Object Database

Write intro and examples.

Attribute Flags

These flags are used to define object attributes. See register_object_type_attrs() for more details.

kaa.db.ATTR_SIMPLE

Attribute is persisted with the object, but cannot be used with query(). It can be any Python type that is picklable.

The attribute data is stored inside an internal pickled object and does not occupy a dedicated column in the underlying database table for the object type.

kaa.db.ATTR_SEARCHABLE

Attribute can be used with query(), but the type must be one of int, float, str, unicode, buffer, or bool.

The attribute data is stored in a dedicated column in the underlying database table.

kaa.db.ATTR_INDEXED

If this flag is set, the attribute is indexed for faster queries.

Internally, an SQL index is placed on the column. Multiple ATTR_INDEXED attributes may be used in a composite index by specifying the indexes argument with register_object_type_attrs().

kaa.db.ATTR_IGNORE_CASE

Queries on this attribute are case-insensitive, however when the attribute is accessed from the ObjectRow, the original case is preserved.

Attributes with this flag require roughly double the space in the database, because two copies are kept (one in lower case for searching, and one in the original case).

kaa.db.ATTR_INVERTED_INDEX

Values for this attribute are parsed into terms and individual terms can be searched to find the object.

When it’s registered, the attribute must also be associated with a registered inverted index.

kaa.db.ATTR_INDEXED_IGNORE_CASE

A bitmap of ATTR_INDEXED and ATTR_IGNORE_CASE. Provided for convenience and code readability.

Classes

class kaa.db.Database(dbfile)

Open a database, creating one if it doesn’t already exist.

Parameters:dbfile (str) – path to the database file

SQLite is used to provide the underlying database.

Synopsis

Class Hierarchy

kaa.db.Database

Methods
add()Add an object to the database.
commit()Explicitly commit any changes made to the database.
delete()Delete the specified object.
delete_by_query()Delete all objects returned by the given query.
get()Fetch the given object from the database.
get_db_info()Return information about the database.
get_inverted_index_terms()Obtain terms used by objects for an inverted index.
get_metadata()Fetch metadata previously set by set_metadata().
query()Query the database for objects matching all of the given keyword attributes.
query_one()Like query() but returns a single object only.
register_inverted_index()Registers a new inverted index with the database.
register_object_type_attrs()Register one or more object attributes and/or multi-column indexes for the given type name.
reparent()Change the parent of an object.
retype()Convert the object to a new type.
set_metadata()Associate simple key/value pairs with the database.
update()Update attributes for an existing object in the database.
upgrade_to_py3()
vacuum()Cleans up the database, removing unused inverted index terms.
Properties
filenameread-onlyFull path to the database file.
lazy_commitread/writeThe interval after which any changes made to the database will be automatically committed, or None to require explicit commiting. (Default is None.)
readonlyread-only
Signals
This class has no signals.

Methods

add(object_type, parent=None, **attrs)

Add an object to the database.

Parameters:
  • object_type (str) – the name of the object type previously created by register_object_type_attrs().
  • parent (ObjectRow or 2-tuple (object_type, object_id)) – specifies the parent of this object, if any; does not have to be an object of the same type.
  • attrs – keyword arguments specifying the attribute (which must have been registered) values. Registered attributes that are not explicitly specified here will default to None.
Returns:

ObjectRow representing the added object

For example:

import os
from kaa.db import *
db = Database('test.db')
db.register_object_type_attrs('directory',
    name = (str, ATTR_SEARCHABLE),
    mtime = (float, ATTR_SIMPLE)
)
root = db.add('directory', name='/', mtime=os.stat('/').st_mtime)
db.add('directory', parent=root, name='etc', mtime=os.stat('/etc').st_mtime)
db.add('directory', parent=root, name='var', mtime=os.stat('/var').st_mtime)
commit()

Explicitly commit any changes made to the database.

Note

Any uncommitted changes will automatically be committed at program exit.

delete(obj)

Delete the specified object.

Parameters:obj (ObjectRow or (object_type, object_id)) – the object to delete
delete_by_query(**attrs)

Delete all objects returned by the given query.

Parameters:attrs – see query() for details.
Returns:the number of objects deleted
Return type:int
get(obj)

Fetch the given object from the database.

Parameters:obj – a 2-tuple (type, id) representing the object.
Returns:ObjectRow

obj may also be an ObjectRow, however that usage is less likely to be useful, because an ObjectRow already contains all information about the object. One common use-case is to reload a possibly changed object from disk.

This method is essentially shorthand for:

database.query(object=(object_type, object_id))[0]
get_db_info()

Return information about the database.

Returns:a dict
The returned dictionary has the following keys:
  • count: dict of object types holding their counts

  • total: total number of objects in the database

  • types: a dict keyed on object type which contains:
    • attrs: a dictionary of registered attributes for this type
    • idx: a list of composite indexes for this type
  • termcounts: a dict of the number of indexed terms for each inverted index

  • file: full path to the database file

get_inverted_index_terms(ivtidx, associated=None, prefix=None)

Obtain terms used by objects for an inverted index.

Parameters:
  • ivtidx (str) – the name of an inverted index previously registered with register_inverted_index().
  • associated (list of str or unicode) – specifies a list of terms, and only those terms which are mapped to objects in addition to the supplied associated terms will be returned. If None, all terms for the inverted index are returned.
  • prefix (str or unicode) – only terms that begin with the specified prefix are returned. This is useful for auto-completion while a user is typing a query.
Returns:

a list of 2-tuples, where each tuple is (term, count). If associated is not given, count is the total number of objects that term is mapped to. Otherwise, count reflects the number of objects which have that term plus all the given associated terms. The list is sorted with the highest counts appearing first.

For example, given an otherwise empty database, if you have an object with terms [‘vacation’, ‘hawaii’] and two other object with terms [‘vacation’, ‘spain’] and the associated list passed is [‘vacation’], the return value will be [(‘spain’, 2), (‘hawaii’, 1)].

get_metadata(key, default=None)

Fetch metadata previously set by set_metadata().

Parameters:
  • key (str) – the key name for the metadata, prefixed with appname::.
  • default – value to return if key is not found
Returns:

unicode string containing the value for this key, or the default parameter if the key was not found.

query(**attrs)

Query the database for objects matching all of the given keyword attributes.

Keyword arguments can be any previously registered ATTR_SEARCHABLE object attribute for any object type, or the name of a registered inverted index. There are some special keyword arguments:

Parameters:
  • parent (ObjectRow, 2-tuple (object_type, object_id), 2-tuple (object_type, QExpr), or a list of those) – require all matched objects to have the given object (or objects) as their immediate parent ancestor. If parent is a list or tuple, then they specify a list of possible parents, any of which would do.
  • object (ObjectRow or 2-tuple (object_type, object_id)) – match only a specific object. Not usually very useful, but could be used to test if the given object matches terms from an inverted index.
  • type (str) – only search items of this object type; if None (or not specified) then all types are searched
  • limit (int) – return only this number of results; if None (or not specified), all matches are returned.
  • attrs (list of str) – a list of attribute names to be returned; if not specified, all attributes registered with the object type are available in the result. Only specifying the attributes required can help performance moderately, but generally it isn’t required except wit distinct below.
  • distinct – if True, ensures that each object in the result set is unique with respect to the attributes specified in the attrs parameter. When distinct is True, attrs is required and none of the attributes specified may be simple.
  • orattrs (list) – attribute names that will be ORed in the query; by default, all attributes are ANDed.
Raises :

ValueError if the query is invalid (e.g. attempting to query on a simple attribute)

Returns:

a list of ObjectRow objects

When any of the attributes are inverted indexes, the result list is sorted according to a score. The score is based upon the frequency of the matched terms relative to the entire database.

Note

If you know which type of object you’re interested in, you should specify the type as it will help improve performance by reducing the scope of the search, especially for inverted indexes.

Another significant factor in performance is whether or not a limit is specified. Query time generally scales linearly with respect to the number of rows found, but in the case of searches on inverted indexes, specifying a limit can drastically reduce search time, but does not affect scoring.

Values supplied to attributes (other than inverted indexes) require exact matches. To search based on an expression, such as inequality, ranges, substrings, set inclusion, etc. require the use of a QExpr object.

Expanding on the example provided in register_object_type_attrs():

>>> db.add('msg', sender=u'Stewie Griffin', subject=u'Blast!',
           keywords='When the world is mine, your death shall be quick and painless.')
>>> # Exact match based on sender
>>> db.query(sender=u'Stewie Griffin')
[<kaa.db.ObjectRow object at 0x7f652b251030>]
>>> # Keyword search requires all keywords
>>> db.query(keywords=['death', 'blast'])
[<kaa.db.ObjectRow object at 0x7f652c3d1f90>]
>>> # This doesn't work, since it does an exact match ...
>>> db.query(sender=u'Stewie')
[]
>>> # ... but we can use QExpr to do a substring/pattern match.
>>> db.query(sender=QExpr('like', u'Stewie%'))
[<kaa.db.ObjectRow object at 0x7f652c3d1f90>]
>>> # How about a regexp search.
>>> db.query(sender=QExpr('regexp', ur'.*\bGriffin'))
[<kaa.db.ObjectRow object at 0x7f652b255030>]
query_one(**attrs)

Like query() but returns a single object only.

This is a convenience method, and query_one(...) is equivalent to:

results = db.query(...)
if results:
    obj = results[0]
else:
    obj = None

limit=1 is implied by this query.

register_inverted_index(name, min=None, max=None, split=None, ignore=None)

Registers a new inverted index with the database.

An inverted index maps arbitrary terms to objects and allows you to query based on one or more terms. If the inverted index already exists with the given parameters, no action is performed.

Parameters:
  • name (str) – the name of the inverted index; must be alphanumeric.
  • min (int) – the minimum length of terms to index; terms smaller than this will be ignored. If None (default), there is no minimum size.
  • max (int) – the maximum length of terms to index; terms larger than this will be ignored. If None (default), there is no maximum size.
  • split (callable, regexp (SRE_Pattern) object, or str) – used to parse string-based attributes using this inverted index into individual terms. In the case of regexps, the split method will be called. (If a string is specified, it will be compiled into a regexp first.) If split is a callable, it will receive a string of text and must return a sequence, and each item in the sequence will be indexed as an individual term. If split is not specified, the default is to split words at non-alphanumeric/underscore/digit boundaries.
  • ignore – a list of terms that will not be indexed (so-called stop words). If specified, each indexed term for this inverted index will first be checked against this list. If it exists, the term is discarded.

For example:

from kaa.db import *
db = Database('test.db')
db.register_inverted_index('tags')
db.register_inverted_index('keywords', min=3, max=30, ignore=STOP_WORDS)
register_object_type_attrs(type_name, indexes=[], **attrs)

Register one or more object attributes and/or multi-column indexes for the given type name.

This function modifies the database as needed to accommodate new indexes and attributes, either by creating the object’s tables (in the case of a new object type) or by altering the object’s tables to add new columns or indexes.

This method is idempotent: if the attributes and indexes specified have not changed from previous invocations, no changes will be made to the database. Moreover, newly registered attributes will not affect previously registered attributes. This allows, for example, a plugin to extend an existing object type created by the core application without interfering with it.

Parameters:
  • type_name (str) – the name of object type the registered attributes or indexes apply to.
  • indexes (list of tuples of strings) – a list of tuples where each tuple contains 2 or more registered ATTR_SEARCHABLE attributes for which a composite index will be created in the underlying database. This is useful for speeding up queries involving these attributes combined.
  • attrs (2, 3, or 4-tuple) – keyword arguments defining the attributes to be registered. The keyword defining the attribute name cannot conflict with any of the names in RESERVED_ATTRIBUTES. See below for a more complete specification of the value.

Previously registered attributes may be updated in limited ways (e.g. by adding an index to the attribute). If the change requested is not supported, a ValueError will be raised.

Note

Currently, indexes and attributes can only be added, not removed. That is, once an attribute or index is added, it lives forever.

Object attributes, which are supplied as keyword arguments, are either searchable or simple. Searchable attributes occupy a column in the underlying database table and so queries can be performed on these attributes, but their types are more restricted. Simple attributes can be any type that can be pickled, but can’t be searched.

The attribute kwarg value is a tuple of 2 to 4 items in length and in the form (attr_type, flags, ivtidx, split).

  • attr_type: the type of the object attribute. For simple attributes (ATTR_SIMPLE in flags), this can be any picklable type; for searchable attributes (ATTR_SEARCHABLE in flags), this must be either int, float, str, unicode, bytes, or bool. (On Python 2.5, you can use kaa.db.RAW_TYPE instead of bytes.)
  • flags: a bitmap of attribute flags
  • ivtidx: name of a previously registered inverted index used for this attribute. Only needed if flags contains ATTR_INVERTED_INDEX
  • split: function or regular expression used to split string-based values for this attribute into separate terms for indexing. If this isn’t defined, then the default split strategy for the inverted index wil be used.

Apart from not being allowed to conflict with one of the reserved names, there is a special case for attribute names: when they have the same name as a previously registered inverted index. These attributes must be ATTR_SIMPLE, and of type list. Terms explicitly associated with the attribute are persisted with the object, but when accessed, all terms for all attributes for that inverted index will be contained in the list, not just those explicitly associated with the same-named attribute.

The following example shows what an application that indexes email might do:

from kaa.db import *
from datetime import datetime
db = Database('email.db')
db.register_inverted_index('keywords', min=3, max=30)
db.register_object_type_attrs('msg',
    # Create a composite index on sender and recipient, because
    # (let's suppose) it's we do a lot of searches for specific
    # senders emailing specific recipients.
    [('sender', 'recipient')],

    # Simple attribute can be anything that's picklable, which datetime is.
    date = (datetime, ATTR_SIMPLE),

    # Sender and recipient names need to be ATTR_SEARCHABLE since
    # they're part of a composite index.
    sender = (unicode, ATTR_SEARCHABLE),
    recipient = (unicode, ATTR_SEARCHABLE),

    # Subject is searchable (standard SQL-based substring matches),
    # but also being indexed as part of the keywords inverted
    # index for fast term-based searching.
    subject = (unicode, ATTR_SEARCHABLE | ATTR_INVERTED_INDEX, 'keywords'),

    # Special case where an attribute name is the same as a registered
    # inverted index.  This lets us index on, for example, the message body
    # without actually storing the message inside the database.
    keywords = (list, ATTR_SIMPLE | ATTR_INVERTED_INDEX, 'keywords')
)
reparent(obj, parent)

Change the parent of an object.

Parameters:
  • obj (ObjectRow, or (type, id)) – the object to reparent
  • parent (ObjectRow, or (type, id)) – the new parent of the object

This is a convenience method to improve code readability, and is equivalent to:

database.update(obj, parent=parent)
retype(obj, new_type)

Convert the object to a new type.

Parameters:
  • obj (ObjectRow, or (type, id)) – the object to be converted to the new type
  • new_type – the type to convert the object to
Returns:

an ObjectRow, converted to the new type with the new id

Any attribute that has not also been registered with new_type (and with the same name) will be removed. Because the object is effectively changing ids, all of its existing children will be reparented to the new id.

set_metadata(key, value)

Associate simple key/value pairs with the database.

Parameters:
  • key (str or unicode) – the key name for the metadata; it is required that key is prefixed with appname:: in order to avoid namespace collisions.
  • value (str or unicode) – the value to associate with the given key
update(obj, parent=None, **attrs)

Update attributes for an existing object in the database.

Parameters:
  • obj (ObjectRow or 2-tuple (object_type, object_id)) – the object whose attributes are being modified
  • parent (ObjectRow or 2-tuple (object_type, object_id)) – if specified, the object is reparented to the given parent object, otherwise the parent remains the same as when the object was added with add().
  • attrs – keyword arguments specifying the attribute (which must have been registered) values. Registered attributes that are not explicitly specified here will preserve their original values (except for special attributes named after inverted index; see warning below).

Continuing from the example in add(), consider:

>>> d = db.add('directory', parent=root, name='foo')
>>> db.update(d, name='bar')
>>> d = db.get(d)   # Reload changes
>>> d['name']
'bar'

Warning

When updating an attribute associated with an inverted index, all terms for that inverted index in the object need to be rescored. For special attributes with the same name as inverted indexes, it’s the caller’s responsibility to ensure terms are passed back during update.

In the email example from register_object_type_attrs(), if the subject attribute is updated by itself, any previously indexed terms passed to the keywords attribute (the message body) would be discarded after the update. If updating the subject, the caller would be required to pass the message body in the keywords attribute again, in order to preserve those terms.

If none of the attributes being updated are associated with an inverted index that also has a same-named special attribute then this warning doesn’t apply as the inverted index does not need to be updated.

upgrade_to_py3()
vacuum()

Cleans up the database, removing unused inverted index terms.

This also calls VACUUM on the underlying sqlite database, which rebuilds the database to reclaim unused space and reduces fragmentation.

Applications should call this periodically, however this operation can be expensive for large databases so it should be done during an extended idle period.

Properties

filename

Full path to the database file.

lazy_commit

The interval after which any changes made to the database will be automatically committed, or None to require explicit commiting. (Default is None.)

The timer is restarted upon each change to the database, so prolonged updates may still benefit from explicit periodic commits.

readonly

class kaa.db.ObjectRow

ObjectRow objects represent a single object from a kaa.db.Database, and are returned by, or may be passed to, many Database methods.

One key feature provided by ObjectRow is on-demand unpickling of ATTR_SIMPLE attributes. It’s often the case that simple attributes don’t need to be accessed, so there’s no point in incurring the unpickling overhead at query time.

For the most part, ObjectRows behave like a read-only dict, providing most (though not all) of the common dict methods. If running on CPython, there is a higher performance implementation written in C.

class kaa.db.QExpr(operator, operand)

Flexible query expressions for use with kaa.db.Database.query()

Parameters:
  • operator (str) – =, !=, <, <=, >, >=, in, not in, range, like, or regexp
  • operand – the rvalue of the expression; any scalar values as part of the operand must be the same type as the attribute being evaluated

Except for in, not in, and range, the operand must be the type of the registered attribute being evaluated (e.g. unicode, int, etc.).

The operand for in and not in are lists or tuples of the attribute type, to test inclusion in the given set.

The range operator accepts a 2-tuple specifying min and max values for the attribute. The Python expression age=QExpr(‘range’, (20, 30)) translates to age >= 20 AND age <= 30.

Table Of Contents

Previous topic

Configuration Files

Next topic

INotify

This Page