Write intro and examples.
These flags are used to define object attributes. See register_object_type_attrs() for more details.
Attribute is persisted with the object, but cannot be used with query(). It can be any Python type that is picklable.
The attribute data is stored inside an internal pickled object and does not occupy a dedicated column in the underlying database table for the object type.
Attribute can be used with query(), but the type must be one of int, float, str, unicode, buffer, or bool.
The attribute data is stored in a dedicated column in the underlying database table.
If this flag is set, the attribute is indexed for faster queries.
Internally, an SQL index is placed on the column. Multiple ATTR_INDEXED attributes may be used in a composite index by specifying the indexes argument with register_object_type_attrs().
Queries on this attribute are case-insensitive, however when the attribute is accessed from the ObjectRow, the original case is preserved.
Attributes with this flag require roughly double the space in the database, because two copies are kept (one in lower case for searching, and one in the original case).
Values for this attribute are parsed into terms and individual terms can be searched to find the object.
When it’s registered, the attribute must also be associated with a registered inverted index.
A bitmap of ATTR_INDEXED and ATTR_IGNORE_CASE. Provided for convenience and code readability.
Open a database, creating one if it doesn’t already exist.
Parameters: | dbfile (str) – path to the database file |
---|
SQLite is used to provide the underlying database.
kaa.db.Database
add() | Add an object to the database. |
---|---|
commit() | Explicitly commit any changes made to the database. |
delete() | Delete the specified object. |
delete_by_query() | Delete all objects returned by the given query. |
get() | Fetch the given object from the database. |
get_db_info() | Return information about the database. |
get_inverted_index_terms() | Obtain terms used by objects for an inverted index. |
get_metadata() | Fetch metadata previously set by set_metadata(). |
query() | Query the database for objects matching all of the given keyword attributes. |
query_one() | Like query() but returns a single object only. |
register_inverted_index() | Registers a new inverted index with the database. |
register_object_type_attrs() | Register one or more object attributes and/or multi-column indexes for the given type name. |
reparent() | Change the parent of an object. |
retype() | Convert the object to a new type. |
set_metadata() | Associate simple key/value pairs with the database. |
update() | Update attributes for an existing object in the database. |
upgrade_to_py3() | |
vacuum() | Cleans up the database, removing unused inverted index terms. |
filename | read-only | Full path to the database file. |
---|---|---|
lazy_commit | read/write | The interval after which any changes made to the database will be automatically committed, or None to require explicit commiting. (Default is None.) |
readonly | read-only |
Add an object to the database.
Parameters: |
|
---|---|
Returns: | ObjectRow representing the added object |
For example:
import os
from kaa.db import *
db = Database('test.db')
db.register_object_type_attrs('directory',
name = (str, ATTR_SEARCHABLE),
mtime = (float, ATTR_SIMPLE)
)
root = db.add('directory', name='/', mtime=os.stat('/').st_mtime)
db.add('directory', parent=root, name='etc', mtime=os.stat('/etc').st_mtime)
db.add('directory', parent=root, name='var', mtime=os.stat('/var').st_mtime)
Explicitly commit any changes made to the database.
Note
Any uncommitted changes will automatically be committed at program exit.
Delete the specified object.
Parameters: | obj (ObjectRow or (object_type, object_id)) – the object to delete |
---|
Delete all objects returned by the given query.
Parameters: | attrs – see query() for details. |
---|---|
Returns: | the number of objects deleted |
Return type: | int |
Fetch the given object from the database.
Parameters: | obj – a 2-tuple (type, id) representing the object. |
---|---|
Returns: | ObjectRow |
obj may also be an ObjectRow, however that usage is less likely to be useful, because an ObjectRow already contains all information about the object. One common use-case is to reload a possibly changed object from disk.
This method is essentially shorthand for:
database.query(object=(object_type, object_id))[0]
Return information about the database.
Returns: | a dict |
---|
count: dict of object types holding their counts
total: total number of objects in the database
termcounts: a dict of the number of indexed terms for each inverted index
file: full path to the database file
Obtain terms used by objects for an inverted index.
Parameters: |
|
---|---|
Returns: | a list of 2-tuples, where each tuple is (term, count). If associated is not given, count is the total number of objects that term is mapped to. Otherwise, count reflects the number of objects which have that term plus all the given associated terms. The list is sorted with the highest counts appearing first. |
For example, given an otherwise empty database, if you have an object with terms [‘vacation’, ‘hawaii’] and two other object with terms [‘vacation’, ‘spain’] and the associated list passed is [‘vacation’], the return value will be [(‘spain’, 2), (‘hawaii’, 1)].
Fetch metadata previously set by set_metadata().
Parameters: |
|
---|---|
Returns: | unicode string containing the value for this key, or the default parameter if the key was not found. |
Query the database for objects matching all of the given keyword attributes.
Keyword arguments can be any previously registered ATTR_SEARCHABLE object attribute for any object type, or the name of a registered inverted index. There are some special keyword arguments:
Parameters: |
|
---|---|
Raises : | ValueError if the query is invalid (e.g. attempting to query on a simple attribute) |
Returns: | a list of ObjectRow objects |
When any of the attributes are inverted indexes, the result list is sorted according to a score. The score is based upon the frequency of the matched terms relative to the entire database.
Note
If you know which type of object you’re interested in, you should specify the type as it will help improve performance by reducing the scope of the search, especially for inverted indexes.
Another significant factor in performance is whether or not a limit is specified. Query time generally scales linearly with respect to the number of rows found, but in the case of searches on inverted indexes, specifying a limit can drastically reduce search time, but does not affect scoring.
Values supplied to attributes (other than inverted indexes) require exact matches. To search based on an expression, such as inequality, ranges, substrings, set inclusion, etc. require the use of a QExpr object.
Expanding on the example provided in register_object_type_attrs():
>>> db.add('msg', sender=u'Stewie Griffin', subject=u'Blast!',
keywords='When the world is mine, your death shall be quick and painless.')
>>> # Exact match based on sender
>>> db.query(sender=u'Stewie Griffin')
[<kaa.db.ObjectRow object at 0x7f652b251030>]
>>> # Keyword search requires all keywords
>>> db.query(keywords=['death', 'blast'])
[<kaa.db.ObjectRow object at 0x7f652c3d1f90>]
>>> # This doesn't work, since it does an exact match ...
>>> db.query(sender=u'Stewie')
[]
>>> # ... but we can use QExpr to do a substring/pattern match.
>>> db.query(sender=QExpr('like', u'Stewie%'))
[<kaa.db.ObjectRow object at 0x7f652c3d1f90>]
>>> # How about a regexp search.
>>> db.query(sender=QExpr('regexp', ur'.*\bGriffin'))
[<kaa.db.ObjectRow object at 0x7f652b255030>]
Like query() but returns a single object only.
This is a convenience method, and query_one(...) is equivalent to:
results = db.query(...)
if results:
obj = results[0]
else:
obj = None
limit=1 is implied by this query.
Registers a new inverted index with the database.
An inverted index maps arbitrary terms to objects and allows you to query based on one or more terms. If the inverted index already exists with the given parameters, no action is performed.
Parameters: |
|
---|
For example:
from kaa.db import *
db = Database('test.db')
db.register_inverted_index('tags')
db.register_inverted_index('keywords', min=3, max=30, ignore=STOP_WORDS)
Register one or more object attributes and/or multi-column indexes for the given type name.
This function modifies the database as needed to accommodate new indexes and attributes, either by creating the object’s tables (in the case of a new object type) or by altering the object’s tables to add new columns or indexes.
This method is idempotent: if the attributes and indexes specified have not changed from previous invocations, no changes will be made to the database. Moreover, newly registered attributes will not affect previously registered attributes. This allows, for example, a plugin to extend an existing object type created by the core application without interfering with it.
Parameters: |
|
---|
Previously registered attributes may be updated in limited ways (e.g. by adding an index to the attribute). If the change requested is not supported, a ValueError will be raised.
Note
Currently, indexes and attributes can only be added, not removed. That is, once an attribute or index is added, it lives forever.
Object attributes, which are supplied as keyword arguments, are either searchable or simple. Searchable attributes occupy a column in the underlying database table and so queries can be performed on these attributes, but their types are more restricted. Simple attributes can be any type that can be pickled, but can’t be searched.
The attribute kwarg value is a tuple of 2 to 4 items in length and in the form (attr_type, flags, ivtidx, split).
- attr_type: the type of the object attribute. For simple attributes (ATTR_SIMPLE in flags), this can be any picklable type; for searchable attributes (ATTR_SEARCHABLE in flags), this must be either int, float, str, unicode, bytes, or bool. (On Python 2.5, you can use kaa.db.RAW_TYPE instead of bytes.)
- flags: a bitmap of attribute flags
- ivtidx: name of a previously registered inverted index used for this attribute. Only needed if flags contains ATTR_INVERTED_INDEX
- split: function or regular expression used to split string-based values for this attribute into separate terms for indexing. If this isn’t defined, then the default split strategy for the inverted index wil be used.
Apart from not being allowed to conflict with one of the reserved names, there is a special case for attribute names: when they have the same name as a previously registered inverted index. These attributes must be ATTR_SIMPLE, and of type list. Terms explicitly associated with the attribute are persisted with the object, but when accessed, all terms for all attributes for that inverted index will be contained in the list, not just those explicitly associated with the same-named attribute.
The following example shows what an application that indexes email might do:
from kaa.db import *
from datetime import datetime
db = Database('email.db')
db.register_inverted_index('keywords', min=3, max=30)
db.register_object_type_attrs('msg',
# Create a composite index on sender and recipient, because
# (let's suppose) it's we do a lot of searches for specific
# senders emailing specific recipients.
[('sender', 'recipient')],
# Simple attribute can be anything that's picklable, which datetime is.
date = (datetime, ATTR_SIMPLE),
# Sender and recipient names need to be ATTR_SEARCHABLE since
# they're part of a composite index.
sender = (unicode, ATTR_SEARCHABLE),
recipient = (unicode, ATTR_SEARCHABLE),
# Subject is searchable (standard SQL-based substring matches),
# but also being indexed as part of the keywords inverted
# index for fast term-based searching.
subject = (unicode, ATTR_SEARCHABLE | ATTR_INVERTED_INDEX, 'keywords'),
# Special case where an attribute name is the same as a registered
# inverted index. This lets us index on, for example, the message body
# without actually storing the message inside the database.
keywords = (list, ATTR_SIMPLE | ATTR_INVERTED_INDEX, 'keywords')
)
Change the parent of an object.
Parameters: |
---|
This is a convenience method to improve code readability, and is equivalent to:
database.update(obj, parent=parent)
Convert the object to a new type.
Parameters: |
|
---|---|
Returns: | an ObjectRow, converted to the new type with the new id |
Any attribute that has not also been registered with new_type (and with the same name) will be removed. Because the object is effectively changing ids, all of its existing children will be reparented to the new id.
Associate simple key/value pairs with the database.
Parameters: |
|
---|
Update attributes for an existing object in the database.
Parameters: |
|
---|
Continuing from the example in add(), consider:
>>> d = db.add('directory', parent=root, name='foo')
>>> db.update(d, name='bar')
>>> d = db.get(d) # Reload changes
>>> d['name']
'bar'
Warning
When updating an attribute associated with an inverted index, all terms for that inverted index in the object need to be rescored. For special attributes with the same name as inverted indexes, it’s the caller’s responsibility to ensure terms are passed back during update.
In the email example from register_object_type_attrs(), if the subject attribute is updated by itself, any previously indexed terms passed to the keywords attribute (the message body) would be discarded after the update. If updating the subject, the caller would be required to pass the message body in the keywords attribute again, in order to preserve those terms.
If none of the attributes being updated are associated with an inverted index that also has a same-named special attribute then this warning doesn’t apply as the inverted index does not need to be updated.
Cleans up the database, removing unused inverted index terms.
This also calls VACUUM on the underlying sqlite database, which rebuilds the database to reclaim unused space and reduces fragmentation.
Applications should call this periodically, however this operation can be expensive for large databases so it should be done during an extended idle period.
Full path to the database file.
The interval after which any changes made to the database will be automatically committed, or None to require explicit commiting. (Default is None.)
The timer is restarted upon each change to the database, so prolonged updates may still benefit from explicit periodic commits.
ObjectRow objects represent a single object from a kaa.db.Database, and are returned by, or may be passed to, many Database methods.
One key feature provided by ObjectRow is on-demand unpickling of ATTR_SIMPLE attributes. It’s often the case that simple attributes don’t need to be accessed, so there’s no point in incurring the unpickling overhead at query time.
For the most part, ObjectRows behave like a read-only dict, providing most (though not all) of the common dict methods. If running on CPython, there is a higher performance implementation written in C.
Flexible query expressions for use with kaa.db.Database.query()
Parameters: |
|
---|
Except for in, not in, and range, the operand must be the type of the registered attribute being evaluated (e.g. unicode, int, etc.).
The operand for in and not in are lists or tuples of the attribute type, to test inclusion in the given set.
The range operator accepts a 2-tuple specifying min and max values for the attribute. The Python expression age=QExpr(‘range’, (20, 30)) translates to age >= 20 AND age <= 30.