Usage¶
The key-value store API gives a common API that can be used with a variety of different backends to provide a consistent interface for storage. If used correctly you can swap out the backend used with little or no modification of the user code.
Creating and Connecting¶
Before you use a store, you need to create an instance of the appropriate type, and then connect to it, possibly authenticating if that is required. For example, the following connects to a read-only remote store via HTTP, using HTTP Authentication:
from encore.events.api import EventManager
from encore.storage.static_url_store import StaticURLStore
event_manager = EventManager()
store = StaticURLStore(event_manager, 'http://localhost:8080/', 'data', 'index.json')
store.connect(credentials={'username': 'alibaba', password: 'Open Sesame'})
At this point the store is ready to use. You can check to see whether the store
has connected using the is_connected()
method. When you are finished
with a store, you should call its disconnect()
method to allow it to
cleanly release any resources it may be using, such as database connections.
Reading¶
To read from a store, you use one of the get()
methods:
value = store.get('my_document')
datastream = value.data
metadata = value.metadata
In this case datastream is a file-like object that streams bytes:
data = datastream.read()
print(data)
More likely you will have used some sort of serialization format like XML, JSON or YAML to store your data in the document, so instead you can do:
import json
data = json.load(datastream)
If the data is raw bytes to store into a numpy array, you can do something like this:
import numpy
data = datastream.read()
dtype = numpy.int32
size = len(data)/dtype().nbytes
arr = numpy.empty(shape=size, dtype=dtype)
arr.data[:] = data
The read()
method supports buffered reads if your data is
larger than would comfortably fit into memory.
If you need to support random-access streaming, the value API also supports
a range(start, end)()
method that return the requested
bytes as a readable stream.
The metadata stores auxilliary information about the data that is stored in the key. It is a dictionary of reasonably serializable values (frequently it will serialize to JSON or similar format):
print('Document title:', metadata['title'])
print('Document author:', metadata['author'])
print('Document encoding:', metadata['encoding'])
# checksum
import hashlib
assert hashlib.sha1(document.read()).digest() == metadata['sha1']
What metadata is stored is completely dependent on the use-case for the key-value store: the key-value store makes no assumptions.
If you try to read a key which doe not exist, then the store will raise a KeyError.
If you want to see whether or not a particular key is populated, you can use the
exists()
method.
Frequently you will only be interested in the data or the metadata, not both.
For these cases there are methods get_data()
and get_metadata()
which return the appropriate entities. For metadata, if you are only interested
in the values of some of the dictionary keys, you can supply an additional argument
select
which will restrict the returned keys to this subset of all the keys:
author_info = store.get_metadata(‘document’, select=[‘author’, ‘organization’])
It is very common that you either want to extract the stream of bytes from a value
into a Python bytes object (ie. a string in Python 2, as opposed to unicode) or
into a file on the local filesystem. Two utility methods to_file()
and to_bytes()
are provided which perform these operations. If the
data source is larger than will comfortably fit into memory (particularly for
to_file()
) you can supply an optional buffer size:
store.to_file('document', 'local_document.txt', buffer=8096)
Querying¶
Frequently you want to find keys whose metadata match certain criteria. The key-value store API gives a simple query mechanism that permits this sort of matching:
for key, metadata in store.query(author='alibaba', organization='40 Thieves'):
print(key, ':', metadata['title'])
This will print the key and title of all documents which have an author
key
with value 'alibaba'
and an organization
key with value '40 Thieves'
.
The current API only permits querying for exact matches and matching all of the
query terms. More complex queries would need to be performed on an ad-hoc basis
on top of this API.
If all the user is concerned with is which keys match, there is an alternative
method query_keys()
:
for key in store.query_keys(author='alibaba', organization='40 Thieves'):
print(key)
To iterate over all the keys in a store, you can simply call query_keys()
with no arguments:
for key in store.query_keys():
print(key)
Finally, as a useful utility, you can use glob-style matching on the keys using
the glob()
method:
for key in store.glob('*.jpg'):
print(key)
Writing¶
Most, but not all, stores also allow you to write data to keys. The basic method
is set()
which is the inverse of get()
. It expects a
file-like object with a read()
method that can do buffering, and a
dictionary of metadata as arguments:
from cStringIO import StringIO
data = StringIO("Hello World")
metadata = {'title': "Greeting", 'author': 'alibaba'}
store.set('hello', (data, metadata))
As with reading, there are methods set_data()
and set_metadata()
that permit you to set just one of the two parts of the value, and there are
utility methods from_bytes()
and from_file()
that populate
the data of a key from either a byte string or a binary file. The latter two
methods do not set any metadata: that must be done manually if needed.
If you want to add to the metadata without overwriting it, there is a convenience
method update_metadata()
method that will update the
metadata dictionary in mych the same way that the standard Python dictionary’s
update
method works.
You can delete a key with the delete()
method:
store.delete('hello')
Transactions¶
The key-value store API does not assume that the underlying storage mechanism has a notion of transactions, but if it does then it can be supported by the key-value store. Transactions are handled by context managers and the with statement:
with store.transaction('Setting some values'):
store.set('key1', (data1, metadata1))
store.set('key2', (data2, metadata2))
If any exception were to occur in the with statement, the context manager will ensure that the transaction gets rolled back. Otherwise the transaction will be committed when the with statement finishes.
Transactions are re-entrant, so it is safe to do the following:
def add_keypair(keypair):
with store.transaction('Adding keypair'):
store.set(keypair.key1, (keypair.data1, keypair.metadata1))
store.set(keypair.key2, (keypair.data2, keypair.metadata2))
def add_many_keypairs(keypairs):
with store.transaction('Adding many keypairs'):
for keypair in keypairs:
add_keypair(keypair)
The transaction in the function is effectively ignored, with only the outermost transaction applying.
The “Multi” Methods¶
For convenience there are a collection of methods prefixed by “multi”, such as
multiget()
and multiset_data()
,
which perform the specified operations on a collection of keys at once. If
transactions are available, then these will be done as a single transaction.
Events¶
The various stores use the Encore event system, which is why the stores must be supplied with a reference to an EventManager instance. The events which are emitted are referenced in the documentation for each method.