Redis storage

This submodule provides the RedisTable, an implementation of StorageTable that stores its records in Redis. It uses the redis.asyncio module, which gets pulled in as a dependency when the redis extra is selected.

Records in Redis need to be encoded as byte strings. To this end, this module also contains many implementations of the abstract Codec for various data types. These also include the wrapper types Nullable and Array.

While storing data in Redis seemed like a good idea a while ago, with changes in the library’s design concerning indexing, this implementation may be slower than the local storage. Supporting multiple indexes requires at least one round trip to get the relevant primary keys of records (one per score interval in the records spec), then another to actually get the records. Deletions actually have to fetch all records first in order to clean up the indexes, then delete one by one.

Storage

tablecache.redis.AttributeCodecs: collections.abc.Mapping[str, tablecache.redis.Codec]: A mapping of attribute names to Codecs used to encode and decode them.

class tablecache.redis.RedisCodingError: Raised when any error relating to en- or decoding occurs.

class tablecache.redis.RedisTable

Bases: StorageTable, Generic

A table stored in Redis.

Enables storage and retrieval of records in Redis. Each record must have a primary key which uniquely identifies it within the table. Only attributes for which a codec is specified are stored.

Each record is also associated with one or more index scores, one for each of the given index names. Defining score functions allow queries for multiple records via intervals of scores. Scores must be 64-bit floats (or other numbers that can be represented as 64-bit floats).

Records are stored in a Redis hash with key <table_name>:r, using their primary key encoded via the given primary key codec. Another hash <table_name>:s contains scratch records that aren’t merged yet. Each index is stored as a Redis sorted set with the key <table_name>:i:<index_name>. These store, for their respective index score, the primary key for the record.

Index scores need not be unique, so each index score may map to multiple primary keys. All of the corresponding records need to be checked (the wrong ones are filtered out via a recheck predicate). This implies that it can be costly if lots of records have equal index scores.

While scratch space is in use (i.e. in between the first call to scratch_put_record() or scratch_discard_records() and the corresponding scratch_merge()), regular write operations (put_record() and delete_records()) are locked. Merging scratch space starts a background task that cleans up data in Redis. During this operation, further scratch activity is locked.

The implementation of the scratch space requires a generation count to be stored with each record, which are used to exclude records in scratch space that aren’t merged yet. The generation is incremented with each merge. Since it is stored as a 32-bit unsigned integer, there is an upper limit of the number of merges that can be done (of 2**32-1). Deletions in scratch space store some data natively (i.e. in Python structures rather than in Redis), so scratch operations with lots of deletions may consume considerable amounts of memory.

__init__(conn, *, table_name, record_scorer, primary_key_codec, attribute_codecs, attribute_extractor=<built-in function getitem>, record_factory=<function _identity>)

Parameters:

conn (redis.asyncio.Redis) – An async Redis connection. The connection will not be closed and needs to be cleaned up from the outside.
table_name (str) – The name of the table, used as a prefix for keys in Redis. Must be unique within the Redis instance.
record_scorer (RecordScorer) – A RecordScorer used to calculate a record’s scores for all the indexes that need to be represented in storage. The score function must not raise exceptions, or the storage may be left in an undefined state.
primary_key_codec (Codec) – A Codec suitable to encode return values of the record_scorer’s primary_key method. Encoded primary keys are used as keys in the Redis hash.
attribute_codecs (AttributeCodecs) – A dictionary of codecs for record attributes. Must map attribute names (strings) to Codec instances that are able to en-/decode the corresponding values. Only attributes present here are stored.
attribute_extractor (Callable[[Record], Any]) – A function extracting an attribute from a record by name. The default works when records are dicts.
record_factory (Callable[[dict], Record]) – A function that takes a dictionary mapping attribute names to values and returns a Record. The default works when records are dicts.

Raises:

ValueError – If attribute_codecs is invalid.

Return type:

None

async clear()

Delete all data belonging to this table.

Return type:: None

async delete_records(records_spec)

Delete multiple records.

Deletes exactly those records that would have been returned by get_records() when called with the same argument.

Asynchronously iterates over the records that are deleted as they exist in storage. Must be fully consumed to finish deletion.

Parameters:: records_spec (StorageRecordsSpec) – A specification of the records to delete.
Returns:: The records as they are deleted as an asynchronous iterator, in no particular order.
Return type:: AsyncIterable

async get_records(records_spec)

Get multiple records.

Asynchronously iterates over all records that match the records spec. That’s all records that have a score in the specified index that is contained in one of the specified intervals, and additionally match the recheck predicate.

Records are guaranteed to be unique as long as the record spec’s intervals don’t overlap (as per their contract).

Parameters:: records_spec (StorageRecordsSpec) – A specification of the records to get.
Returns:: The requested records as an asynchronous iterator, in no particular order.
Return type:: AsyncIterable

property name: str: A name for the table.

async put_record(record)

Store a record.

Stores a record of all attributes for which a codec was configured in Redis. Other attributes that may be present are silently ignored. If a record with the same primary key exists, it is overwritten.

Parameters:

record (Record) – The record to add.

Raises:

ValueError – If any attribute is missing from the record.
RedisCodingError – If any attribute encode to something other than bytes, or any error occurs during encoding.

Return type:

None

async scratch_discard_records(records_spec)

Mark a set of records to be deleted in scratch space.

Asynchronously iterates over the records that are marked for discarding as they exist in storage. These records will continue to be available until scratch space is merged. Must be fully consumed to finish the operation.

Regular write operations are locked until scratch space is merged.

Marking records for deletion requires a bit of internal state, which may get large with large numbers of records.

Parameters:: records_spec (StorageRecordsSpec) – A specification of the records to mark for discarding.
Returns:: The records marked for discarding as an asynchronous iterator, in no particular order.
Return type:: AsyncIterable

scratch_merge()

Merge scratch space.

Merge records added to scratch space via scratch_put_record() or marked for deletion via scratch_discard_records() so that these changes are reflected in get_record() and get_records().

This method is not async, as the switchover is meant to be fast. However, implementations may start background tasks to handle some cleanup during which further scratch operations are blocked.

Return type:: None

async scratch_put_record(record)

Add a record to scratch space.

Regular write operations are locked until scratch space is merged.

Parameters:: record (Record) – The record to add to scratch space.
Return type:: None

Codecs

class tablecache.redis.Codec

Abstract base for codecs.

A codec can encode certain values to bytes, then decode those back to the original value.

abstract decode(bs)

Decode the bytes to a value.

Parameters:: bs (bytes) – A bytes object containing an encoded value.
Returns:: The decoded value
Raises:: ValueError – If the input is invalid and can’t be decoded.
Return type:: T

abstract encode(value)

Encode the value to bytes.

Parameters:: value (T) – The value to encode.
Returns:: A representation of the input value as bytes.
Raises:: ValueError – If the input value is invalid and can’t be encoded.
Return type:: bytes

class tablecache.redis.Nullable

Wrapper codec that allows representing nullable values.

Encodes optional values by using an inner codec for values, and a marker for None.

__init__(value_codec)

Parameters:: value_codec (Codec) – Wrapped codec to encode and decode values.

class tablecache.redis.Array

Wrapper codec that allows representing arrays (i.e. lists).

Encodes elements using an inner codec. The length of each element is encoded using a 16-bit unsigned integer, so elements must not be over 65535 bytes long.

__init__(value_codec)

Parameters:: value_codec (Codec) – Wrapped codec to encode and decode array values.

class tablecache.redis.BoolCodec: Codec that represents bools as single bytes.

class tablecache.redis.StringCodec: Simple str<->bytes codec (UTF-8).

class tablecache.redis.IntAsStringCodec: Codec that represents ints as strings.

class tablecache.redis.FloatAsStringCodec

Codec that represents floats as strings.

Handles infinities and NaNs, but makes no distinction between signalling NaNs (all NaNs are decoded to quiet NaNs).

class tablecache.redis.SignedInt8Codec

class tablecache.redis.SignedInt16Codec

class tablecache.redis.SignedInt32Codec

class tablecache.redis.SignedInt64Codec

class tablecache.redis.UnsignedInt8Codec

class tablecache.redis.UnsignedInt16Codec

class tablecache.redis.UnsignedInt32Codec

class tablecache.redis.UnsignedInt64Codec

class tablecache.redis.Float32Codec

class tablecache.redis.Float64Codec

class tablecache.redis.UuidCodec: Codec for UUIDs.

class tablecache.redis.UtcDatetimeCodec

Codec for UTC datetimes.

Encodes values as an epoch timestamp in a double precision float, so precision is limited to that value range.

Only timezone-naive datetimes ones in timezone UTC can be encoded. Any other value results in a ValueError. Naive datetimes are treated as though they are UTC. When decoding datetimes in UTC are returned.