Client-side row versioning in DynamoDB
Edit 12/14/2018: This document is out of date! One of the great things about DynamoDB is that they are constantly improving it. They posted documentation on how to do row versioning in Using Sort Keys for Version Control. I’d also recommend watching Advanced Design Patterns for DynamoDB from re:Invent 2018. The speaker recommends using the new TransactWriteItems operation to get rid of the complexity with ordering requests.
Say your application needs to maintain the history of its records. There are multiple ways of achieving this with DynamoDB. See https://stackoverflow.com/a/24275045/1201381 as an example solution.
Your architecture should be built around your application’s use-cases. The layout below can be used if you don’t expect your users to access older versions frequently.
We can maintain two separate tables for the resources that need versioning.
Resource Table #
The resource
table contains only the latest version for each item.
The primary key of the table is just its partition key.
In the table below, the primary and partition key is just a hash
which is a String. The version
is a Number.
The remaining attributes define your resource.
Since hash
is the only attribute for the primary key, any new entry with the same hash
will overwrite the row.
We assume that any operation that updates a record, a new update, or a rollback should always increment the version number of the item.
hash | version | attr1..attrN |
---|---|---|
1c5815b2 | 2 | some values |
Resource History Table #
The resource-history
table contains every revision of the items.
It will have more storage, but can have a lower read capacity if you do not expect the users to retrieve older entries frequently.
The main difference is that the primary key is a composite key (partitionKey: hash, sortKey: version)
, so every new version
for the same hash will have
its own row.
hash | version | attr1..attrN |
---|---|---|
1c5815b2 | 2 | some values |
1c5815b2 | 1 | some old values |
Create #
Creating a new item involves first writing the item to the resource-history
table, and then writing the same entry to the resource
table.
If the first step fails, then nothing has been written to the tables and the user can safely issue another request.
If the second step fails, then there will be an extra record that’s in the resource-history
table which won’t be accessed by any user.
Read #
Retrieving the latest item requires us just to fetch the record with that hash
from the resource
table. We are guaranteed to have either one or no record for a given hash
.
Update #
Updating an existing item requires us to first fetch the item’s latest version from the resource
table, increment its version, and then write the new entry to both tables just like in CREATE.
The failure scenarios are similar to the CREATE operation. If the new entry is
added only to the resource-history
table, then when the user requests the same update operation,
the previously created entry with the key (hash, v2)
in resource-history
will be replaced.
Delete #
Deleting an item requires us to only delete it from the main resource
table.
Alternatives #
If you work with a single table and try to have immutable records, then the UPDATE operation is going to have a user experience trade-off.
For example, if you decide to implement the stackoverflow answer listed in the introduction, you will have two writes to the same table. Depending on the order you’ve chosen, if the second write operation fails, then the user will either see their item deleted or you’ll lose a historical record.
Client side row versioning is not perfect. Our solution also has drawbacks as explained above, but the customer experience is better than actually losing data.