Edit 12/14/2018: This document is out of date! One of the great things about DynamoDB is that they are constantly improving it. They posted documentation on how to do row versioning in Using Sort Keys for Version Control. I’d also recommend watching Advanced Design Patterns for DynamoDB from re:Invent 2018. The speaker recommends using the new TransactWriteItems operation to get rid of the complexity with ordering requests.


Say your application needs to maintain the history of its records. There are multiple ways of achieving this with DynamoDB. See https://stackoverflow.com/a/24275045/1201381 as an example solution.

Your architecture should be built around your application’s use-cases. The layout below can be used if you don’t expect your users to access older versions frequently.

We can maintain two separate tables for the resources that need versioning.

Resource Table #

The resource table contains only the latest version for each item. The primary key of the table is just its partition key. In the table below, the primary and partition key is just a hash which is a String. The version is a Number. The remaining attributes define your resource. Since hash is the only attribute for the primary key, any new entry with the same hash will overwrite the row.

We assume that any operation that updates a record, a new update, or a rollback should always increment the version number of the item.

hash version attr1..attrN
1c5815b2 2 some values

Resource History Table #

The resource-history table contains every revision of the items. It will have more storage, but can have a lower read capacity if you do not expect the users to retrieve older entries frequently. The main difference is that the primary key is a composite key (partitionKey: hash, sortKey: version), so every new version for the same hash will have its own row.

hash version attr1..attrN
1c5815b2 2 some values
1c5815b2 1 some old values

Create #

Create

Creating a new item involves first writing the item to the resource-history table, and then writing the same entry to the resource table.
If the first step fails, then nothing has been written to the tables and the user can safely issue another request.
If the second step fails, then there will be an extra record that’s in the resource-history table which won’t be accessed by any user.

Read #

Read

Retrieving the latest item requires us just to fetch the record with that hash from the resource table. We are guaranteed to have either one or no record for a given hash.

Update #

Update

Updating an existing item requires us to first fetch the item’s latest version from the resource table, increment its version, and then write the new entry to both tables just like in CREATE. The failure scenarios are similar to the CREATE operation. If the new entry is added only to the resource-history table, then when the user requests the same update operation, the previously created entry with the key (hash, v2) in resource-history will be replaced.

Delete #

Delete

Deleting an item requires us to only delete it from the main resource table.

Alternatives #

If you work with a single table and try to have immutable records, then the UPDATE operation is going to have a user experience trade-off.

For example, if you decide to implement the stackoverflow answer listed in the introduction, you will have two writes to the same table. Depending on the order you’ve chosen, if the second write operation fails, then the user will either see their item deleted or you’ll lose a historical record.

Client side row versioning is not perfect. Our solution also has drawbacks as explained above, but the customer experience is better than actually losing data.