70 %
Chris Biscardi

Your first CRDT

Conflict-Free Replicated Data Types (CRDTs) are data structures that we can hold multiple copies of, that each copy can be updated by an actor, and that will converge to the same result when all actors' actions are applied to all data structures. This is, of course, not a formal definition and not even particularly correct. It helps visualize the next example though. Using CRDTs you can put some data on a server and in two different people's browsers. One user can then go offline and make changes while the other stays online and makes changes. The online user syncs with the server's data while the offline user eventually comes back online and syncs with the server. The server's representation now includes the changes from both users.

Offline and Collaborative

This is what makes CRDTs interesting. They support offline and online collaborative use cases. Compared to Operational Transforms (OTs), CRDTs are much newer while OTs have been used to make products such as Google Docs.

It is mathematically always possible to merge or resolve concurrent updates on different replicas of the data structure without conflicts. How is this possible? To be a CRDT the algorithm that merges two data structures together needs to satisfy three properties: Commutativity, Associativity, and Idempotency.

A quick definition for each:

Commutativity: Multiplication is commutative. So is addition. This means that moving things around doesn't matter. Our data can be "on the left" or "on the right". For example: 2 + 3 is the same result as 3 + 2.

Associativity: Multiplication and addition are both associative. This means that the order of operations doesn't matter. We can apply one operation and then another, or vice versa and still get the same result. For example: 2 + (3 + 4) is the same as (2 + 3) + 4.

Idempotency: An operation is said to be idempotent if we can apply the operation multiple times and get the same result. Consider multiplication by 1. If we do 3 x 1 we get the same result as doing (3 x 1) x 1 or really any amount of times we want to multiply by 1.

Now this restricts the kinds of operations we can perform so let's look at some operations we can perform.

Union of a set

The JavaScript Set we are about to use either has a number in it or it doesn't. Let's say we have two Sets, new Set([1,2,3]) and newSet([3,4,5]). We'll start a node repl in the terminal (or you can use the browser console if you prefer). First we define setA, then we define setB. We log both sets to show their contents are as we expect. Then we create a new Set from the contents of the first two Sets.

JS
> const setA = new Set([1,2,3]);
undefined
> const setB = new Set([3,4,5]);
undefined
> setA
Set { 1, 2, 3 }
> setB
Set { 3, 4, 5 }
> const union = new Set([...setA, ...setB]);
undefined
> union
Set { 1, 2, 3, 4, 5 }

If we say that the only action we will ever take is to add numbers to a set, then we've created a Grow-Only Set. Our first CRDT.

Imagine that each of these Sets started out empty and on a different person's computer, each having the equivalent of new Set([]). Person A adds 1, then 2, then 3 which Person B adds 3, then 4, then 5. No matter how many numbers are added, because our merge operation (union) satisfies Commutativity, Associativity, and Idempotency, we will end up with the same results on both computers. All we have to do is make sure both people eventually receive all of the added numbers.