How the Datastore API Handles Conflicts – Part 1: Basics of Offline Conflict Handling

Posted by Guido van Rossum on July 30, 2013

At DBX we launched the Datastore API, which offers automatic conflict resolution for structured data. In this series of blog posts I explain how the datastore implementation detects and resolves conflicts. This is a multi-part series, because it is a complex algorithm and I want to do it justice.

Introduction

There are many interesting ideas underlying the datastore implementation, but the most basic one is to represent changes as objects. Those objects can be inspected, copied, serialized, persisted, transferred between clients and servers, and so on. You may recognize the Command pattern here.

The Dropbox server stores the full list of changes for each datastore, and the state (a.k.a. snapshot) of a datastore can be obtained by executing the entire list of changes, in sequence, starting with an empty datastore. (The practical way to obtain a snapshot is different, and more efficient. :-)

The list of changes stored on the server is structured into deltas, where each delta is a list of changes that has been labeled with a revision number. Revision numbers can also be used to refer to specific datastore snapshots (though not every snapshot is assigned a revision). The initial state of a datastore, completely empty, always has revision 0, and each delta increments the datastore's revision by 1. A delta is labeled with the revision number of the state that precedes it, so the very first delta has revision 0, and after executing it the datastore has revision 1. For this reason we sometimes call the delta's label its base revision.

Example Scenario

For this example I am making up an informal notation to represent changes that should be easy to understand. I need to be able to show two different types of things: datastore snapshots and deltas. Here's a picture of an empty datastore snapshot:

Datastore: revision 0

Table ID Record ID Values

It contains zero records! The headings show what's to come, though: each record has a table ID, a record ID, and some values. In this example there's only one table, so the table ID will always be the same. The record ID distinguishes records from each other. The values column contains simple values (like strings or numbers) each labeled with a field name. Here's a picture of a datastore snapshot with two records:

Datastore: revision 1

Table ID Record ID Values
T1 r1 name="Jack", age=6
T1 r2 name="Jill", age=5

Here are some deltas:

Base revisionChanges
0. insert T1:r1 {name="Jack", age=6}; insert T1:r2 {name="Jill", age=5}
1. update T1:r2 {age=6}
2. delete T1:r1; insert T1:r3 {name="Fred", age=42}

We've already seen the initial state of the datastore (revision 0 above, empty) and the state after executing delta 0 (revision 1 above). After executing delta 1, the datastore's contents is:

Datastore: revision 2

Table ID Record ID Values
T1 r1 name="Jack", age=6
T1 r2 name="Jill", age=6

Finally, after delta 2, the datastore's contents is this:

Datastore: revision 3

Table ID Record ID Values
T1 r2 name="Jill", age=6
T1 r3 name="Fred", age=42

Each delta originates on a particular device. For example, it is possible that delta 0 was generated by device A, delta 1 by device B, and delta 2 by device A again. (The identity of the originating device is not recorded with the delta, as it is not relevant to reconstructing the datastore state later.)

The Dropbox server sends notifications for deltas to devices when the device is online. (This is accomplished through so-called HTTP long-polling, but the actual mechanism isn't relevant to understanding conflict resolution.) In the above example, we will see the following traffic between the server and the two devices:

The same algorithm works for any number of devices. All devices that are currently online except the one originating a delta receive a copy of that delta. Devices that are currently offline will receive it once they come back online.

A Conflict

Now suppose the last message ("server to B: delta 2") is lost, because device B has gone offline. At this point device B has a datastore snapshot with revision 2, while A and the server are at revision 3.

While B is offline with a stale datastore snapshot, the user directs it to make a change, for example to change the age of one of the surviving characters. Once B comes back online, it attempts to send the server a delta containing this change. Because B's datastore snapshot had revision 2, it will label this delta with base revision 2:

Base revision Changes
2. update T1:r2 {age=7}

The server, knowing its datastore is already at revision 3, rejects this delta. In order to decide to reject a delta, the server doesn't look at its contents — if the revision doesn't match, that results in an immediate rejection.

When the server rejects a delta, it also sends the list of deltas that will take a datastore from the device's current revision to the latest revision seen by the server. In this example that's just the same delta that the server received earlier from device A. It is then up to device B to resolve the conflict and send the server an updated delta representing its resolution of the conflict. Here are the precise messages exchanged in this scenario:

The delta labeled with base revision 2 sent by B to the server does not contain the same changes as the delta 2 sent by the server to B in response. In these diagrams we don't concern ourselves with the specific changes, only with the base revisions used to label deltas. However, the delta 2 sent by the server to B is the same one it received from A in the previous diagram, and also the one that was lost on its way from the server to B previously.

Next, let's look at how the client library actually resolves a conflict.

Conflict Resolution Basics

Let's assume the first three three deltas accepted by the server are the ones we showed earlier. Let's furthermore assume that B's rejected delta contained the single change:

	update T1:r2 {age=7}   [rejected]

(Jill is aging rapidly :-). The delta from A that B missed contains these changes:

	delete T1:r1; insert T1:r3 {name="Fred", age=42}   [accepted]

As you can probably see, these deltas "commute" perfectly, by which we mean that it doesn't matter in which order they are executed: the resulting datastore state is the same either way. Therefore, "resolving" this particular conflict is trivial, and B just re-sends the same change relabeled as delta 3. If in the meantime the server didn't receive any other changes, the server will accept this, and transmit a copy of this change to A as well.

Let's look at the datastore snapshots in device B. The last revision before the conflict was revision 2 shown earlier. After locally updating Jill's age, it has tentatively reached the following snapshot state:

Datastore revision: ? (derived from 2)

Table ID Record ID Values
T1 r1 name="Jack", age=6
T1 r2 name="Jill", age=7

However, after receiving the rejection of its delta, it rolls back to the previous snapshot (revision 2 again, as shown earlier). It then applies the delta it received from the server, obtaining an accurate copy of revision 3 shown earlier. Finally, after resolving the conflict, it re-applies the update to Jill's age, and reaches the following snapshot state:

Datastore revision: ? (derived from 3)

Table ID Record ID Values
T1 r2 name="Jill", age=7
T1 r3 name="Fred", age=42

Once the server has accepted the delta, the datastore revision is updated to 4, without changing the contents of the snapshot.

Up Next

In the next installment I'll show how the client library handles more interesting conflicts, which we call collisions — for example, what should happen if one device changes Fred's age to 43 and the other sets it to 17 (the answer may surprise you :-). But before I sign off for the day I'd like to address a question that might occur to certain academic readers and fans of Google Wave:

P.S. What about OT?

If you’re familiar with the theory of Operational Transformation (OT), you might be surprised that the server doesn’t even attempt to resolve conflicts. We use OT-style conflict resolution on the client, but leave the server to simply serialize requests. This allows conflict resolution to be defined by the client — your app — giving you more freedom than traditional approaches (which usually require that the rules for conflict resolution be fixed). As you will see next time, a Datastore API client may customize conflict resolution to fit its own requirements.

We realize that this occasionally leads to redundant network traffic, when the same (or almost the same) delta has to be transmitted a second time. But this should be relatively rare, and if this becomes a serious problem we can optimize it. Until then, we think it's a small price to pay for the added flexibility.