Using the Datastore API in Python

These days, your app needs to store and sync more than just files. With the Datastore API, structured data like contacts, to-do items, and game state can be synced effortlessly. Datastores support multiple platforms, offline access, and automatic conflict resolution.

Here are the basic concepts that underlie the Datastore API:

Client and datastore manager
The client is your starting point. It lets your app start the authentication process to link with a user's Dropbox account. Once you've linked to an account, you use the client to create a datastore manager, which you can use to open datastores, get a list of datastores, wait for changes to multiple datastores, and so on.
Datastores and tables

Datastores are containers for your app's data. Each datastore contains a set of tables, and each table is a collection of records. As you'd expect, the table allows you to query existing records or insert new ones.

A datastore is cached locally once it's opened, allowing for fast access and offline operation. Datastores are also the unit of transactions; changes to one datastore are committed independently from another datastore. After modifying a datastore, call the commit method to send those changes to Dropbox. Call the load_deltas method to receive new changes from Dropbox.

Records

Records are how your app stores data. Each record consists of a set of fields, each with a name and a value. Values can be simple objects, like strings, integers, and booleans, or they can be lists of simple objects. A record has an ID and can have any number of fields.

Unlike in SQL, tables in datastores don't have a schema, so each record can have an arbitrary set of fields. While there's no requirement to have the same fields, it makes sense for all the records in a table to have roughly the same fields so you can query over them.

Now that you're familiar with the basics, read on to learn how to get the Datastore API running in your app.

Setting up the SDK

If you want to code along with this guide, start by visiting the SDKs page for instructions on downloading the SDK and setting up your project. A complete working example is available in examples/datastore_app/tasks.py within the SDK download.

Authenticating a user

The Datastore API uses OAuth v2, but the Python SDK will take care of most of it so you don't have to start from scratch.

In this example, we'll build a web app with Flask, and we'll add the following routes to authenticate users and store their access tokens in a session variable:

DROPBOX_APP_KEY = '<YOUR APP KEY>'
DROPBOX_APP_SECRET = '<YOUR APP SECRET>'

@app.route('/')
def home():
    if not 'access_token' in session:
        return redirect(url_for('dropbox_auth_start'))
    return 'Authenticated.'

@app.route('/dropbox-auth-start')
def dropbox_auth_start():
    return redirect(get_auth_flow().start())

@app.route('/dropbox-auth-finish')
def dropbox_auth_finish():
    try:
        access_token, user_id, url_state = get_auth_flow().finish(request.args)
    except:
        abort(400)
    else:
        session['access_token'] = access_token
    return redirect(url_for('home'))

def get_auth_flow():
    redirect_uri = url_for('dropbox_auth_finish', _external=True)
    return DropboxOAuth2Flow(DROPBOX_APP_KEY, DROPBOX_APP_SECRET, redirect_uri,
                             session, 'dropbox-auth-csrf-token')

Note that before running this code, you'll need to first create a Dropbox API app, substitute your app key and secret in the code, and add the appropriate OAuth redirect URI for your app in the App Console. By default, Flask runs apps locally on port 5000, so your redirect URI will be http://127.0.0.1:5000/dropbox-auth-finish. Once you've set up the redirect URI and added these routes, your app should redirect you to dropbox.com to authorize the app and then display the message "Authenticated."

Creating a datastore and your first table

With an access token in hand, the next step is to open the default datastore. Each app has its own default datastore per user. The following code is meant to be run in a Flask route handler and uses the session variable we set above on authentication.

access_token = session['access_token']
client = DropboxClient(access_token)
manager = DatastoreManager(client)
datastore = manager.open_default_datastore()

In order to store records in a datastore, you'll need to put them in a table. Let's define a table named "tasks":

tasks_table = datastore.get_table('tasks')

In the future, you might choose to add more tables to store related sets of things such as a "settings" table for the app or a "people" table to keep track of people assigned to each task. For now, this app is really simple so you only need one table to hold all your tasks.

You've got a datastore manager, a datastore for your app, and a table for all the tasks you're about to make. Let's start storing some data.

Working with records

A record is a set of name and value pairs called fields, similar in concept to a map. Records in the same table can have different combinations of fields; there's no schema on the table which contains them.

first_task = tasks_table.insert(taskname='Buy milk', completed=False)

This task is now in memory, but hasn't been synced to Dropbox. Thankfully, that's simple:

datastore.commit()

Note that commit can fail if another instance of your app has updated the datastore since you last loaded changes from Dropbox, in which case you'll receive a DatastoreConflictError. If this happens, you need to roll back your local changes, load the latest changes from the server, and retry your commit. The code below is a simple implementation of this retry logic:

for _ in range(4):
    try:
        tasks_table.insert(taskname='Buy milk', completed=False)
        datastore.commit()
        break
    except DatastoreConflictError:
        datastore.rollback()    # roll back local changes
        datastore.load_deltas() # load new changes from Dropbox

Because this a common pattern for working with datastores, the Python SDK provides a method called transaction that encapsulates this logic. The above example rewritten to use transaction() looks like this:

def do_insert():
    tasks_table.insert(taskname='Buy milk', completed=False)
datastore.transaction(do_insert, max_tries=4)

This is the preferred way to commit a transaction to a datastore. After running this code, visit the datastore browser, and you should see your newly created task.

Accessing data from a record is straightforward:

task_name = first_task.get('taskname')

Editing tasks is just as easy. This is how you can mark the first result as completed:

def do_update():
    first_task.set('completed', True)
datastore.transaction(do_update, max_tries=4)

Finally, if you want to remove the record completely, just call delete().

def do_delete():
    first_task.delete()
datastore.transaction(do_delete, max_tries=4)

Querying records

You can query the records in a table to get a subset of records that match a set of field names and values you specify. The query method takes a set of conditions that the fields of a record must match to be returned in the result set. For each included condition, all records must have a field with that name and that field's value must be exactly equal to the specified value. For strings, this is a case-sensitive comparison (e.g. "abc" won't match "ABC").

tasks = tasks_table.query(completed=False)
for task in tasks:
    print task.get('taskname')

tasks is a list of Record objects.

The records that meet the specified query are not returned in any guaranteed order. The entire result set is returned as a Python list, so you can sort in memory after the request completes.

If no condition set is provided, the query will return every record in the table.

tasks = tasks_table.query()

Records and fields

The record is the smallest grouping of data in a datastore. It combines a set of fields to make a useful set of information within a table.

Record IDs

Each record has a string ID. An ID can be provided when a record is created, or one will be automatically generated and assigned if none is provided. Once a record is created, the ID cannot be changed.

Other records can refer to a given record by storing its ID. This is similar to the concept of a foreign key in SQL databases.

Field types

Records can contain a variety of field types. Earlier in this tutorial, you saw strings and booleans, but you can also specify a number of other types. Here is a complete list of all supported types:

  • String (str or unicode)
  • Boolean (bool)
  • Integer (int or long) – 64 bits, signed
  • Floating point (float) – IEEE double
  • Date (dropbox.datastore.Date) – POSIX-like timestamp stored with millisecond precision.
  • Bytes (dropbox.datastore.Bytes) – Arbitrary data, which is treated as binary, such as thumbnail images and compressed data. Individual records can be up to 100KB, which limits the size of the data. If you want to store larger files, you should use the Core API and reference the paths to those files in your records.
  • List (dropbox.datastore.List) – A special value that can contain other values, though not other lists.

Customizing conflict resolution

Unlike the mobile and JavaScript SDKs, the Python SDK does not perform automatic conflict resolution. Instead, the function you pass to transaction() will be retried as necessary and must perform the appropriate action based on the current contents of the datastore. For example, the following code will properly increment the value of the count field:

def do_increment():
    new_count = record.get('count') + 1
    record.set('count', new_count)
datastore.transaction(do_increment, max_tries=4)