Efficiently enumerating Dropbox with /delta

Posted by Steve Marx on December 10, 2013

Sometimes your app needs to enumerate all the files it can see. (What exactly your app can see depends on the permissions you chose when you created it.) If you're building a mobile or desktop app with the Sync API, you can just call listFolder (iOS, Android) recursively. Since the Sync API caches this data, these calls will be nearly instantaneous. If you're using the Core API, though, the analog of calling /metadata repeatedly will result in a network call for every folder in Dropbox. Depending on how many folders there are, this can end up being quite slow.

To enumerate all the files in Dropbox efficiently, you should instead call /delta. Although /delta is typically used for ongoing monitoring of changes in a user's Dropbox, the first time you call it (without a cursor parameter), it tells you about every file that already exists. Because it returns a flat list, rather than requiring recursive calls for each folder, calling /delta will result in many fewer calls to the server and be a lot faster.

An example in Python

As an example of how to use /delta for file enumeration, let's build a simple app that lists the top 10 biggest files in a user's Dropbox.

We'll start off by building a list_files() method that returns a dictionary of all files and folders in a user's Dropbox along with their metadata. Handling the response from /delta can be a little tricky. Because /delta is designed for ongoing syncing, each entry it returns is really an instruction that tells your code how to update its internal state. If a returned entry has metadata, it means you should add that metadata to your local state. If a returned entry has no metadata with it, that means you should remove that entry and, in the case of folders, all entries "under" that path. Depending on the number of files, you may also get the has_more flag along with a new cursor. In that case, you should immediately call /delta again to get more changes. The following code handles all of these cases and returns the full dictionary of files and their metadata:

def list_files(client, files=None, cursor=None):
    if files is None:
        files = {}

    has_more = True

    while has_more:
        result = client.delta(cursor)
        cursor = result['cursor']
        has_more = result['has_more']

        for lowercase_path, metadata in result['entries']:

            if metadata is not None:
                files[lowercase_path] = metadata

            else:
                # no metadata indicates a deletion

                # remove if present
                files.pop(lowercase_path, None)

                # in case this was a directory, delete everything under it
                for other in files.keys():
                    if other.startswith(lowercase_path + '/'):
                        del files[other]

    return files, cursor

As an added bonus, this function can also accept an existing dictionary of files and cursor and will then update that dictionary instead of creating a new one.

To complete our little example, we need to call this method with a valid DropboxClient object, and then use the results to find the biggest files:

from dropbox.client import DropboxClient

# ...

files = list_files(DropboxClient(token))

print 'Top 10 biggest files:'

for path, metadata in nlargest(10, files.items(), key=lambda x: x[1]['bytes']):
    print '\t%s: %d bytes' % (path, metadata['bytes'])

That's it! See the full code below if you want to run this yourself. You'll need an access token for your account, which you can either get from your existing code or by following the Core API Python tutorial. If you have any questions or feedback, please share on the developer forum.

Full source code

import heapq
import sys

from dropbox.client import DropboxClient

if len(sys.argv) == 2:
    token = sys.argv[1]
else:
    print 'Usage: python app.py <access token>'
    sys.exit(1)

def list_files(client, files=None, cursor=None):
    if files is None:
        files = {}

    has_more = True

    while has_more:
        result = client.delta(cursor)
        cursor = result['cursor']
        has_more = result['has_more']

        for lowercase_path, metadata in result['entries']:

            if metadata is not None:
                files[lowercase_path] = metadata

            else:
                # no metadata indicates a deletion

                # remove if present
                files.pop(lowercase_path, None)

                # in case this was a directory, delete everything under it
                for other in files.keys():
                    if other.startswith(lowercase_path + '/'):
                        del files[other]

    return files, cursor

files, cursor = list_files(DropboxClient(token))

print 'Total Dropbox size: %d bytes' % sum([metadata['bytes'] for metadata in files.values()])

print

print 'Top 10 biggest files:'

for path, metadata in heapq.nlargest(10, files.items(), key=lambda x: x[1]['bytes']):
    print '\t%s: %d bytes' % (path, metadata['bytes'])