Content Hash

In order to allow API apps to verify uploaded contents or compare remote files to local files without downloading them, the FileMetadata object contains a hash of the file contents in the content_hash property.

To verify that the server’s copy of the file is identical to yours, make sure the server-generated content_hash is identical to your locally-computed version of the file’s content hash.

To calculate the content_hash of a file:

  1. Split the file into blocks of 4 MB (4,194,304 or 4 * 1024 * 1024 bytes). The last block (if any) may be smaller than 4 MB.
  2. Compute the hash of each block using SHA-256.
  3. Concatenate the hash of all blocks in the binary format to form a single binary string.
  4. Compute the hash of the concatenated string using SHA-256. Output the resulting hash in hexadecimal format.

Note there is no block for an empty file of zero length. In this case an empty string would be formed in step 3 above.

Here is an example of running the above algorithm on this image of the Milky Way from NASA.

The file is 9.7 MB (9,711,423 bytes) in size. We divide it into three blocks and run SHA-256 on each block.

Block Size SHA-256 (32-byte value, shown in hex)
1 4194304 2a846fa617c3361fc117e1c5c1e1838c336b6a5cef982c1a2d9bdf68f2f1992a
2 4194304 c68469027410ea393eba6551b9fa1e26db775f00eae70a0c3c129a0011a39cf9
3 1322815 7376192de020925ce6c5ef5a8a0405e931b0a9a8c75517aacd9ca24a8a56818b

Concatenate the three 32-byte hashes to get a single 96-byte value. Run SHA-256 on the concatenated value, then hex-encode the result, yielding 485291fa0ee50c016982abbfa943957bcd231aae0492ccbaa22c58e3997b35e0.

Example code in some popular languages are available in our Github repo.

You can assume that the content_hash field would always be available and we would not change the way to generate it. However in the unlikely case where we decide to change it in the future, we want to keep the transition process as smooth as possible by declaring the field as optional. We may provide the new representation in another field and stop providing the old representation after a certain time. We will provide advanced notice to developers with the details of the transition process.