Arq Cloud Backup Data Format

Arq Cloud Backup stores backups in a format inspired by the open-source version control system git. File data are stored as encrypted, compressed, de-duplicated “blobs”. Metadata such as backup record information and directory structure are also stored as encrypted, compressed, de-duplicated objects.

Direct Access

You can directly access your backup data at Wasabi. Follow the instructions in Direct Access.

File Structure

The files in your Arq Cloud Backup account are structured as follows:

/encrypted_master_keys.dat
/encrypted_password_recovery_keys.dat
/encrypted_password_recovery_password.dat
/master_keys_history/
    /encrypted_master_keys_1538942631.dat
    ...
/plans/
    /<uuid>
        /blobs/
            /00/
                039324f6872acd3e51f9d557b2b906d6576eba8ccecf7b3556e2fe1afb249e
                ...
            /01/
            ...
        /commits/
            /00/
                22f4d5c1b95f00c4982523e0160ff561d10f1f61212ec628889fec6a57fe25
                ...
            /01/
            ...
        /trees/
            /00/
                63a1b267c16d9993af9de5fa223e4d355230c74a13b833dbe97d8ad6af6dc4
                ...
            /01/
            ...
        head_commit_id
        stats

/encrypted_master_keys.dat

This file contains the master keys used to encrypt your backup data. It’s encrypted with either your account password (if you chose the default encryption option when you created your account) or your separate encryption password (if you chose a separate encryption password when you created your account).

The encrypted_master_keys.dat file is 193 bytes:

 25 bytes: header    "ARQ_ENCRYPTED_MASTER_KEYS"
  8 bytes: salt
 32 bytes: HMAC-SHA256 of IV + encrypted key set using derived HMAC key
 16 bytes: IV
112 bytes encrypted key set

To decrypt the encrypted_master_keys.dat file:

  1. Derive a 64-byte key from the user’s password using 200,000 rounds of PBKDF2-HMAC-SHA1 and the salt.
    • The first 32 bytes will be used for decryption in step 4 below
    • The second 32 bytes will be used for verifying the HMAC in step 2 below
  2. Calculate the HMAC-SHA256 of the last 128 bytes of the file and compare to the given HMAC to verify the password and integrity of the data.
  3. Decrypt the last 112 bytes using AES-256/CBC, the first 32-bytes of the derived key from step 1 as the key, and the IV.
  4. Use the resulting 96 bytes as 3 32-byte keys:
    • encryption key
    • HMAC keys
    • salt for BlobId

The file KeySet.m contains Objective-C code that decrypts an encrypted_master_keys.dat file.

/encrypted_password_recovery_keys.dat and /encrypted_password_recovery_password.dat

If did not choose a separate encryption password at account-creation time, these 2 files contain information we can use to reset your password for you. The file contents and password reset process are documented in Password Recovery.

/master_keys_history/

Each time you change your encryption password, Arq Cloud Backup overwrites the file /encrypted_master_keys.dat. As a safety measure, it also writes the new encrypted_master_keys.dat file into /master_keys_history/ with a timestamp added to the name, e.g. /master_keys_history/encrypted_master_keys_1542106003.dat. That way if you forget your encryption password, you could email support@arqcloudbackup.com and we could overwrite your encrypted_master_keys.dat file with one of those history files that you remember the password for.

/plans/

Each UUID subdirectory under /plans/ contains all backup data for 1 computer in your account, including Commits, Trees and Blobs (see definitions below).

The file /plans/<uuid>/head_commit_id contains the BlobId of the head Commit.

The head Commit has a root Tree per volume and a parent Commit (unless it’s the initial Commit).

Each Tree contains a dictionary of Nodes by name. Each Node has either the BlobId of a Tree, or the BlobId(s) of a file. The Trees and Nodes represent the folders and files within the volume.

/plans/<uuid>/stats

The stats file contains information used by Arq Cloud Backup to calculate the correct billing amount each month. Here’s an example:

{
  "blobsStoredSize" : 8041504,
  "commitsStoredSize" : 5692,
  "treesStoredSize" : 3380
}

Blob

A Blob is a collection of encrypted bytes. It contains the following:

4 bytes: header “ARQO” 32 bytes: HMAC-SHA256 16 bytes: master IV 64 bytes: encrypted metadata remainder: ciphertext

To decrypt a Blob:

  1. Verify that the first 4 bytes are “ARQO”.
  2. Using the HMAC keys from encrypted_master_keys.dat, calculate the HMAC-SHA256 of all the data except the first 36 bytes. Verify that it matches the HMAC-SHA256 in the Blob.
  3. Decrypt the 64 bytes of metadata using the decryption key in encrypted_master_keys.dat.
  4. Decrypt the ciphertext using AES-256/CBC with the bytes 1-16 of the decrypted metadata as the IV and bytes 17-48 of the decrypted metadata as the key.

The file Encryptor.m contains Objective-C code that decrypts a Blob.

Blobs are stored at /plans/<uuid>/blobs/<sha256 prefix>/<sha256 suffix> where ‘sha256’ is the BlobId of the Blob (see next section).

BlobId

A BlobId is a SHA-256 hash of the the salt from encrypted_master_keys.dat concatenated with the plaintext of the Blob. Arq Cloud Backup achieves de-duplication by storing Blobs using BlobId as the name, so that each unique Blob is only stored once in the cloud. Blobs are stored in a subdirectory named with the first 2 characters of the hex representation of the SHA-256 hash. The remaining 62 characters are the filename of the file within that subdirectory.

Documentation Conventions

In the following data format descriptions:

Tree

A Tree is a Blob with the following format:

4 bytes: version (1)
 UInt32: nodeCount
nodeCount times:
    String: nodeName
    Node

A Node has the following format:

1 byte: isATree (0 or 1)
UInt32: computerOSType (1=macOS, 2=Windows)
UInt64: blobDescriptorCount
blobDescriptorCount times:
    BlobDescriptor: tree blob descriptor or data blob descriptor (multiple descriptors for large files))
1 byte: aclBlobDescriptorNotNil (0 or 1)
if aclBlobDescriptorNotNil:
    BlobDescriptor: acl blob descriptor
1 byte: xattrsBlobDescriptorNotNil (0 or 1)
if xattrsBlobDescriptorNotNil:
    BlobDescriptor: xattrs blob descriptor
UInt64: item size
UInt64: number of contained files
 Int64: modification time (from struct timespec.tv_sec)
 Int64: modification time nanoseconds (from struct timespec.tv_nsec)
 Int64: change time (from struct timespec.tv_sec)
 Int64: change time nanoseconds (from struct timespec.tv_nsec)
 Int64: creation time (from struct timespec.tv_sec)
 Int64: creation time nanoseconds (from struct timespec.tv_nsec)
 String: user name
 String: group name
 Int32: struct stat.st_dev (Mac only)
 Int64: struct stat.st_ino (Mac only)
 UInt32: struct stat.st_mode (Mac only) (16-bit signed integer stored as UInt32)
 UInt32: struct stat.st_nlink (Mac only) (16-bit signed integer stored as UInt32)
 UInt32: struct stat.st_uid (Mac only) (16-bit signed integer stored as UInt32)
 UInt32: struct stat.st_gid (Mac only) (16-bit signed integer stored as UInt32)
 Int32: struct stat.st_rdev
 UInt32: struct stat.st_flags
 UInt32: attributes (Windows only)

A BlobDescriptor has the following format:

String: BlobId
UInt64: size

Trees are stored at /plans/<uuid>/trees/<sha256prefix>/<sha256suffix>.

Commit

A Commit is a Blob with the following format:

10 bytes: header ("PLANCOMMIT")
UInt64: planCommitVolumeCount
planCommitVolumeCount times:
    PlanCommitVolume
String: parentCommitId
Date: creationDate
Data: backup plan JSON
String: version of Arq Cloud Backup that created this Commit
UInt32: Commit version
UInt64: errorCount
errorCount times:
    PlanCommitError

A PlanCommitVolume has the following format:

String: disk identifier
String: name
String: mount point
Node: root "tree" node of the files backed up for this volume

A PlanCommitError has the following format:

Data: JSON

The PlanCommitError JSON is a dictionary:

The ErrorJSON is a dictionary:

Commits are stored at /plans/<uuid>/commits/<sha256prefix>/<sha256suffix>.

Backup Plan JSON

The Backup Plan settings are stored in JSON format in every Commit. The following is an example Backup Plan JSON object:

{
  "excludedWiFiNetworkNames" : [
    "Google Starbucks"
  ],
  "uuid" : "54c27373-4f9a-41ad-b779-e23287af8df5",
  "exclusionsAreDefaults" : false,
  "throttleKBPS" : 0,
  "slashedIgnoredRelativePathArraysByDiskIdentifier" : {
    "8F51B214-72F4-39C7-AB9E-8730AC577513" : [
      "\/.DS_Store\/",
      "\/opt\/",
      "\/usr\/",
      "\/Volumes\/",
      "\/System\/",
      "\/etc\/",
      "\/.MobileBackups.trash\/",
      ...
    ]
  },
  "pauseOnBattery" : false,
  "thinCommits" : true,
  "retentionMonths" : 60,
  "parallelism" : 3,
  "throttleEnabled" : false,
  "includeNewVolumes" : true,
  "excludedNetworkInterfaces" : [

  ],
  "preventSleep" : false,
  "networkSharesByDiskIdentifier" : {

  },
  "exclusions" : [
    {
      "type" : 1,
      "text" : ".Trash"
    },
    {
      "type" : 1,
      "text" : ".Trashes"
    },
    ...
  ],
  "backupPlanVolumesByDiskIdentifier" : {
    "8F51B214-72F4-39C7-AB9E-8730AC577513" : {
      "mountPoint" : "\/",
      "diskIdentifier" : "8F51B214-72F4-39C7-AB9E-8730AC577513",
      "name" : "Macintosh HD",
      "included" : true
    },
    "%2FVolumes%2FGoogleDrive" : {
      "mountPoint" : "\/Volumes\/GoogleDrive",
      "diskIdentifier" : "%2FVolumes%2FGoogleDrive",
      "name" : "Google Drive",
      "included" : true
    }
  },
  "name" : "Cloud Backup"
}

Most of the values in the JSON are self-explanatory and correspond to controls in the Settings dialog of Arq Cloud Backup.

The exclusions section is a list of excluded items of up to 5 different types: