Backing Up Dataless Files

What dataless files are

A dataless file (also called a cloud-only file or a placeholder) is a file whose metadata (name, size, modification date, etc.) exists on local disk, but whose content lives only in the cloud.

On either platform, when disk space is tight — or when you take a deliberate action like enabling “Optimize Mac Storage” or right-clicking a file in File Explorer and choosing “Free up space” — the cloud provider evicts the local content. The file’s icon shows a small cloud badge. Reading the file (for example, opening it in an app) asks the provider to download the content again — a process called materialization.

Why snapshot-based reads can’t capture dataless content

Arq by default takes a point-in-time snapshot of the source volume at the start of each backup, then reads files from the snapshot. This gives Arq a consistent view of the filesystem and avoids problems caused by files changing during the backup. On macOS the snapshot is an APFS snapshot; on Windows it’s a VSS shadow copy.

Dataless files don’t work with this approach. The snapshot captures the placeholder (the metadata stub) but not the file’s contents, because the contents aren’t on the local disk when the snapshot is taken. Worse, reading a dataless file from inside a snapshot does not trigger materialization — the cloud provider only materializes files in the live filesystem, not in snapshots.

To back up a dataless file, Arq has to bypass the snapshot and read the file live from its real path. Reading it live triggers the cloud provider to download the content, and Arq backs up the materialized content. Because the read happens outside the snapshot, the file could in principle be modified during the backup; Arq accepts this trade-off because it’s the only way to capture the file’s contents at all.

Dataless files need a user session to materialize

Materialization is performed by a process running inside the logged-in user’s session:

In both cases, Arq can only back up a user’s dataless files while that user is logged in. Backups that run when no user is logged in (for example, scheduled overnight backups on a machine that’s been left at the login screen) will fail on any dataless file they encounter.

The Dataless files / Cloud-only files plan option

In your backup plan’s Options section, the Dataless files menu (called Cloud-only files on Windows) has three choices:

Min free disk space (Windows only)

When Materialize and back up is selected on Windows, the Min free disk space (GB) setting protects you against accidentally filling up the disk. Before materializing a dataless file, Arq checks the free space on the volume that contains the file. If materializing this file would push free space below the threshold you set, Arq logs an error and skips the file instead of downloading it.

Leave the field blank to disable the gate (no minimum). The default is 10 GB.

How materialized files are re-evicted

Materializing a dataless file fills its content onto local disk. If every materialized file stayed in that state, a single backup could consume gigabytes of disk space that the user had deliberately freed up.

On macOS, Arq tracks every dataless file it materializes during a backup and tells macOS to evict each one after backing it up — telling FileProvider to discard the local content and return the file to its dataless state. If eviction fails for some reason (for example, FileProvider is unresponsive), the file simply stays materialized on local disk and the cloud provider’s own background cleanup will eventually re-evict it.

On Windows, Arq does not force the file back to its evicted state after backing it up. Windows doesn’t provide an API for Arq to do so. Re-eviction happens via your sync engine’s own background cleanup:

This is why the Min free disk space (GB) setting matters on Windows: it limits how much disk space Arq will let dataless files consume during a backup, while you wait for your sync engine to re-evict them.

Arq only materializes changed files

Materializing a dataless file is expensive — it has to download the content from the cloud. To minimize that cost, Arq compares each dataless file’s metadata (size, modification date, etc.) against the previous backup record before deciding to materialize it. If the file is unchanged since the last backup, Arq reuses the previously backed-up content and skips materialization entirely.

This means the first backup of a folder full of dataless files is slow (every file has to be downloaded), but subsequent backups touch only the files you have actually modified.