ChiFS

Logo pending™

Efficient Directory Synchronisation

(This document is not part of the ChiFS protocol specification. It describes a possible future extension of ChiFS. See the protocol overview for context.)

The current Share Api suffers from a limitation that, on each minor change to the shared files, the entire index.json.zst file needs to be downloaded, parsed and compared against what has already been indexed by the Hub. This is not a problem for small Shares - the Share index is small enough that the overhead of a full synchronisation like this is negligible. For larger Shares (e.g. with a 100 MiB index.json), this overhead may become more significant and Hubs need to throttle the update rate to prevent excessive bandwidth and resource usage.

An alternative synchronisation method could allow faster incremental updates. This could be implemented by adding support for a directory listings API, e.g.

These files will contain metadata for one level in the directory tree. As a rough draft of the format:

{
    "files": [
        { "name": "Some video.mp4", .. }
    ],
    "directories": [
        { "name": "subdir", "version": 1 }
    ]
}

The files array has the exact same format as entries in index.json.zst, with the only difference that the path field is now replaced with a name, containing the relative name of this file entry inside this directory.

The directories array lists all subdirectories. The important aspect here is the version field, this is an arbitrary integer that is updated when anything is changed inside a subdirectory. A change here includes adding, removing, moving, renaming a file or directory, or any other change to file metadata. Any version update of a subdirectory should also cause a version update of all its parent directories. The version integer does not have to be a monotonically increasing number, the only thing that matters is that a change in a directory implies a change in the version number. This could be implemented as a simple increase in the number, but it could also generate a new random number on each update or derive the number from a hash of the directory metadata (SipHash or a truncated BLAKE2 hash will more than suffice for this use case).

With this infrastructure in place, Hubs and Clients can index these directories and keep track of the version number. A subsequent synchronisation step could compare the directory versions against the versions seen in the previous fetch, and only download metadata of directories that have been changed.

Two considerations are important to make this work:

Some downsides compared to synchronizing using index.json.zst: