(This document is not part of the ChiFS protocol specification. It describes a possible future extension of ChiFS. See the protocol overview for context.)
ChiFS is primarily concerned with individual files: Each file across the network is uniquely identified by a hash and you can easily and reliably share files with other people by giving them the hash of a file.
This convenience is not available for directories. It is possible to link other people to a path on a specific share, but unlike with file hashes, there is no guarantee that the contents of that directory will be the same by the time they open the link - directories are mutable. Furthermore, it's possible that the Share is not available anymore and the associated metadata is lost, leaving no way to figure out what the link was supposed to represent. This document discusses two possible approaches to make sharing immutable directories as convenient as sharing files: Directory hashing and Bundles.
With this approach, each directory is represented as a (sorted) list of files and subdirectories and their associated hash, e.g.:
b2t:4Z4QfLA1drZesWqAA7KfaxcEhm4VVHM9VS73C4RECGLE about/ b2t:5uopWWrt7HLp8yU5dWtDjRSF2YLYWWW6tkxD651b8YUQ index.html b2t:BgKqxc2f6ND9SGMAfk9k9K8EQBWtWgKjcSXFHKxGUeCk style.css
(A JSON-based format is also possible, but that makes normalization slightly more error-prone). This "directory object" is then hashed and the resulting hash can be used to uniquely identify the directory across the entire ChiFS network. Even if the Share that originally hosted the directory is gone, a directory with the exact same contents can still be found if it is available on another Share. Directory objects and the associated hashes can be generated by a Share and provided through the ShareApi or they can be generated by Hubs while indexing a Share - the details don't matter too much. The Hub should provide an API to lookup directory objects by their hash so that Clients can resolve and download the directory.
This approach is relatively simple and efficient, but has one major shortcoming: If a Share adds/moves/renames even a single file in the directory, the directory hash will change. If all Shares that host a directory of interest make such a change, the link to the old directory hash will go stale and may not be available anymore, even if the contents of the directory are still fully available in the network. Another downside is that one can only link to complete directories; It is not possible to link to only a subset of the files, e.g. to only download the
.jpg files in a directory and skip any large
.mp4's. (Of course, Clients could still provide a user interface to skip downloading files, but this will have to be a manual action, there'd be no generic way to create a link to a subset of the files).
An alternative approach is to generate file "bundles" on demand. A bundle is similar in format to a directory object, but with one major difference: It is a standalone object - it does not refer to other directory objects or bundles - and it represents a full directory tree. E.g.:
b2t:4Z4QfLA1drZesWqAA7KfaxcEhm4VVHM9VS73C4RECGLE about/index.html b2t:BgKqxc2f6ND9SGMAfk9k9K8EQBWtWgKjcSXFHKxGUeCk files/style.css b2t:6inbyJi8M8irG777S1iF1Wg5bATNSuzaCuqbrJRBFpSC files/logo.svg b2t:5uopWWrt7HLp8yU5dWtDjRSF2YLYWWW6tkxD651b8YUQ index.html
As with directory objects, hashing this representation generates a unique hash for the bundle, which can then be shared with other people. Bundles are conceptually similar to multi-file Torrents.
Unlike with directory objects, it is not feasible to generate bundles for every possible directory structure in advance - there are simply far too many combinations. Instead, bundles are generated on demand based on user action. Clients could add a Share button to their ChiFS browser interface, which would allow the user to select which files to include and then generate the bundle.
Bundles share a downside with Torrent files: You need to somehow exchange the bundle metadata with other people for them to be able to download it. Within ChiFS there are two possible approaches to achieve this:
- Hubs can provide a "bundle store" where clients can exchange bundle metadata (kind of like a DHT in the Torrent world, except without the D). The downside is that this adds additional storage and resource requirements to Hubs and Hubs will become a single point of failure for this metadata.
- Bundles can be published as regular files in a Share. For this to be resilient, this sharing should be automatic rather than based on user action. Clients could automatically add every generated and/or downloaded bundle to their own Share. For extra long-term resiliency, it may be possible to introduce specialized "bundle shares" that automatically download, archive and re-share such bundles.
The major advantage of bundles when compared to directory hashes is that bundles offer full flexibility over which files are included and over the directory layout, without compromising on availability. Shares have the full flexibility to rename, move, add and even delete some files (provided that other Shares do still offer these files) without affecting the availability of the bundle.