- when reading files in from the local filesystem (such as when you run "tahoe backup" to back up your local files to a Tahoe-LAFS grid);
- when writing files out to the local filesystem (such as when you run "tahoe cp -r" to recursively copy files out of a Tahoe-LAFS grid);
- when displaying filenames to the terminal (such as when you run "tahoe ls"), subject to limitations of the terminal and locale;
- when parsing command-line arguments, except on Windows.
Correct handling of Small Immutable Directories
Immutable directories can now be deep-checked and listed in the web UI in all cases. (In v1.6.0, some operations, such as deep-check, on a directory graph that included very small immutable directories, would result in an exception causing the whole operation to abort.) (#948)
Immutable Directories
Tahoe-LAFS can now create and handle immutable directories. (#607, #833, #931) These are read just like normal directories, but are "deep-immutable", meaning that all their children (and everything reachable from those children) must be immutable objects (i.e. immutable or literal files, and other immutable directories).
These directories must be created in a single webapi call that provides all of the children at once. (Since they cannot be changed after creation, the usual create/add/add sequence cannot be used.) They have URIs that start with "URI:DIR2-CHK:" or "URI:DIR2-LIT:", and are described on the human-facing web interface (aka the "WUI") with a "DIR-IMM" abbreviation (as opposed to "DIR" for the usual read-write directories and "DIR-RO" for read-only directories).
Tahoe-LAFS releases before 1.6.0 cannot read the contents of an immutable directory. 1.5.0 will tolerate their presence in a directory listing (and display it as "unknown"). 1.4.1 and earlier cannot tolerate them: a DIR-IMM child in any directory will prevent the listing of that directory.
Immutable directories are repairable, just like normal immutable files.
The webapi "POST t=mkdir-immutable" call is used to create immutable directories. See docs/frontends/webapi.txt for details.
"tahoe backup" now creates immutable directories, backupdb has dircache
The "tahoe backup" command has been enhanced to create immutable directories (in previous releases, it created read-only mutable directories) (#828). This is significantly faster, since it does not need to create an RSA keypair for each new directory. Also "DIR-IMM" immutable directories are repairable, unlike "DIR-RO" read-only mutable directories at present. (A future Tahoe-LAFS release should also be able to repair DIR-RO.)
In addition, the backupdb (used by "tahoe backup" to remember what it has already copied) has been enhanced to store information about existing immutable directories. This allows it to re-use directories that have moved but still contain identical contents, or that have been deleted and later replaced. (The 1.5.0 "tahoe backup" command could only re-use directories that were in the same place as they were in the immediately previous backup.) With this change, the backup process no longer needs to read the previous snapshot out of the Tahoe-LAFS grid, reducing the network load considerably. (#606)
A "null backup" (in which nothing has changed since the previous backup) will require only two Tahoe-side operations: one to add an Archives/$TIMESTAMP entry, and a second to update the Latest/ link. On the local disk side, it will readdir() all your local directories and stat() all your local files.
If you've been using "tahoe backup" for a while, you will notice that your first use of it after upgrading to 1.6.0 may take a long time: it must create proper immutable versions of all the old read-only mutable directories. This process won't take as long as the initial backup (where all the file contents had to be uploaded too): it will require time proportional to the number and size of your directories. After this initial pass, all subsequent passes should take a tiny fraction of the time.
As noted above, Tahoe-LAFS versions earlier than 1.5.0 cannot list a directory containing an immutable subdirectory. Tahoe-LAFS versions earlier than 1.6.0 cannot read the contents of an immutable directory.
The "tahoe backup" command has been improved to skip over unreadable objects (like device files, named pipes, and files with permissions that prevent the command from reading their contents), instead of throwing an exception and terminating the backup process. It also skips over symlinks, because these cannot be represented faithfully in the Tahoe-side filesystem. A warning message will be emitted each time something is skipped. (#729, #850, #641)
"create-node" command added, "create-client" now implies --no-storage
The basic idea behind Tahoe-LAFS's client+server and client-only processes is that you are creating a general-purpose Tahoe-LAFS "node" process, which has several components that can be activated. Storage service is one of these optional components, as is the Helper, FTP server, and SFTP server. Web gateway functionality is nominally on this list, but it is always active; a future release will make it optional. There are three special purpose servers that can't currently be run as a component in a node: introducer, key-generator, and stats-gatherer.
So now "tahoe create-node" will create a Tahoe-LAFS node process, and after creation you can edit its tahoe.cfg to enable or disable the desired services. It is a more general-purpose replacement for "tahoe create-client". The default configuration has storage service enabled. For convenience, the "--no-storage" argument makes a tahoe.cfg file that disables storage service. (#760)
"tahoe create-client" has been changed to create a Tahoe-LAFS node without a storage service. It is equivalent to "tahoe create-node --no-storage". This helps to reduce the confusion surrounding the use of a command with "client" in its name to create a storage server. Use "tahoe create-client" to create a purely client-side node. If you want to offer storage to the grid, use "tahoe create-node" instead.
In the future, other services will be added to the node, and they will be controlled through options in tahoe.cfg . The most important of these services may get additional --enable-XYZ or --disable-XYZ arguments to "tahoe create-node".
Performance Improvements
Download of immutable files begins as soon as the downloader has located the K necessary shares (#928, #287). In both the previous and current releases, a downloader will first issue queries to all storage servers on the grid to locate shares before it begins downloading the shares. In previous releases of Tahoe-LAFS, download would not begin until all storage servers on the grid had replied to the query, at which point K shares would be chosen for download from among the shares that were located. In this release, download begins as soon as any K shares are located. This means that downloads start sooner, which is particularly important if there is a server on the grid that is extremely slow or even hung in such a way that it will never respond. In previous releases such a server would have a negative impact on all downloads from that grid. In this release, such a server will have no impact on downloads, as long as K shares can be found on other, quicker, servers. This also means that downloads now use the "best-alacrity" servers that they talk to, as measured by how quickly the servers reply to the initial query. This might cause downloads to go faster, especially on grids with heterogeneous servers or geographical dispersion.
For other changes not mentioned here, see <http://tahoe-lafs.org/trac/tahoe/query?milestone=1.6.0&keywords=!~news-done>. To include the tickets mentioned above, go to <http://tahoe-lafs.org/trac/tahoe/query?milestone=1.6.0>.
The big feature for this release is the implementation of garbage collection, allowing Tahoe storage servers to delete shares for old deleted files. When enabled, this uses a "mark and sweep" process: clients are responsible for updating the leases on their shares (generally by running "tahoe deep-check --add-lease"), and servers are allowed to delete any share which does not have an up-to-date lease. The process is described in detail in docs/garbage-collection.txt .
The server must be configured to enable garbage-collection, by adding directives to the [storage] section that define an age limit for shares. The default configuration will not delete any shares.
Both servers and clients should be upgraded to this release to make the garbage-collection as pleasant as possible. 1.2.0 servers have code to perform the update-lease operation but it suffers from a fatal bug, while 1.3.0 servers have update-lease but will return an exception for unknown storage indices, causing clients to emit an Incident for each exception, slowing the add-lease process down to a crawl. 1.1.0 servers did not have the add-lease operation at all.
- -Many unit tests were changed to use a non-network test harness,
- speeding them up considerably.
- Immutable verifier is incomplete: not all shares are used, and not all fields of those shares are verified. Therefore the immutable verifier has only a moderate chance of detecting corrupted shares.
- The mutable verifier is mostly complete: all shares are examined, and most fields of the shares are validated.
- The storage server protocol offers no way for the repairer to replace or delete immutable shares. If corruption is detected, the repairer will upload replacement shares to other servers, but the corrupted shares will be left in place.
- read-only directories and read-only mutable files must be repaired by someone who holds the write-cap: the read-cap is insufficient. Moreover, the deep-check-and-repair operation will halt with an error if it attempts to repair one of these read-only objects.
- Some forms of corruption can cause both download and repair operations to fail. A future release will fix this, since download should be tolerant of any corruption as long as there are at least 'k' valid shares, and repair should be able to fix any file that is downloadable.
The "tahoe backup" command is new in this release, which creates efficient versioned backups of a local directory. Given a local pathname and a target Tahoe directory, this will create a read-only snapshot of the local directory in $target/Archives/$timestamp. It will also create $target/Latest, which is a reference to the latest such snapshot. Each time you run "tahoe backup" with the same source and target, a new $timestamp snapshot will be added. These snapshots will share directories that have not changed since the last backup, to speed up the process and minimize storage requirements. In addition, a small database is used to keep track of which local files have been uploaded already, to avoid uploading them a second time. This drastically reduces the work needed to do a "null backup" (when nothing has changed locally), making "tahoe backup' suitable to run from a daily cronjob.
Note that the "tahoe backup" CLI command must be used in conjunction with a 1.3.0-or-newer Tahoe client node; there was a bug in the 1.2.0 webapi implementation that would prevent the last step (create $target/Latest) from working.
tahoe debug dump-cap
tahoe debug dump-share
tahoe debug find-shares
tahoe debug catalog-shares
tahoe debug corrupt-share
The last command ("tahoe debug corrupt-share") flips a random bit of the given local sharefile. This is used to test the file verifying/repairing code, and obviously should not be used on user data.
The cli might not correctly handle arguments which contain non-ascii characters in Tahoe v1.3 (although depending on your platform it might, especially if your platform can be configured to pass such characters on the command-line in utf-8 encoding). See http://tahoe-lafs.org/trac/tahoe/ticket/565 for details.
- The misc/spacetime/ directory contains a "disk watcher" daemon (startable with 'tahoe start'), which can be configured with a set of HTTP URLs (pointing at the wapi '/statistics' page of a bunch of storage servers), and will periodically fetch disk-used/disk-available information from all the servers. It keeps this information in an Axiom database (a sqlite-based library available from divmod.org). The daemon computes time-averaged rates of disk usage, as well as a prediction of how much time is left before the grid is completely full.
- The misc/munin/ directory contains a new set of munin plugins (tahoe_diskleft, tahoe_diskusage, tahoe_doomsday) which talk to the disk-watcher and provide graphs of its calculations.
- To support the disk-watcher, the Tahoe statistics component (visible through the wapi at the /statistics/ URL) now includes disk-used and disk-available information. Both are derived through an equivalent of the unix 'df' command (i.e. they ask the kernel for the number of free blocks on the partition that encloses the BASEDIR/storage directory). In the future, the disk-available number will be further influenced by the local storage policy: if that policy says that the server should refuse new shares when less than 5GB is left on the partition, then "disk-available" will report zero even though the kernel sees 5GB remaining.
- The 'tahoe_overhead' munin plugin interacts with an allmydata.com-specific server which reports the total of the 'deep-size' reports for all active user accounts, compares this with the disk-watcher data, to report on overhead percentages. This provides information on how much space could be recovered once Tahoe implements some form of garbage collection.
This release makes the immutable-file "ciphertext hash tree" mandatory. Previous releases allowed the uploader to decide whether their file would have an integrity check on the ciphertext or not. A malicious uploader could use this to create a readcap that would download as one file or a different one, depending upon which shares the client fetched first, with no errors raised. There are other integrity checks on the shares themselves, preventing a storage server or other party from violating the integrity properties of the read-cap: this failure was only exploitable by the uploader who gives you a carefully constructed read-cap. If you download the file with Tahoe 1.2.0 or later, you will not be vulnerable to this problem. #491
This change does not introduce a compatibility issue, because all existing versions of Tahoe will emit the ciphertext hash tree in their shares.
Tahoe is slowly acquiring convenient tools to check up on file health, examine existing shares for errors, and repair files that are not fully healthy. This release adds a mutable checker/verifier/repairer, although testing is very limited, and there are no web interfaces to trigger repair yet. The "Check" button next to each file or directory on the wapi page will perform a file check, and the "deep check" button on each directory will recursively check all files and directories reachable from there (which may take a very long time).
Future releases will improve access to this functionality.
- tahoe cp local.txt tahoe:virtual.txt
- tahoe ls work:subdir
- /helper_status : to describe what a Helper is doing
- /statistics : reports node uptime, CPU usage, other stats
- /file : for easy file-download URLs, see #221
- /cap == /uri : future compatibility
t=deep-size : add up the size of all immutable files reachable from the directory
- t=deep-stats : return a JSON-encoded description of number of files, size
distribution, total size, etc
- tahoe-introstats
- tahoe-rootdir-space
- tahoe_estimate_files
- mutable files published/retrieved
- tahoe_cpu_watcher
- tahoe_spacetime