Version 2 (modified by zooko, at 2008-06-10T18:15:25Z) (diff) |
---|
Known Issues
This page describes known problems for recent releases of Tahoe. Issues are fixed as quickly as possible, however users of older releases may still need to be aware of these problems until they upgrade to a release which resolves it.
Issues in [Tahoe 1.1 milestone:1.1.0] (not quite released)
Servers which run out of space
If a Tahoe storage server runs out of space, writes will fail with an IOError exception. In some situations, Tahoe-1.1 clients will not react to this very well:
- If the exception occurs during an immutable-share write, that share will be broken. The client will detect this, and will declare the upload as failing if insufficient shares can be placed (this "shares of happiness" threshold defaults to 7 out of 10). The code does not yet search for new servers to replace the full ones. If the upload fails, the server's upload-already-in-progress routines may interfere with a subsequent upload.
- If the exception occurs during a mutable-share write, the old share will be left in place (and a new home for the share will be sought). If enough old shares are left around, subsequent reads may see the file in its earlier state, known as a "rollback" fault. Writing a new version of the file should find the newer shares correctly, although it will take longer (more roundtrips) than usual.
The out-of-space handling code is not yet complete, and we do not yet have a space-limiting solution that is suitable for large storage nodes. The "sizelimit" configuration uses a /usr/bin/du -style query at node startup, which takes a long time (tens of minutes) on storage nodes that offer 100GB or more, making it unsuitable for highly-available servers.
In lieu of 'sizelimit', server admins are advised to set the NODEDIR/readonly_storage (and remove 'sizelimit', and restart their nodes) on their storage nodes before space is exhausted. This will stop the influx of immutable shares. Mutable shares will continue to arrive, but since these are mainly used by directories, the amount of space consumed will be smaller.
Eventually we will have a better solution for this.
Issues in Tahoe 1.0
Servers which run out of space
In addition to the problems described above, Tahoe-1.0 clients which experience out-of-space errors while writing mutable files are likely to think the write succeeded, when it in fact failed. This can cause data loss.
Large Directories or Mutable files in a specific range of sizes
A mismatched pair of size limits causes a problem when a client attempts to upload a large mutable file with a size between 3139275 and 3500000 bytes. (Mutable files larger than 3.5MB are refused outright). The symptom is very high memory usage (3GB) and 100% CPU for about 5 minutes. The attempted write will fail, but the client may think that it succeeded. This size corresponds to roughly 9000 entries in a directory.
This was fixed in 1.1, as ticket #379. Files up to 3.5MB should now work properly, and files above that size should be rejected properly. Both servers and clients must be upgraded to resolve the problem, although once the client is upgraded to 1.1 the memory usage and false-success problems should be fixed.
pycryptopp compile errors resulting in corruption
Certain combinations of compiler, linker, and pycryptopp versions may cause corruption errors during decryption, resulting in corrupted plaintext.