Changes between Initial Version and Version 1 of KnownIssues


Ignore:
Timestamp:
2008-06-05T20:02:29Z (16 years ago)
Author:
warner
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • KnownIssues

    v1 v1  
     1= Known Issues =
     2
     3This page describes known problems for recent releases of Tahoe. Issues are
     4fixed as quickly as possible, however users of older releases may still need
     5to be aware of these problems until they upgrade to a release which resolves
     6it.
     7
     8== Issues in [Tahoe 1.1 milestone:1.1.0] (not quite released) ==
     9
     10=== Servers which run out of space ===
     11
     12If a Tahoe storage server runs out of space, writes will fail with an
     13{{{IOError}}} exception. In some situations, Tahoe-1.1 clients will not react
     14to this very well:
     15
     16 * if the exception occurs during an immutable-share write, that share will
     17   be broken. The client will detect this, and will declare the upload as
     18   failing if insufficient shares can be placed (this "shares of happiness"
     19   threshold defaults to 7 out of 10). The code does not yet search for new
     20   servers to replace the full ones. If the upload fails, the server's
     21   upload-already-in-progress routines may interfere with a subsequent
     22   upload.
     23 * if the exception occurs during a mutable-share write, the old share will
     24   be left in place (and a new home for the share will be sought). If enough
     25   old shares are left around, subsequent reads may see the file in its
     26   earlier state, known as a "rollback" fault. Writing a new version of the
     27   file should find the newer shares correctly, although it will take
     28   longer (more roundtrips) than usual.
     29
     30The out-of-space handling code is not yet complete, and we do not yet have a
     31space-limiting solution that is suitable for large storage nodes. The
     32"sizelimit" configuration uses a /usr/bin/du -style query at node startup,
     33which takes a long time (tens of minutes) on storage nodes that offer 100GB
     34or more, making it unsuitable for highly-available servers.
     35
     36In lieu of 'sizelimit', server admins are advised to set the
     37NODEDIR/readonly_storage (and remove 'sizelimit', and restart their nodes) on
     38their storage nodes before space is exhausted. This will stop the influx of
     39immutable shares. Mutable shares will continue to arrive, but since these are
     40mainly used by directories, the amount of space consumed will be smaller.
     41
     42Eventually we will have a better solution for this.
     43
     44== Issues in Tahoe 1.0 ==
     45
     46=== Servers which run out of space ===
     47
     48In addition to the problems described above, Tahoe-1.0 clients which
     49experience out-of-space errors while writing mutable files are likely to
     50think the write succeeded, when it in fact failed. This can cause data loss.
     51
     52=== Large Directories or Mutable files in a specific range of sizes ===
     53
     54A mismatched pair of size limits causes a problem when a client attempts to
     55upload a large mutable file with a size between 3139275 and 3500000 bytes.
     56(Mutable files larger than 3.5MB are refused outright). The symptom is very
     57high memory usage (3GB) and 100% CPU for about 5 minutes. The attempted write
     58will fail, but the client may think that it succeeded. This size corresponds
     59to roughly 9000 entries in a directory.
     60
     61This was fixed in 1.1, as ticket #379. Files up to 3.5MB should now work
     62properly, and files above that size should be rejected properly. Both servers
     63and clients must be upgraded to resolve the problem, although once the client
     64is upgraded to 1.1 the memory usage and false-success problems should be
     65fixed.
     66
     67=== pycryptopp compile errors resulting in corruption ===
     68
     69Certain combinations of compiler, linker, and pycryptopp versions may cause
     70corruption errors during decryption, resulting in corrupted plaintext.
     71