| 1 | = Known Issues = |
| 2 | |
| 3 | This page describes known problems for recent releases of Tahoe. Issues are |
| 4 | fixed as quickly as possible, however users of older releases may still need |
| 5 | to be aware of these problems until they upgrade to a release which resolves |
| 6 | it. |
| 7 | |
| 8 | == Issues in [Tahoe 1.1 milestone:1.1.0] (not quite released) == |
| 9 | |
| 10 | === Servers which run out of space === |
| 11 | |
| 12 | If a Tahoe storage server runs out of space, writes will fail with an |
| 13 | {{{IOError}}} exception. In some situations, Tahoe-1.1 clients will not react |
| 14 | to this very well: |
| 15 | |
| 16 | * if the exception occurs during an immutable-share write, that share will |
| 17 | be broken. The client will detect this, and will declare the upload as |
| 18 | failing if insufficient shares can be placed (this "shares of happiness" |
| 19 | threshold defaults to 7 out of 10). The code does not yet search for new |
| 20 | servers to replace the full ones. If the upload fails, the server's |
| 21 | upload-already-in-progress routines may interfere with a subsequent |
| 22 | upload. |
| 23 | * if the exception occurs during a mutable-share write, the old share will |
| 24 | be left in place (and a new home for the share will be sought). If enough |
| 25 | old shares are left around, subsequent reads may see the file in its |
| 26 | earlier state, known as a "rollback" fault. Writing a new version of the |
| 27 | file should find the newer shares correctly, although it will take |
| 28 | longer (more roundtrips) than usual. |
| 29 | |
| 30 | The out-of-space handling code is not yet complete, and we do not yet have a |
| 31 | space-limiting solution that is suitable for large storage nodes. The |
| 32 | "sizelimit" configuration uses a /usr/bin/du -style query at node startup, |
| 33 | which takes a long time (tens of minutes) on storage nodes that offer 100GB |
| 34 | or more, making it unsuitable for highly-available servers. |
| 35 | |
| 36 | In lieu of 'sizelimit', server admins are advised to set the |
| 37 | NODEDIR/readonly_storage (and remove 'sizelimit', and restart their nodes) on |
| 38 | their storage nodes before space is exhausted. This will stop the influx of |
| 39 | immutable shares. Mutable shares will continue to arrive, but since these are |
| 40 | mainly used by directories, the amount of space consumed will be smaller. |
| 41 | |
| 42 | Eventually we will have a better solution for this. |
| 43 | |
| 44 | == Issues in Tahoe 1.0 == |
| 45 | |
| 46 | === Servers which run out of space === |
| 47 | |
| 48 | In addition to the problems described above, Tahoe-1.0 clients which |
| 49 | experience out-of-space errors while writing mutable files are likely to |
| 50 | think the write succeeded, when it in fact failed. This can cause data loss. |
| 51 | |
| 52 | === Large Directories or Mutable files in a specific range of sizes === |
| 53 | |
| 54 | A mismatched pair of size limits causes a problem when a client attempts to |
| 55 | upload a large mutable file with a size between 3139275 and 3500000 bytes. |
| 56 | (Mutable files larger than 3.5MB are refused outright). The symptom is very |
| 57 | high memory usage (3GB) and 100% CPU for about 5 minutes. The attempted write |
| 58 | will fail, but the client may think that it succeeded. This size corresponds |
| 59 | to roughly 9000 entries in a directory. |
| 60 | |
| 61 | This was fixed in 1.1, as ticket #379. Files up to 3.5MB should now work |
| 62 | properly, and files above that size should be rejected properly. Both servers |
| 63 | and clients must be upgraded to resolve the problem, although once the client |
| 64 | is upgraded to 1.1 the memory usage and false-success problems should be |
| 65 | fixed. |
| 66 | |
| 67 | === pycryptopp compile errors resulting in corruption === |
| 68 | |
| 69 | Certain combinations of compiler, linker, and pycryptopp versions may cause |
| 70 | corruption errors during decryption, resulting in corrupted plaintext. |
| 71 | |