Changes between Version 3 and Version 4 of KnownIssues


Ignore:
Timestamp:
2008-06-10T23:32:12Z (16 years ago)
Author:
zooko
Comment:

replace with link to docs/known_issues.txt (for now)

Legend:

Unmodified
Added
Removed
Modified
  • KnownIssues

    v3 v4  
    1 = Known Issues =
    2 
    3 Below is a list of known issues in recent releases of Tahoe, and how to manage
    4 them.
    5 
    6 
    7 == issues in Tahoe v1.1.0, released 2008-06-10 ==
    8 
    9 === issue 1: server out of space when writing mutable file ===
    10 
    11 If a v1.0 or v1.1.0 storage server runs out of disk space then its attempts to
    12 write data to the local filesystem will fail.  For immutable files, this will
    13 not lead to any problem (the attempt to upload that share to that server will
    14 fail, the partially uploaded share will be deleted from the storage server's
    15 "incoming shares" directory, and the client will move on to using another
    16 storage server instead).
    17 
    18 If the write was an attempt to modify an existing mutable file, however, a
    19 problem will result: when the attempt to write the new share fails due to
    20 insufficient disk space, then it will be aborted and the old share will be left
    21 in place.  If enough such old shares are left, then a subsequent read may get
    22 those old shares and see the file in its earlier state, which is a "rollback"
    23 failure.  With the default parameters (3-of-10), six old shares will be enough
    24 to potentially lead to a rollback failure.
    25 
    26 ==== how to manage it ====
    27 
    28 Make sure your Tahoe storage servers don't run out of disk space.  This means
    29 refusing storage requests before the disk fills up. There are a couple of ways
    30 to do that with v1.1.
    31 
    32 First, there is a configuration option named "sizelimit" which will cause the
    33 storage server to do a "du" style recursive examination of its directories at
    34 startup, and then if the sum of the size of files found therein is greater than
    35 the "sizelimit" number, it will reject requests by clients to write new
    36 immutable shares.
    37 
    38 However, that can take a long time (something on the order of a minute of
    39 examination of the filesystem for each 10 GB of data stored in the Tahoe
    40 server), and the Tahoe server will be unavailable to clients during that time.
    41 
    42 Another option is to set the "readonly_storage" configuration option on the
    43 storage server before startup.  This will cause the storage server to reject
    44 all requests to upload new immutable shares.
    45 
    46 Note that neither of these configurations affect mutable shares: even if
    47 sizelimit is configured and the storage server currently has greater space used
    48 than allowed, or even if readonly_storage is configured, servers will continue
    49 to accept new mutable shares and will continue to accept requests to overwrite
    50 existing mutable shares.
    51 
    52 Mutable files are typically used only for directories, and are usually much
    53 smaller than immutable files, so if you use one of these configurations to stop
    54 the influx of immutable files while there is still sufficient disk space to
    55 receive an influx of (much smaller) mutable files, you may be able to avoid the
    56 potential for "rollback" failure.
    57 
    58 A future version of Tahoe will include a fix for this issue.  Here is
    59 [http://allmydata.org/pipermail/tahoe-dev/2008-May/000630.html the mailing list
    60 discussion] about how that future version will work.
    61 
    62 
    63 == issues in Tahoe v1.1.0 and v1.0.0 ==
    64 
    65 === issue 2: pyOpenSSL and/or Twisted defect resulting false alarms in the unit tests ===
    66 
    67 The combination of Twisted v8.1.0 and pyOpenSSL v0.7 causes the Tahoe v1.1 unit
    68 tests to fail, even though the behavior of Tahoe itself which is being tested is
    69 correct.
    70 
    71 ==== how to manage it ====
    72 
    73 If you are using Twisted v8.1.0 and pyOpenSSL v0.7, then please ignore XYZ in
    74 XYZ.  Downgrading to an older version of Twisted or pyOpenSSL will cause those
    75 false alarms to stop happening.
    76 
    77 
    78 == issues in Tahoe v1.0.0, released 2008-03-25 ==
    79 
    80 (Tahoe v1.0 was superceded by v1.1 which was released 2008-06-10.)
    81 
    82 === issue 3: server out of space when writing mutable file ===
    83 
    84 In addition to the problems caused by insufficient disk space described above,
    85 v1.0 clients which are writing mutable files when the servers fail to write to
    86 their filesystem are likely to think the write succeeded, when it in fact
    87 failed. This can cause data loss.
    88 
    89 ==== how to manage it ====
    90 
    91 Upgrade client to v1.1, or make sure that servers are always able to write to
    92 their local filesystem (including that there is space available) as described in
    93 "issue 1" above.
    94 
    95 
    96 === issue 4: server out of space when writing immutable file ===
    97 
    98 Tahoe v1.0 clients are using v1.0 servers which are unable to write to their
    99 filesystem during an immutable upload will correctly detect the first failure,
    100 but if they retry the upload without restarting the client, or if another client
    101 attempts to upload the same file, the second upload may appear to succeed when
    102 it hasn't, which can lead to data loss.
    103 
    104 ==== how to manage it ====
    105 
    106 Upgrading either or both of the client and the server to v1.1 will fix this
    107 issue.  Also it can be avoided by ensuring that the servers are always able to
    108 write to their local filesystem (including that there is space available) as
    109 described in "issue 1" above.
    110 
    111 
    112 === issue 5: large directories or mutable files in a specific range of sizes ===
    113 
    114 If a client attempts to upload a large mutable file with a size greater than
    115 about 3,139,000 and less than or equal to 3,500,000 bytes then it will fail but
    116 appear to succeed, which can lead to data loss.
    117 
    118 (Mutable files larger than 3,500,000 are refused outright).  The symptom of the
    119 failure is very high memory usage (3 GB of memory) and 100% CPU for about 5
    120 minutes, before it appears to succeed, although it hasn't.
    121 
    122 Directories are stored in mutable files, and a directory of approximately 9000
    123 entries may fall into this range of mutable file sizes (depending on the size of
    124 the filenames or other metadata associated with the entries).
    125 
    126 ==== how to manage it ====
    127 
    128 This was fixed in v1.1, under ticket #379.  If the client is upgraded to v1.1,
    129 then it will fail cleanly instead of falsely appearing to succeed when it tries
    130 to write a file whose size is in this range.  If the server is also upgraded to
    131 v1.1, then writes of mutable files whose size is in this range will succeed.
    132 (If the server is upgraded to v1.1 but the client is still v1.0 then the client
    133 will still suffer this failure.)
    134 
    135 
    136 === issue 6: pycryptopp defect resulting in data corruption ===
    137 
    138 Versions of pycryptopp earlier than pycryptopp-0.5.0 had a defect which, when
    139 compiled with some compilers, would cause AES-256 encryption and decryption to
    140 be computed incorrectly.  This could cause data corruption.  Tahoe v1.0
    141 required, and came with a bundled copy of, pycryptopp v0.3.
    142 
    143 ==== how to manage it ====
    144 
    145 You can detect whether pycryptopp-0.3 has this failure when it is compiled by
    146 your compiler.  Run the unit tests that come with pycryptopp-0.3: unpack the
    147 "pycryptopp-0.3.tar" file that comes in the Tahoe v1.0 {{{misc/dependencies}}}
    148 directory, cd into the resulting {{{pycryptopp-0.3.0}}} directory, and execute
    149 {{{python ./setup.py test}}}.  If the tests pass, then your compiler does not
    150 trigger this failure.
    151 
    152 Tahoe v1.1 requires, and comes with a bundled copy of, pycryptopp v0.5.1, which
    153 does not have this defect.
     1Please see [source:docs/known_issues.txt].