Milestone 1.3.0
Release focus: checker/verifier/repairer:
- checking: just count number of available shares
- verifying: read share contents, check hashes
- repair: create new shares as necessary to replace bad/missing ones.
- Mutable shares are repaired in place. Note that mutable repair requires a write-cap, to make sure the write-enabler shared secrets are created correctly. It would be nice to be able to repair from just a read-cap or a verify-cap, but this may need to wait until we switch to DSA mutable files, and/or change the way we control server-side write access.
- Immutable shares must be manually deleted from the storage servers, so repair needs a mechanism to report which shares should be examined and removed. Immutable repair really means creating new shares to make up for the bad ones.
- there should be a "check" button for each file to initiate a check, or a verify, with or without auto-repair. The page that this button displays should contain the results of the operation: which shares were found where, how much verification was performed, whether repair was deemed necessary, whether repair was actually done, and the success or failure of the repair operation.
- there should be a "deep-check" button for each directory to perform a recursive traversal and check/verify/repair everything reachable from that point. The page this returns should show aggregate information about the check/repair: a count of how many files/dirs were examined, how many were healthy, how many had problems, etc. The page should have a line or two about each problem.
- there should be machine-parseable versions of these buttons: POST operations that return JSON with the same information as the human-targetted HTML described above.
- serious problems (like hash failures) should be automatically reported to some centralized Incident Gatherer, so we can discover bugs, failing drives, or malice.
- allmydata should be able to run a periodic checker/repairer on customer rootcaps. We should be able to count the number of missing/bad shares and track it over time (to inform us of the impact of bouncing/moving storage servers, discover failing drives, etc). We need to find out how long the full check/verify/repair process takes, so we can decide upon a suitable repeat rate (perhaps once a month).
- to handle bad immutable shares, we should add a 'tahoe check-share' command that can be run on the storage-server side and check all the hashes of a single share file on disk. If file-verify observes a bad hash, we should be able to go to the disk and use this tool to see if the problem is transient or persistent, to make decisions about the stability of that disk.
Note: See
TracRoadmap for help on using
the roadmap.