Opened at 2008-02-07T19:12:06Z
Closed at 2008-09-18T05:20:36Z
#301 closed enhancement (fixed)
t=deep-check with JSON output, for automated checking
| Reported by: | zooko | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 1.3.0 |
| Component: | code-encoding | Version: | 0.7.0 |
| Keywords: | Cc: | ||
| Launchpad Bug: |
Description
Run "check" on files and directories in an automated, regular way.
It's not clear how the checker process should get the verifier caps that it needs. See bigger, more general ticket #119 -- "lease expiration / deletion / filechecking / quotas".
Change History (7)
comment:1 Changed at 2008-02-07T19:32:00Z by zooko
comment:2 Changed at 2008-06-01T21:05:14Z by warner
- Milestone changed from eventually to undecided
comment:3 Changed at 2008-09-03T01:35:36Z by warner
- Milestone changed from undecided to 1.3.0
comment:4 Changed at 2008-09-04T18:58:06Z by warner
- Summary changed from automate checking to t=deep-check with JSON output, for automated checking
The plan for this is:
- provide a deep-check webapi with machine-readable (JSON) output
- give responsibility for running deep-check to users or grid admins. They should periodically run deep-check (possibly with verify=true, probably with repair=true) on their root-caps.
comment:5 Changed at 2008-09-04T20:12:25Z by warner
Here are my docs/webapi.txt additions describing the JSON output for t=check and t=deep-check . I'm working on implementing this now.
POST $URL?t=check
This triggers the FileChecker to determine the current "health" of the
given file or directory, by counting how many shares are available. The
page that is returned will display the results. This can be used as a "show
me detailed information about this file" page.
If a when_done=url argument is provided, the return value will be a redirect
to that URL instead of the checker results.
If a return_to=url argument is provided, the returned page will include a
link to the given URL entitled "Return to the parent directory".
If a verify=true argument is provided, the node will perform a more
intensive check, downloading and verifying every single bit of every share.
If an output=JSON argument is provided, the response will be
machine-readable JSON instead of human-oriented HTML. The data is a
dictionary with the following keys:
storage-index: a base32-encoded string with the objects's storage index,
or an empty string for LIT files
repair-attempted: (bool) True if repair was attempted
repair-successful: (bool) True if repair was attempted and the file was
fully healthy afterwards.
pre-repair-results: a dictionary that describes the state of the file
before any repair was performed. For LIT files, this
dictionary has only the 'healthy' key, which will
always be True. For distributed files, this dictionary
has the following keys:
count-shares-good: the number of good shares that were found
count-shares-needed: 'k', the number of shares required for recovery
count-shares-expected: 'N', the number of total shares generated
count-good-share-hosts: the number of distinct storage servers with
good shares. If this number is less than
count-shares-good, then some shares are doubled
up, increasing the correlation of failures. This
indicates that one or more shares should be
moved to an otherwise unused server, if one is
available.
count-corrupt-shares: the number of shares with integrity failures
list-corrupt-shares: a list of "share identifiers", one for each share
that was found to be corrupt. Each share identifier
is a list of (serverid, storage_index, sharenum).
needs-rebalancing: (bool) True if there are multiple shares on a single
storage server, indicating a reduction in reliability
that could be resolved by moving shares to new
servers.
servers-responding: list of base32-encoded storage server identifiers,
one for each server which responded to the share
query.
healthy: (bool) True if the file is completely healthy, False otherwise.
Healthy files have at least N good shares. Overlapping shares
(indicated by count-good-share-hosts < count-shares-good) do not
currently cause a file to be marked unhealthy. If there are at
least N good shares, then corrupt shares do not cause the file to
be marked unhealthy, although the corrupt shares will be listed
in the results (list-corrupt-shares) and should be manually
removed to wasting time in subsequent downloads (as the
downloader rediscovers the corruption and uses alternate shares).
post-repair-results: a dictionary (with the same keys as
pre-repair-results) that describes the state of the
file after any repair was performed. If no repair was
requested or required, 'pre-repair-results' and
'post-repair'results' will be identical. Note that
since immutable shares cannot be modified by clients,
any corrupt immutable shares in pre-repair-results
will remain in post-repair-results.
POST $URL?t=deep-check
This triggers a recursive walk of all files and directories reachable from
the target, performing a check on each one just like t=check. The result
page will contain a summary of the results, including details on any
file/directory that was not fully healthy.
t=deep-check is most useful to invoke on a directory. If invoked on a file,
it will just check that single object. The recursive walker will deal with
loops safely.
This accepts the same verify=, when_done=, and return_to= arguments as
t=check.
Be aware that this can take a long time: perhaps a second per object. No
progress information is currently provided: the server will be silent until
the full tree has been traversed, then will emit the complete response.
If an output=JSON argument is provided, the response will be
machine-readable JSON instead of human-oriented HTML. The data is a
dictionary with the following keys:
count-objects-checked: count of how many objects were checked
count-objects-healthy: how many of those objects were completely healthy
count-objects-unhealthy: how many were damaged in some way
count-repairs-attempted: repairs were attempted on this many objects.
The count-repairs- keys will always be provided,
however unless repair=true is present, they will
all be zero.
count-repairs-successful: how many repairs resulted in healthy objects
count-repairs-unsuccessful: how many repairs resulted did not results in
completely healthy objects
count-corrupt-shares: how many shares were found to have corruption,
summed over all objects examined
list-corrupt-shares: a list of "share identifiers", one for each share
that was found to be corrupt. Each share identifier
is a list of (serverid, storage_index, sharenum).
list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
that were successfully repaired are not
included. These are shares that need
manual processing. Since immutable shares
cannot be modified by clients, all corruption
in immutable shares will be listed here.
list-unhealthy-files: a list of (pathname, check-results) tuples, for
each file that was not fully healthy. 'pathname' is
relative to the directory on which deep-check was
invoked. The 'check-results' field is the same as
that returned by t=check&output=JSON, described
above.
comment:6 Changed at 2008-09-06T05:44:40Z by warner
Ok, I just split check() and check_and_repair() into separate methods, because they return significantly different results. check() returns a single ICheckerResults instance, whereas the return value of check_and_repair() needs to have two such instances (pre-repair and post-repair), as well as indicating whether repair was attempted or not, and whether it was successful or not. This means there is also a deep_check() and deep_check_and_repair().
I left the webapi alone, but internally the POST t=check (and t=deep-check) implementation calls different methods depending upon the value of the repair= argument.
comment:7 Changed at 2008-09-18T05:20:36Z by warner
- Resolution set to fixed
- Status changed from new to closed
The webapi now does the right thing, and both mutable and immutable checkers provide the right sort of output. (zooko is still working on the immutable verifier).

I think it might be good to go ahead and implement the auto-checker without knowing how it is going to get its verifier caps. The interface to it can include that the client provides verifier caps.