[tahoe-lafs-trac-stream] [tahoe-lafs] #1679: Nondeterministic NoSharesError for direct CHK download in 1.8.3 and 1.9.1
tahoe-lafs
trac at tahoe-lafs.org
Wed Oct 31 10:02:56 UTC 2012
#1679: Nondeterministic NoSharesError for direct CHK download in 1.8.3 and 1.9.1
-------------------------+-------------------------------------------------
Reporter: | Owner: nejucomo
nejucomo | Status: new
Type: defect | Milestone: soon
Priority: | Version: 1.8.3
critical | Keywords: download heisenbug lae test-needed
Component: code- | review-needed
network |
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Comment (by zooko):
Replying to [comment:20 nejucomo]:
> Can I test this manually without waiting for or writing a unittest?
>
> In order to do so, I need some more clarification:
>
> Where does the invalid cache live? Is it in the downloading gateway?
The cache is in the downloading gateway.
> Does that mean if the gateway cannot connect to the storage server
during an immutable download, then the cache records this fact and is not
correctly bypassed later?
I think so. I just checked the code
([source:git/src/allmydata/immutable/download/finder.py) which I think is
at fault. It asks the storage broker for a list of connected servers when
it starts, then it tries to use the servers. I think if the server is
excluded from that list by storage broker because it isn't connected, or
if finder tries to use the server and gets an error (because the
connection just failed), then finder will never again try to use that
server. The cache causes new downloads to use the same finder object.
> If all those are true, a manual test would be:
>
> a. Pick a known-uploaded CHK cap which has *not* been recently
downloaded by the target gateway.
>
> b. Prevent the gateway from connecting to relevant storage servers.
(For LAE service this is easier because there's only one storage node;
ifdown $iface can work for a local gateway test, or adding a special
temporary black-hole route for the storage node IP might work for a remote
gateway.)
>
> c. Attempt to fetch that CAP on the network-impaired gateway.
>
> d. Repair the network of the gateway.
>
> e. Attempt to fetch that CAP again. If the fetch fails, this is
evidence of the bug. If not, there's some flaw in these assumptions.
>
> f. If e. produces evidence of the bug, then stop that gateway, apply the
patch, start the patched gateway and repeat steps a. through e. (with a
*new* CAP to help control the experiment).
>
> g. Publish the results of step e. in the first iteration (unpatched) and
the second iteration (patched).
>
> At the same time or afterwards, write a unittest.
This sounds like a good protocol!
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1679#comment:21>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list