[tahoe-dev] [tahoe-lafs] #653: introducer client: connection count is wrong, !VersionedRemoteReference needs EQ
tahoe-lafs
trac at allmydata.org
Wed Sep 2 15:00:54 PDT 2009
#653: introducer client: connection count is wrong, !VersionedRemoteReference
needs EQ
--------------------------+-------------------------------------------------
Reporter: warner | Owner: warner
Type: defect | Status: assigned
Priority: major | Milestone: 1.6.0
Component: code-network | Version: 1.3.0
Keywords: | Launchpad_bug:
--------------------------+-------------------------------------------------
Comment(by warner):
Zooko and I talked and did some more analysis. Based on that, we think
there's a high probability of a foolscap bug (still present in the latest
0.4.2) that causes notifyOnDisconnect to sometimes not get called,
probably triggered by "replacement connections" (i.e. where NAT table
expiries or something cause an asymmetric close, one side reconnects, and
the other side must displace an existing-but-really-dead connection with
the new inbound one).
The tahoe code was rewritten to reduce the damage caused by this sort of
thing. We could change it further, to remove the use of notifyOnDisconnect
altogether, with two negative consequences:
* the welcome-page status display would be unable to show "Connected /
Not Connected" status for each known server. Instead, it could say "Last
Connection Established At / Not Connected". Basically we'd know when the
connection was established, and (with extra code) we could know when we
last successfully used the connection. And when we tried to use the
connection and found it down, we could mark the connection as down until
we'd restablished it. But we wouldn't notice the actual event of
connection loss (or the resulting period of not-being-connected) until we
actually tried to use it. So we couldn't claim to be "connected", we could
merely claim that we *had* connected at some point, and that we haven't
noticed becoming disconnected yet (but aren't trying very hard to notice).
* the share-allocation algorithm wouldn't learn about disconnected
servers until it tried to send a message to them (this would fail quickly,
but still not synchronously), but allocates share numbers ahead of time
for each batch of requests. This could wind up with shares placed
0,1,3,4,2 instead of 0,1,2,3,4
The first problem would be annoying, so I think we're going to leave tahoe
alone for now. I'll add a note to the foolscap docs to warn users about
the notifyOnDisconnect bug, and encourage people to not rely upon it in
replacement-connection -likely environments.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/653#comment:18>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list