[tahoe-dev] [tahoe-lafs] #816: don't rely on notifyOnDisconnect()
tahoe-lafs
trac at allmydata.org
Wed Oct 21 15:19:57 PDT 2009
#816: don't rely on notifyOnDisconnect()
--------------------------+-------------------------------------------------
Reporter: zooko | Owner:
Type: enhancement | Status: new
Priority: minor | Milestone: undecided
Component: code-network | Version: 1.5.0
Keywords: | Launchpad_bug:
--------------------------+-------------------------------------------------
#653 was a long drawn out investigation that concluded that there is
probably (but not certainly) a bug in foolscap in which
{{{notifyOnDisconnect()}}} doesn't get triggered sometimes when it is
supposed to. Fixing (and writing automated tests for)
{{{notifyOnDisconnect()}}} is quite tricky. Also, it can never be 100%
correct because of the problems of the inherent unreliability of
communications and the limitations of the speed of light and so on. My
personal prejudice as someone who has long studied secure and fault-
tolerant networked applications is that you should really avoid relying on
such a service -- a service that attempts to tell you when a remote object
has switched from "likely to respond in a timely way to your next request"
to "unlikely to respond in a timely way to your next request", and instead
design your system so that it works correctly and as efficiently as it can
regardless of the pattern of connections-and-disconnections of the
underlying comms subsystems. (Hm, I guess this is an instance of the
general idiom of "Don't check if it is likely to work and then try and
then handle failure, instead just try and then handle failure.")
Now, Tahoe-LAFS already does it this way! For the most part. There are a
few places where we invoke {{{notifyOnDisconnect()}}}, but removing most
of them would not diminish the functionality of Tahoe-LAFS. One thing
that ''would'' diminish its functionality is as Brian wrote on #653:
”""
* the welcome-page status display would be unable to show "Connected /
Not Connected" status for each known server. Instead, it could say "Last
Connection Established At / Not Connected". Basically we'd know when the
connection was established, and (with extra code) we could know when we
last successfully used the connection. And when we tried to use the
connection and found it down, we could mark the connection as down until
we'd restablished it. But we wouldn't notice the actual event of
connection loss (or the resulting period of not-being-connected) until we
actually tried to use it. So we couldn't claim to be "connected", we could
merely claim that we *had* connected at some point, and that we haven't
noticed becoming disconnected yet (but aren't trying very hard to notice).
* the share-allocation algorithm wouldn't learn about disconnected
servers until it tried to send a message to them (this would fail quickly,
but still not synchronously), but allocates share numbers ahead of time
for each batch of requests. This could wind up with shares placed
0,1,3,4,2 instead of 0,1,2,3,4
The first problem would be annoying, so I think we're going to leave tahoe
alone for now. I'll add a note to the foolscap docs to warn users about
the notifyOnDisconnect bug, and encourage people to not rely upon it in
replacement-connection -likely environments.
"""
Since he wrote that, I realized that it would be cool if the welcome-page
had a "ping all servers" button which then changed their statuses to
indicate whether they responded to the ping or not (and how long it took).
This would, in my opinion, be more reliable and more informative than the
current "connected/not-connected" welcome-page.
To close this ticket, make sure you have Brian's approval first, then add
a "ping all servers" feature to the welcome page, then remove all uses of
{{{notifyOnDisconnect()}}} from Tahoe-LAFS.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/816>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list