Opened at 2009-10-21T22:19:57Z
Last modified at 2013-06-25T16:17:32Z
#816 new enhancement
don't rely on notifyOnDisconnect() — at Initial Version
Reported by: | zooko | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | eventually |
Component: | code-network | Version: | 1.5.0 |
Keywords: | usability transparency ostrom statistics notifyOnDisconnect | Cc: | |
Launchpad Bug: |
Description
#653 was a long drawn out investigation that concluded that there is probably (but not certainly) a bug in foolscap in which notifyOnDisconnect() doesn't get triggered sometimes when it is supposed to. Fixing (and writing automated tests for) notifyOnDisconnect() is quite tricky. Also, it can never be 100% correct because of the problems of the inherent unreliability of communications and the limitations of the speed of light and so on. My personal prejudice as someone who has long studied secure and fault-tolerant networked applications is that you should really avoid relying on such a service -- a service that attempts to tell you when a remote object has switched from "likely to respond in a timely way to your next request" to "unlikely to respond in a timely way to your next request", and instead design your system so that it works correctly and as efficiently as it can regardless of the pattern of connections-and-disconnections of the underlying comms subsystems. (Hm, I guess this is an instance of the general idiom of "Don't check if it is likely to work and then try and then handle failure, instead just try and then handle failure.")
Now, Tahoe-LAFS already does it this way! For the most part. There are a few places where we invoke notifyOnDisconnect(), but removing most of them would not diminish the functionality of Tahoe-LAFS. One thing that would diminish its functionality is as Brian wrote on #653:
”""
- the welcome-page status display would be unable to show "Connected / Not Connected" status for each known server. Instead, it could say "Last Connection Established At / Not Connected". Basically we'd know when the connection was established, and (with extra code) we could know when we last successfully used the connection. And when we tried to use the connection and found it down, we could mark the connection as down until we'd restablished it. But we wouldn't notice the actual event of connection loss (or the resulting period of not-being-connected) until we actually tried to use it. So we couldn't claim to be "connected", we could merely claim that we *had* connected at some point, and that we haven't noticed becoming disconnected yet (but aren't trying very hard to notice).
- the share-allocation algorithm wouldn't learn about disconnected servers until it tried to send a message to them (this would fail quickly, but still not synchronously), but allocates share numbers ahead of time for each batch of requests. This could wind up with shares placed 0,1,3,4,2 instead of 0,1,2,3,4
The first problem would be annoying, so I think we're going to leave tahoe alone for now. I'll add a note to the foolscap docs to warn users about the notifyOnDisconnect bug, and encourage people to not rely upon it in replacement-connection -likely environments.
"""
Since he wrote that, I realized that it would be cool if the welcome-page had a "ping all servers" button which then changed their statuses to indicate whether they responded to the ping or not (and how long it took). This would, in my opinion, be more reliable and more informative than the current "connected/not-connected" welcome-page.
To close this ticket, make sure you have Brian's approval first, then add a "ping all servers" feature to the welcome page, then remove all uses of notifyOnDisconnect() from Tahoe-LAFS.