[tahoe-lafs-trac-stream] [tahoe-lafs] #1765: gossip-introducer should include timeouts

Wed Jun 13 03:52:27 UTC 2012

#1765: gossip-introducer should include timeouts
--------------------------------+--------------------
     Reporter:  warner          |      Owner:  warner
         Type:  enhancement     |     Status:  new
     Priority:  normal          |  Milestone:  soon
    Component:  code-nodeadmin  |    Version:  1.9.1
   Resolution:                  |   Keywords:
Launchpad Bug:                  |
--------------------------------+--------------------

Comment (by zooko):

 I would be kind of sad to make tahoe-lafs require synchronization between
 clocks of different computers. As far as I know, it doesn't currently do
 so. There isn't any way to be sure that your computer's clock is
 synchronized with the clock of another computer (the one you are gossiping
 with), except by relying on a trusted third party -- an NTP server.

 ''Except'', the above is no longer true, now that Bitcoin exists. So I
 retract my longstanding objection against relying on synchronized clocks,
 and replace it with a suggested policy that the only remote-clock-
 synchronization protocol that a tahoe-lafs node is allowed to rely on is
 the Bitcoin blockchain.

 ☺

 P.S. Also in all seriousness I don't like the proposed design that much.
 Not only the part about requiring clock synchronization (and by the way in
 practice, clocks are ''often'' more than a month out of sync with each
 other, especially in some of the "different" deployment targets that
 people are increasingly interested in, such as embedded systems and
 Windows clients). I ''am'' concerned about relying on that, because our
 defenses against data deletion, rollback attack on mutables, and
 (hopefully in the future) unadd-attack on add-only-sets rely on the client
 connecting to a sufficient number of good servers. This seems to add
 another path by which accident or malice could prevent clients from
 connecting to good servers, which I think deserves careful risk analysis,
 both now and whenever we change the server-selection behavior.

 But in addition to that, also the part about waiting for "a few minutes
 after starting up" sounds kind of fragile.

 Let me try to think of a reasonable alternative to consider. What do you
 think of this:

 1. When telling other people gossip about servers, you don't tell them
 about servers that you aren't currently connected to.
 2. Remember the fact that you were unable to connect to a server last time
 you tried. When you start up, don't try reconnecting to that guy right
 away until you've finished trying to reconnect to more-likely-to-work
 ones. (Because of a bug that is really important on Windows: #605 (two-
 hour delay to connect to a grid from Win32, if there are many storage
 servers unreachable))
 3. If it has been more than a month ''on your local clock'' since you were
 able to connect to that guy, ''and'' you are currently able to connect to
 lots of other guys, then forget about that guy.

 We need to carefully revisit 3 when changing anything to do with server
 selection, but at least there is less of a path for remote attackers to
 manipulate this than with the remote-clock-synchronization approach.

 What do you say? This sounds not much more complicated than the initial
 proposal, and maybe less complicated. It is certainly less complicated if
 you include the fact that you have to think about the clock-
 synchronization protocol in that one and you don't in this one. Does this
 proposal satisfy the same values as the initial post does -- i.e. not
 letting dead servers pile up indefinitely in the gossip network?

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1765#comment:2>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage