[tahoe-lafs-trac-stream] [tahoe-lafs] #1765: gossip-introducer should include timeouts
tahoe-lafs
trac at tahoe-lafs.org
Wed Jun 13 03:52:27 UTC 2012
#1765: gossip-introducer should include timeouts
--------------------------------+--------------------
Reporter: warner | Owner: warner
Type: enhancement | Status: new
Priority: normal | Milestone: soon
Component: code-nodeadmin | Version: 1.9.1
Resolution: | Keywords:
Launchpad Bug: |
--------------------------------+--------------------
Comment (by zooko):
I would be kind of sad to make tahoe-lafs require synchronization between
clocks of different computers. As far as I know, it doesn't currently do
so. There isn't any way to be sure that your computer's clock is
synchronized with the clock of another computer (the one you are gossiping
with), except by relying on a trusted third party -- an NTP server.
''Except'', the above is no longer true, now that Bitcoin exists. So I
retract my longstanding objection against relying on synchronized clocks,
and replace it with a suggested policy that the only remote-clock-
synchronization protocol that a tahoe-lafs node is allowed to rely on is
the Bitcoin blockchain.
☺
P.S. Also in all seriousness I don't like the proposed design that much.
Not only the part about requiring clock synchronization (and by the way in
practice, clocks are ''often'' more than a month out of sync with each
other, especially in some of the "different" deployment targets that
people are increasingly interested in, such as embedded systems and
Windows clients). I ''am'' concerned about relying on that, because our
defenses against data deletion, rollback attack on mutables, and
(hopefully in the future) unadd-attack on add-only-sets rely on the client
connecting to a sufficient number of good servers. This seems to add
another path by which accident or malice could prevent clients from
connecting to good servers, which I think deserves careful risk analysis,
both now and whenever we change the server-selection behavior.
But in addition to that, also the part about waiting for "a few minutes
after starting up" sounds kind of fragile.
Let me try to think of a reasonable alternative to consider. What do you
think of this:
1. When telling other people gossip about servers, you don't tell them
about servers that you aren't currently connected to.
2. Remember the fact that you were unable to connect to a server last time
you tried. When you start up, don't try reconnecting to that guy right
away until you've finished trying to reconnect to more-likely-to-work
ones. (Because of a bug that is really important on Windows: #605 (two-
hour delay to connect to a grid from Win32, if there are many storage
servers unreachable))
3. If it has been more than a month ''on your local clock'' since you were
able to connect to that guy, ''and'' you are currently able to connect to
lots of other guys, then forget about that guy.
We need to carefully revisit 3 when changing anything to do with server
selection, but at least there is less of a path for remote attackers to
manipulate this than with the remote-clock-synchronization approach.
What do you say? This sounds not much more complicated than the initial
proposal, and maybe less complicated. It is certainly less complicated if
you include the fact that you have to think about the clock-
synchronization protocol in that one and you don't in this one. Does this
proposal satisfy the same values as the initial post does -- i.e. not
letting dead servers pile up indefinitely in the gossip network?
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1765#comment:2>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list