[tahoe-lafs-trac-stream] [tahoe-lafs] #1765: gossip-introducer should forget about old nodes somehow (was: gossip-introducer should include timeouts)

Wed Jun 13 05:22:32 UTC 2012

#1765: gossip-introducer should forget about old nodes somehow
--------------------------------+--------------------
     Reporter:  warner          |      Owner:  warner
         Type:  enhancement     |     Status:  new
     Priority:  normal          |  Milestone:  soon
    Component:  code-nodeadmin  |    Version:  1.9.1
   Resolution:                  |   Keywords:
Launchpad Bug:                  |
--------------------------------+--------------------

Comment (by warner):

 Great response!

 > I would be kind of sad to make tahoe-lafs require synchronization
 > between clocks of different computers. As far as I know, it doesn't
 > currently do so.

 Yeah, I'm not keen on requiring synchronized clocks either. I was
 considering how we might have the recipient note the difference between
 their local clock and the sender's clock (or however that'd map to the
 flooded announcement scheme, where messages are being delivered by third
 parties minutes or days after they were created) and using that to
 correct for a static offset in future messages. But that feels fragile.

 > 1. When telling other people gossip about servers, you don't tell them
 >    about servers that you aren't currently connected to.
 > 2. Remember the fact that you were unable to connect to a server last
 >    time you tried. When you start up, don't try reconnecting to that
 >    guy right away until you've finished trying to reconnect to
 >    more-likely-to-work ones. (Because of a bug that is really
 >    important on Windows: #605 (two-hour delay to connect to a grid
 >    from Win32, if there are many storage servers unreachable))
 > 3. If it has been more than a month on your local clock since you were
 >    able to connect to that guy, and you are currently able to connect
 >    to lots of other guys, then forget about that guy.

 Hey, that sounds great! Let's see, the first rule prevents the
 "persistent nonsense" problem, as long as any grid-control-only nodes
 (i.e. what the Introducer becomes in the new gossip world) follow this
 rule too. The only concern I can think of is that partial connectivity
 might prevent a new client from learning about nodes that they could
 normally connect to. In particular, could this interact with NAT in some
 way that might produce a less-connected grid than our current central
 Introducer? I don't think so, but I'd have to study it more.

 The second rule is really about implementing connection throttling,
 which might want to be a Foolscap feature (maybe expressed as
 {{{tub.setOption("pending-connection-limit", 10)}}} or similar), and
 then asking for connections in a specific order (most-recently-seen
 first). Seems like a good idea, but not as critical as the other two.

 The third rule prevents local nonsense from sticking around forever. It
 also ties into a more general "connection history" mechanism that I
 think we want: something to hold historic uptime, RTT, speeds, and
 overall reliability for each server we know about. This could be used to
 decide how long to wait for a response from the server before declaring
 it "overdue" (and switching to an alternate), and could eventually be
 published and aggregated to provide some sort of collaborative
 reliability-prediction metric to influence share placement or even
 storage prices (servers that everyone agrees have been highly available
 might command higher fees).

 I like it! I'll update this ticket to reflect the new scheme.

 Would you still be in favor of changing the Announcement field from
 "seqnum" to "announcement-time", even if we don't plan to use it for
 that purpose? The specific purpose of that field (which is inside the
 signed announcement body) is to prevent replay and rollback attacks
 (feeding an old announcement into some client in the hopes of changing
 their behavior in some useful way).

 The publishing node could indeed just use a sequence number (incremented
 by one for each new message), but:

 * the counter would need to be stored and recovered safely, such as when
   rebuilding the node after a hard drive failure, otherwise peers would
   not believe new announcements until the new node's counter naturally
   incremented beyond the other values.
 * This would require periodic backup copies of the counter. In contrast,
   the other information needed to rebuild a node (node.privkey,
   node.pem) would be static.

 I can imagine arguments against using time.time() instead of an actual
 counter:

 * more entropy for a de-anonymizing attacker to correlate
 * providing a potentially high-resolution timestamp (the current code
   uses all significant digits of time.time(), frequently microseconds)
   that might reveal time consumed during boot, which might help a timing
   attack on e.g. key generation or signature generation.
 * timequakes causing temporary disbelief of new announcements, requiring
   period refresh to make sure the disbelief is eventually overcome
   (imagine setting your clock back a day and then rebooting: you need to
   have at least one announcement more than one day after reboot to catch
   up)

 Oh, wait, here's an idea: use a counter, remember it somewhere like
 NODEDIR/private/announcement.counter, initialize it to zero upon node
 creation. '''But''': listen for your own announcements too. If you hear
 a valid announcement with a higher seqnum than what you're currently
 publishing, increase your counter to match. (if the announcement is
 different than what you're currently publishing, increase it one more..
 that ought to converge).

 What do you think about that? And, given your thoughts about that, what
 are your new thoughts about seqnum vs announcement-time? Can you think
 of any reason that we'd really like actual (possibly erroneous and/or
 malicious) wallclock values in Announcements?

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1765#comment:3>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage