[tahoe-lafs-trac-stream] [tahoe-lafs] #1765: gossip-introducer should forget about old nodes somehow (was: gossip-introducer should include timeouts)
tahoe-lafs
trac at tahoe-lafs.org
Wed Jun 13 05:22:32 UTC 2012
#1765: gossip-introducer should forget about old nodes somehow
--------------------------------+--------------------
Reporter: warner | Owner: warner
Type: enhancement | Status: new
Priority: normal | Milestone: soon
Component: code-nodeadmin | Version: 1.9.1
Resolution: | Keywords:
Launchpad Bug: |
--------------------------------+--------------------
Comment (by warner):
Great response!
> I would be kind of sad to make tahoe-lafs require synchronization
> between clocks of different computers. As far as I know, it doesn't
> currently do so.
Yeah, I'm not keen on requiring synchronized clocks either. I was
considering how we might have the recipient note the difference between
their local clock and the sender's clock (or however that'd map to the
flooded announcement scheme, where messages are being delivered by third
parties minutes or days after they were created) and using that to
correct for a static offset in future messages. But that feels fragile.
> 1. When telling other people gossip about servers, you don't tell them
> about servers that you aren't currently connected to.
> 2. Remember the fact that you were unable to connect to a server last
> time you tried. When you start up, don't try reconnecting to that
> guy right away until you've finished trying to reconnect to
> more-likely-to-work ones. (Because of a bug that is really
> important on Windows: #605 (two-hour delay to connect to a grid
> from Win32, if there are many storage servers unreachable))
> 3. If it has been more than a month on your local clock since you were
> able to connect to that guy, and you are currently able to connect
> to lots of other guys, then forget about that guy.
Hey, that sounds great! Let's see, the first rule prevents the
"persistent nonsense" problem, as long as any grid-control-only nodes
(i.e. what the Introducer becomes in the new gossip world) follow this
rule too. The only concern I can think of is that partial connectivity
might prevent a new client from learning about nodes that they could
normally connect to. In particular, could this interact with NAT in some
way that might produce a less-connected grid than our current central
Introducer? I don't think so, but I'd have to study it more.
The second rule is really about implementing connection throttling,
which might want to be a Foolscap feature (maybe expressed as
{{{tub.setOption("pending-connection-limit", 10)}}} or similar), and
then asking for connections in a specific order (most-recently-seen
first). Seems like a good idea, but not as critical as the other two.
The third rule prevents local nonsense from sticking around forever. It
also ties into a more general "connection history" mechanism that I
think we want: something to hold historic uptime, RTT, speeds, and
overall reliability for each server we know about. This could be used to
decide how long to wait for a response from the server before declaring
it "overdue" (and switching to an alternate), and could eventually be
published and aggregated to provide some sort of collaborative
reliability-prediction metric to influence share placement or even
storage prices (servers that everyone agrees have been highly available
might command higher fees).
I like it! I'll update this ticket to reflect the new scheme.
Would you still be in favor of changing the Announcement field from
"seqnum" to "announcement-time", even if we don't plan to use it for
that purpose? The specific purpose of that field (which is inside the
signed announcement body) is to prevent replay and rollback attacks
(feeding an old announcement into some client in the hopes of changing
their behavior in some useful way).
The publishing node could indeed just use a sequence number (incremented
by one for each new message), but:
* the counter would need to be stored and recovered safely, such as when
rebuilding the node after a hard drive failure, otherwise peers would
not believe new announcements until the new node's counter naturally
incremented beyond the other values.
* This would require periodic backup copies of the counter. In contrast,
the other information needed to rebuild a node (node.privkey,
node.pem) would be static.
I can imagine arguments against using time.time() instead of an actual
counter:
* more entropy for a de-anonymizing attacker to correlate
* providing a potentially high-resolution timestamp (the current code
uses all significant digits of time.time(), frequently microseconds)
that might reveal time consumed during boot, which might help a timing
attack on e.g. key generation or signature generation.
* timequakes causing temporary disbelief of new announcements, requiring
period refresh to make sure the disbelief is eventually overcome
(imagine setting your clock back a day and then rebooting: you need to
have at least one announcement more than one day after reboot to catch
up)
Oh, wait, here's an idea: use a counter, remember it somewhere like
NODEDIR/private/announcement.counter, initialize it to zero upon node
creation. '''But''': listen for your own announcements too. If you hear
a valid announcement with a higher seqnum than what you're currently
publishing, increase your counter to match. (if the announcement is
different than what you're currently publishing, increase it one more..
that ought to converge).
What do you think about that? And, given your thoughts about that, what
are your new thoughts about seqnum vs announcement-time? Can you think
of any reason that we'd really like actual (possibly erroneous and/or
malicious) wallclock values in Announcements?
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1765#comment:3>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list