[tahoe-lafs-trac-stream] [tahoe-lafs] #1767: update Announcement "timestamp": sequence number?

Wed Sep 5 00:17:29 UTC 2012

#1767: update Announcement "timestamp": sequence number?
------------------------------+----------------------
     Reporter:  warner        |      Owner:  warner
         Type:  enhancement   |     Status:  assigned
     Priority:  normal        |  Milestone:  1.10.0
    Component:  code-network  |    Version:  1.9.1
   Resolution:                |   Keywords:
Launchpad Bug:                |
------------------------------+----------------------

Old description:

> One proposal that came out of #1765 was to change the current
> Announcement's "timestamp"-like field to be a sequence number
> instead of an actual clock value. This field is used by both the
> Introducer (server) and the !IntroducerClient to decide when to
> replace a previous announcement with the same (pubkey, servicename)
> index, so it needs to be orderable and mostly
> monotonically-increasing. (it's ok if a publisher briefly uses a
> lower value than it did previously, as long as it's also ok for
> other subscribers to ignore that message, which generally means the
> publisher needs to periodically update their messages).
>
> A timestamp (plus periodic updates) is a simple, cheap way to
> achieve this property. The only rollback would be for a timequake
> (when the publisher's clock has been adjusted backwards, maybe by
> NTP being turned on for the first time), and that will eventually
> be resolved when the new-time increases beyond the old-time of the
> last update (so rolling the clock back by one hour means one hour
> of stale announcements).
>
> #1765 specifically discourages comparing this "timestamp" against
> anybody else's clock (since clocks are never really synchronized).
> So it really doesn't need to be a clock: it could just be a
> sequence number. The advantage of a seqnum would be that it would
> reveal less information about the client (which might help a
> de-anonymying attacker correlate the tahoe node's behavior with
> other externally-visible things).
>
> The disadvantage is that we'd have to manage the counter ourselves,
> and tolerate node restarts which don't maintain the saved counter
> state. We want to make sure folks can back up their nodes by just
> recording some static private keys, and don't need to constantly be
> saving their updated counters.
>
> The proposal is to do the following:
>
> * use a separate counter for each service-name
> * store the current counter values in
>   NODEDIR/private/announcement.counters, one line per service, like:
>   {{{storage: 12}}}
> * initialize all counters to 0 at node creation
> * increment the counter each time {{{IntroducerClient.publish}}} is
> called
> * if we receive a valid signed announcement from out own pubkey:
>   * if the seqnum is higher than our current value, set our counter
>     to one greater than the received value, and re-publish
>   * if the seqnum is equal to our current value, but the signed
>     message body is different, do the same: set the counter to one
>     greater than the received value, and re-publish
>   * if the seqnum is lower, or (equal and the message is
>     identical), do nothing
>
> In conjunction with the gossip protocol from #1765, that ought to
> converge. Nodes that are restored from backup (and thus experience
> a "counterquake") will send stale announcements for a little while
> (which everyone else will ignore) until they hear back their own
> earlier (higher-seqnum) announcements, at which point they'll
> advance their counters enough to become fresh again.
>
> One requirement this imposes on clients is that anyone who
> publishes a record for service-name=X must also subscribe to
> service-name=X. Otherwise they won't know to update their counters
> after a counterquake. Alternatively, we could require that anyone
> else who receives message they recognize as stale must immediately
> send back the fresh version, even if the publisher wasn't
> subscribed to hear about them. This would require some changes to
> the APIs, as publishers and subscribers are quite distinct right
> now.
>
> It might be easier if we only had one counter for the whole node,
> instead of separate counters for each service-name. Then receipt of
> *any* message with a higher counter would trigger the updates.
> (when gossip-introduction happens, all nodes will subscribe to
> "grid-control", so we don't need to require specific loopback
> rules). My concern is that we might announce (counter=0,
> service-name=storage, data=X) and (counter=0,
> service-name=grid-control, data=Y), then have a quake, then some
> small thing changes about the storage server but not about
> grid-control. When the node comes back, it will announce
> (counter=0, storage, data=Z) but still (counter=0, grid-control,
> data=Y). If we aren't subscribed to "storage", we'll see the
> grid-control loopback and conclude that we've converged, and not
> replace the stale storage/data=X announcement. Maybe requiring a
> nonce be added to grid-control messages would avoid this.
>
> I want to get this change into 1.10, even though the #68/#1765
> gossip-introducer won't happen until later, so that old 1.10
> clients can continue to correctly update themselves in a gossipy
> world. Also, since the current implementation uses a clock, I'd
> like to switch to smaller integers as quickly as possible, so there
> are fewer nodes which have ever used a (large) time.time() and will
> thus have problems updating those announcements.

New description:

 One proposal that came out of #1765 was to change the current
 Announcement's "timestamp"-like field to be a sequence number
 instead of an actual clock value. This field is used by both the
 Introducer (server) and the !IntroducerClient to decide when to
 replace a previous announcement with the same (pubkey, servicename)
 index, so it needs to be orderable and mostly
 monotonically-increasing. (it's ok if a publisher briefly uses a
 lower value than it did previously, as long as it's also ok for
 other subscribers to ignore that message, which generally means the
 publisher needs to periodically update their messages).

 A timestamp (plus periodic updates) is a simple, cheap way to
 achieve this property. The only rollback would be for a timequake
 (when the publisher's clock has been adjusted backwards, maybe by
 NTP being turned on for the first time), and that will eventually
 be resolved when the new-time increases beyond the old-time of the
 last update (so rolling the clock back by one hour means one hour
 of stale announcements).

 #1765 specifically discourages comparing this "timestamp" against
 anybody else's clock (since clocks are never really synchronized).
 So it really doesn't need to be a clock: it could just be a
 sequence number. The advantage of a seqnum would be that it would
 reveal less information about the client (which might help a
 de-anonymyzing attacker correlate the tahoe node's behavior with
 other externally-visible things).

 The disadvantage is that we'd have to manage the counter ourselves,
 and tolerate node restarts which don't maintain the saved counter
 state. We want to make sure folks can back up their nodes by just
 recording some static private keys, and don't need to constantly be
 saving their updated counters.

 The proposal is to do the following:

 * use a separate counter for each service-name
 * store the current counter values in
   NODEDIR/private/announcement.counters, one line per service, like:
   {{{storage: 12}}}
 * initialize all counters to 0 at node creation
 * increment the counter each time {{{IntroducerClient.publish}}} is called
 * if we receive a valid signed announcement from our own pubkey:
   * if the seqnum is higher than our current value, set our counter
     to one greater than the received value, and re-publish
   * if the seqnum is equal to our current value, but the signed
     message body is different, do the same: set the counter to one
     greater than the received value, and re-publish
   * if the seqnum is lower, or (equal and the message is
     identical), do nothing

 In conjunction with the gossip protocol from #1765, that ought to
 converge. Nodes that are restored from backup (and thus experience
 a "counterquake") will send stale announcements for a little while
 (which everyone else will ignore) until they hear back their own
 earlier (higher-seqnum) announcements, at which point they'll
 advance their counters enough to become fresh again.

 One requirement this imposes on clients is that anyone who
 publishes a record for service-name=X must also subscribe to
 service-name=X. Otherwise they won't know to update their counters
 after a counterquake. Alternatively, we could require that anyone
 else who receives message they recognize as stale must immediately
 send back the fresh version, even if the publisher wasn't
 subscribed to hear about them. This would require some changes to
 the APIs, as publishers and subscribers are quite distinct right
 now.

 It might be easier if we only had one counter for the whole node,
 instead of separate counters for each service-name. Then receipt of
 *any* message with a higher counter would trigger the updates.
 (when gossip-introduction happens, all nodes will subscribe to
 "grid-control", so we don't need to require specific loopback
 rules). My concern is that we might announce (counter=0,
 service-name=storage, data=X) and (counter=0,
 service-name=grid-control, data=Y), then have a quake, then some
 small thing changes about the storage server but not about
 grid-control. When the node comes back, it will announce
 (counter=0, storage, data=Z) but still (counter=0, grid-control,
 data=Y). If we aren't subscribed to "storage", we'll see the
 grid-control loopback and conclude that we've converged, and not
 replace the stale storage/data=X announcement. Maybe requiring a
 nonce be added to grid-control messages would avoid this.

 I want to get this change into 1.10, even though the #68/#1765
 gossip-introducer won't happen until later, so that old 1.10
 clients can continue to correctly update themselves in a gossipy
 world. Also, since the current implementation uses a clock, I'd
 like to switch to smaller integers as quickly as possible, so there
 are fewer nodes which have ever used a (large) time.time() and will
 thus have problems updating those announcements.

--

Comment (by warner):

 fixed some typos in the description.

 Two other thoughts:

 * if two nodes are somehow configured with the same private key, they'll
 fight over the announcements: each inbound announcement will trigger an
 outbound one with the higher seqnum, and they won't ever converge because
 they'll undoubtedly have different swissnums for the storage-server FURLs.
 They'll just chase each other up to infinity.
 * it's probably ok to just switch to a locally-managed persistent counter
 for 1.10 . I think we can safely defer the feedback/quake-handling code
 until later.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1767#comment:5>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage