#271 closed enhancement

implement new publish/subscribe introduction scheme — at Version 4

Reported by: zooko Owned by: zooko
Priority: major Milestone: 0.8.0 (Allmydata 3.0 Beta)
Component: code-network Version: 0.7.0
Keywords: Cc:
Launchpad Bug:

Description (last modified by warner)

Implement the new publish/subscribe introduction scheme we've been discussing recently:

  • enumerate the services which can be published and queried for:
    • upload storage server (ones which will accept new shares)
    • download storage server (ones which will let you read shares)
      • (soon-to-be-decommissioned storage servers will be download-only)
    • helpers and other introducers may be added to this list, but we need to talk about that more first.. I'm not sure about it.
  • all nodes should have an IntroducerClient, as an attribute of the Node instance.
  • to publish a service, do e.g.:
    if self.get_config("offer_storage"):
        ss = StorageServer()
        ss.setServiceParent(self)
        self.introducer.publish(ss, "upload_storage")
        self.introducer.pushing(ss, "download_storage")
  • if the node cares about a particular service, it must register that intent at startup:
    if want_storage_servers:
        self.introducer.subscribe_to("upload_storage")
        self.introducer.subscribe_to("download_storage")
  • then, to access a service, there are two APIs: one that does permutation (for upload/download) and one which just returns a flat list (mostly for the welcome page):
    ppeers = self.introducer.get_permuted_peers("download_storage", storage_index)
    # ppeers is a list of (permuted_peerid, peerid, RemoteReference)
    all_peers = self.introducer.get_peers("upload_storage")
  • add config flags to disable upload, and to disable storage completely. Client installs (i.e. those created by py2exe) will disable storage service by default. Storage-only nodes won't subscribe to hear about other storage nodes.

Other things to think about:

  • get_permuted_peers could return a Deferred (which would make it easier for us to create a special kind of helper which knows about peers for you), or return an iterator, or both, somehow. To actually make this useful is non-trivial (to reduce the memory footprint, you'd want an iterator that yields Deferreds, but that might also impose a stupidly large number of roundtrips to a query). We should probably wait until we identify a need for this before implementing any part of it.
  • This API implies a publish/subscribe model in which the subscription accumulates knowledge about peers, and the actual point of use (i.e. upload or download) samples whatever peers have been acquired by that time. This might not be the best approach.

Change History (4)

comment:1 Changed at 2008-01-11T10:21:40Z by warner

In a separate but related topic, we were talking about the possible utility of different "classes" of introduction: a node could publish some object in one category ("storage servers") and a different object in some other category ("upload helpers").

It occurred to me that it might be useful to have "storage servers for upload" and "storage servers for download" to be separate categories. One use would be a way to deal with the #269 mistake (in which I accidentally caused most of our storage servers to generate new keys and therefore change nodeids). We could resurrect the old nodeids in a different place, and move all their old shares to be served by those nodes, thus making the mutable slots available once more. But we'd like those nodes to only stick around long enough to allow clients to migrate their data onto the real servers, so we'd want to prevent new shares from being uploaded to them. The only tool we have at the moment is to set size_limit=0, but sizes aren't being enforced for mutable slots yet. But, if these "read-only" nodes were published as download storage servers (and *not* upload storage servers), then the upload and download code could use slightly different peersets, and we'd get the desired behavior.

Likewise, if we have a storage server which is scheduled to be decommissioned (say, the hard drive is starting to have soft errors, and we've begun the process of migrating shares off of it but have not yet finished the job), it might be nice to allow it to be available for reading but not accept any new shares. Not being published as an upload server would prevent clients from trying to send shares to it in the most efficient way possible.

comment:2 Changed at 2008-01-21T20:58:37Z by zooko

Rob pointed out that this generalized pubsub mechanism might be a good way to meet upload helpers.

While scrubbing the kitchen floor with Amber on Saturday, I figured out that this might be a good way to meet other introducers, leading to #68 -- "implement distributed introduction, remove Introducer as a single point of failure".

comment:3 Changed at 2008-01-23T02:50:47Z by zooko

merging in #168

comment:4 Changed at 2008-01-23T17:49:29Z by warner

  • Description modified (diff)
  • Summary changed from subscriber-only introducer client to implement new publish/subscribe introduction scheme

Updated summary and description to specify the new introduction scheme we're planning to implement.

Note: See TracTickets for help on using tickets.