#1471 closed enhancement (fixed)

Make Crawlers Compatible With Pluggable Backends

Reported by: Zancas Owned by:
Priority: major Milestone: undecided
Component: code-storage Version: 1.8.2
Keywords: s3-backend crawler Cc: warner, zancas
Launchpad Bug:


The ShareCrawler class (and children) were designed under the assumption of a single "Disk"-type backend. In the future Crawlers will support multiple possible backends. We'll (probably) use the composition idiom where the Crawlers learn about the backend by being passed the relevant backend object in their constructor.

Change History (4)

comment:1 Changed at 2011-08-08T18:26:02Z by warner

  • Summary changed from Make Crawler's Compatible With Pluggable Backends to Make Crawlers Compatible With Pluggable Backends

(fixed title: http://www.angryflower.com/bobsqu.gif)

I'd like to point out that the use of a Crawler at all is deeply intertwined with the way the shares are being stored. We decided early on that we'd prefer a storage scheme in which the share files are the primary source of truth, and that anything else is merely a volatile performance-enhancing cache that could be deleted at any time without long-term information loss. The idea was to keep the storage model simple for server-admins, letting them correctly assume that shares could be migrated by merely copying sharefiles from one box to another. (write-enablers violate this assumption, but we're working on that).

Those Crawlers exist to manage things like lease-expiration and stats-gathering from a bunch of independent sharefiles, both handling the initial bootstrap case (i.e. you've just upgraded your storage server to a version that knows how to expire leases) and later recovery cases (i.e. you've migrated some shares into your server, or you manually deleted shares for some reason). It assumes that share metadata can be retrieved quickly (i.e. fast local disk).

If a server is using a different backend, these rules and goals might not apply. For example, if shares are being stored in S3, are shares stored in a single S3 object each? How important is it that you be able to add or remove objects without going through the storage server? It may be a lot easier/faster to use a different approach:

  • all shares must be added/removed through the server: manual tinkering can knock things out of sync
  • canonical share metadata could live in a separate database, updated by the server upon each change (maybe AWS's SimpleDB?)
  • upgrade (introducing a new feature like lease-expiry) could be accomplished with an offline process: to upgrade the server, first stop the server, then run a program against the backing store and DB, then launch the new version of the server. That would reduce the size, complexity, and runtime cost of the actual server code.

Anyawys, my point is that you shouldn't assume a Crawler is the best way to do things, and that therefore you must find a way to port the Crawler code to a new backend. It fit a specific use-case for local disk, but it's pretty slow and resource-intensive, and for new uses (i.e. Accounting) I'm seriously considering finding a different approach. Don't be constrained by that particular design choice for new backends.

comment:2 Changed at 2011-08-11T04:42:09Z by Zancas

  • Owner changed from Zancas to zancas

comment:3 Changed at 2011-12-16T16:25:12Z by davidsarah

  • Cc warner zancas added
  • Keywords s3-backend crawler added; backend S3 removed
  • Owner zancas deleted

The current pluggable backend patches (which will be on #1569 shortly) have crawlers disabled for the S3 backend. I agree mostly with comment:1; let's suspend work on this until we see how warner's leasedb changes pan out.

comment:4 Changed at 2012-12-05T20:33:21Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

This ticket has been superceded by #1818 and #1819. See also #1834 for possible future work.

Note: See TracTickets for help on using tickets.