[tahoe-lafs-trac-stream] [tahoe-lafs] #1471: Make Crawlers Compatible With Pluggable Backends (was: Make Crawler's Compatible With Pluggable Backends)
tahoe-lafs
trac at tahoe-lafs.org
Mon Aug 8 11:26:02 PDT 2011
#1471: Make Crawlers Compatible With Pluggable Backends
------------------------------+------------------------
Reporter: Zancas | Owner: Zancas
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-storage | Version: 1.8.2
Resolution: | Keywords: backend S3
Launchpad Bug: |
------------------------------+------------------------
Comment (by warner):
(fixed title: http://www.angryflower.com/bobsqu.gif)
I'd like to point out that the use of a {{{Crawler}}} at all is deeply
intertwined with the way the shares are being stored. We decided early
on that we'd prefer a storage scheme in which the share files are the
primary source of truth, and that anything else is merely a volatile
performance-enhancing cache that could be deleted at any time without
long-term information loss. The idea was to keep the storage model
simple for server-admins, letting them correctly assume that shares
could be migrated by merely copying sharefiles from one box to another.
(write-enablers violate this assumption, but we're working on that).
Those Crawlers exist to manage things like lease-expiration and
stats-gathering from a bunch of independent sharefiles, both handling
the initial bootstrap case (i.e. you've just upgraded your storage
server to a version that knows how to expire leases) and later recovery
cases (i.e. you've migrated some shares into your server, or you
manually deleted shares for some reason). It assumes that share metadata
can be retrieved quickly (i.e. fast local disk).
If a server is using a different backend, these rules and goals might
not apply. For example, if shares are being stored in S3, are shares
stored in a single S3 object each? How important is it that you be able
to add or remove objects without going through the storage server? It
may be a lot easier/faster to use a different approach:
* all shares must be added/removed through the server: manual tinkering
can knock things out of sync
* canonical share metadata could live in a separate database, updated
by the server upon each change (maybe AWS's SimpleDB?)
* upgrade (introducing a new feature like lease-expiry) could be
accomplished with an offline process: to upgrade the server, first
stop the server, then run a program against the backing store and DB,
then launch the new version of the server. That would reduce the
size, complexity, and runtime cost of the actual server code.
Anyawys, my point is that you shouldn't assume a Crawler is the best way
to do things, and that therefore you must find a way to port the Crawler
code to a new backend. It fit a specific use-case for local disk, but
it's pretty slow and resource-intensive, and for new uses (i.e.
Accounting) I'm seriously considering finding a different approach.
Don't be constrained by that particular design choice for new backends.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1471#comment:1>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list