[tahoe-lafs-trac-stream] [tahoe-lafs] #1834: stop using share crawler for anything except constructing a leasedb
tahoe-lafs
trac at tahoe-lafs.org
Tue May 28 02:01:56 UTC 2013
#1834: stop using share crawler for anything except constructing a leasedb
-------------------------+-------------------------------------------------
Reporter: zooko | Owner:
Type: defect | Status: new
Priority: normal | Milestone: undecided
Component: code- | Version: 1.9.2
storage | Keywords: leases garbage-collection
Resolution: | accounting performance crawlers
Launchpad Bug: |
-------------------------+-------------------------------------------------
Changes (by daira):
* keywords: leases garbage-collection accounting => leases garbage-
collection accounting performance crawlers
Old description:
> I think we should stop using a "share crawler" — a long-running,
> persistent, duty-cycle-limited process that visits every share held by a
> storage server — for everything that we can.
>
> And, I think that the only thing that we can't do in a different way is:
> construct a leasedb when we are first upgrading the server to a leasedb-
> capable version, or the leasedb has been lost or corrupted.
>
> Here are the other things that are currently done by crawlers and how I
> think they should be done differently:
>
> * Updating and/or checking the leases on shares to see if they have
> expired;
>
> On David-Sarah's 666-accounting branch, this is now done for all shares
> by a single, synchronous command/query to leasedb. (#666)
>
> * Delete shares that have lost all their leases (by cancellation or
> expiry);
>
> I propose that this be done instead by the storage server maintaining a
> persistent set of shares to be deleted. When lease-updating step (which,
> in #666, is synchronous and fast) has identified a share that has no more
> leases, the share's id gets added to the persistent set of shares to
> delete. A long-running, persistent, duty-cycle-limited processes deletes
> those shares from the backend and removes their ids from the set of
> shares-to-delete. This is cleaner and more efficient than using a
> crawler, which has to visit ''all'' shares and which never stops
> twitching, since this has to visit only shares that have been marked as
> to-delete, and it quiesces when there is nothing to delete. (#1833 —
> storage server deletes garbage shares itself instead of waiting for
> crawler to notice them)
>
> * Discover newly added shares that the operator copied into the backend
> without notifying the storage server;
>
> I propose that we stop supporting this use case. It can be replaced by
> some combination of: 1. requiring you to run a tahoe-lafs storage client
> tool (a share migration tool) to upload the shares through the server
> instead of copying the shares directly into the backend, 2. various
> kludgy workarounds, 3. a new tool for registering specific storage
> indexes in the leasedb after you've added the shares directly into the
> backend, or 4. simply requiring that the operator manually trigger the
> crawler to start instead of expecting the crawler to run continuously.
> (#1835 — stop grovelling the whole storage backend looking for
> externally-added shares to add a lease to)
>
> * Count how many shares you have;
>
> This can be nicely replaced by leasedb (a simple SQL "COUNT" query), and
> also the functionality can be extended to compute the aggregate sizes of
> data in addition to the mere number of objects, which would be very
> useful for customers of !LeastAuthority.com (who pay per byte), among
> others. (#1836 — stop crawling share files in order to figure out how
> many shares you have)
>
> What would be better about removing these uses of crawler?
>
> 1. The storage server would be more efficient in terms of accesses to its
> storage backend. This might turn out to matter when the storage backend
> is a cloud storage service and you pay per API call. (Or it might not, if
> the cost is cheap enough and the crawly way to do it is efficient
> enough.)
>
> 2. The crawling would be a quiescent process — something that finishes
> its job and then stops, and doesn't start again unless a user tells it
> to. I like this way of doing things. See
> wiki:FAQ#Q18_unobtrusive_software .
>
> 3. Some of these operations would be faster and better if done in the
> newly proposed way instead of by relying on a crawler.
New description:
I think we should stop using a "share crawler" — a long-running,
persistent, duty-cycle-limited process that visits every share held by a
storage server — for everything that we can.
And, I think that the only thing that we can't do in a different way is:
construct a leasedb when we are first upgrading the server to a leasedb-
capable version, or the leasedb has been lost or corrupted.
Here are the other things that are currently done by crawlers and how I
think they should be done differently:
* Updating and/or checking the leases on shares to see if they have
expired;
On David-Sarah's 666-accounting branch, this is now done for all shares by
a single, synchronous command/query to leasedb. (#666)
* Delete shares that have lost all their leases (by cancellation or
expiry);
I propose that this be done instead by the storage server maintaining a
persistent set of shares to be deleted. When lease-updating step (which,
in #666, is synchronous and fast) has identified a share that has no more
leases, the share's id gets added to the persistent set of shares to
delete. A long-running, persistent, duty-cycle-limited processes deletes
those shares from the backend and removes their ids from the set of
shares-to-delete. This is cleaner and more efficient than using a crawler,
which has to visit ''all'' shares and which never stops twitching, since
this has to visit only shares that have been marked as to-delete, and it
quiesces when there is nothing to delete. (#1833 — storage server deletes
garbage shares itself instead of waiting for crawler to notice them)
* Discover newly added shares that the operator copied into the backend
without notifying the storage server;
I propose that we stop supporting this use case. It can be replaced by
some combination of: 1. requiring you to run a tahoe-lafs storage client
tool (a share migration tool) to upload the shares through the server
instead of copying the shares directly into the backend, 2. various kludgy
workarounds, 3. a new tool for registering specific storage indexes in the
leasedb after you've added the shares directly into the backend, or 4.
simply requiring that the operator manually trigger the crawler to start
instead of expecting the crawler to run continuously. (#1835 — stop
grovelling the whole storage backend looking for externally-added shares
to add a lease to)
* Count how many shares you have;
This can be nicely replaced by leasedb (a simple SQL "COUNT" query), and
also the functionality can be extended to compute the aggregate sizes of
data in addition to the mere number of objects, which would be very useful
for customers of !LeastAuthority.com (who pay per byte), among others.
(#1836 — stop crawling share files in order to figure out how many shares
you have)
What would be better about removing these uses of crawler?
1. The storage server would be more efficient in terms of accesses to its
storage backend. This might turn out to matter when the storage backend is
a cloud storage service and you pay per API call. (Or it might not, if the
cost is cheap enough and the crawly way to do it is efficient enough.)
2. The crawling would be a quiescent process — something that finishes its
job and then stops, and doesn't start again unless a user tells it to. I
like this way of doing things. See wiki:FAQ#Q18_unobtrusive_software .
3. Some of these operations would be faster and better if done in the
newly proposed way instead of by relying on a crawler.
--
Comment:
Replying to [comment:4 zooko]:
> I'm pretty interested in taking this design to the extreme to get the
best efficiency. In that extreme, we ''never'' go to persistent storage
for either read or write (or existence check) — which requires at least a
disk seek for a direct-attached-storage backend or at least a cloud
service API request for a cloud backend — unless the leasedb ''told'' us
to go to persistent storage. (Except, in the case that we're currently
building or rebuilding leasedb by crawling persistent storage.)
I agree with returning a positive response to an existence check (DYHB)
when the leasedb says we have a share. The case where we turn out not to
actually have the share is an error that the downloader, uploader, or
repairer should tolerate anyway.
I think that returning a negative response to an existence check when the
leasedb says we don't have a share is more debatable. In principle, this
shouldn't affect latency of downloads because the downloader should use
the first {{{shares.needed}}} servers that respond positively, so the
latency of negative responses shouldn't matter.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1834#comment:5>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list