[tahoe-lafs-trac-stream] [tahoe-lafs] #1834: stop using share crawler for anything except constructing a leasedb

tahoe-lafs trac at tahoe-lafs.org
Tue May 28 02:01:56 UTC 2013


#1834: stop using share crawler for anything except constructing a leasedb
-------------------------+-------------------------------------------------
     Reporter:  zooko    |      Owner:
         Type:  defect   |     Status:  new
     Priority:  normal   |  Milestone:  undecided
    Component:  code-    |    Version:  1.9.2
  storage                |   Keywords:  leases garbage-collection
   Resolution:           |  accounting performance crawlers
Launchpad Bug:           |
-------------------------+-------------------------------------------------
Changes (by daira):

 * keywords:  leases garbage-collection accounting => leases garbage-
     collection accounting performance crawlers


Old description:

> I think we should stop using a "share crawler" — a long-running,
> persistent, duty-cycle-limited process that visits every share held by a
> storage server — for everything that we can.
>
> And, I think that the only thing that we can't do in a different way is:
> construct a leasedb when we are first upgrading the server to a leasedb-
> capable version, or the leasedb has been lost or corrupted.
>
> Here are the other things that are currently done by crawlers and how I
> think they should be done differently:
>
> * Updating and/or checking the leases on shares to see if they have
> expired;
>
> On David-Sarah's 666-accounting branch, this is now done for all shares
> by a single, synchronous command/query to leasedb. (#666)
>
> * Delete shares that have lost all their leases (by cancellation or
> expiry);
>
> I propose that this be done instead by the storage server maintaining a
> persistent set of shares to be deleted. When lease-updating step (which,
> in #666, is synchronous and fast) has identified a share that has no more
> leases, the share's id gets added to the persistent set of shares to
> delete. A long-running, persistent, duty-cycle-limited processes deletes
> those shares from the backend and removes their ids from the set of
> shares-to-delete. This is cleaner and more efficient than using a
> crawler, which has to visit ''all'' shares and which never stops
> twitching, since this has to visit only shares that have been marked as
> to-delete, and it quiesces when there is nothing to delete. (#1833 —
> storage server deletes garbage shares itself instead of waiting for
> crawler to notice them)
>
> * Discover newly added shares that the operator copied into the backend
> without notifying the storage server;
>
> I propose that we stop supporting this use case. It can be replaced by
> some combination of: 1. requiring you to run a tahoe-lafs storage client
> tool (a share migration tool) to upload the shares through the server
> instead of copying the shares directly into the backend, 2. various
> kludgy workarounds, 3. a new tool for registering specific storage
> indexes in the leasedb after you've added the shares directly into the
> backend, or 4. simply requiring that the operator manually trigger the
> crawler to start instead of expecting the crawler to run continuously.
> (#1835 — stop grovelling the whole storage backend looking for
> externally-added shares to add a lease to)
>
> * Count how many shares you have;
>
> This can be nicely replaced by leasedb (a simple SQL "COUNT" query), and
> also the functionality can be extended to compute the aggregate sizes of
> data in addition to the mere number of objects, which would be very
> useful for customers of !LeastAuthority.com (who pay per byte), among
> others. (#1836 — stop crawling share files in order to figure out how
> many shares you have)
>
> What would be better about removing these uses of crawler?
>
> 1. The storage server would be more efficient in terms of accesses to its
> storage backend. This might turn out to matter when the storage backend
> is a cloud storage service and you pay per API call. (Or it might not, if
> the cost is cheap enough and the crawly way to do it is efficient
> enough.)
>
> 2. The crawling would be a quiescent process — something that finishes
> its job and then stops, and doesn't start again unless a user tells it
> to. I like this way of doing things. See
> wiki:FAQ#Q18_unobtrusive_software .
>
> 3. Some of these operations would be faster and better if done in the
> newly proposed way instead of by relying on a crawler.

New description:

 I think we should stop using a "share crawler" — a long-running,
 persistent, duty-cycle-limited process that visits every share held by a
 storage server — for everything that we can.

 And, I think that the only thing that we can't do in a different way is:
 construct a leasedb when we are first upgrading the server to a leasedb-
 capable version, or the leasedb has been lost or corrupted.

 Here are the other things that are currently done by crawlers and how I
 think they should be done differently:

 * Updating and/or checking the leases on shares to see if they have
 expired;

 On David-Sarah's 666-accounting branch, this is now done for all shares by
 a single, synchronous command/query to leasedb. (#666)

 * Delete shares that have lost all their leases (by cancellation or
 expiry);

 I propose that this be done instead by the storage server maintaining a
 persistent set of shares to be deleted. When lease-updating step (which,
 in #666, is synchronous and fast) has identified a share that has no more
 leases, the share's id gets added to the persistent set of shares to
 delete. A long-running, persistent, duty-cycle-limited processes deletes
 those shares from the backend and removes their ids from the set of
 shares-to-delete. This is cleaner and more efficient than using a crawler,
 which has to visit ''all'' shares and which never stops twitching, since
 this has to visit only shares that have been marked as to-delete, and it
 quiesces when there is nothing to delete. (#1833 — storage server deletes
 garbage shares itself instead of waiting for crawler to notice them)

 * Discover newly added shares that the operator copied into the backend
 without notifying the storage server;

 I propose that we stop supporting this use case. It can be replaced by
 some combination of: 1. requiring you to run a tahoe-lafs storage client
 tool (a share migration tool) to upload the shares through the server
 instead of copying the shares directly into the backend, 2. various kludgy
 workarounds, 3. a new tool for registering specific storage indexes in the
 leasedb after you've added the shares directly into the backend, or 4.
 simply requiring that the operator manually trigger the crawler to start
 instead of expecting the crawler to run continuously. (#1835 — stop
 grovelling the whole storage backend looking for externally-added shares
 to add a lease to)

 * Count how many shares you have;

 This can be nicely replaced by leasedb (a simple SQL "COUNT" query), and
 also the functionality can be extended to compute the aggregate sizes of
 data in addition to the mere number of objects, which would be very useful
 for customers of !LeastAuthority.com (who pay per byte), among others.
 (#1836 — stop crawling share files in order to figure out how many shares
 you have)

 What would be better about removing these uses of crawler?

 1. The storage server would be more efficient in terms of accesses to its
 storage backend. This might turn out to matter when the storage backend is
 a cloud storage service and you pay per API call. (Or it might not, if the
 cost is cheap enough and the crawly way to do it is efficient enough.)

 2. The crawling would be a quiescent process — something that finishes its
 job and then stops, and doesn't start again unless a user tells it to. I
 like this way of doing things. See wiki:FAQ#Q18_unobtrusive_software .

 3. Some of these operations would be faster and better if done in the
 newly proposed way instead of by relying on a crawler.

--

Comment:

 Replying to [comment:4 zooko]:
 > I'm pretty interested in taking this design to the extreme to get the
 best efficiency. In that extreme, we ''never'' go to persistent storage
 for either read or write (or existence check) — which requires at least a
 disk seek for a direct-attached-storage backend or at least a cloud
 service API request for a cloud backend — unless the leasedb ''told'' us
 to go to persistent storage. (Except, in the case that we're currently
 building or rebuilding leasedb by crawling persistent storage.)

 I agree with returning a positive response to an existence check (DYHB)
 when the leasedb says we have a share. The case where we turn out not to
 actually have the share is an error that the downloader, uploader, or
 repairer should tolerate anyway.

 I think that returning a negative response to an existence check when the
 leasedb says we don't have a share is more debatable. In principle, this
 shouldn't affect latency of downloads because the downloader should use
 the first {{{shares.needed}}} servers that respond positively, so the
 latency of negative responses shouldn't matter.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1834#comment:5>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list