[tahoe-dev] [tahoe-lafs] #633: lease-expiring share crawler
tahoe-lafs
trac at allmydata.org
Wed Feb 18 15:01:49 PST 2009
#633: lease-expiring share crawler
--------------------------+-------------------------------------------------
Reporter: warner | Owner: warner
Type: task | Status: new
Priority: major | Milestone: 1.4.0
Component: code-storage | Version: 1.3.0
Keywords: | Launchpad_bug:
--------------------------+-------------------------------------------------
Comment(by warner):
Hm, come to think of it, a share-crawler would also be useful to keep
track of how many shares are being managed by this server. At the moment
we have some broken scripts that try to estimate this by watching a
couple of prefixdirs on a few servers. A crawler which loops once every
few days could give us a better estimate. Of course, we could get the
same information out of a lease DB in O(1) time, in exchange for
complexity and a constant-time overhead per share add/remove.
If we have multiple crawlers, it might be a good idea to combine them
into a single crawler, basically to improve locality of reference and be
kinder to the filesystem's directory cache.
Hm, so it feels like a crawler is either a transition tool (used to
first populate the lease DB, or convert shares to a new format, or
something), or a fallback/error-recovery tool (to detect problems in the
DB, or rebuild it after it gets corrupted), or something to use in the
interim until we build ourselves a fast database (like for a
share-counter, or a local-share-verifier). Maybe it is not deserving of
the hassle of merging multiple crawlers into a single one.
Some tests I just ran on a prodnet storage server (pt4.st4, with about
1TB of shares) show that it takes about 130-200ms to list the buckets in
each prefixdir (with a lukewarm cache.. with a hot one, it's closer to
17ms). There are 1040 prefixdirs, and on this server each one has an
average of 2460 buckets, giving us about 2.56M buckets total. Actually
listing the shares in a prefixdir takes considerably longer, more like
55 seconds, since it requires accessing all 2460 bucketdirs, which
suggests that merely enumerating every share on this server would take
57ksec, or 16 hours. And doing a stat() on every file in a prefixdir
takes 76s, which suggests all bucketdirs would take 79ksec, or 22 hours.
A hot cache again brings down the stat() time considerably, to about
100ms per prefixdir.
Reading something from each file takes even longer. The other data point
I have is from several months ago, and I don't remember which server it
was run on. What I seem to remember was 5 hours to do a 'find' of all
shares, and 12 hours to create a "share catalog", which must read the
header and leases from each share.
The normal upload/download/do-you-have-block traffic of a tahoe storage
server will cause most of the prefixdirs to be cached (this is the
"lukewarm" state I mentioned above), so the crawler can assume that it
will be cheap to learn the bucketdir names. To do anything with the
actual shares, the crawler will have to bring the bucketdir contents
into memory, which should be assumed to be fairly expensive.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/633#comment:4>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list