[tahoe-dev] [tahoe-lafs] #633: lease-expiring share crawler

Wed Feb 18 15:01:49 PST 2009

#633: lease-expiring share crawler
--------------------------+-------------------------------------------------
 Reporter:  warner        |           Owner:  warner
     Type:  task          |          Status:  new   
 Priority:  major         |       Milestone:  1.4.0 
Component:  code-storage  |         Version:  1.3.0 
 Keywords:                |   Launchpad_bug:        
--------------------------+-------------------------------------------------

Comment(by warner):

 Hm, come to think of it, a share-crawler would also be useful to keep
 track of how many shares are being managed by this server. At the moment
 we have some broken scripts that try to estimate this by watching a
 couple of prefixdirs on a few servers. A crawler which loops once every
 few days could give us a better estimate. Of course, we could get the
 same information out of a lease DB in O(1) time, in exchange for
 complexity and a constant-time overhead per share add/remove.

 If we have multiple crawlers, it might be a good idea to combine them
 into a single crawler, basically to improve locality of reference and be
 kinder to the filesystem's directory cache.

 Hm, so it feels like a crawler is either a transition tool (used to
 first populate the lease DB, or convert shares to a new format, or
 something), or a fallback/error-recovery tool (to detect problems in the
 DB, or rebuild it after it gets corrupted), or something to use in the
 interim until we build ourselves a fast database (like for a
 share-counter, or a local-share-verifier). Maybe it is not deserving of
 the hassle of merging multiple crawlers into a single one.

 Some tests I just ran on a prodnet storage server (pt4.st4, with about
 1TB of shares) show that it takes about 130-200ms to list the buckets in
 each prefixdir (with a lukewarm cache.. with a hot one, it's closer to
 17ms). There are 1040 prefixdirs, and on this server each one has an
 average of 2460 buckets, giving us about 2.56M buckets total. Actually
 listing the shares in a prefixdir takes considerably longer, more like
 55 seconds, since it requires accessing all 2460 bucketdirs, which
 suggests that merely enumerating every share on this server would take
 57ksec, or 16 hours. And doing a stat() on every file in a prefixdir
 takes 76s, which suggests all bucketdirs would take 79ksec, or 22 hours.
 A hot cache again brings down the stat() time considerably, to about
 100ms per prefixdir.

 Reading something from each file takes even longer. The other data point
 I have is from several months ago, and I don't remember which server it
 was run on. What I seem to remember was 5 hours to do a 'find' of all
 shares, and 12 hours to create a "share catalog", which must read the
 header and leases from each share.

 The normal upload/download/do-you-have-block traffic of a tahoe storage
 server will cause most of the prefixdirs to be cached (this is the
 "lukewarm" state I mentioned above), so the crawler can assume that it
 will be cheap to learn the bucketdir names. To do anything with the
 actual shares, the crawler will have to bring the bucketdir contents
 into memory, which should be assumed to be fairly expensive.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/633#comment:4>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid