[tahoe-dev] [tahoe-lafs] #633: lease-expiring share crawler

Wed Feb 18 13:21:15 PST 2009

#633: lease-expiring share crawler
--------------------------+-------------------------------------------------
 Reporter:  warner        |           Owner:  warner
     Type:  task          |          Status:  new   
 Priority:  major         |       Milestone:  1.4.0 
Component:  code-storage  |         Version:  1.3.0 
 Keywords:                |   Launchpad_bug:        
--------------------------+-------------------------------------------------

Comment(by warner):

 Hm, yeah, there are a number of optimizations that can take advantage of
 the
 fact that we're allowed to delete shares late. You can think of this as
 another factor in the tradeoff diagram I just attached to this ticket:
 with
 marginally increased complexity, we can reduce the CPU/diskIO costs, by
 increasing the lease expiration time.

 For example, we don't need to maintain an exact sorted order: if leases on
 A
 and B both don't expire for a month, we don't care (right now) whether A
 comes first or B does.. we can put off that sort for a couple of weeks.
 Likewise we don't care about timestamp resolution smaller than a day.

 I definitely like having the share contain the canonical lease
 information,
 and using the ancillary data structures merely as a cache. If we were to
 go
 with a traditional database (sqlite or the like), then I'd have the DB
 contain a table with (storageindex, leasedata, expirationtime), with an
 index
 on both storageindex and expirationtime, and the daily or hourly query
 would
 then be "SELECT storageindex FROM table WHERE expirationtime < now". We'd
 read the real lease data from the share before acting upon it (which
 incurs
 an IO cost, but share expiration is relatively infrequent, and the safety
 benefits are well worth it).

 Given the large number of shares we're talking about (a few million per
 server), I'm hesitant to create a persistent data structure that needs one
 file per share. The shares themselves are already wasting GBs of space on
 the
 minimum block size overhead. Mind you, ext3 is pretty good about zero-
 length
 files, a quick test shows that it spends one 4kB block for each 113 files
 (each named with the same length as one of our storage index strings, 26
 bytes, which means ext3's per-file overhead is an impressively-small 10.25
 bytes), so a million would take about 36MB.. not too bad.

 Having a separate directory for each second would probably result in a
 million directories, but a tree of expire-time directories (as you
 described)
 that only goes down to the kilosecond might be reasonably-sized. It would
 still require a slow initial crawl to set up, though.

 Incidentally, a slow-share-crawler could also be used to do local share
 verification (slowly read and check hashes on all local shares, to
 discover
 local disk failures before the filecap holder gets around to doing a
 bandwidth-expensive remote verification), and even server-driven repair
 (ask
 other servers if they have other shares for this file, perform ciphertext
 repair if it looks like the file needs it). Hm, note to self: server-
 driven
 repair should create new shares with the same lease expiration time as the
 original shares, so that it doesn't cause a garbage file to live forever
 like
 some infectious epidemic.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/633#comment:2>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid