[tahoe-dev] Getting my root writecap for the production grid

zooko zooko at zooko.com
Thu Jul 31 10:41:02 PDT 2008


Dear Brian:

After the big discussion about accounting and garbage collection that  
you and me and Greg Hazel had two weeks ago in Boulder, we agreed  
that we needn't start by implementing time-based deletion of expired  
data, but instead we could start with an active "decref this now"  
command to delete data.  Here are the notes that you so helpfully  
wrote up and by which I am able to remember what we concluded:

http://allmydata.org/pipermail/tahoe-dev/2008-July/000700.html

The disadvantage of active decref as opposed to expiry is that when  
the decref command fails then this results in uncollected garbage.  
The advantage is that it eases the three contending pressues in your  
diagram -- it allows low traffic and high reliability while still  
having (we think) a low rate of uncollected garbage, comparable to  
the expiry-based approach.

Did you change your mind or forget our plan about expiry vs. decref  
since you wrote that message on the 21st?  Or do I misunderstand your  
comments about using expirations in garbage collection in this post  
yesterday?

http://allmydata.org/pipermail/tahoe-dev/2008-July/000727.html

Regards,

Zooko


Below is nothing but the quoted content of the message to which I'm  
replying.

On Jul 30, 2008, at 19:17 PM, Brian Warner wrote:

> Yeah, that's the plan.. I'm thinking POST /uri/ROOTDIRCAP?t=deep- 
> renew which
> will recursively update the lease timer on everything you can reach  
> from that
> root. I'm figuring we do this once a week or once a month, and then  
> have the
> servers flag anything with a timer more than a few months old as  
> garbage.
>
> There's a whole bunch of tradeoffs here: reliability vs traffic vs  
> garbage. I
> have a diagram (offline) of the issues: if you put renewal time on  
> one axis,
> and expiration time on the other, then you get three unreconcileable
> pressures: long renewal time to get low traffic, short expiration  
> time to get
> minimal garbage, large expire/renewal ratio to get high reliability.
>
>> Better yet, why not garbage collect any files that aren't linked  
>> to a name
>> inside any directory linked from the root (perhaps through more
>> directories)?
>
> That's also basically the plan, except for files that have been  
> uploaded
> outside a directory structure (the "unlinked upload" feature) that  
> somebody
> wants to retain. The main issue here is who is able to renew the  
> leases. The
> servers can't see inside the directories.
>
> We've been planning to make lease-renewal something that you can  
> safely
> delegate to someone else, a sort of "renewer service": they get to  
> renew your
> leases for you, but they don't get to read your plaintext. For  
> example, it
> would be completely appropriate for Allmydata to provide a service  
> like this.
>
> One current vague plan has been for the client to give a list of their
> renewal-caps (basically the same as a verify-cap, which t=manifest has
> returned since v1.0.0) to the renewer service. The client would  
> walk their
> rootcaps every once in a while (maybe once a day) to build up their  
> current
> manifest (a list of everything they want to keep around), then give  
> it to
> this service. The service would then take responsibility for  
> renewing the
> leases every week. The client could go offline for an extended  
> period of
> time, but the renewer would keep their files alive. The benefit  
> here is that
> clients don't need to share their rootcaps with the renewer; the  
> downside is
> that they have to do a recursive walk to figure out what *does*  
> need to be
> given to the renewer, and that they need to have a pretty  
> comprehensive view
> of what their rootcaps are (so they don't tell the renewer to  
> forget about
> something that is not actually garbage).
>
> There are problems with that plan, so as an intermediate position  
> I've been
> thinking about abandoning the manifest-of-verifycaps scheme and just
> concentrating on the recursive walk. Somebody would be responsible  
> for doing
> a deep-renew from their rootcap on a regular basis. That somebody  
> might be
> the client, or it might be somebody else.
>
> We're planning to introduce "traversal caps" in the new DSA-based  
> dirnodes
> (ticket #217), which would let the holder do a recursive walk but  
> not see any
> of the plaintext.. with these, the renewer agent (i.e. allmydata)  
> could hold
> the traversal cap and renew your files for you, even though we  
> can't see your
> plaintext.
>
> In the meantime, allmydata currently retains rootcaps for all our  
> customers
> (both to provide password recovery and to let us do recursive  
> traversals like
> this). So we could just have our own servers do a recursive walk- 
> and-renew on
> all files reachable from those rootcaps. When we get #217 done and  
> switch
> people over to DSA-based files, we can retain the traversal-cap  
> instead (and
> at least put the rootcap in a different place, available for password
> recovery but not generally available to our deep-renew cronjob).
>
>  (the action item for 3rd-party allmydata client code is to be  
> prepared to do
>  this deep-renew operation on any directories that allmydata  
> doesn't know
>  about, since we won't be able to do it for you)
>
>> 3. Also, if anyone is interested, I would be willing to open source
>> (at least) the lowest level Tahoe client portion of my project. That
>> portion is a very simple Ruby library for performing basic Tahoe  
>> operations
>> (put, get, rename, delete, mkdir, list, attributes). Similar to  
>> the AWS::S3
>> (amazon.rubyforge.org) library.
>
> That's great.. is this a front-end to the Tahoe webapi?
>
> cheers,
>  -Brian


More information about the tahoe-dev mailing list