[tahoe-dev] How does file deletion work?
Rufus Pollock
rufus.pollock at okfn.org
Fri Jul 24 11:16:22 PDT 2009
2009/7/15 Brian Warner <warner-tahoe at allmydata.com>:
>
> Note that the default server configuration is conservative and does not
> perform any GC at all. You must explicitly enable anything which threatens
> data safety like GC. When you enable it, you will provide the expiration
> timeout, which of course determines how quickly your server's users must
> renew their leases.
And in the absence of GC (which may threaten data safety): how do I do
file deletion? (From my reading of the rest of your email I think the
answer is you can't ...)
> Basically, since we have a distributed filesystem (which makes reference
> counting expensive at best) with least-authority access semantics (which
> makes it impossible for the servers to count references themselves), the
> easiest and most reliable approach to deletion is to use leases, timers, and
> garbage collection.
Right, though couldn't one have alternatives like each node having an
optional configuration variable holding the public key of an
"administrator". The private key would then act as a capability token
granting the ability to e.g. delete nodes etc
>> 3. If a file is listed in a directory then this will lead
>> automatically to renewal of the relevant leases
>
> Nope, not without client involvement. As other folks have pointed out,
> servers can't read directories (since they don't hold a readcap). So clients
> (who *do* hold a readcap, or hold one to a parent directory, which gives them
> transitive access) must be responsible for performing renewals.
>
> (if you do a periodic "tahoe deep-check --add-lease ROOT:", then your
> statement becomes true, because your client is doing the work of finding
> everything reachable from that directory and renewing the leases, and merely
> adding/removing a file to any reachable directory is enough to start/stop
> renewing its lease)
And this can be done by any storage node with access to that root
directory right?
[...]
> If you're confident that you can enumerate all the files and directories that
> you care about, you can periodically compare this manifest against a previous
> version, and then send out explicit lease-cancel message for the objects that
> are no longer on the list. (the "cancel lease XYZ" message is the closest
> we've got to actual server-side deletion). But note that if you get it wrong
> (perhaps due to race conditions between two machines modifying a shared
> directory), you could accidentally lose files. Adding one lease per starting
> point (i.e. per root directory) per walker instance feels like it might avoid
> the race worries.
Maybe I've missed it in the webapi.txt but can you send lease
cancellation via the webapi (and what other lease operations be done
that way?)
[...]
> Another idea (ticket #308) is to change the encryption format of dirnodes to
> introduce another level into the "writecap -> readcap -> verifycap"
> hierarchy. The new order would be "writecap -> readcap -> traversalcap ->
> verifycap", and a traversalcap would give somebody the ability to get the
> traversalcaps of all children (as well as the verifycap of the dirnode
> itself). Then, if you gave a hypothetical Lease Renewal Service the
> traversalcap of your root directory (as well as your master lease-renewing
> secret), they could renew all of your leases, but couldn't actually read your
> directories or files. (this requires "semi-private DSA keys", see
> http://allmydata.org/trac/pycryptopp/ticket/13 for details). You might be
> willing to reveal the shape of your directory structure to this service, in
> exchange for letting it take responsibility for your lease-renewal duties. If
> the service lives network-wise close to the storage servers, it may be
> considerably faster too.
And certainly is the case where I don't care about giving out
information because I'm running a grid for open data!
Thanks once again for your deleted explanations.
Regards,
Rufus
More information about the tahoe-dev
mailing list