#1832 new enhancement

support indefinite leases with garbage collection

Reported by: zooko Owned by:
Priority: normal Milestone: undecided
Component: code-network Version: 1.9.2
Keywords: leases gc garbage-collection accounting Cc:
Launchpad Bug:

Description (last modified by Guido Witmond)

LeastAuthority.com runs a storage server and we want to offer our customers an indefinite (within the scope of their business relationship with us) lease. That is: as long as they keep paying their bills (or even longer, if we choose to keep their ciphertext until they bring their account back into good standing) we will not delete ciphertext that their LAFS storage client has marked as something to keep, even if they don't successfully get their LAFS storage client to renew leases ever again.

We need this, because the current protocol offers us only two options, neither of which is acceptable:

  • We can turn on periodic, lease-renewal-based garbage collection, but if the customer fails to renew the leases on their data, it will get deleted. We don't believe that's an acceptable risk to offer our customers.

Or:

  • We can turn off garbage collection altogether and keep everything that the customer ever uploaded, and charge them for keeping it.

The latter is what we currently do, but it isn't sustainable, because:

  • Some of the ciphertext that the customer has uploaded is no longer valuable to them, and their LAFS client could inform us of the fact that we need to delete it (or at the very least, to stop charging them for the cost of keeping it!)

and even more difficult:

  • Some of the ciphertext that the customer has uploaded is no longer valuable to them, but they've forgotten all about it and deleted or lost all of their links to it, and their LAFS storage client never informed us about this fact (possibly due to a network failure when it tried to inform us).

This implies that to satisfy this use case, there must be a protocol whereby the LAFS client can tell the storage server "Okay, here are some ciphertext shares which as of now I want to keep, and any other ones that you might have I hereby cease paying for, so you'd better delete them, if they exist."

Now, it would be troublesome for the LAFS client to be required to build a complete manifest of all ciphertext shares that it wants to keep and then deliver that entire manifest to the server at once. So, a better, more incremental algorithm that would satisfy this use case is like this:

The following protocol is what the LAFS client does when it wants to stop paying for everything not-reachable from a given root.

  1. The client asks the server for a magic token which is something that is meaningful only to the server. The meaning of this is "Give me a special token that when I later give it back to you, you'll know you can delete everything that I didn't touch since you created this special token." As a matter of implementation, the storage server will find it convenient to use a timestamp from his clock to be the token, but in order to deter the client from comparing it to timestamps from the client's clock, it is sent as a string. Ooh, in fact, the server may have actually encrypted the token with a secret key known only to the server and unknown to the client, just to prevent the client from comparing its value to a value taken from their own clock. This way, the protocol adds absolutely no requirement for clock sync between the client's clock and the server's clock, but instead this "timestamp" is derived from the server's clock, and is only ever compared to the server's clock. If the server's clock is set to 1969 and the client's clock is set to 2099, or vice versa, that's fine.
  1. The client starts traversing the files from the root, and for each one (or batch of them) it sends a message to the storage server saying "Please mark these ones as to-keep.". The server replies "Okay, done." (This "mark as to-keep" can be implemented as a "100 year lease" if that helps implementation.) It might make sense for the client to send the magic token with every one of these requests, so it means "Please mark these ones as to-keep when you do the garbage-collection sweep associated with this token.". Or "Please exempt these ones from a probably future garbage-collection sweep that will, when and if it comes, be associated with this token.".

Note: If the client crashes or gets stopped and restarted or loses and regains connection to the storage server during this process, it can always resume at step 2, provided that it still has the "token" from step 1 written down.

Note: If the client creates new files during this process, the newly created file comes with an equivalent mark ("100 year lease"), so that the client doesn't have to worry about race conditions between its traversal for marking keepers and its addition of new files.

  1. Once the client is satisfied that it has marked all files that it wants to keep, then it sends a message to the storage server saying "Here is a magic token that you gave me earlier. I hereby cease paying you for any files that I haven't marked (or created) since you gave me that magic token.".

(discussed on the mailing list)

Change History (7)

comment:1 Changed at 2012-10-30T18:47:04Z by zooko

  • Description modified (diff)

comment:2 Changed at 2012-10-30T20:50:42Z by zooko

  • Description modified (diff)

comment:3 Changed at 2012-10-31T00:06:02Z by terrell

  • Description modified (diff)

comment:4 follow-up: Changed at 2012-11-19T01:11:18Z by gdt

The mark/sweep scheme sounds reasonable to me. In addition, I as a server operator want a way to ask (with CLI tools - no web browser!) how much storage space is in use total, and how much due to leases from various entities that store data, perhaps even by age of lease. And I want to be able to delete files that are old and belong to some entity, in various combinations. I don't mean one would often want to do this; this is the equivalent of the sysadmin deleting big files to restore the system to functioning after users are warned and don't clean up.

comment:5 in reply to: ↑ 4 Changed at 2012-11-21T00:51:52Z by zooko

Replying to gdt:

The mark/sweep scheme sounds reasonable to me. In addition, I as a server operator want a way to ask (with CLI tools - no web browser!) how much storage space is in use total, and how much due to leases from various entities that store data, perhaps even by age of lease. And I want to be able to delete files that are old and belong to some entity, in various combinations. I don't mean one would often want to do this; this is the equivalent of the sysadmin deleting big files to restore the system to functioning after users are warned and don't clean up.

gdt: the asking-about-resource-usage part would be facilitated by #1836. Would you please open a separate ticket asking for the command-line interface to query these things? Go ahead and specify exactly how it should be spelled!

Please open a separate ticket for a command-line to delete specific shares. (I guess using their verify-cap, and optionally their shnum(s) as the arguments?)

comment:6 Changed at 2013-07-12T14:15:19Z by Guido Witmond

  • Description modified (diff)

From a private mail message about my ventures into offering rented tahoe server nodes:

The automatic garbage collection of Tahoe gave me the biggest headache. It gives the customers a choice of wasting space (for money) or risk losing data when they don't have their repair and renew lease process in order. It's inherent in the protocol where the clients link the blocks together and the storage nodes decide when to delete.

To me, that makes the service rather brittle for long term fire-and-forget backup (which was my main goal). I've toyed with ideas of creating a verifier service where client nodes store a list of CAPs that need be kept. The verifier would regularly check the health and repair if necessary and keep a log of that for later inspection.

It would allow me to control the match the repair frequency and garbage collection to prevent accidental data loss, making my service more robust than the competition ;-) It also spares a lot of traffic on 3G capped connections.

The customer should be able to trust the protocol that when s/he switches off the client but keeps paying the service that all files are there, unconditionally.

comment:7 Changed at 2015-06-09T17:08:08Z by daira

  • Keywords gc added
Note: See TracTickets for help on using tickets.