id,summary,reporter,owner,description,type,status,priority,milestone,component,version,resolution,keywords,cc,launchpad_bug 119,lease expiration / deletion / garbage-collection,warner,,"I think the last Big Thing we need to develop (as opposed to implement or fix) is a structure to both maintain the long-term health of files and also insure their eventual deletion. I think these need to be developed together, since they are closely related. Leases need to expire after a while (we're thinking of one month as a good timeout). Files that are supposed to stick around longer than this either need to be kept alive by the original uploader or by someone to whom they've delegated this task. If the original uploader expects to be around at least once a month, they can do it themselves, but for a backup application we can't impose this requirement. We refer to this task as ""refreshing"", and the provider of this service is either doing it out of the kindness of their heart (in the friend-net use case) or as part of a paid service (in the commercial-offering use case). The refreshing process will also perform ""file checking"", which is simply counting the number of shares that are available for any given file. This gives a rough measure of the ""health"" of the file. The process may also perform ""file verification"" from time to time, which is downloading the crypttext and checking its hash against the value in the URI extension block. If either checking/verification process discovers a problem, the ""file repairer"" may be triggered, which uses the remaining shares to reconstruct the correct crypttext, then re-encodes and re-uploads any shares which have been lost. This series of processes all serve to improve the health of the file, at various bandwidth/CPU costs: refreshing/checking is cheap, repair/re-upload is expensive. The intent is to use the refreshing service to keep the file as healthy as possible at low cost, and use the checker results to trigger more costly repair operations as little as possible. Refreshing must take place at least once a month to keep the leases alive. The required filecheck frequency wil depend upon how quickly storage servers drop out of the grid: we expect that files will undergo an exponential decay curve, so we must do checks frequently enough to reduce the chance that the health will decay beyond repair. The exact parameters will be tunable, of course, to pick a tradeoff between bandwidth consumed and the chance that a file will decay too quickly to be saved. Files that are deleted from a vdrive need to have their shares dereferenced in a timely fashion (I'm thinking by the end of the day for this). If the reference count drops to zero, the share should be deleted immediately (for a storage server on a home user's machine who wants their disk for other purposes), or marked for deletion as soon as the storage is needed for something else (for a dedicated commercial server with nothing better to do with that disk space; there's a chance that someone will re-upload the file that was just deleted, and if the share is still around then we can avoid repeating the upload). Deleted files should also be removed from the filechecker and repair mechanisms. Note that files should be deleted promptly, rather than allowing their leases to expire on their own, to reduce the storage overhead (storage consumed beyond that required to desired files). The lease expiration mechanism is a necessary fallback to keep storage usage from growing without bound, but without prompt deletion, high churn rates could cause actual storage consumed to grow larger than desired. Finally, many of our use cases will want to enforce a utilization quota on each user, limiting the amount of storage space they are allowed to consume. The file-repair service may be a good place to enforce this (with a rule saying that you can upload as much as you want, but the repair service won't help you exceed your quota). Eventually we may want each client to have membership credentials which would allow storage servers to measure how much space each client is consuming: with this, a daily (or slower) process could calculate how much global space is consumed by each client, and flag or revoke membership for clients which use more space than they've contracted for. ",task,closed,major,1.4.1,code-storage,0.7.0,fixed,,,