﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
119	lease expiration / deletion / garbage-collection	warner		"I think the last Big Thing we need to develop (as opposed to implement or
fix) is a structure to both maintain the long-term health of files and also
insure their eventual deletion. I think these need to be developed together,
since they are closely related.

Leases need to expire after a while (we're thinking of one month as a good
timeout). Files that are supposed to stick around longer than this either
need to be kept alive by the original uploader or by someone to whom they've
delegated this task. If the original uploader expects to be around at least
once a month, they can do it themselves, but for a backup application we
can't impose this requirement. We refer to this task as ""refreshing"", and the
provider of this service is either doing it out of the kindness of their
heart (in the friend-net use case) or as part of a paid service (in the
commercial-offering use case).

The refreshing process will also perform ""file checking"", which is simply
counting the number of shares that are available for any given file. This
gives a rough measure of the ""health"" of the file. The process may also
perform ""file verification"" from time to time, which is downloading the
crypttext and checking its hash against the value in the URI extension block.

If either checking/verification process discovers a problem, the ""file
repairer"" may be triggered, which uses the remaining shares to reconstruct
the correct crypttext, then re-encodes and re-uploads any shares which have
been lost.

This series of processes all serve to improve the health of the file, at
various bandwidth/CPU costs: refreshing/checking is cheap, repair/re-upload
is expensive. The intent is to use the refreshing service to keep the file as
healthy as possible at low cost, and use the checker results to trigger more
costly repair operations as little as possible. Refreshing must take place at
least once a month to keep the leases alive. The required filecheck frequency
wil depend upon how quickly storage servers drop out of the grid: we expect
that files will undergo an exponential decay curve, so we must do checks
frequently enough to reduce the chance that the health will decay beyond
repair. The exact parameters will be tunable, of course, to pick a tradeoff
between bandwidth consumed and the chance that a file will decay too quickly
to be saved.

Files that are deleted from a vdrive need to have their shares dereferenced
in a timely fashion (I'm thinking by the end of the day for this). If the
reference count drops to zero, the share should be deleted immediately (for a
storage server on a home user's machine who wants their disk for other
purposes), or marked for deletion as soon as the storage is needed for
something else (for a dedicated commercial server with nothing better to do
with that disk space; there's a chance that someone will re-upload the file
that was just deleted, and if the share is still around then we can avoid
repeating the upload). Deleted files should also be removed from the
filechecker and repair mechanisms.

Note that files should be deleted promptly, rather than allowing their leases
to expire on their own, to reduce the storage overhead (storage consumed
beyond that required to desired files). The lease expiration mechanism is a
necessary fallback to keep storage usage from growing without bound, but
without prompt deletion, high churn rates could cause actual storage consumed
to grow larger than desired.

Finally, many of our use cases will want to enforce a utilization quota on
each user, limiting the amount of storage space they are allowed to consume.
The file-repair service may be a good place to enforce this (with a rule
saying that you can upload as much as you want, but the repair service won't
help you exceed your quota). Eventually we may want each client to have
membership credentials which would allow storage servers to measure how much
space each client is consuming: with this, a daily (or slower) process could
calculate how much global space is consumed by each client, and flag or
revoke membership for clients which use more space than they've contracted
for.



"	task	closed	major	1.4.1	code-storage	0.7.0	fixed			
