[tahoe-dev] a few thoughts about the future of leasedb
David-Sarah Hopwood
david-sarah at jacaranda.org
Wed Nov 21 17:42:48 UTC 2012
On 21/11/12 16:36, Zooko Wilcox-O'Hearn wrote:
> This is exciting, because it is the next step in LeastAuthority.com's
> project that we're doing for DARPA — Redundant Array of Independent
> Clouds.
Yes, I'm excited :-)
> Here are a few comments on the leasedb design — not issues which could
> block the acceptance of this patch, but just topics for future
> reference:
>
> • https://github.com/davidsarah/tahoe-lafs/blob/1818-leasedb/docs/proposed/leasedb.rst#design-constraints
>
> "Writing to the persistent store objects is in general not an
> atomic operation. So the leasedb also keeps track of which shares are
> in an inconsistent state because they have been partly written. (This
> may change in future when we implement a protocol to improve atomicity
> of updates to mutable shares.)"
>
> I'm not 100% sure, but I *think* that this use of leasedb could be
> replaced in the future by the end-to-end 2-phase-commit that I
> recently posted about (#1755). End-to-end 2-phase-commit requires more
> complex service from the storage server than the current one-shot
> updates to mutable files do, but it requires less state to be stored
> in the leasedb since the equivalent state is now stored in the storage
> backend plus the LAFS client.
Remember that those share states are needed anyway to avoid race conditions
between adding and removing shares. There are no additional states just to
support marking of potentially inconsistent shares.
Also, clients will need to support non-leasedb servers for a while. (I'm
looking forward to the point where they can drop that support, since it
will allow deleting the rest of the code that implemented renewal secrets.)
> In E2E 2PC, the storage backend has to
> be able to receive and store updates to a mutable file (including the
> initial upload of a large mutable file, which is the same as a large
> update to an initially empty mutable file), while retaining the option
> of rolling back to the previous version. This means the storage server
> has to write these updates into the storage backend in some
> non-destructive way and then have a relatively efficient way to
> "switch over" from the old to the new version.
>
> If the storage server is able to do that, then it might be nice if
> it can do it without relying on state held in the leasedb, because
> then loss or corruption of the leasedb won't result in the corruption
> of any files.
There's no *persistent* state needed to do that, since if a server
crashes, its foolscap connections will be dropped and the client will
interpret that as a transaction abort (either immediately or after a
timeout, depending on how clean the crash is). We're assuming that the
leasedb stays consistent while its server is running.
> • https://github.com/davidsarah/tahoe-lafs/blob/1818-leasedb/docs/proposed/leasedb.rst#accounting-crawler
>
> "A 'crawler' is a long-running process that visits share container
> files at a slow rate, so as not to overload the server by trying to
> visit all share container files one after another immediately."
>
> Since I opened the following group of tickets, I've become happier
> with the idea of removing almost all uses of "crawler", leaving as the
> only remaining use of it to generate the initial leasedb or
> reconstruct the leasedb in case it has been lost or corrupted. I'm
> waiting for Brian to notice these tickets and weigh in: #1833, #1834,
> #1835, #1836.
+1
> This would change the leasedb design state machine by changing two triggers:
>
> - STATE_STABLE → NONE; trigger: The accounting crawler noticed that
> all the store objects for this share are gone. implementation: Remove
> the entry in the leasedb.
>
> This edge would still be here, but the trigger would be
> different. There would be no crawler noticing such things, but this
> edge would be triggered when a client requests a share, the storage
> server looks in the leasedb and sees that the share is listed as
> present, but then when it tries to read the share data it finds out
> that all of the share data is gone.
+1
> - NONE → STATE_STABLE; trigger: The accounting crawler discovers a
> complete share. implementation: Add an entry to the leasedb with
> STATE_STABLE.
>
> Likewise, this edge would still be here, but the trigger would be
> different. There would be no crawler noticing such things, but this
> edge would be manually triggered by the server operator using an
> "import" tool (probably option 4 from #1835).
I'm still quite keen on my suggested variation of option 3 on #1835,
let's call it 3a):
# If [a share that has been added directly to backend storage] is ever
# requested, the server could then notice that it exists and add it to
# the leasedb. In that case, doing a filecheck on that file would be
# sufficient.
I think you didn't want to do that because you thought there would be
a performance advantage in treating the leasedb as authoritative. But
the check for whether a share is on disk when it isn't in the leasedb
is an uncommon case, and does not affect performance in the common
case. (It shouldn't matter if servers take longer to report that they
*don't* have a share, because a downloader should use the first k
servers to respond. Actually, I think the current downloader might be
waiting longer than that, but if so, that is easy to fix.)
> • https://github.com/davidsarah/tahoe-lafs/blob/1818-leasedb/docs/proposed/leasedb.rst#unresolved-design-issues
>
> "What happens if a write to store objects for a new share fails permanently?"
>
> I don't understand. If an attempt to write fails, how can you
> distinguish between a temporary and permanent failure?
A permanent failure is a failure after any retries.
> "What happens if only some store objects for a share disappear unexpectedly?"
>
> Log it, remove the share entry from the leasedb, and leave what's
> left of the share data alone? Because perhaps operators or developers
> want to investigate the exact shape of the lossage/corruption.
That seems reasonable. Note that if the share is re-stored, it will
overwrite (at least some of) those store objects.
> "Does the leasedb need to track corrupted shares?"
>
> This is the same question as the previous one — a corrupted share
> is the same as a share with some of its objects missing.
If we do 3a) and a share has a corrupted header, then each time the share
is requested, the server will report that it *does* have the share (because
its objects were listed by the backend query), and then it will fail to
provide it (once it sees that the header is corrupted). That's why I
distinguished the two cases.
--
David-Sarah Hopwood ⚥
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20121121/72b172d1/attachment.pgp>
More information about the tahoe-dev
mailing list