[tahoe-dev] automatic repair/renewal : where should it go?

Sat Aug 29 12:36:17 PDT 2009

Shawn Willden wrote:
> 
> I'd like to confuse your question by adding more issues to consider
> ;-)

Excellent! :)

> The probability that a single file will survive isn't just a function
> of the encoding parameters and the failure modes of the servers
> holding its shares, it's also dependent on the availability of its
> cap.

Yes, that's an excellent point. The probability of recovering a file
that's 6 subdirectories deep is equal to the product of the ancestor
nodes' recovery probabilities, so there's a good argument for making the
parents stronger than the child nodes. I think I even filed a ticket
years ago about how maybe dirnodes should get a different (more
conservative) set of encoding parameters than regular immutable files, I
think it was titled "just set k=1 for mutable files?".

I sometimes try to draw an analogy with signal-processing terms, when
you're looking at how much noise is added by various parts of the
system. If the new component that you're adding causes an order of
magnitude less noise than any other component, you can effectively
ignore its contribution. Likewise, if dirnodes were 10x more reliable
than filenodes, you could chain several of them without significantly
reducing the probability of recovering the file. In most environments,
simply reducing "k" by a bit can hit this 10x improvement, and since
dirnodes are (usually) much smaller than filenodes, doesn't cost much
extra space.

Other than simply encoding dirnodes differently, I can imagine two
straightforward things to improve the situation. The first is that the
"repairdb" that I described earlier could have some columns to indicate
whether the object is a directory or a file, and how far away from the
root it was encountered (the latter could be fooled by non-tree-shaped
structures, but would probably be good enough). This information would
then factor into the repair prioritization process: senior dirnodes
would receive top priority when deciding how to allocate repair
bandwidth (frequency of checking, priority of repair relative to other
victims, and repair-threshold / willingness to repair even minor
injuries). The rootcap would be highest priority.

The other would be to use the repairdb as a secondary backup. A
two-column "child of" table, holding (parent, child) pairs, would be
enough to reassemble the shape and nodes of the graph completely
(recording childnames would also preserve the edge names, at the expense
of space).

Of course, we could always serialize this table and store *that* in the
grid as a single file, which would reduce the maximum-path-length
(product-of-probabilities) to two, assuming you had a good way to
remember the filecap of this snapshot. If we made this reference only
the (immutable) files and their pathnames, we'd have a
#204 "virtual CD". If it just recorded all the dircaps/filecaps and
their relationships, you'd have a snapshot of the tree at that point in
time, which you could navigate later and extract files (extracting
dirnodes would be dubious, because they may have been mutated in the
meantime).

On one hand, if we feel we need this "extra reliability" on top of
Tahoe, maybe it suggests that we need to improve Tahoe to provide that
level of reliability natively. On the other, maybe it's just that data
and metadata are qualitatively different things, and deserve different
treatment: if it's relatively cheap to retain a local copy of the
dirnode graph, since it's usually smaller than the data itself, then why
not?

I guess there's a tradeoff between reliability and how much data you're
willing to retain. Recording just the rootcap means you depend upon
longer chains to retain access to everything else, but you've got a
smaller starting point to keep secure (and the rootcap is constant, so
it's easy to keep it safe: no distributed update protocol necessary).
Recording a whole snapshot shrinks those chains down to zero (but
obligates you to retain that table and keep it up-to-date). Writing a
snapshot into a mutable file and remembering the filecap makes it length
one. There's also a tradeoff between the effort required to walk the
tree and how much data you're willing to retain plus keep up-to-date.

One thing I like about tahoe's filecap/dircap design is that at least
this tradeoff is pretty easy to understand, and that it's easy to
imagine other ways to manage the filecaps. Having dircaps in the system
seemed (to me, at the time) as a necessary component, because Tahoe
should provide a reasonable native mechanism for such a common task, and
because it nicely meets our goal of simple fine-grained sharing (whereas
using one big table would not make it easy to share the intermediate
nodes without sharing the whole thing). But, as GridBackup shows,
they're hardly the only approach. Indeed, since "tags" seem to be all
the rage these days, there's a good argument for some users to ditch the
dirnodes and just manage a big mutable database that maps search terms
to filecaps.

But yeah, the repair process should definitely be cognizant of the
relative importance of the things being repaired. That's part of my
inclination towards the table-based scheme (as opposed to the simple
already-implemented deep-check-and-repair operation), so there'll be a
place where this kind of policy can be applied.

thanks,
 -Brian