[tahoe-dev] automatic repair/renewal : where should it go?

Shawn Willden shawn at willden.org
Sat Aug 29 16:20:28 PDT 2009


Oops, forgot to address a couple of questions.

On Thursday 27 August 2009 02:18:52 am Brian Warner wrote:
> So.. does this seem reasonable? Can people imagine what the schema of
> this persistent store would look like? What sort of statistics or trends
> might we want to extract from this database, and how would that
> influence the data that we put into it? In allmydata.com's pre-Tahoe
> "MV" system, I really wanted to track some files (specifically excluded
> from repair) and graph how they degraded over time (to learn more about
> what the repair policy should be). It might be useful to get similar
> graphs out of this scheme. Should we / can we use this DB to track
> server availability too?

I think it would be very useful to track both share loss and server 
availability, for lots of reasons even beyond the needs of the repairer.

I think the repairer should also track its backlog, and display that through 
the web API (just a number, so no security concern), so that the user can see 
when the repair process is falling behind.

> How should the process be managed? Should there be a "pause" button? A
> "go faster" button? Where should bandwidth limits be imposed?

Hehe.  Here's my wish list:

I'd like to see a global Tahoe bandwidth limitation.  I'd like to be able to 
specify up and down rate limits and ensure that all Tahoe operations fit 
within those limits.  I'd also like simple CLI and WebAPI controls to modify 
those limits, to allow the creation of tools that dynamically adjust the 
limits.  In the absence of user configuration of the limits, I'd like Tahoe 
to automatically determine the available bandwidth that can be used without 
impacting latency, and auto-set the limits just below that level.

I'd also like to be able to specify soft allocations of those limits.  X% for 
the repairer, Y% for requests from other nodes and the remainder for this 
node's work.  If any of those categories of work are using less than their 
current allocation, other categories may use that portion.

Yeah, that's well beyond what you were asking.  If anyone is interested in 
doing something like that, though, I'm willing to write the bandwidth 
accounting code.  One of these days I'll invest the time to actually 
understand the upload/download code...

> Can we do 
> all of this through the webapi? How can we make that safe? (i.e. does
> the status page need to be on an unguessable URL? how about the control
> page and its POST buttons?). And what's the best way to manage a
> loop-avoiding depth-first directed graph traversal such that it can be
> interrupted and resumed with minimal loss of progress? (this might be a
> reason to store information about every node in the DB, and use that as
> a "been here already, move along" reminder).

It might make for a large DB, but information about every node could provide 
very useful statistics, as well as helping with restarts.  Ideally, it would 
be nice to have share location information as well as a log of repair work 
performed.  With that and some statistical calculations a lot could be 
learned, and perhaps some useful predictions could be made as well.

	Shawn.


More information about the tahoe-dev mailing list