[tahoe-dev] detecting weak uploads and share rebalancing Re: Share rebalancing

Brian Warner warner at lothar.com
Mon Nov 2 00:55:43 PST 2009


Zooko Wilcox-O'Hearn wrote:
> 
> 1.  I wrote a simulation which convinced me that this is wrong --  
> that both share placement algorithms have an indistinguishable (and  
> highly varying) pattern of servers filling up.  However, the results  
> that I posted to tahoe-dev were confusing and hard to follow, and you  
> seem to have ignored them.  I see that I didn't link them in from  
> #302 either.  I should go find that letter in tahoe-dev archives and  
> link to it from #302.  Here it is: http://allmydata.org/pipermail/ 
> tahoe-dev/2008-July/000676.html .

What I remember about your simulation was thinking that it was
confusing, hard to follow, and smelled wrong and invalid, like it was
averaging certain values and therefore eliminating the results that we
were most interested in, or that it was trying to determine results that
we weren't actually interested in at all.

When I'm done with DIR-IMM, I'll try to get some time to read over your
earlier message and either analyze your simulator or write my own.

The properties we probably care about:

 * servers filling up at equal rates (bytes per second)
 * servers filling up at equal rates (percentage-full per second)
 * "good" behavior in the face of heterogeneous server storage capacity
 * "good" behavior in the face of heterogeneous server bandwidth

These are likely to be at odds with one another, of course.

> 2.  Neat!  I hadn't thought of this malicious case before.  Perhaps  
> you could add a link from #302 to your letter about the malicious case.

Done.

> However, Chord and Kademlia have been deployed with success, sometimes
> on a massive scale -- e.g. Cassandra DB [1] and Vuze [2] -- where
> load-balancing is also an issue. This suggests that either this
> phenomenon isn't a problem in many situations in practice (which would
> be consistent with my simulation -- argument 1) or that the designers
> of Cassandra DB and Vuze ought to think about adopting the
> permute-per-fileid trick (or both).

It would be awfully useful to find out what their server-capacity models
are. Do they expect to see new servers added over time? Removed?
Hetrogeneous or homogeneous storage capacities? The
one-grid-to-rule-them-all sorts of networks (freenet, bittorrent's DHT)
generally expect lots of churn and variable capacity, but I get the
impression that Cassandra-type networks are being built in data centers
by professional sysadmins with carefully-chosen hardware bought dozens
or hundreds at a time.

I believe the properties we care about will depend a lot upon the sort
of grid and servers involved, but I can't currently guess which way the
dependency leans.

> In fact, Cassandra's unique appeal among "post-relational" (a.k.a.
> "nosql") databases is that it supports range queries, and the way it
> does so relies upon the "natural" chord ordering.

I don't yet understand how this is useful in practice. You only get to
query along a single range value, and the "storage index" (or
equivalent) must be exactly equal to that value, right? So, if e.g.
you're storing web server log events in your Cassandra database, and you
pick event timestamp as the axis along which you want to do range
queries, then you use the timestamp as your event ID. But then your data
get jammed into this narrow little piece of the ring, isn't it?

Who is actually using Cassandra in this mode?

> [1] http://wiki.apache.org/cassandra/PoweredBy # says Cassandra is  
> used for inbox search at Facebook, which is up to 40 TB of data  
> across 120 machines in two separate data centers

Hrm, would "username" be a useful range query? Seems unlikely... "Show
me all updates for users from AA through AF"?

> 4.  This thread started because Shawn Willden needed to do some  
> mucking about with his shares, and the permute-per-fileid feature  
> makes it harder for him to muck his shares.

As Shawn pointed out:

SW> I think a tool for easily discovering the permuted list for a given
SW> file and the current grid would solve my issue.

I think that'd be a great tool, and would be nearly trivial to build.
The biggest complexity would how to expose the necessary information
through the webapi.. probably by adding a JSON form of the welcome page,
and then writing a CLI tool which grabbed the serverids from that page
and computed the permuted hash itself. Another option would be a form of
the "More Info" page which would tell you where the shares "want" to
live.


cheers,
 -Brian


More information about the tahoe-dev mailing list