[tahoe-dev] Automatic rebalancing

Terrell Russell terrellrussell at gmail.com
Sun Dec 5 14:05:22 UTC 2010


CRUSH seems to require knowledge of a hierarchy (location) of devices
(location, racks, etc).  I'm not sure this knowledge is available to
tahoe.  Or any knowledge we have should be assumed as flat.

http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH

Terrell


On 12/5/10 4:42 AM, Ravi Pinjala wrote:
> As far as description languages for data allocation go, Ceph has
> already solved this problem - check out the "CRUSH" algorithm.
> Basically, it's a description language for data placement that
> controls replication and data placement, and I think it also lets
> clients figure out which servers a piece of data is on without
> querying them first. IIRC, the code for it is in a separate library
> from the rest of Ceph, so it might be feasible to just put a thin
> python wrapper around it and use it.
> 
> On Sun, Dec 5, 2010 at 1:28 AM, Shu Lin <linshu at gmail.com> wrote:
>> Hi,
>> As the answer of this discussion, Tahoe doesn't have automatic rebalancing
>> capability now.
>> http://tahoe-lafs.org/pipermail/tahoe-dev/2010-December/005697.html
>> Also, we have bunch of tickets already tracking this problem. Such as the
>> rebalancing manager:
>> http://tahoe-lafs.org/trac/tahoe-lafs/ticket/543
>> I think beside rebalancing manager to start rebalancing all files in bulk
>> after a new server being added in, Tahoe can also start rebalancing
>> a particular file while a client tries to access it. It is better than
>> asking the human to start a repair manually. The person accessing the file
>> is definitely caring about the file, either more distributed or accessing it
>> faster. So, the algorithm can be defined as there must be shares put into
>> the server closer to the client (how to define "closer" is another story, it
>> could be the node consists of both the client and the server, or the client
>> and the server are in the same subnet) if there is not there yet. Using this
>> way, the shares will be rebalanced automatically under users intention
>> without scarifying too much resource in a short burst time.
>> In terms of the users' intention mapping to a server selection algorithm,
>> again, I like Zooko's idea. That should be a framework with descriptive
>> language to specify it.
>> Just a little thought. Hope it can fit into your design. :-)
>> Thanks,
>> -Shu


More information about the tahoe-dev mailing list