[tahoe-dev] Tahoe-LAFS v1.8 planning / Administrivia / Big Picture

Thu Aug 5 06:29:16 UTC 2010

On 08/03/10 22:23, Zooko O'Whielacronx wrote:
> On Tue, Aug 3, 2010 at 12:22 PM, Francois Deppierraz
> <francois at ctrlaltdel.ch> wrote:
>>
>> I'd really love to see location/rack/server-awareness in the peer-selection
>> process.
> 
> Thank you for the feedback! I think a lot of people strongly want this
> feature. This is described on [wiki:ServerSelection].
> 
> What's the next step? I still don't know exactly what the UX would be
> for this feature. Would you have a flat file containing a list of
> serverids followed by categories like this:
> 
> # serverid category:id, category:id, category:id, ...
> alt6cjddwfnwrnct4lx2ypwricrgtoam colo:us-west-1a, rack:5, chassis:3
> cufg4m4c7bfujnf5tkhjdazicn7ifkae colo:us-west-1b, rack:1, chassis:1
> e5itfysbe3qeqgzflxdnm6ypraufj6vj colo:singapore-1, rack:1, chassis:1
> fp3xjndgjt2npubdl2jqqb26clanyag7 colo:singapore-1, rack:1, chassis:2
> 
> and would the server selection algorithm automatically use the
> following as its highest-priority requirement: "spread the shares as
> evenly as possible among the different numbers of each category"? And
> if there were more than one category would it treat each successive
> one as the next-highest-priority after the previous priorities were
> satisfied?
> 
>> One of the current deployment I did is a grid of 3 servers, each with 24
>> SATA disks. One Tahoe-LAFS storage node per disk and each server is located
>> in a different datacenter.
> 
> Sweet! How is it working? Could you give us some problem reports,
> success reports, benchmarks? :-)
> 
>> With the default 3-of-10 encoding on such setup, I have currently no way to
>> ensure that two servers can fail without any impact on file availability.
> 
> Until we fix ServerSelection to do what you want, you could try to
> accomplish it by changing your parameters. You have 72 storage
> servers. If you set M=72 and K=22 then you'll have approximately the
> same redundancy as M=10 and K=3. Then a normal upload would put one
> share on each of the servers, and you could lose any two of those
> 24-drive servers and still keep the file.
> 
> Now what should you set Servers Of Happiness, H, to? If you set H to
> 70 then uploads will abort if any fewer than 70 servers are available
> at that time. If 70 separate servers have the file then even if two of
> your 24-node machines are gone you'll still have 22 storage servers
> left. :-)
> 
> I've never seen real production use of K and M values that large.
> Everyone always uses the default M=10. K=3. I did actually set K=15
> and M=30-something for a while on a test grid that had 15 live
> servers. It worked okay. If you try setting something like M=72, K=22,
> H=70 then please do run measurements of performance and do please
> report your results to this list! :-)
> 
> Regards,
> 
> Zooko
> 
> http://tahoe-lafs.org/trac/tahoe-lafs/wiki/ServerSelection
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
> 

In this vein, I've actually been thinking about a similar feature for
about a week now.

The current server selection algorithm assumes failures are randomly
distributed; it has this in common with most replicating distributed
storage systems, I think. The problem is that there are some cases where
large portions of the grid could go offline, either temporarily or
permanently - Francois mentioned having a server with a lot of disks go
down, for instance. Another problematic case is a home user whose
internet connection goes down, leaving them without access to their files.

I think it'd be neat if we had the ability to define "availability
zones", where you could specify the replication ratio per-zone instead
of globally. For instance, if you have one zone with the default 3-of-10
share ratio, and another zone with a 2-of-5 ratio, Tahoe would upload 10
shares to the first zone, and 5 shares to the second zone, and have a
guarantee that if either zone went down completely, the data would still
be accessible.

(This would also open up some cool scenarios like being able to set up a
grid on the local network and prefer that for read performance, but
still being able to automagically fall back to cloud-based storage if
something fails.)

As far as UI for this, it'd be possible to combine this with the work
that's going on right now with multiple introducers, and say that all
servers met through a given introducer constituted an availability zone.
This would make it really easy to set up and administer independent zones.

Anyway, just a thought. Curious to see what everybody else thinks.

--Ravi