[tahoe-dev] server selection Re: Node correlations - [Was] best practice for wanting to setup multiple tahoe instances on a single node

Tue Jan 17 19:09:29 UTC 2012

Folks:

About "part two" of the server selection problem, which is: given that
you know the descriptions of all the servers that you could use, to
which ones should you upload which shares?

I remembered that at the second Tahoe-LAFS Summit, after seeing Kevan
Carstensen's presentation `Summit2Day4`_ on his new immutable server
selection algorithm, I had a brainstorm about a fairly simple
algorithm that could satisfy a lot of the use cases I've heard.

So, first, what's Kevan's new immutable server selection algorithm?
Well, it's the subject of his Master's Thesis (Kevan: citation needed!
Please link your thesis into `Bibliography`_), and if I understand it
correctly, it basically just ensures that the "servers of happiness"
property is satisfied in every case where it is possible to satisfy
it. Kevan also some related ideas about improving the robustness and
clarity of the server selection mechanism, simplifying the code, and
unifying immutable and mutable upload; see ticket `#1382`_.

If I recall correctly, my brainstorm was: you could have a two-tier
policy, where the uploader first ensures the Servers of Happiness
("H") is satisfied by allocating shares to H servers that have some
special tag, and then it allocates the remaining shares (N - H) to
other, less highly selected, servers.

For example, if your goal is to make sure you have a complete local
copy of the file on your LAN, you tag those servers on your LAN with
some description, and then the uploader ensures that H of those
servers have shares of the file, then it is free to upload the
remaining shares to servers outside your LAN.

Likewise, if you want to stash a complete, recoverable, copy of the
file on Tor Hidden Servers, you could tag which of the servers are Tor
Hidden Servers and then uploader could upload to H of them and then to
N-H of any servers it wanted.

I guess within each set, and within the constraints of the servers of
happiness algorithm, then the preference of which servers to use would
be determined by the permuted server id, thus making it fastest for
the downloader to find the right servers by trying the same
permutation of the server ids.

Another potentially interesting application of this would be "upload
to H servers which have been tagged as members in good standing, then
you can upload the remaining shares to random strangers who just
connected for the first time today".

This couldn't be used to satisfy the "repair within a colo" use case
-- the requirement of including at least K+1 shares in each of Q
different colocations -- but it could be used to satisfy a few other
use cases.

What do you think? If you could tag servers with arbitrary
descriptions, and then could specify that the uploader had to ensure H
servers-of-happiness with some specific tag before allocating the rest
of the shares however it likes, would that satisfy your use case?

Regards,

Zooko

.. _Summit2Day4: https://tahoe-lafs.org/trac/tahoe-lafs/wiki/Summit2Day4
.. _Bibliography: https://tahoe-lafs.org/trac/tahoe-lafs/wiki/Bibliography
.._ #1382: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1382#
immutable peer selection refactoring and enhancements