[tahoe-lafs-trac-stream] [tahoe-lafs] #2107: don't place shares on servers that already have shares
tahoe-lafs
trac at tahoe-lafs.org
Tue Dec 3 01:53:11 UTC 2013
#2107: don't place shares on servers that already have shares
-------------------------+-------------------------------------------------
Reporter: zooko | Owner:
Type: | Status: new
enhancement | Milestone: undecided
Priority: normal | Version: 1.10.0
Component: code- | Keywords: upload servers-of-happiness brians-
peerselection | opinion-needed
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Comment (by gdt):
(I've been too busy to pay attention, so apologies if me throwing out a
random comment is not helpful.)
It seems clear that "servers of happiness" is an approximation to a richer
property that cannot be described so simply. So a rule written in terms
of SoH is going to be an approximation to the correct behavior.
The real question is maximizing the ability to retrieve the file, given
some assumed probability distribution of a) a server losing a share, but
staying up and b) a server going away, traded off against storage cost
somehow. In particular, I'm not sure how often a) happens, and whether we
want to support the notion of assigning different probabilities to
different servers, or correlated probabilities. So two placements with
the same SoH can still have different probabilities, and the higher one is
still preferred.
A key issue is when the number of servers is less than the number of
shares, vs when it's more. I suspect that behaviors are quite different
then. We've been talking about that, but perhaps that great divide
should be more front and center in the discussion. It's also not
reasonable to expect people to tweak encoding as the number of servers
changes. It should be sane to run 3/10 all the time, and just place more
shares if there are 4 servers.
It seems obvious to me that better balance is better. But what about S=4,
3/10 encoding. Is 1,3,3,3 really worse than 2,3,2,3? Or is it better,
because if 3 out of 4 are lost, there are 3 ways to win and one to lose,
vs 2 and 2 with 2,3,2,3? So it seems that regardless of probabilities,
1,3,3,3 is more robust. I'm not sure how to generalize this, except that
for k=3 one should not put more than k until all servers have k.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2107#comment:23>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list