[tahoe-lafs-trac-stream] [tahoe-lafs] #2107: don't place shares on servers that already have shares

Thu Nov 14 22:50:04 UTC 2013

#2107: don't place shares on servers that already have shares
-------------------------------------------------+-------------------------
 Reporter:  zooko                                |          Owner:
     Type:  enhancement                          |         Status:  new
 Priority:  normal                               |      Milestone:
Component:  code-peerselection                   |  undecided
 Keywords:  upload servers-of-happiness brians-  |        Version:  1.10.0
  opinion-needed                                 |  Launchpad Bug:
-------------------------------------------------+-------------------------
 I think we should change it so that the Upload Strategy doesn't bother
 uploading a share to a server if that server already has a share of that
 file. I used to think it was worth uploading "extra" shares to a server
 that already had a share, because there are some (probably rare) cases
 where it can help with file recovery, and it never *hurts* file recovery,
 but I've changed my mind.

 My main reason for changing my mind about this is that Diego "sickness"
 Righi is confused and dismayed by this behavior (e.g. #2106), and if a
 feature confuses and dismays users, then it is probably not worth it.
 Consider also
 [https://zooko.com/uri/URI%3ADIR2-CHK%3Aooyppj6eshxwmweeelqm3x54nq%3Au5pauln65blikfn5peq7e4s7x5fwdvvhvsklmfmwbjxlvlosldcq%3A1%3A1%3A105588/Carstensen-2011-Robust_Resource_Allocation_In_Distributed_Filesystem.pdf
 Kevan Carstensen's master's thesis on Upload Strategy Of Happiness], which
 says:

     This means we are not allowed to store any erasure coded share more
 than once.

     The reasoning for this requirement is less straightforward than that
 for spreading shares out amongst many storage servers. Note that we can
 achieve an availability improvement by storing more data. If r = 4, and we
 store five replicas on five distinct servers, then the file can tolerate
 the loss of four servers instead of five before read availability is
 compromised. Using selective double placement of shares in an Fe -style
 filesystem allows us to tolerate the failure of n − k + 1 or more storage
 servers.

     This requirement is more for usability and consistency than any clear
 availability criteria. Space utilization in distributed filesystems is an
 important issue. Many commodity computing services charge based on the
 amount of space used. So, in a practical distributed system, it is
 important for the user to be able to reason about space usage in a precise
 way. Explicit erasure-coding or replication parameters provided to the
 user allow the user to do this. We argue that it is not appropriate for an
 algorithm to second-guess the user’s choices, and say instead that the
 user will increase n, k, or r if they want more data stored on the
 filesystem.

 That's a pretty good argument! Especially the part about "You're paying
 for that space, and if you upload 2 or 3 shares to one server, you're
 paying 2 or 3 times as much, but not getting much fault-tolerance in
 return for it.".

 I would add that if a user is diagnosing the state of their grid, or
 reasoning about possible future behavior of their grid, it will be more
 intuitive and easier for them (at least for many people) to think about if
 they can assume that shares will never be intentionally doubled-up.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2107>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage