[tahoe-lafs-trac-stream] [Tahoe-LAFS] #3022: Servers of happiness share placement distributes storage load unevenly in small grids

Thu Apr 4 03:00:17 UTC 2019

#3022: Servers of happiness share placement distributes storage load unevenly in
small grids
-------------------------+------------------------------------------
     Reporter:  exarkun  |      Owner:
         Type:  defect   |     Status:  new
     Priority:  normal   |  Milestone:  undecided
    Component:  unknown  |    Version:  1.12.1
   Resolution:           |   Keywords:  servers-of-happiness, upload
Launchpad Bug:           |
-------------------------+------------------------------------------

Comment (by ccx):

 To me there seems to be conflict of requirements, or maybe just
 underspecification.
 The configuration documentation about [#pref peers.preferred] outlines
 behavior and some use-cases of putting some peers on top of list of those
 considered for reading and writing.
 The current algorithm, as described informally above and in length in
 [#placements SoH document], selects 2*N top peers of the list
 and makes arbitrary (likely pythonhash-based) selection from them as the
 desired share to peer mapping.
 In ticket:610#comment:16 it's suggested that window for considering peers
 for
 holding data should be even larger if existing shares are found.
 There is obvious issue of the ordering of peers as returned from
 get_servers_for_psi not being respected, but I'd like us to pause for a
 moment
 and look at whether fixing that actually addresses underlying issues.

 The Servers of Happiness was devised as a metric of healthiness of a file,
 namely for the purposes of "upload".
 It consists of the number of peers that have distinct shares which are in
 the top 2*N
 of the peer list for given Storage Index.
 Furthermore a Servers of Happiness Placement Algorithm was designed that
 uses
 this metric internally as a constraint on where to upload shares.
 From reading the relevant tickets and paper, I gather there were two
 goals:
 * maximize the redundancy for the cases of peers becoming unavailable
 * minimize the amount of shares in need of uploading should some be
 already
   available

 Not only does the design of that algorithm ignore the preferred ordering
 of
 peers for assigning new uploads, but I find the second goal is in conflict
 of
 the use-case described in [#pref peers.preferred] of "… prefer their local
 servers so that they can maintain access to all of their uploads without
 using
 the internet."

 It's worth noting that the "upload" mechanism is used for a lot of things
 other
 than fresh upload to a grid.
 Notably the check-and-repair mechanism from what I understood hooks
 download
 and upload component together in a way that relies on the upload mechanism
 to
 reuse shares already accessible in the top 2*N (or should it be 4*N?)
 peers for given Storage Index.
 This possibly achieves, given low enough peer turnover and frequent
 repairs,
 moving the shares to the servers that are among the ones that are searched
 first but not necessarily the actual top of the list.
 Garbage collection could potentially remove the old shares on peers down
 the
 list, but the current checker can only renew leases on all reachable peers
 or
 not at all.
 The comments in happiness_upload suggest that the attempted reupload
 actually
 refreshes the leases, so counterintuitively you should not specify that
 you
 want leases refreshed on repair unless you want redundant placement!
 But as far as I understood the reupload is not attempted unless the file
 is
 considered unhealthy by some metric so there doesn't seem to be a
 mechanism for
 letting only redundant shares be collected.

 Also of note is that various checkers use it's own implementation of
 Servers of
 Happiness metric calculation in src/allmydata/util/happinessutil.py that
 only
 partly reuses the happiness_upload code.
 I assume this is to eliminate the 2*N limit that is imposed on uploads.

 The deeper I dove in the related tickets (there are many!) the more of ad-
 hoc
 unspecified requirement discussion I discovered.
 Here I try to distill those that I've found that may affect desired
 placement
 of shares over peers:

 * resilience (happiness)
   - The happiness number (cardinality of maximum match) presents a metric
 of
     resiliency against peers becoming unavailable.
   - Peers are considered atomic units of failure, which while
 oversimplified is
     pretty much the underlying abstraction for Tahoe-LAFS as it is now.
   - In face of more shares on a single peer the happiness number is not
 actual
     resiliency metric but a lower bound of it. (The spec discusses this)
 * permutation distance
   - Distance of available shares from the peers chosen by Storage Index
 based
     permutation.
 * locality
   - Ability to reconstruct files from local peers only (possibly specified
 by
     peers.preferred or some other mechanism).
   - This means having k shares available locally, even if there is less
 than k
     local peers.
 * localized redundancy
   - Amount of shares of single file placed on a single peer.
   - If this is larger than k then it's superfluous according to the
 failure
     model as we can reconstruct all shares from k.
 * global redundancy
   - Amount of shares duplicated across the grid.
 * bandwidth efficiency
   - There is incentive to not waste bandwidth by uploading shares that are
     already found on some peers.
 * balance
   - Even spread of shares across peers. The issue at the origin of this
 ticket.
   - Should by achievable by minimizing redundancy and permutation
 distance.

 The ticket:1212#comment:26 discusses the difference of requirements
 between
 user-initiated upload and a repair.
 While there seemed to be an agreement to split them, I don't think that
 got
 worked on.

 There are quite a few tickets for rebalancing files after initial upload,
 e.g.:
 ticket:699, ticket:232 and even ticket:543 which proposes central
 rebalancing
 manager.
 I should add that the concept of central balancer is incompatible with the
 feature of locality, while relocating check-and-repair would support it.

 While extreme imbalance should be less of an issue with Servers of
 Happiness
 Share Placement than it was with previous algorithm, the share_placement
 function will try to always allocate all of the shares, which in absence
 of
 enough connected peers can mean they all end up clustered on one or few
 peers, possibly with superfluous redundancy (>k).
 This can happen both when having very low required happiness value and on
 repair where that value is artificially set to zero.

 There were some proposals to disallow multiple shares per peer
 (e.g. ticket:1212#comment:35) or to allow capping their amount to supplied
 number.
 This would make the behavior closer to RAID and make the space usage
 predictable despite shaky interconnection.

 There is a lot of requests for allowing uploader select peers
 specifically,
 usually for purposes of locality (ticket:573, ticket:467) but curiously
 also
 for anti-locality (making sure of having shares non-locally) in ticket:481
 for
 purpose of migrating data from dying drive.

 I didn't dive deeply into garbage collection tickets, but I presume the
 above mentioned issues with rebalancing and removing redundant shares
 would be
 well represented.

 The upload.py API is mostly described in ticket:1382, while bulk of
 rationale
 for Servers of Happiness as a metric is in ticket:778.

 The share placement is determined once, at allocation time, and does not
 reflect possibility of upload failing (ticket:2108)

 That's about what I managed to discover while spelunking the code and
 tickets
 that may affect how to actually define the requirements (and their
 relative
 precedence) for the share placement function(s) that we want.

 ----

 * [=#pref peers.preferred] https://tahoe-lafs.org/trac/tahoe-
 lafs/browser/trunk/docs/configuration.rst?rev=fad4ffe37edaf7ca7497f08bc543062d93ca4f99#L662
 * [=#placements SoH document]:: https://tahoe-lafs.org/trac/tahoe-
 lafs/browser/trunk/docs/specifications/servers-of-
 happiness.rst?rev=fad4ffe37edaf7ca7497f08bc543062d93ca4f99#L106

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3022#comment:1>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage