[tahoe-lafs-trac-stream] [Tahoe-LAFS] #3022: Servers of happiness share placement distributes storage load unevenly in small grids

Wed Apr 3 12:12:36 UTC 2019

#3022: Servers of happiness share placement distributes storage load unevenly in
small grids
------------------------------------------+---------------------------
 Reporter:  exarkun                       |          Owner:
     Type:  defect                        |         Status:  new
 Priority:  normal                        |      Milestone:  undecided
Component:  unknown                       |        Version:  1.12.1
 Keywords:  servers-of-happiness, upload  |  Launchpad Bug:
------------------------------------------+---------------------------
 Originally posted to tahoe-dev: https://tahoe-lafs.org/pipermail/tahoe-
 dev/2019-April/009937.html

 Imagine a grid of 4 storage nodes and a client using parameters of
 needed=2 total=3 happy=3.  Given 4 different uploads, I propose that an
 optimal distribution of shares is:

 * Server 0: uploadA-shareX, uploadB-shareY, uploadC-shareZ
 * Server 1: uploadA-shareY, uploadB-shareZ, uploadD-shareX
 * Server 2: uploadA-shareZ, uploadC-shareX, uploadD-shareY
 * Server 3: uploadB-shareX, uploadC-shareY, uploadD-shareZ

 Happiness is maximized because no server has more than one share from any
 single upload.  Additionally, storage utilization is optimized because
 each server has the same number of shares (of course different uploads
 need not be the same size but by spreading shares for each upload across
 different servers we also do about the best we can do with the current
 encoding scheme to equally share the load of large uploads).

 However, this is not the distribution of shares that I observe in practice
 and I believe I see the cause of this in the implementation of servers of
 happiness.

 First, the distribution of shares that I actually observe in practice is:

 * Server 0: uploadA-shareX, uploadB-shareX, uploadC-shareX, uploadD-shareX
 * Server 1: uploadA-shareY, uploadB-shareY, uploadC-shareY, uploadD-shareY
 * Server 2: uploadA-shareZ, uploadB-shareZ, uploadC-shareZ, uploadD-shareZ
 * Server 3: <empty>

 Happiness has still been maximized because it is still the case that no
 server has more than one share from any single upload.  However, it's
 clear that storage utilization has not been optimized because Server 3 has
 taken on no storage responsibilities at all.

 Regarding the cause of this, here's my analysis.

 Before "servers of happiness" starts, 2N storage servers are selected.
 Tahoe2ServerSelector.get_shareholders is mostly responsible for this, with
 the help of StorageFarmBroker.get_servers_for_psi. get_servers_for_psi
 produces a permuted list of all all connected storage servers.  I think
 that it is the permutation of this list that is intended to provide
 storage load balancing across the grid.  get_shareholders then grabs the
 first 2N of these for its purposes.

 At this point, I think everything is still fine.  In the scenario I
 outlined above, there are only 4 storage servers and 2N is 6.  This means
 the same storage servers are always going to be returned by
 get_servers_for_psi - but they're not always going to be in the same
 order.  Likewise, the first 2N of them is exactly the same thing and the
 same qualifications apply.

 get_shareholders does a little more work to make sure the storage servers
 are actually available, splits them into read/write and read-only servers.
 None of this really makes a difference to the outcome of the scenario
 described above though.

 The next relevant thing that happens is that get_shareholders uses
 PeerSelector.get_share_placements to compute a placement map that achieves
 the happiness target.

 I think PeerSelector.get_share_placements is where things go wrong.  The
 PeerSelector does not have the ordered list of 2N servers.  Instead, it
 has some sets of servers.  It passes those sets along to the servers of
 happiness algorithm (happiness_upload.share_placement) but the problem
 isn't solveable at this point.  With only sets of servers and the original
 permuted list ordering lost, share_placement makes up an order
 (lexicographical on server id, I think).  This is where the preference for
 servers 0-2 comes in and server 3 gets left out - every time.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3022>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage