id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
3022	Servers of happiness share placement distributes storage load unevenly in small grids	exarkun		"Originally posted to tahoe-dev: https://tahoe-lafs.org/pipermail/tahoe-dev/2019-April/009937.html

Imagine a grid of 4 storage nodes and a client using parameters of needed=2 total=3 happy=3.  Given 4 different uploads, I propose that an optimal distribution of shares is:

* Server 0: uploadA-shareX, uploadB-shareY, uploadC-shareZ
* Server 1: uploadA-shareY, uploadB-shareZ, uploadD-shareX
* Server 2: uploadA-shareZ, uploadC-shareX, uploadD-shareY
* Server 3: uploadB-shareX, uploadC-shareY, uploadD-shareZ

Happiness is maximized because no server has more than one share from any single upload.  Additionally, storage utilization is optimized because each server has the same number of shares (of course different uploads need not be the same size but by spreading shares for each upload across different servers we also do about the best we can do with the current encoding scheme to equally share the load of large uploads).

However, this is not the distribution of shares that I observe in practice and I believe I see the cause of this in the implementation of servers of happiness.

First, the distribution of shares that I actually observe in practice is:

* Server 0: uploadA-shareX, uploadB-shareX, uploadC-shareX, uploadD-shareX
* Server 1: uploadA-shareY, uploadB-shareY, uploadC-shareY, uploadD-shareY
* Server 2: uploadA-shareZ, uploadB-shareZ, uploadC-shareZ, uploadD-shareZ
* Server 3: <empty>

Happiness has still been maximized because it is still the case that no server has more than one share from any single upload.  However, it's clear that storage utilization has not been optimized because Server 3 has taken on no storage responsibilities at all.

Regarding the cause of this, here's my analysis.

Before ""servers of happiness"" starts, 2N storage servers are selected.  Tahoe2ServerSelector.get_shareholders is mostly responsible for this, with the help of StorageFarmBroker.get_servers_for_psi. get_servers_for_psi produces a permuted list of all all connected storage servers.  I think that it is the permutation of this list that is intended to provide storage load balancing across the grid.  get_shareholders then grabs the first 2N of these for its purposes.

At this point, I think everything is still fine.  In the scenario I outlined above, there are only 4 storage servers and 2N is 6.  This means the same storage servers are always going to be returned by get_servers_for_psi - but they're not always going to be in the same order.  Likewise, the first 2N of them is exactly the same thing and the same qualifications apply.

get_shareholders does a little more work to make sure the storage servers are actually available, splits them into read/write and read-only servers. None of this really makes a difference to the outcome of the scenario described above though.

The next relevant thing that happens is that get_shareholders uses PeerSelector.get_share_placements to compute a placement map that achieves the happiness target.

I think PeerSelector.get_share_placements is where things go wrong.  The PeerSelector does not have the ordered list of 2N servers.  Instead, it has some sets of servers.  It passes those sets along to the servers of happiness algorithm (happiness_upload.share_placement) but the problem isn't solveable at this point.  With only sets of servers and the original permuted list ordering lost, share_placement makes up an order (lexicographical on server id, I think).  This is where the preference for servers 0-2 comes in and server 3 gets left out - every time.
"	defect	new	normal	undecided	unknown	1.12.1		servers-of-happiness, upload