Concentrated storage load from servers of happiness placement

Tue Apr 2 16:46:03 UTC 2019

Hello,

Imagine a grid of 4 storage nodes and a client using parameters of needed=2
total=3 happy=3.  Given 4 different uploads, I propose that an optimal
distribution of shares is:

Server 0: uploadA-shareX, uploadB-shareY, uploadC-shareZ
Server 1: uploadA-shareY, uploadB-shareZ, uploadD-shareX
Server 2: uploadA-shareZ, uploadC-shareX, uploadD-shareY
Server 3: uploadB-shareX, uploadC-shareY, uploadD-shareZ

Happiness is maximized because no server has more than one share from any
single upload.  Additionally, storage utilization is optimized because each
server has the same number of shares (of course different uploads need not
be the same size but by spreading shares for each upload across different
servers we also do about the best we can do with the current encoding
scheme to equally share the load of large uploads).

However, this is not the distribution of shares that I observe in practice
and I believe I see the cause of this in the implementation of servers of
happiness.

First, the distribution of shares that I actually observe in practice is:

Server 0: uploadA-shareX, uploadB-shareX, uploadC-shareX, uploadD-shareX
Server 1: uploadA-shareY, uploadB-shareY, uploadC-shareY, uploadD-shareY
Server 2: uploadA-shareZ, uploadB-shareZ, uploadC-shareZ, uploadD-shareZ
Server 3: <empty>

Happiness has still been maximized because it is still the case that no
server has more than one share from any single upload.  However, it's clear
that storage utilization has not been optimized because Server 3 has taken
on no storage responsibilities at all.

My first question is whether anyone can confirm this behavior.  My second
question is whether anyone agrees that this behavior is not desirable.

Regarding the cause of this, here's my analysis.

Before "servers of happiness" starts, 2N storage servers are
selected.  Tahoe2ServerSelector.get_shareholders is mostly responsible for
this, with the help of StorageFarmBroker.get_servers_for_psi.
get_servers_for_psi produces a permuted list of all all connected storage
servers.  I think that it is the permutation of this list that is intended
to provide storage load balancing across the grid.  get_shareholders then
grabs the first 2N of these for its purposes.

At this point, I think everything is still fine.  In the scenario I
outlined above, there are only 4 storage servers and 2N is 6.  This means
the same storage servers are always going to be returned by
get_servers_for_psi - but they're not always going to be in the same
order.  Likewise, the first 2N of them is exactly the same thing and the
same qualifications apply.

get_shareholders does a little more work to make sure the storage servers
are actually available, splits them into read/write and read-only servers.
None of this really makes a difference to the outcome of the scenario
described above though.

The next relevant thing that happens is that get_shareholders
uses PeerSelector.get_share_placements to compute a placement map that
achieves the happiness target.

I think PeerSelector.get_share_placements is where things go wrong.  The
PeerSelector does not have the ordered list of 2N servers.  Instead, it has
some sets of servers.  It passes those sets along to the servers of
happiness algorithm (happiness_upload.share_placement) but the problem
isn't solveable at this point.  With only sets of servers and the original
permuted list ordering lost, share_placement makes up an order
(lexicographical on server id, I think).  This is where the preference for
servers 0-2 comes in and server 3 gets left out - every time.

Thoughts?

Jean-Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20190402/3330a6c8/attachment.html>