[tahoe-lafs-trac-stream] [Tahoe-LAFS] #3022: Servers of happiness share placement distributes storage load unevenly in small grids
Tahoe-LAFS
trac at tahoe-lafs.org
Wed Apr 3 12:12:36 UTC 2019
#3022: Servers of happiness share placement distributes storage load unevenly in
small grids
------------------------------------------+---------------------------
Reporter: exarkun | Owner:
Type: defect | Status: new
Priority: normal | Milestone: undecided
Component: unknown | Version: 1.12.1
Keywords: servers-of-happiness, upload | Launchpad Bug:
------------------------------------------+---------------------------
Originally posted to tahoe-dev: https://tahoe-lafs.org/pipermail/tahoe-
dev/2019-April/009937.html
Imagine a grid of 4 storage nodes and a client using parameters of
needed=2 total=3 happy=3. Given 4 different uploads, I propose that an
optimal distribution of shares is:
* Server 0: uploadA-shareX, uploadB-shareY, uploadC-shareZ
* Server 1: uploadA-shareY, uploadB-shareZ, uploadD-shareX
* Server 2: uploadA-shareZ, uploadC-shareX, uploadD-shareY
* Server 3: uploadB-shareX, uploadC-shareY, uploadD-shareZ
Happiness is maximized because no server has more than one share from any
single upload. Additionally, storage utilization is optimized because
each server has the same number of shares (of course different uploads
need not be the same size but by spreading shares for each upload across
different servers we also do about the best we can do with the current
encoding scheme to equally share the load of large uploads).
However, this is not the distribution of shares that I observe in practice
and I believe I see the cause of this in the implementation of servers of
happiness.
First, the distribution of shares that I actually observe in practice is:
* Server 0: uploadA-shareX, uploadB-shareX, uploadC-shareX, uploadD-shareX
* Server 1: uploadA-shareY, uploadB-shareY, uploadC-shareY, uploadD-shareY
* Server 2: uploadA-shareZ, uploadB-shareZ, uploadC-shareZ, uploadD-shareZ
* Server 3: <empty>
Happiness has still been maximized because it is still the case that no
server has more than one share from any single upload. However, it's
clear that storage utilization has not been optimized because Server 3 has
taken on no storage responsibilities at all.
Regarding the cause of this, here's my analysis.
Before "servers of happiness" starts, 2N storage servers are selected.
Tahoe2ServerSelector.get_shareholders is mostly responsible for this, with
the help of StorageFarmBroker.get_servers_for_psi. get_servers_for_psi
produces a permuted list of all all connected storage servers. I think
that it is the permutation of this list that is intended to provide
storage load balancing across the grid. get_shareholders then grabs the
first 2N of these for its purposes.
At this point, I think everything is still fine. In the scenario I
outlined above, there are only 4 storage servers and 2N is 6. This means
the same storage servers are always going to be returned by
get_servers_for_psi - but they're not always going to be in the same
order. Likewise, the first 2N of them is exactly the same thing and the
same qualifications apply.
get_shareholders does a little more work to make sure the storage servers
are actually available, splits them into read/write and read-only servers.
None of this really makes a difference to the outcome of the scenario
described above though.
The next relevant thing that happens is that get_shareholders uses
PeerSelector.get_share_placements to compute a placement map that achieves
the happiness target.
I think PeerSelector.get_share_placements is where things go wrong. The
PeerSelector does not have the ordered list of 2N servers. Instead, it
has some sets of servers. It passes those sets along to the servers of
happiness algorithm (happiness_upload.share_placement) but the problem
isn't solveable at this point. With only sets of servers and the original
permuted list ordering lost, share_placement makes up an order
(lexicographical on server id, I think). This is where the preference for
servers 0-2 comes in and server 3 gets left out - every time.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3022>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list