[tahoe-lafs-trac-stream] [Tahoe-LAFS] #3022: Servers of happiness share placement distributes storage load unevenly in small grids
Tahoe-LAFS
trac at tahoe-lafs.org
Thu Apr 4 03:00:17 UTC 2019
#3022: Servers of happiness share placement distributes storage load unevenly in
small grids
-------------------------+------------------------------------------
Reporter: exarkun | Owner:
Type: defect | Status: new
Priority: normal | Milestone: undecided
Component: unknown | Version: 1.12.1
Resolution: | Keywords: servers-of-happiness, upload
Launchpad Bug: |
-------------------------+------------------------------------------
Comment (by ccx):
To me there seems to be conflict of requirements, or maybe just
underspecification.
The configuration documentation about [#pref peers.preferred] outlines
behavior and some use-cases of putting some peers on top of list of those
considered for reading and writing.
The current algorithm, as described informally above and in length in
[#placements SoH document], selects 2*N top peers of the list
and makes arbitrary (likely pythonhash-based) selection from them as the
desired share to peer mapping.
In ticket:610#comment:16 it's suggested that window for considering peers
for
holding data should be even larger if existing shares are found.
There is obvious issue of the ordering of peers as returned from
get_servers_for_psi not being respected, but I'd like us to pause for a
moment
and look at whether fixing that actually addresses underlying issues.
The Servers of Happiness was devised as a metric of healthiness of a file,
namely for the purposes of "upload".
It consists of the number of peers that have distinct shares which are in
the top 2*N
of the peer list for given Storage Index.
Furthermore a Servers of Happiness Placement Algorithm was designed that
uses
this metric internally as a constraint on where to upload shares.
From reading the relevant tickets and paper, I gather there were two
goals:
* maximize the redundancy for the cases of peers becoming unavailable
* minimize the amount of shares in need of uploading should some be
already
available
Not only does the design of that algorithm ignore the preferred ordering
of
peers for assigning new uploads, but I find the second goal is in conflict
of
the use-case described in [#pref peers.preferred] of "… prefer their local
servers so that they can maintain access to all of their uploads without
using
the internet."
It's worth noting that the "upload" mechanism is used for a lot of things
other
than fresh upload to a grid.
Notably the check-and-repair mechanism from what I understood hooks
download
and upload component together in a way that relies on the upload mechanism
to
reuse shares already accessible in the top 2*N (or should it be 4*N?)
peers for given Storage Index.
This possibly achieves, given low enough peer turnover and frequent
repairs,
moving the shares to the servers that are among the ones that are searched
first but not necessarily the actual top of the list.
Garbage collection could potentially remove the old shares on peers down
the
list, but the current checker can only renew leases on all reachable peers
or
not at all.
The comments in happiness_upload suggest that the attempted reupload
actually
refreshes the leases, so counterintuitively you should not specify that
you
want leases refreshed on repair unless you want redundant placement!
But as far as I understood the reupload is not attempted unless the file
is
considered unhealthy by some metric so there doesn't seem to be a
mechanism for
letting only redundant shares be collected.
Also of note is that various checkers use it's own implementation of
Servers of
Happiness metric calculation in src/allmydata/util/happinessutil.py that
only
partly reuses the happiness_upload code.
I assume this is to eliminate the 2*N limit that is imposed on uploads.
The deeper I dove in the related tickets (there are many!) the more of ad-
hoc
unspecified requirement discussion I discovered.
Here I try to distill those that I've found that may affect desired
placement
of shares over peers:
* resilience (happiness)
- The happiness number (cardinality of maximum match) presents a metric
of
resiliency against peers becoming unavailable.
- Peers are considered atomic units of failure, which while
oversimplified is
pretty much the underlying abstraction for Tahoe-LAFS as it is now.
- In face of more shares on a single peer the happiness number is not
actual
resiliency metric but a lower bound of it. (The spec discusses this)
* permutation distance
- Distance of available shares from the peers chosen by Storage Index
based
permutation.
* locality
- Ability to reconstruct files from local peers only (possibly specified
by
peers.preferred or some other mechanism).
- This means having k shares available locally, even if there is less
than k
local peers.
* localized redundancy
- Amount of shares of single file placed on a single peer.
- If this is larger than k then it's superfluous according to the
failure
model as we can reconstruct all shares from k.
* global redundancy
- Amount of shares duplicated across the grid.
* bandwidth efficiency
- There is incentive to not waste bandwidth by uploading shares that are
already found on some peers.
* balance
- Even spread of shares across peers. The issue at the origin of this
ticket.
- Should by achievable by minimizing redundancy and permutation
distance.
The ticket:1212#comment:26 discusses the difference of requirements
between
user-initiated upload and a repair.
While there seemed to be an agreement to split them, I don't think that
got
worked on.
There are quite a few tickets for rebalancing files after initial upload,
e.g.:
ticket:699, ticket:232 and even ticket:543 which proposes central
rebalancing
manager.
I should add that the concept of central balancer is incompatible with the
feature of locality, while relocating check-and-repair would support it.
While extreme imbalance should be less of an issue with Servers of
Happiness
Share Placement than it was with previous algorithm, the share_placement
function will try to always allocate all of the shares, which in absence
of
enough connected peers can mean they all end up clustered on one or few
peers, possibly with superfluous redundancy (>k).
This can happen both when having very low required happiness value and on
repair where that value is artificially set to zero.
There were some proposals to disallow multiple shares per peer
(e.g. ticket:1212#comment:35) or to allow capping their amount to supplied
number.
This would make the behavior closer to RAID and make the space usage
predictable despite shaky interconnection.
There is a lot of requests for allowing uploader select peers
specifically,
usually for purposes of locality (ticket:573, ticket:467) but curiously
also
for anti-locality (making sure of having shares non-locally) in ticket:481
for
purpose of migrating data from dying drive.
I didn't dive deeply into garbage collection tickets, but I presume the
above mentioned issues with rebalancing and removing redundant shares
would be
well represented.
The upload.py API is mostly described in ticket:1382, while bulk of
rationale
for Servers of Happiness as a metric is in ticket:778.
The share placement is determined once, at allocation time, and does not
reflect possibility of upload failing (ticket:2108)
That's about what I managed to discover while spelunking the code and
tickets
that may affect how to actually define the requirements (and their
relative
precedence) for the share placement function(s) that we want.
----
* [=#pref peers.preferred] https://tahoe-lafs.org/trac/tahoe-
lafs/browser/trunk/docs/configuration.rst?rev=fad4ffe37edaf7ca7497f08bc543062d93ca4f99#L662
* [=#placements SoH document]:: https://tahoe-lafs.org/trac/tahoe-
lafs/browser/trunk/docs/specifications/servers-of-
happiness.rst?rev=fad4ffe37edaf7ca7497f08bc543062d93ca4f99#L106
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3022#comment:1>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list