Context Navigation

servers-of-happiness.rst

Visit:

Last change on this file was 05f48c3, checked in by meejah <meejah@…>, at 2017-06-05T22:31:41Z

Various cleanups, fixes and improvements

Squashed all commits that were meejah's between
30d68fb499f300a393fa0ced5980229f4bb6efda
and
33c268ed3a8c63a809f4403e307ecc13d848b1ab
On the branch meejah:1382.markberger-rewrite-rebase.6 as
per review

Property mode set to 100644

File size: 8.8 KB

Line
1	.. -- coding: utf-8-with-signature --
2
3	====================
4	Servers of Happiness
5	====================
6
7	When you upload a file to a Tahoe-LAFS grid, you expect that it will
8	stay there for a while, and that it will do so even if a few of the
9	peers on the grid stop working, or if something else goes wrong. An
10	upload health metric helps to make sure that this actually happens.
11	An upload health metric is a test that looks at a file on a Tahoe-LAFS
12	grid and says whether or not that file is healthy; that is, whether it
13	is distributed on the grid in such a way as to ensure that it will
14	probably survive in good enough shape to be recoverable, even if a few
15	things go wrong between the time of the test and the time that it is
16	recovered. Our current upload health metric for immutable files is called
17	'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.
18
19	shares-of-happiness used the number of encoded shares generated by a
20	file upload to say whether or not it was healthy. If there were more
21	shares than a user-configurable threshold, the file was reported to be
22	healthy; otherwise, it was reported to be unhealthy. In normal
23	situations, the upload process would distribute shares fairly evenly
24	over the peers in the grid, and in that case shares-of-happiness
25	worked fine. However, because it only considered the number of shares,
26	and not where they were on the grid, it could not detect situations
27	where a file was unhealthy because most or all of the shares generated
28	from the file were stored on one or two peers.
29
30	servers-of-happiness addresses this by extending the share-focused
31	upload health metric to also consider the location of the shares on
32	grid. servers-of-happiness looks at the mapping of peers to the shares
33	that they hold, and compares the cardinality of the largest happy subset
34	of those to a user-configurable threshold. A happy subset of peers has
35	the property that any k (where k is as in k-of-n encoding) peers within
36	the subset can reconstruct the source file. This definition of file
37	health provides a stronger assurance of file availability over time;
38	with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed
39	to be available even if 4 peers fail.
40
41	Measuring Servers of Happiness
42	==============================
43
44	We calculate servers-of-happiness by computing a matching on a
45	bipartite graph that is related to the layout of shares on the grid.
46	One set of vertices is the peers on the grid, and one set of vertices is
47	the shares. An edge connects a peer and a share if the peer will (or
48	does, for existing shares) hold the share. The size of the maximum
49	matching on this graph is the size of the largest happy peer set that
50	exists for the upload.
51
52	First, note that a bipartite matching of size n corresponds to a happy
53	subset of size n. This is because a bipartite matching of size n implies
54	that there are n peers such that each peer holds a share that no other
55	peer holds. Then any k of those peers collectively hold k distinct
56	shares, and can restore the file.
57
58	A bipartite matching of size n is not necessary for a happy subset of
59	size n, however (so it is not correct to say that the size of the
60	maximum matching on this graph is the size of the largest happy subset
61	of peers that exists for the upload). For example, consider a file with
62	k = 3, and suppose that each peer has all three of those pieces. Then,
63	since any peer from the original upload can restore the file, if there
64	are 10 peers holding shares, and the happiness threshold is 7, the
65	upload should be declared happy, because there is a happy subset of size
66	10, and 10 > 7. However, since a maximum matching on the bipartite graph
67	related to this layout has only 3 edges, Tahoe-LAFS declares the upload
68	unhealthy. Though it is not unhealthy, a share layout like this example
69	is inefficient; for k = 3, and if there are n peers, it corresponds to
70	an expansion factor of 10x. Layouts that are declared healthy by the
71	bipartite graph matching approach have the property that they correspond
72	to uploads that are either already relatively efficient in their
73	utilization of space, or can be made to be so by deleting shares; and
74	that place all of the shares that they generate, enabling redistribution
75	of shares later without having to re-encode the file. Also, it is
76	computationally reasonable to compute a maximum matching in a bipartite
77	graph, and there are well-studied algorithms to do that.
78
79	Issues
80	======
81
82	The uploader is good at detecting unhealthy upload layouts, but it
83	doesn't always know how to make an unhealthy upload into a healthy
84	upload if it is possible to do so; it attempts to redistribute shares to
85	achieve happiness, but only in certain circumstances. The redistribution
86	algorithm isn't optimal, either, so even in these cases it will not
87	always find a happy layout if one can be arrived at through
88	redistribution. We are investigating improvements to address these
89	issues.
90
91	We don't use servers-of-happiness for mutable files yet; this fix will
92	likely come in Tahoe-LAFS version 1.13.
93
94
95	============================
96	Upload Strategy of Happiness
97	============================
98
99	As mentioned above, the uploader is good at detecting instances which
100	do not pass the servers-of-happiness test, but the share distribution algorithm
101	is not always successful in instances where happiness can be achieved. A new
102	placement algorithm designed to pass the servers-of-happiness test, titled
103	'Upload Strategy of Happiness', is meant to fix these instances where the uploader
104	is unable to achieve happiness.
105
106	Calculating Share Placements
107	============================
108
109	We calculate share placement like so:
110
111	0. Start with an ordered list of servers. Maybe 2N of them.
112
113	1. Query all servers for existing shares.
114
115	1a. Query remaining space from all servers. Every server that has
116	enough free space is considered "readwrite" and every server with too
117	little space is "readonly".
118
119	2. Construct a bipartite graph G1 of readonly servers to pre-existing
120	shares, where an edge exists between an arbitrary readonly server S and an
121	arbitrary share T if and only if S currently holds T.
122
123	3. Calculate a maximum matching graph of G1 (a set of S->T edges that has or
124	is-tied-for the highest "happiness score"). There is a clever efficient
125	algorithm for this, named "Ford-Fulkerson". There may be more than one
126	maximum matching for this graph; we choose one of them arbitrarily, but
127	prefer earlier servers. Call this particular placement M1. The placement
128	maps shares to servers, where each share appears at most once, and each
129	server appears at most once.
130
131	4. Construct a bipartite graph G2 of readwrite servers to pre-existing
132	shares. Then remove any edge (from G2) that uses a server or a share found
133	in M1. Let an edge exist between server S and share T if and only if S
134	already holds T.
135
136	5. Calculate a maximum matching graph of G2, call this M2, again preferring
137	earlier servers.
138
139	6. Construct a bipartite graph G3 of (only readwrite) servers to
140	shares (some shares may already exist on a server). Then remove
141	(from G3) any servers and shares used in M1 or M2 (note that we
142	retain servers/shares that were in G1/G2 but not in the M1/M2
143	subsets)
144
145	7. Calculate a maximum matching graph of G3, call this M3, preferring earlier
146	servers. The final placement table is the union of M1+M2+M3.
147
148	8. Renew the shares on their respective servers from M1 and M2.
149
150	9. Upload share T to server S if an edge exists between S and T in M3.
151
152	10. If any placements from step 9 fail, mark the server as read-only. Go back
153	to step 2 (since we may discover a server is/has-become read-only, or has
154	failed, during step 9).
155
156	Rationale (Step 4): when we see pre-existing shares on read-only servers, we
157	prefer to rely upon those (rather than the ones on read-write servers), so we
158	can maybe use the read-write servers for new shares. If we picked the
159	read-write server's share, then we couldn't re-use that server for new ones
160	(we only rely upon each server for one share, more or less).
161
162	Properties of Upload Strategy of Happiness
163	==========================================
164
165	The size of the maximum bipartite matching is bounded by the size of the smaller
166	set of vertices. Therefore in a situation where the set of servers is smaller
167	than the set of shares, placement is not generated for a subset of shares. In
168	this case the remaining shares are distributed as evenly as possible across the
169	set of writable servers.
170
171	If the servers-of-happiness criteria can be met, the upload strategy of
172	happiness guarantees that H shares will be placed on the network. During file
173	repair, if the set of servers is larger than N, the algorithm will only attempt
174	to spread shares over N distinct servers. For both initial file upload and file
175	repair, N should be viewed as the maximum number of distinct servers shares
176	can be placed on, and H as the minimum amount. The uploader will fail if
177	the number of distinct servers is less than H, and it will never attempt to
178	exceed N.

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/docs/specifications/servers-of-happiness.rst

Download in other formats: