Ticket #911: 911specv2.dpatch

File 911specv2.dpatch, 18.7 KB (added by kevan, at 2010-05-24T00:46:18Z)

revised specification that mentions that servers of happiness only applies to immutable files for now

Line 
1Sun May 23 17:35:08 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
2  * Add a specification for servers of happiness.
3
4New patches:
5
6[Add a specification for servers of happiness.
7Kevan Carstensen <kevan@isnotajoke.com>**20100524003508
8 Ignore-this: 982e2be8a411be5beaf3582bdfde6151
9] {
10addfile ./docs/specifications/servers-of-happiness.txt
11hunk ./docs/specifications/servers-of-happiness.txt 1
12+= Servers of Happiness =
13+
14+When you upload a file to a Tahoe-LAFS grid, you expect that it will
15+stay there for a while, and that it will do so even if a few of the
16+peers on the grid stop working, or if something else goes wrong. An
17+upload health metric helps to make sure that this actually happens. An
18+upload health metric is essentially a test that looks at a file on a
19+Tahoe-LAFS grid and says whether or not that file is healthy; that is,
20+whether it is distributed on the grid in such a way as to ensure that it
21+will probably survive in good enough shape to be recoverable even if a
22+few things go wrong between the time of the test and the time that it is
23+recovered. Our current upload health metric for immutable files is called
24+'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.
25+
26+shares-of-happiness used the number of encoded shares generated by a
27+file upload to say whether or not it was healthy. If there were more
28+shares than a user-configurable threshold, the file was reported to be
29+healthy; otherwise, it was reported to be unhealthy. In normal
30+situations, the upload process would distribute shares fairly evenly
31+over the peers in the grid, and in that case shares-of-happiness
32+worked fine. However, because it only considered the number of shares,
33+and not where they were on the grid, it could not detect situations
34+where a file was unhealthy because most or all of the shares generated
35+from the file were stored on one or two peers.
36+
37+servers-of-happiness addresses this by extending the share-focused
38+upload health metric to also consider the location of the shares on
39+grid. servers-of-happiness looks at the mapping of peers to the shares
40+that they hold, and compares the cardinality of the largest happy subset
41+of those with a user-configurable threshold (A happy subset of peers has
42+the property that any k (where k is as in k-of-n encoding) peers within
43+the subset can reconstruct the source file). This definition of file
44+health provides a stronger assurance of file availability over time;
45+with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed
46+to be available even if 4 peers fail.
47+
48+== Measuring Servers of Happiness ==
49+
50+We calculate servers-of-happiness by computing a matching on a
51+bipartite graph that is related to the layout of shares on the grid.
52+One set of vertices is the peers on the grid, and one set of vertices is
53+the shares. An edge connects a peer and a share if the peer will (or
54+does, for existing shares) hold the share. The size of the maximum
55+matching on this graph is the size of the largest happy peer set that
56+exists for the upload.
57+
58+First, note that a bipartite matching of size n corresponds to a happy
59+subset of size n. This is because a bipartite matching of size n implies
60+that there are n peers such that each peer holds a share that no other
61+peer holds. Then any k of those peers collectively hold k distinct
62+shares, and can restore the file.
63+
64+A bipartite matching of size n is not necessary for a happy subset of
65+size n, however (so it is not correct to say that the size of the
66+maximum matching on this graph is the size of the largest happy subset
67+of peers that exists for the upload). For example, consider a file with
68+k = 3, and suppose that each peer has all three of those pieces.  Then,
69+since any peer from the original upload can restore the file, if there
70+are 10 peers holding shares, and the happiness threshold is 7, the
71+upload should be declared happy, because there is a happy subset of size
72+10, and 10 > 7. However, since a maximum matching on the bipartite graph
73+related to this layout has only 3 edges, Tahoe-LAFS declares the upload
74+unhealthy. Though it is not unhealthy, a share layout like this example
75+is inefficient; for k = 3, and if there are n peers, it corresponds to
76+an expansion factor of 10x. Layouts that are declared healthy by the
77+bipartite graph matching approach have the property that they correspond
78+to uploads that are either already relatively efficient in their
79+utilization of space, or can be made to be so by deleting shares, and
80+that place all of the shares that they generate, enabling redistribution
81+of shares later without having to re-encode the file.  Also, it is
82+computationally reasonable to compute a maximum matching in a bipartite
83+graph, and there are well-studied algorithms to do that.
84+
85+== Issues ==
86+
87+The uploader is good at detecting unhealthy upload layouts, but it
88+doesn't always know how to make an unhealthy upload into a healthy
89+upload if it is possible to do so; it attempts to redistribute shares to
90+achieve happiness, but only in certain circumstances. The redistribution
91+algorithm isn't optimal, either, so even in these cases it will not
92+always find a happy layout if one can be arrived at through
93+redistribution. We are investigating improvements to address these
94+issues.
95+
96+We don't use servers-of-happiness for mutable files yet; this fix will
97+likely come in Tahoe-LAFS version 1.8.
98}
99
100Context:
101
102[Improve code coverage of the Tahoe2PeerSelector tests.
103Kevan Carstensen <kevan@isnotajoke.com>**20100515032913
104 Ignore-this: 793151b63ffa65fdae6915db22d9924a
105] 
106[Remove a comment that no longer makes sense.
107Kevan Carstensen <kevan@isnotajoke.com>**20100514203516
108 Ignore-this: 956983c7e7c7e4477215494dfce8f058
109] 
110[docs: update docs/architecture.txt to more fully and correctly explain the upload procedure
111zooko@zooko.com**20100514043458
112 Ignore-this: 538b6ea256a49fed837500342092efa3
113] 
114[Fix up the behavior of #778, per reviewers' comments
115Kevan Carstensen <kevan@isnotajoke.com>**20100514004917
116 Ignore-this: 9c20b60716125278b5456e8feb396bff
117 
118   - Make some important utility functions clearer and more thoroughly
119     documented.
120   - Assert in upload.servers_of_happiness that the buckets attributes
121     of PeerTrackers passed to it are mutually disjoint.
122   - Get rid of some silly non-Pythonisms that I didn't see when I first
123     wrote these patches.
124   - Make sure that should_add_server returns true when queried about a
125     shnum that it doesn't know about yet.
126   - Change Tahoe2PeerSelector.preexisting_shares to map a shareid to a set
127     of peerids, alter dependencies to deal with that.
128   - Remove upload.should_add_servers, because it is no longer necessary
129   - Move upload.shares_of_happiness and upload.shares_by_server to a utility
130     file.
131   - Change some points in Tahoe2PeerSelector.
132   - Compute servers_of_happiness using a bipartite matching algorithm that
133     we know is optimal instead of an ad-hoc greedy algorithm that isn't.
134   - Change servers_of_happiness to just take a sharemap as an argument,
135     change its callers to merge existing_shares and used_peers before
136     calling it.
137   - Change an error message in the encoder to be more appropriate for
138     servers of happiness.
139   - Clarify the wording of an error message in immutable/upload.py
140   - Refactor a happiness failure message to happinessutil.py, and make
141     immutable/upload.py and immutable/encode.py use it.
142   - Move the word "only" as far to the right as possible in failure
143     messages.
144   - Use a better definition of progress during peer selection.
145   - Do read-only peer share detection queries in parallel, not sequentially.
146   - Clean up logging semantics; print the query statistics whenever an
147     upload is unsuccessful, not just in one case.
148 
149] 
150[Alter the error message when an upload fails, per some comments in #778.
151Kevan Carstensen <kevan@isnotajoke.com>**20091230210344
152 Ignore-this: ba97422b2f9737c46abeb828727beb1
153 
154 When I first implemented #778, I just altered the error messages to refer to
155 servers where they referred to shares. The resulting error messages weren't
156 very good. These are a bit better.
157] 
158[Change "UploadHappinessError" to "UploadUnhappinessError"
159Kevan Carstensen <kevan@isnotajoke.com>**20091205043037
160 Ignore-this: 236b64ab19836854af4993bb5c1b221a
161] 
162[Alter the error message returned when peer selection fails
163Kevan Carstensen <kevan@isnotajoke.com>**20091123002405
164 Ignore-this: b2a7dc163edcab8d9613bfd6907e5166
165 
166 The Tahoe2PeerSelector returned either NoSharesError or NotEnoughSharesError
167 for a variety of error conditions that weren't informatively described by them.
168 This patch creates a new error, UploadHappinessError, replaces uses of
169 NoSharesError and NotEnoughSharesError with it, and alters the error message
170 raised with the errors to be more in line with the new servers_of_happiness
171 behavior. See ticket #834 for more information.
172] 
173[Eliminate overcounting iof servers_of_happiness in Tahoe2PeerSelector; also reorganize some things.
174Kevan Carstensen <kevan@isnotajoke.com>**20091118014542
175 Ignore-this: a6cb032cbff74f4f9d4238faebd99868
176] 
177[Change stray "shares_of_happiness" to "servers_of_happiness"
178Kevan Carstensen <kevan@isnotajoke.com>**20091116212459
179 Ignore-this: 1c971ba8c3c4d2e7ba9f020577b28b73
180] 
181[Alter Tahoe2PeerSelector to make sure that it recognizes existing shares on readonly servers, fixing an issue in #778
182Kevan Carstensen <kevan@isnotajoke.com>**20091116192805
183 Ignore-this: 15289f4d709e03851ed0587b286fd955
184] 
185[Alter 'immutable/encode.py' and 'immutable/upload.py' to use servers_of_happiness instead of shares_of_happiness.
186Kevan Carstensen <kevan@isnotajoke.com>**20091104111222
187 Ignore-this: abb3283314820a8bbf9b5d0cbfbb57c8
188] 
189[Alter the signature of set_shareholders in IEncoder to add a 'servermap' parameter, which gives IEncoders enough information to perform a sane check for servers_of_happiness.
190Kevan Carstensen <kevan@isnotajoke.com>**20091104033241
191 Ignore-this: b3a6649a8ac66431beca1026a31fed94
192] 
193[Alter CiphertextDownloader to work with servers_of_happiness
194Kevan Carstensen <kevan@isnotajoke.com>**20090924041932
195 Ignore-this: e81edccf0308c2d3bedbc4cf217da197
196] 
197[Revisions of the #778 tests, per reviewers' comments
198Kevan Carstensen <kevan@isnotajoke.com>**20100514012542
199 Ignore-this: 735bbc7f663dce633caeb3b66a53cf6e
200 
201 - Fix comments and confusing naming.
202 - Add tests for the new error messages suggested by David-Sarah
203   and Zooko.
204 - Alter existing tests for new error messages.
205 - Make sure that the tests continue to work with the trunk.
206 - Add a test for a mutual disjointedness assertion that I added to
207   upload.servers_of_happiness.
208 - Fix the comments to correctly reflect read-onlyness
209 - Add a test for an edge case in should_add_server
210 - Add an assertion to make sure that share redistribution works as it
211   should
212 - Alter tests to work with revised servers_of_happiness semantics
213 - Remove tests for should_add_server, since that function no longer exists.
214 - Alter tests to know about merge_peers, and to use it before calling
215   servers_of_happiness.
216 - Add tests for merge_peers.
217 - Add Zooko's puzzles to the tests.
218 - Edit encoding tests to expect the new kind of failure message.
219 - Edit tests to expect error messages with the word "only" moved as far
220   to the right as possible.
221 - Extended and cleaned up some helper functions.
222 - Changed some tests to call more appropriate helper functions.
223 - Added a test for the failing redistribution algorithm
224 - Added a test for the progress message
225 - Added a test for the upper bound on readonly peer share discovery.
226 
227] 
228[Alter various unit tests to work with the new happy behavior
229Kevan Carstensen <kevan@isnotajoke.com>**20100107181325
230 Ignore-this: 132032bbf865e63a079f869b663be34a
231] 
232[Replace "UploadHappinessError" with "UploadUnhappinessError" in tests.
233Kevan Carstensen <kevan@isnotajoke.com>**20091205043453
234 Ignore-this: 83f4bc50c697d21b5f4e2a4cd91862ca
235] 
236[Add tests for the behavior described in #834.
237Kevan Carstensen <kevan@isnotajoke.com>**20091123012008
238 Ignore-this: d8e0aa0f3f7965ce9b5cea843c6d6f9f
239] 
240[Re-work 'test_upload.py' to be more readable; add more tests for #778
241Kevan Carstensen <kevan@isnotajoke.com>**20091116192334
242 Ignore-this: 7e8565f92fe51dece5ae28daf442d659
243] 
244[Test Tahoe2PeerSelector to make sure that it recognizeses existing shares on readonly servers
245Kevan Carstensen <kevan@isnotajoke.com>**20091109003735
246 Ignore-this: 12f9b4cff5752fca7ed32a6ebcff6446
247] 
248[Add more tests for comment:53 in ticket #778
249Kevan Carstensen <kevan@isnotajoke.com>**20091104112849
250 Ignore-this: 3bb2edd299a944cc9586e14d5d83ec8c
251] 
252[Add a test for upload.shares_by_server
253Kevan Carstensen <kevan@isnotajoke.com>**20091104111324
254 Ignore-this: f9802e82d6982a93e00f92e0b276f018
255] 
256[Minor tweak to an existing test -- make the first server read-write, instead of read-only
257Kevan Carstensen <kevan@isnotajoke.com>**20091104034232
258 Ignore-this: a951a46c93f7f58dd44d93d8623b2aee
259] 
260[Alter tests to use the new form of set_shareholders
261Kevan Carstensen <kevan@isnotajoke.com>**20091104033602
262 Ignore-this: 3deac11fc831618d11441317463ef830
263] 
264[Refactor some behavior into a mixin, and add tests for the behavior described in #778
265"Kevan Carstensen" <kevan@isnotajoke.com>**20091030091908
266 Ignore-this: a6f9797057ca135579b249af3b2b66ac
267] 
268[Alter NoNetworkGrid to allow the creation of readonly servers for testing purposes.
269Kevan Carstensen <kevan@isnotajoke.com>**20091018013013
270 Ignore-this: e12cd7c4ddeb65305c5a7e08df57c754
271] 
272[Update 'docs/architecture.txt' to reflect readonly share discovery
273kevan@isnotajoke.com**20100514003852
274 Ignore-this: 7ead71b34df3b1ecfdcfd3cb2882e4f9
275] 
276[Alter the wording in docs/architecture.txt to more accurately describe the servers_of_happiness behavior.
277Kevan Carstensen <kevan@isnotajoke.com>**20100428002455
278 Ignore-this: 6eff7fa756858a1c6f73728d989544cc
279] 
280[Alter wording in 'interfaces.py' to be correct wrt #778
281"Kevan Carstensen" <kevan@isnotajoke.com>**20091205034005
282 Ignore-this: c9913c700ac14e7a63569458b06980e0
283] 
284[Update 'docs/configuration.txt' to reflect the servers_of_happiness behavior.
285Kevan Carstensen <kevan@isnotajoke.com>**20091205033813
286 Ignore-this: 5e1cb171f8239bfb5b565d73c75ac2b8
287] 
288[Clarify quickstart instructions for installing pywin32
289david-sarah@jacaranda.org**20100511180300
290 Ignore-this: d4668359673600d2acbc7cd8dd44b93c
291] 
292[web: add a simple test that you can load directory.xhtml
293zooko@zooko.com**20100510063729
294 Ignore-this: e49b25fa3c67b3c7a56c8b1ae01bb463
295] 
296[setup: fix typos in misc/show-tool-versions.py
297zooko@zooko.com**20100510063615
298 Ignore-this: 2181b1303a0e288e7a9ebd4c4855628
299] 
300[setup: show code-coverage tool versions in show-tools-versions.py
301zooko@zooko.com**20100510062955
302 Ignore-this: 4b4c68eb3780b762c8dbbd22b39df7cf
303] 
304[docs: update README, mv it to README.txt, update setup.py
305zooko@zooko.com**20100504094340
306 Ignore-this: 40e28ca36c299ea1fd12d3b91e5b421c
307] 
308[Dependency on Windmill test framework is not needed yet.
309david-sarah@jacaranda.org**20100504161043
310 Ignore-this: be088712bec650d4ef24766c0026ebc8
311] 
312[tests: pass z to tar so that BSD tar will know to ungzip
313zooko@zooko.com**20100504090628
314 Ignore-this: 1339e493f255e8fc0b01b70478f23a09
315] 
316[setup: update comments and URLs in setup.cfg
317zooko@zooko.com**20100504061653
318 Ignore-this: f97692807c74bcab56d33100c899f829
319] 
320[setup: reorder and extend the show-tool-versions script, the better to glean information about our new buildslaves
321zooko@zooko.com**20100504045643
322 Ignore-this: 836084b56b8d4ee8f1de1f4efb706d36
323] 
324[CLI: Support for https url in option --node-url
325Francois Deppierraz <francois@ctrlaltdel.ch>**20100430185609
326 Ignore-this: 1717176b4d27c877e6bc67a944d9bf34
327 
328 This patch modifies the regular expression used for verifying of '--node-url'
329 parameter.  Support for accessing a Tahoe gateway over HTTPS was already
330 present, thanks to Python's urllib.
331 
332] 
333[backupdb.did_create_directory: use REPLACE INTO, not INSERT INTO + ignore error
334Brian Warner <warner@lothar.com>**20100428050803
335 Ignore-this: 1fca7b8f364a21ae413be8767161e32f
336 
337 This handles the case where we upload a new tahoe directory for a
338 previously-processed local directory, possibly creating a new dircap (if the
339 metadata had changed). Now we replace the old dirhash->dircap record. The
340 previous behavior left the old record in place (with the old dircap and
341 timestamps), so we'd never stop creating new directories and never converge
342 on a null backup.
343] 
344["tahoe webopen": add --info flag, to get ?t=info
345Brian Warner <warner@lothar.com>**20100424233003
346 Ignore-this: 126b0bb6db340fabacb623d295eb45fa
347 
348 Also fix some trailing whitespace.
349] 
350[docs: install.html http-equiv refresh to quickstart.html
351zooko@zooko.com**20100421165708
352 Ignore-this: 52b4b619f9dde5886ae2cd7f1f3b734b
353] 
354[docs: install.html -> quickstart.html
355zooko@zooko.com**20100421155757
356 Ignore-this: 6084e203909306bed93efb09d0e6181d
357 It is not called "installing" because that implies that it is going to change the configuration of your operating system. It is not called "building" because that implies that you need developer tools like a compiler. Also I added a stern warning against looking at the "InstallDetails" wiki page, which I have renamed to "AdvancedInstall".
358] 
359[Fix another typo in tahoe_storagespace munin plugin
360david-sarah@jacaranda.org**20100416220935
361 Ignore-this: ad1f7aa66b554174f91dfb2b7a3ea5f3
362] 
363[Add dependency on windmill >= 1.3
364david-sarah@jacaranda.org**20100416190404
365 Ignore-this: 4437a7a464e92d6c9012926b18676211
366] 
367[licensing: phrase the OpenSSL-exemption in the vocabulary of copyright instead of computer technology, and replicate the exemption from the GPL to the TGPPL
368zooko@zooko.com**20100414232521
369 Ignore-this: a5494b2f582a295544c6cad3f245e91
370] 
371[munin-tahoe_storagespace
372freestorm77@gmail.com**20100221203626
373 Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa
374 
375 Plugin configuration rename
376 
377] 
378[setup: add licensing declaration for setuptools (noticed by the FSF compliance folks)
379zooko@zooko.com**20100309184415
380 Ignore-this: 2dfa7d812d65fec7c72ddbf0de609ccb
381] 
382[setup: fix error in licensing declaration from Shawn Willden, as noted by the FSF compliance division
383zooko@zooko.com**20100309163736
384 Ignore-this: c0623d27e469799d86cabf67921a13f8
385] 
386[CREDITS to Jacob Appelbaum
387zooko@zooko.com**20100304015616
388 Ignore-this: 70db493abbc23968fcc8db93f386ea54
389] 
390[desert-island-build-with-proper-versions
391jacob@appelbaum.net**20100304013858] 
392[docs: a few small edits to try to guide newcomers through the docs
393zooko@zooko.com**20100303231902
394 Ignore-this: a6aab44f5bf5ad97ea73e6976bc4042d
395 These edits were suggested by my watching over Jake Appelbaum's shoulder as he completely ignored/skipped/missed install.html and also as he decided that debian.txt wouldn't help him with basic installation. Then I threw in a few docs edits that have been sitting around in my sandbox asking to be committed for months.
396] 
397[TAG allmydata-tahoe-1.6.1
398david-sarah@jacaranda.org**20100228062314
399 Ignore-this: eb5f03ada8ea953ee7780e7fe068539
400] 
401Patch bundle hash:
402bc5771f7bd7c00e695fc44a2311cc4bb880a4a91