Ticket #911: 911.darcspatch.txt

File 911.darcspatch.txt, 18.6 KB (added by kevan, at 2010-05-19T00:59:20Z)
Line 
1Tue May 18 17:46:22 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
2  * Add a specification for servers of happiness.
3
4New patches:
5
6[Add a specification for servers of happiness.
7Kevan Carstensen <kevan@isnotajoke.com>**20100519004622
8 Ignore-this: 34a025523b7b776e7e45d69cfbe236f9
9] {
10addfile ./docs/specifications/servers-of-happiness.txt
11hunk ./docs/specifications/servers-of-happiness.txt 1
12+= Servers of Happiness =
13+
14+When you upload a file to a Tahoe-LAFS grid, you expect that it will
15+stay there for a while, and that it will do so even if a few of the
16+peers on the grid stop working, or if something else goes wrong. An
17+upload health metric helps to make sure that this actually happens. An
18+upload health metric is essentially a test that looks at a file on a
19+Tahoe-LAFS grid and says whether or not that file is healthy; that is,
20+whether it is distributed on the grid in such a way as to ensure that it
21+will probably survive in good enough shape to be recoverable even if a
22+few things go wrong between the time of the test and the time that it is
23+recovered. Our current upload health metric is called
24+'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.
25+
26+shares-of-happiness used the number of encoded shares generated by a
27+file upload to say whether or not it was healthy. If there were more
28+shares than a user-configurable threshold, the file was reported to be
29+healthy; otherwise, it was reported to be unhealthy. In normal
30+situations, the upload process would distribute shares fairly evenly
31+over the peers in the grid, and in that case shares-of-happiness
32+worked fine. However, because it only considered the number of shares,
33+and not where they were on the grid, it could not detect situations
34+where a file was unhealthy because most or all of the shares generated
35+from the file were stored on one or two peers.
36+
37+servers-of-happiness addresses this by extending the share-focused
38+upload health metric to also consider the location of the shares on
39+grid. servers-of-happiness looks at the mapping of peers to the shares
40+that they hold, and compares the cardinality of the largest happy subset
41+of those with a user-configurable threshold (A happy subset of peers has
42+the property that any k (where k is as in k-of-n encoding) peers within
43+the subset can reconstruct the source file). This definition of file
44+health provides a stronger assurance of file availability over time;
45+with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed
46+to be available even if 4 peers fail.
47+
48+== Measuring Servers of Happiness ==
49+
50+We calculate servers-of-happiness by computing a matching on a
51+bipartite graph that is related to the layout of shares on the grid.
52+One set of vertices is the peers on the grid, and one set of vertices is
53+the shares. An edge connects a peer and a share if the peer will (or
54+does, for existing shares) hold the share. The size of the maximum
55+matching on this graph is the size of the largest happy peer set that
56+exists for the upload.
57+
58+First, note that a bipartite matching of size n corresponds to a happy
59+subset of size n. This is because a bipartite matching of size n implies
60+that there are n peers such that each peer holds a share that no other
61+peer holds. Then any k of those peers collectively hold k distinct
62+shares, and can restore the file.
63+
64+A bipartite matching of size n is not necessary for a happy subset of
65+size n, however (so it is not correct to say that the size of the
66+maximum matching on this graph is the size of the largest happy subset
67+of peers that exists for the upload). For example, consider a file with
68+k = 3, and suppose that each peer has all three of those pieces.  Then,
69+since any peer from the original upload can restore the file, if there
70+are 10 peers holding shares, and the happiness threshold is 7, the
71+upload should be declared happy, because there is a happy subset of size
72+10, and 10 > 7. However, since a maximum matching on the bipartite graph
73+related to this layout has only 3 edges, Tahoe-LAFS declares the upload
74+unhealthy. Though it is not unhealthy, a share layout like this example
75+is inefficient; for k = 3, and if there are n peers, it corresponds to
76+an expansion factor of 10x. Layouts that are declared healthy by the
77+bipartite graph matching approach have the property that they correspond
78+to uploads that are either already relatively efficient in their
79+utilization of space, or can be made to be so by deleting shares, and
80+that place all of the shares that they generate, enabling redistribution
81+of shares later without having to re-encode the file.  Also, it is
82+computationally reasonable to compute a maximum matching in a bipartite
83+graph, and there are well-studied algorithms to do that.
84+
85+== Issues ==
86+
87+The uploader is good at detecting unhealthy upload layouts, but it
88+doesn't always know how to make an unhealthy upload into a healthy
89+upload if it is possible to do so; it attempts to redistribute shares to
90+achieve happiness, but only in certain circumstances. The redistribution
91+algorithm isn't optimal, either, so even in these cases it will not
92+always find a happy layout if one can be arrived at through
93+redistribution. We are investigating improvements to address these
94+issues.
95}
96
97Context:
98
99[Improve code coverage of the Tahoe2PeerSelector tests.
100Kevan Carstensen <kevan@isnotajoke.com>**20100515032913
101 Ignore-this: 793151b63ffa65fdae6915db22d9924a
102]
103[Remove a comment that no longer makes sense.
104Kevan Carstensen <kevan@isnotajoke.com>**20100514203516
105 Ignore-this: 956983c7e7c7e4477215494dfce8f058
106]
107[docs: update docs/architecture.txt to more fully and correctly explain the upload procedure
108zooko@zooko.com**20100514043458
109 Ignore-this: 538b6ea256a49fed837500342092efa3
110]
111[Fix up the behavior of #778, per reviewers' comments
112Kevan Carstensen <kevan@isnotajoke.com>**20100514004917
113 Ignore-this: 9c20b60716125278b5456e8feb396bff
114 
115   - Make some important utility functions clearer and more thoroughly
116     documented.
117   - Assert in upload.servers_of_happiness that the buckets attributes
118     of PeerTrackers passed to it are mutually disjoint.
119   - Get rid of some silly non-Pythonisms that I didn't see when I first
120     wrote these patches.
121   - Make sure that should_add_server returns true when queried about a
122     shnum that it doesn't know about yet.
123   - Change Tahoe2PeerSelector.preexisting_shares to map a shareid to a set
124     of peerids, alter dependencies to deal with that.
125   - Remove upload.should_add_servers, because it is no longer necessary
126   - Move upload.shares_of_happiness and upload.shares_by_server to a utility
127     file.
128   - Change some points in Tahoe2PeerSelector.
129   - Compute servers_of_happiness using a bipartite matching algorithm that
130     we know is optimal instead of an ad-hoc greedy algorithm that isn't.
131   - Change servers_of_happiness to just take a sharemap as an argument,
132     change its callers to merge existing_shares and used_peers before
133     calling it.
134   - Change an error message in the encoder to be more appropriate for
135     servers of happiness.
136   - Clarify the wording of an error message in immutable/upload.py
137   - Refactor a happiness failure message to happinessutil.py, and make
138     immutable/upload.py and immutable/encode.py use it.
139   - Move the word "only" as far to the right as possible in failure
140     messages.
141   - Use a better definition of progress during peer selection.
142   - Do read-only peer share detection queries in parallel, not sequentially.
143   - Clean up logging semantics; print the query statistics whenever an
144     upload is unsuccessful, not just in one case.
145 
146]
147[Alter the error message when an upload fails, per some comments in #778.
148Kevan Carstensen <kevan@isnotajoke.com>**20091230210344
149 Ignore-this: ba97422b2f9737c46abeb828727beb1
150 
151 When I first implemented #778, I just altered the error messages to refer to
152 servers where they referred to shares. The resulting error messages weren't
153 very good. These are a bit better.
154]
155[Change "UploadHappinessError" to "UploadUnhappinessError"
156Kevan Carstensen <kevan@isnotajoke.com>**20091205043037
157 Ignore-this: 236b64ab19836854af4993bb5c1b221a
158]
159[Alter the error message returned when peer selection fails
160Kevan Carstensen <kevan@isnotajoke.com>**20091123002405
161 Ignore-this: b2a7dc163edcab8d9613bfd6907e5166
162 
163 The Tahoe2PeerSelector returned either NoSharesError or NotEnoughSharesError
164 for a variety of error conditions that weren't informatively described by them.
165 This patch creates a new error, UploadHappinessError, replaces uses of
166 NoSharesError and NotEnoughSharesError with it, and alters the error message
167 raised with the errors to be more in line with the new servers_of_happiness
168 behavior. See ticket #834 for more information.
169]
170[Eliminate overcounting iof servers_of_happiness in Tahoe2PeerSelector; also reorganize some things.
171Kevan Carstensen <kevan@isnotajoke.com>**20091118014542
172 Ignore-this: a6cb032cbff74f4f9d4238faebd99868
173]
174[Change stray "shares_of_happiness" to "servers_of_happiness"
175Kevan Carstensen <kevan@isnotajoke.com>**20091116212459
176 Ignore-this: 1c971ba8c3c4d2e7ba9f020577b28b73
177]
178[Alter Tahoe2PeerSelector to make sure that it recognizes existing shares on readonly servers, fixing an issue in #778
179Kevan Carstensen <kevan@isnotajoke.com>**20091116192805
180 Ignore-this: 15289f4d709e03851ed0587b286fd955
181]
182[Alter 'immutable/encode.py' and 'immutable/upload.py' to use servers_of_happiness instead of shares_of_happiness.
183Kevan Carstensen <kevan@isnotajoke.com>**20091104111222
184 Ignore-this: abb3283314820a8bbf9b5d0cbfbb57c8
185]
186[Alter the signature of set_shareholders in IEncoder to add a 'servermap' parameter, which gives IEncoders enough information to perform a sane check for servers_of_happiness.
187Kevan Carstensen <kevan@isnotajoke.com>**20091104033241
188 Ignore-this: b3a6649a8ac66431beca1026a31fed94
189]
190[Alter CiphertextDownloader to work with servers_of_happiness
191Kevan Carstensen <kevan@isnotajoke.com>**20090924041932
192 Ignore-this: e81edccf0308c2d3bedbc4cf217da197
193]
194[Revisions of the #778 tests, per reviewers' comments
195Kevan Carstensen <kevan@isnotajoke.com>**20100514012542
196 Ignore-this: 735bbc7f663dce633caeb3b66a53cf6e
197 
198 - Fix comments and confusing naming.
199 - Add tests for the new error messages suggested by David-Sarah
200   and Zooko.
201 - Alter existing tests for new error messages.
202 - Make sure that the tests continue to work with the trunk.
203 - Add a test for a mutual disjointedness assertion that I added to
204   upload.servers_of_happiness.
205 - Fix the comments to correctly reflect read-onlyness
206 - Add a test for an edge case in should_add_server
207 - Add an assertion to make sure that share redistribution works as it
208   should
209 - Alter tests to work with revised servers_of_happiness semantics
210 - Remove tests for should_add_server, since that function no longer exists.
211 - Alter tests to know about merge_peers, and to use it before calling
212   servers_of_happiness.
213 - Add tests for merge_peers.
214 - Add Zooko's puzzles to the tests.
215 - Edit encoding tests to expect the new kind of failure message.
216 - Edit tests to expect error messages with the word "only" moved as far
217   to the right as possible.
218 - Extended and cleaned up some helper functions.
219 - Changed some tests to call more appropriate helper functions.
220 - Added a test for the failing redistribution algorithm
221 - Added a test for the progress message
222 - Added a test for the upper bound on readonly peer share discovery.
223 
224]
225[Alter various unit tests to work with the new happy behavior
226Kevan Carstensen <kevan@isnotajoke.com>**20100107181325
227 Ignore-this: 132032bbf865e63a079f869b663be34a
228]
229[Replace "UploadHappinessError" with "UploadUnhappinessError" in tests.
230Kevan Carstensen <kevan@isnotajoke.com>**20091205043453
231 Ignore-this: 83f4bc50c697d21b5f4e2a4cd91862ca
232]
233[Add tests for the behavior described in #834.
234Kevan Carstensen <kevan@isnotajoke.com>**20091123012008
235 Ignore-this: d8e0aa0f3f7965ce9b5cea843c6d6f9f
236]
237[Re-work 'test_upload.py' to be more readable; add more tests for #778
238Kevan Carstensen <kevan@isnotajoke.com>**20091116192334
239 Ignore-this: 7e8565f92fe51dece5ae28daf442d659
240]
241[Test Tahoe2PeerSelector to make sure that it recognizeses existing shares on readonly servers
242Kevan Carstensen <kevan@isnotajoke.com>**20091109003735
243 Ignore-this: 12f9b4cff5752fca7ed32a6ebcff6446
244]
245[Add more tests for comment:53 in ticket #778
246Kevan Carstensen <kevan@isnotajoke.com>**20091104112849
247 Ignore-this: 3bb2edd299a944cc9586e14d5d83ec8c
248]
249[Add a test for upload.shares_by_server
250Kevan Carstensen <kevan@isnotajoke.com>**20091104111324
251 Ignore-this: f9802e82d6982a93e00f92e0b276f018
252]
253[Minor tweak to an existing test -- make the first server read-write, instead of read-only
254Kevan Carstensen <kevan@isnotajoke.com>**20091104034232
255 Ignore-this: a951a46c93f7f58dd44d93d8623b2aee
256]
257[Alter tests to use the new form of set_shareholders
258Kevan Carstensen <kevan@isnotajoke.com>**20091104033602
259 Ignore-this: 3deac11fc831618d11441317463ef830
260]
261[Refactor some behavior into a mixin, and add tests for the behavior described in #778
262"Kevan Carstensen" <kevan@isnotajoke.com>**20091030091908
263 Ignore-this: a6f9797057ca135579b249af3b2b66ac
264]
265[Alter NoNetworkGrid to allow the creation of readonly servers for testing purposes.
266Kevan Carstensen <kevan@isnotajoke.com>**20091018013013
267 Ignore-this: e12cd7c4ddeb65305c5a7e08df57c754
268]
269[Update 'docs/architecture.txt' to reflect readonly share discovery
270kevan@isnotajoke.com**20100514003852
271 Ignore-this: 7ead71b34df3b1ecfdcfd3cb2882e4f9
272]
273[Alter the wording in docs/architecture.txt to more accurately describe the servers_of_happiness behavior.
274Kevan Carstensen <kevan@isnotajoke.com>**20100428002455
275 Ignore-this: 6eff7fa756858a1c6f73728d989544cc
276]
277[Alter wording in 'interfaces.py' to be correct wrt #778
278"Kevan Carstensen" <kevan@isnotajoke.com>**20091205034005
279 Ignore-this: c9913c700ac14e7a63569458b06980e0
280]
281[Update 'docs/configuration.txt' to reflect the servers_of_happiness behavior.
282Kevan Carstensen <kevan@isnotajoke.com>**20091205033813
283 Ignore-this: 5e1cb171f8239bfb5b565d73c75ac2b8
284]
285[Clarify quickstart instructions for installing pywin32
286david-sarah@jacaranda.org**20100511180300
287 Ignore-this: d4668359673600d2acbc7cd8dd44b93c
288]
289[web: add a simple test that you can load directory.xhtml
290zooko@zooko.com**20100510063729
291 Ignore-this: e49b25fa3c67b3c7a56c8b1ae01bb463
292]
293[setup: fix typos in misc/show-tool-versions.py
294zooko@zooko.com**20100510063615
295 Ignore-this: 2181b1303a0e288e7a9ebd4c4855628
296]
297[setup: show code-coverage tool versions in show-tools-versions.py
298zooko@zooko.com**20100510062955
299 Ignore-this: 4b4c68eb3780b762c8dbbd22b39df7cf
300]
301[docs: update README, mv it to README.txt, update setup.py
302zooko@zooko.com**20100504094340
303 Ignore-this: 40e28ca36c299ea1fd12d3b91e5b421c
304]
305[Dependency on Windmill test framework is not needed yet.
306david-sarah@jacaranda.org**20100504161043
307 Ignore-this: be088712bec650d4ef24766c0026ebc8
308]
309[tests: pass z to tar so that BSD tar will know to ungzip
310zooko@zooko.com**20100504090628
311 Ignore-this: 1339e493f255e8fc0b01b70478f23a09
312]
313[setup: update comments and URLs in setup.cfg
314zooko@zooko.com**20100504061653
315 Ignore-this: f97692807c74bcab56d33100c899f829
316]
317[setup: reorder and extend the show-tool-versions script, the better to glean information about our new buildslaves
318zooko@zooko.com**20100504045643
319 Ignore-this: 836084b56b8d4ee8f1de1f4efb706d36
320]
321[CLI: Support for https url in option --node-url
322Francois Deppierraz <francois@ctrlaltdel.ch>**20100430185609
323 Ignore-this: 1717176b4d27c877e6bc67a944d9bf34
324 
325 This patch modifies the regular expression used for verifying of '--node-url'
326 parameter.  Support for accessing a Tahoe gateway over HTTPS was already
327 present, thanks to Python's urllib.
328 
329]
330[backupdb.did_create_directory: use REPLACE INTO, not INSERT INTO + ignore error
331Brian Warner <warner@lothar.com>**20100428050803
332 Ignore-this: 1fca7b8f364a21ae413be8767161e32f
333 
334 This handles the case where we upload a new tahoe directory for a
335 previously-processed local directory, possibly creating a new dircap (if the
336 metadata had changed). Now we replace the old dirhash->dircap record. The
337 previous behavior left the old record in place (with the old dircap and
338 timestamps), so we'd never stop creating new directories and never converge
339 on a null backup.
340]
341["tahoe webopen": add --info flag, to get ?t=info
342Brian Warner <warner@lothar.com>**20100424233003
343 Ignore-this: 126b0bb6db340fabacb623d295eb45fa
344 
345 Also fix some trailing whitespace.
346]
347[docs: install.html http-equiv refresh to quickstart.html
348zooko@zooko.com**20100421165708
349 Ignore-this: 52b4b619f9dde5886ae2cd7f1f3b734b
350]
351[docs: install.html -> quickstart.html
352zooko@zooko.com**20100421155757
353 Ignore-this: 6084e203909306bed93efb09d0e6181d
354 It is not called "installing" because that implies that it is going to change the configuration of your operating system. It is not called "building" because that implies that you need developer tools like a compiler. Also I added a stern warning against looking at the "InstallDetails" wiki page, which I have renamed to "AdvancedInstall".
355]
356[Fix another typo in tahoe_storagespace munin plugin
357david-sarah@jacaranda.org**20100416220935
358 Ignore-this: ad1f7aa66b554174f91dfb2b7a3ea5f3
359]
360[Add dependency on windmill >= 1.3
361david-sarah@jacaranda.org**20100416190404
362 Ignore-this: 4437a7a464e92d6c9012926b18676211
363]
364[licensing: phrase the OpenSSL-exemption in the vocabulary of copyright instead of computer technology, and replicate the exemption from the GPL to the TGPPL
365zooko@zooko.com**20100414232521
366 Ignore-this: a5494b2f582a295544c6cad3f245e91
367]
368[munin-tahoe_storagespace
369freestorm77@gmail.com**20100221203626
370 Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa
371 
372 Plugin configuration rename
373 
374]
375[setup: add licensing declaration for setuptools (noticed by the FSF compliance folks)
376zooko@zooko.com**20100309184415
377 Ignore-this: 2dfa7d812d65fec7c72ddbf0de609ccb
378]
379[setup: fix error in licensing declaration from Shawn Willden, as noted by the FSF compliance division
380zooko@zooko.com**20100309163736
381 Ignore-this: c0623d27e469799d86cabf67921a13f8
382]
383[CREDITS to Jacob Appelbaum
384zooko@zooko.com**20100304015616
385 Ignore-this: 70db493abbc23968fcc8db93f386ea54
386]
387[desert-island-build-with-proper-versions
388jacob@appelbaum.net**20100304013858]
389[docs: a few small edits to try to guide newcomers through the docs
390zooko@zooko.com**20100303231902
391 Ignore-this: a6aab44f5bf5ad97ea73e6976bc4042d
392 These edits were suggested by my watching over Jake Appelbaum's shoulder as he completely ignored/skipped/missed install.html and also as he decided that debian.txt wouldn't help him with basic installation. Then I threw in a few docs edits that have been sitting around in my sandbox asking to be committed for months.
393]
394[TAG allmydata-tahoe-1.6.1
395david-sarah@jacaranda.org**20100228062314
396 Ignore-this: eb5f03ada8ea953ee7780e7fe068539
397]
398Patch bundle hash:
3998e2062439a75ecfa0a1303013c60ca4ac2b876ac