Tue May 18 17:46:22 PDT 2010 Kevan Carstensen * Add a specification for servers of happiness. New patches: [Add a specification for servers of happiness. Kevan Carstensen **20100519004622 Ignore-this: 34a025523b7b776e7e45d69cfbe236f9 ] { addfile ./docs/specifications/servers-of-happiness.txt hunk ./docs/specifications/servers-of-happiness.txt 1 += Servers of Happiness = + +When you upload a file to a Tahoe-LAFS grid, you expect that it will +stay there for a while, and that it will do so even if a few of the +peers on the grid stop working, or if something else goes wrong. An +upload health metric helps to make sure that this actually happens. An +upload health metric is essentially a test that looks at a file on a +Tahoe-LAFS grid and says whether or not that file is healthy; that is, +whether it is distributed on the grid in such a way as to ensure that it +will probably survive in good enough shape to be recoverable even if a +few things go wrong between the time of the test and the time that it is +recovered. Our current upload health metric is called +'servers-of-happiness'; its predecessor was called 'shares-of-happiness'. + +shares-of-happiness used the number of encoded shares generated by a +file upload to say whether or not it was healthy. If there were more +shares than a user-configurable threshold, the file was reported to be +healthy; otherwise, it was reported to be unhealthy. In normal +situations, the upload process would distribute shares fairly evenly +over the peers in the grid, and in that case shares-of-happiness +worked fine. However, because it only considered the number of shares, +and not where they were on the grid, it could not detect situations +where a file was unhealthy because most or all of the shares generated +from the file were stored on one or two peers. + +servers-of-happiness addresses this by extending the share-focused +upload health metric to also consider the location of the shares on +grid. servers-of-happiness looks at the mapping of peers to the shares +that they hold, and compares the cardinality of the largest happy subset +of those with a user-configurable threshold (A happy subset of peers has +the property that any k (where k is as in k-of-n encoding) peers within +the subset can reconstruct the source file). This definition of file +health provides a stronger assurance of file availability over time; +with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed +to be available even if 4 peers fail. + +== Measuring Servers of Happiness == + +We calculate servers-of-happiness by computing a matching on a +bipartite graph that is related to the layout of shares on the grid. +One set of vertices is the peers on the grid, and one set of vertices is +the shares. An edge connects a peer and a share if the peer will (or +does, for existing shares) hold the share. The size of the maximum +matching on this graph is the size of the largest happy peer set that +exists for the upload. + +First, note that a bipartite matching of size n corresponds to a happy +subset of size n. This is because a bipartite matching of size n implies +that there are n peers such that each peer holds a share that no other +peer holds. Then any k of those peers collectively hold k distinct +shares, and can restore the file. + +A bipartite matching of size n is not necessary for a happy subset of +size n, however (so it is not correct to say that the size of the +maximum matching on this graph is the size of the largest happy subset +of peers that exists for the upload). For example, consider a file with +k = 3, and suppose that each peer has all three of those pieces. Then, +since any peer from the original upload can restore the file, if there +are 10 peers holding shares, and the happiness threshold is 7, the +upload should be declared happy, because there is a happy subset of size +10, and 10 > 7. However, since a maximum matching on the bipartite graph +related to this layout has only 3 edges, Tahoe-LAFS declares the upload +unhealthy. Though it is not unhealthy, a share layout like this example +is inefficient; for k = 3, and if there are n peers, it corresponds to +an expansion factor of 10x. Layouts that are declared healthy by the +bipartite graph matching approach have the property that they correspond +to uploads that are either already relatively efficient in their +utilization of space, or can be made to be so by deleting shares, and +that place all of the shares that they generate, enabling redistribution +of shares later without having to re-encode the file. Also, it is +computationally reasonable to compute a maximum matching in a bipartite +graph, and there are well-studied algorithms to do that. + +== Issues == + +The uploader is good at detecting unhealthy upload layouts, but it +doesn't always know how to make an unhealthy upload into a healthy +upload if it is possible to do so; it attempts to redistribute shares to +achieve happiness, but only in certain circumstances. The redistribution +algorithm isn't optimal, either, so even in these cases it will not +always find a happy layout if one can be arrived at through +redistribution. We are investigating improvements to address these +issues. } Context: [Improve code coverage of the Tahoe2PeerSelector tests. Kevan Carstensen **20100515032913 Ignore-this: 793151b63ffa65fdae6915db22d9924a ] [Remove a comment that no longer makes sense. Kevan Carstensen **20100514203516 Ignore-this: 956983c7e7c7e4477215494dfce8f058 ] [docs: update docs/architecture.txt to more fully and correctly explain the upload procedure zooko@zooko.com**20100514043458 Ignore-this: 538b6ea256a49fed837500342092efa3 ] [Fix up the behavior of #778, per reviewers' comments Kevan Carstensen **20100514004917 Ignore-this: 9c20b60716125278b5456e8feb396bff - Make some important utility functions clearer and more thoroughly documented. - Assert in upload.servers_of_happiness that the buckets attributes of PeerTrackers passed to it are mutually disjoint. - Get rid of some silly non-Pythonisms that I didn't see when I first wrote these patches. - Make sure that should_add_server returns true when queried about a shnum that it doesn't know about yet. - Change Tahoe2PeerSelector.preexisting_shares to map a shareid to a set of peerids, alter dependencies to deal with that. - Remove upload.should_add_servers, because it is no longer necessary - Move upload.shares_of_happiness and upload.shares_by_server to a utility file. - Change some points in Tahoe2PeerSelector. - Compute servers_of_happiness using a bipartite matching algorithm that we know is optimal instead of an ad-hoc greedy algorithm that isn't. - Change servers_of_happiness to just take a sharemap as an argument, change its callers to merge existing_shares and used_peers before calling it. - Change an error message in the encoder to be more appropriate for servers of happiness. - Clarify the wording of an error message in immutable/upload.py - Refactor a happiness failure message to happinessutil.py, and make immutable/upload.py and immutable/encode.py use it. - Move the word "only" as far to the right as possible in failure messages. - Use a better definition of progress during peer selection. - Do read-only peer share detection queries in parallel, not sequentially. - Clean up logging semantics; print the query statistics whenever an upload is unsuccessful, not just in one case. ] [Alter the error message when an upload fails, per some comments in #778. Kevan Carstensen **20091230210344 Ignore-this: ba97422b2f9737c46abeb828727beb1 When I first implemented #778, I just altered the error messages to refer to servers where they referred to shares. The resulting error messages weren't very good. These are a bit better. ] [Change "UploadHappinessError" to "UploadUnhappinessError" Kevan Carstensen **20091205043037 Ignore-this: 236b64ab19836854af4993bb5c1b221a ] [Alter the error message returned when peer selection fails Kevan Carstensen **20091123002405 Ignore-this: b2a7dc163edcab8d9613bfd6907e5166 The Tahoe2PeerSelector returned either NoSharesError or NotEnoughSharesError for a variety of error conditions that weren't informatively described by them. This patch creates a new error, UploadHappinessError, replaces uses of NoSharesError and NotEnoughSharesError with it, and alters the error message raised with the errors to be more in line with the new servers_of_happiness behavior. See ticket #834 for more information. ] [Eliminate overcounting iof servers_of_happiness in Tahoe2PeerSelector; also reorganize some things. Kevan Carstensen **20091118014542 Ignore-this: a6cb032cbff74f4f9d4238faebd99868 ] [Change stray "shares_of_happiness" to "servers_of_happiness" Kevan Carstensen **20091116212459 Ignore-this: 1c971ba8c3c4d2e7ba9f020577b28b73 ] [Alter Tahoe2PeerSelector to make sure that it recognizes existing shares on readonly servers, fixing an issue in #778 Kevan Carstensen **20091116192805 Ignore-this: 15289f4d709e03851ed0587b286fd955 ] [Alter 'immutable/encode.py' and 'immutable/upload.py' to use servers_of_happiness instead of shares_of_happiness. Kevan Carstensen **20091104111222 Ignore-this: abb3283314820a8bbf9b5d0cbfbb57c8 ] [Alter the signature of set_shareholders in IEncoder to add a 'servermap' parameter, which gives IEncoders enough information to perform a sane check for servers_of_happiness. Kevan Carstensen **20091104033241 Ignore-this: b3a6649a8ac66431beca1026a31fed94 ] [Alter CiphertextDownloader to work with servers_of_happiness Kevan Carstensen **20090924041932 Ignore-this: e81edccf0308c2d3bedbc4cf217da197 ] [Revisions of the #778 tests, per reviewers' comments Kevan Carstensen **20100514012542 Ignore-this: 735bbc7f663dce633caeb3b66a53cf6e - Fix comments and confusing naming. - Add tests for the new error messages suggested by David-Sarah and Zooko. - Alter existing tests for new error messages. - Make sure that the tests continue to work with the trunk. - Add a test for a mutual disjointedness assertion that I added to upload.servers_of_happiness. - Fix the comments to correctly reflect read-onlyness - Add a test for an edge case in should_add_server - Add an assertion to make sure that share redistribution works as it should - Alter tests to work with revised servers_of_happiness semantics - Remove tests for should_add_server, since that function no longer exists. - Alter tests to know about merge_peers, and to use it before calling servers_of_happiness. - Add tests for merge_peers. - Add Zooko's puzzles to the tests. - Edit encoding tests to expect the new kind of failure message. - Edit tests to expect error messages with the word "only" moved as far to the right as possible. - Extended and cleaned up some helper functions. - Changed some tests to call more appropriate helper functions. - Added a test for the failing redistribution algorithm - Added a test for the progress message - Added a test for the upper bound on readonly peer share discovery. ] [Alter various unit tests to work with the new happy behavior Kevan Carstensen **20100107181325 Ignore-this: 132032bbf865e63a079f869b663be34a ] [Replace "UploadHappinessError" with "UploadUnhappinessError" in tests. Kevan Carstensen **20091205043453 Ignore-this: 83f4bc50c697d21b5f4e2a4cd91862ca ] [Add tests for the behavior described in #834. Kevan Carstensen **20091123012008 Ignore-this: d8e0aa0f3f7965ce9b5cea843c6d6f9f ] [Re-work 'test_upload.py' to be more readable; add more tests for #778 Kevan Carstensen **20091116192334 Ignore-this: 7e8565f92fe51dece5ae28daf442d659 ] [Test Tahoe2PeerSelector to make sure that it recognizeses existing shares on readonly servers Kevan Carstensen **20091109003735 Ignore-this: 12f9b4cff5752fca7ed32a6ebcff6446 ] [Add more tests for comment:53 in ticket #778 Kevan Carstensen **20091104112849 Ignore-this: 3bb2edd299a944cc9586e14d5d83ec8c ] [Add a test for upload.shares_by_server Kevan Carstensen **20091104111324 Ignore-this: f9802e82d6982a93e00f92e0b276f018 ] [Minor tweak to an existing test -- make the first server read-write, instead of read-only Kevan Carstensen **20091104034232 Ignore-this: a951a46c93f7f58dd44d93d8623b2aee ] [Alter tests to use the new form of set_shareholders Kevan Carstensen **20091104033602 Ignore-this: 3deac11fc831618d11441317463ef830 ] [Refactor some behavior into a mixin, and add tests for the behavior described in #778 "Kevan Carstensen" **20091030091908 Ignore-this: a6f9797057ca135579b249af3b2b66ac ] [Alter NoNetworkGrid to allow the creation of readonly servers for testing purposes. Kevan Carstensen **20091018013013 Ignore-this: e12cd7c4ddeb65305c5a7e08df57c754 ] [Update 'docs/architecture.txt' to reflect readonly share discovery kevan@isnotajoke.com**20100514003852 Ignore-this: 7ead71b34df3b1ecfdcfd3cb2882e4f9 ] [Alter the wording in docs/architecture.txt to more accurately describe the servers_of_happiness behavior. Kevan Carstensen **20100428002455 Ignore-this: 6eff7fa756858a1c6f73728d989544cc ] [Alter wording in 'interfaces.py' to be correct wrt #778 "Kevan Carstensen" **20091205034005 Ignore-this: c9913c700ac14e7a63569458b06980e0 ] [Update 'docs/configuration.txt' to reflect the servers_of_happiness behavior. Kevan Carstensen **20091205033813 Ignore-this: 5e1cb171f8239bfb5b565d73c75ac2b8 ] [Clarify quickstart instructions for installing pywin32 david-sarah@jacaranda.org**20100511180300 Ignore-this: d4668359673600d2acbc7cd8dd44b93c ] [web: add a simple test that you can load directory.xhtml zooko@zooko.com**20100510063729 Ignore-this: e49b25fa3c67b3c7a56c8b1ae01bb463 ] [setup: fix typos in misc/show-tool-versions.py zooko@zooko.com**20100510063615 Ignore-this: 2181b1303a0e288e7a9ebd4c4855628 ] [setup: show code-coverage tool versions in show-tools-versions.py zooko@zooko.com**20100510062955 Ignore-this: 4b4c68eb3780b762c8dbbd22b39df7cf ] [docs: update README, mv it to README.txt, update setup.py zooko@zooko.com**20100504094340 Ignore-this: 40e28ca36c299ea1fd12d3b91e5b421c ] [Dependency on Windmill test framework is not needed yet. david-sarah@jacaranda.org**20100504161043 Ignore-this: be088712bec650d4ef24766c0026ebc8 ] [tests: pass z to tar so that BSD tar will know to ungzip zooko@zooko.com**20100504090628 Ignore-this: 1339e493f255e8fc0b01b70478f23a09 ] [setup: update comments and URLs in setup.cfg zooko@zooko.com**20100504061653 Ignore-this: f97692807c74bcab56d33100c899f829 ] [setup: reorder and extend the show-tool-versions script, the better to glean information about our new buildslaves zooko@zooko.com**20100504045643 Ignore-this: 836084b56b8d4ee8f1de1f4efb706d36 ] [CLI: Support for https url in option --node-url Francois Deppierraz **20100430185609 Ignore-this: 1717176b4d27c877e6bc67a944d9bf34 This patch modifies the regular expression used for verifying of '--node-url' parameter. Support for accessing a Tahoe gateway over HTTPS was already present, thanks to Python's urllib. ] [backupdb.did_create_directory: use REPLACE INTO, not INSERT INTO + ignore error Brian Warner **20100428050803 Ignore-this: 1fca7b8f364a21ae413be8767161e32f This handles the case where we upload a new tahoe directory for a previously-processed local directory, possibly creating a new dircap (if the metadata had changed). Now we replace the old dirhash->dircap record. The previous behavior left the old record in place (with the old dircap and timestamps), so we'd never stop creating new directories and never converge on a null backup. ] ["tahoe webopen": add --info flag, to get ?t=info Brian Warner **20100424233003 Ignore-this: 126b0bb6db340fabacb623d295eb45fa Also fix some trailing whitespace. ] [docs: install.html http-equiv refresh to quickstart.html zooko@zooko.com**20100421165708 Ignore-this: 52b4b619f9dde5886ae2cd7f1f3b734b ] [docs: install.html -> quickstart.html zooko@zooko.com**20100421155757 Ignore-this: 6084e203909306bed93efb09d0e6181d It is not called "installing" because that implies that it is going to change the configuration of your operating system. It is not called "building" because that implies that you need developer tools like a compiler. Also I added a stern warning against looking at the "InstallDetails" wiki page, which I have renamed to "AdvancedInstall". ] [Fix another typo in tahoe_storagespace munin plugin david-sarah@jacaranda.org**20100416220935 Ignore-this: ad1f7aa66b554174f91dfb2b7a3ea5f3 ] [Add dependency on windmill >= 1.3 david-sarah@jacaranda.org**20100416190404 Ignore-this: 4437a7a464e92d6c9012926b18676211 ] [licensing: phrase the OpenSSL-exemption in the vocabulary of copyright instead of computer technology, and replicate the exemption from the GPL to the TGPPL zooko@zooko.com**20100414232521 Ignore-this: a5494b2f582a295544c6cad3f245e91 ] [munin-tahoe_storagespace freestorm77@gmail.com**20100221203626 Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa Plugin configuration rename ] [setup: add licensing declaration for setuptools (noticed by the FSF compliance folks) zooko@zooko.com**20100309184415 Ignore-this: 2dfa7d812d65fec7c72ddbf0de609ccb ] [setup: fix error in licensing declaration from Shawn Willden, as noted by the FSF compliance division zooko@zooko.com**20100309163736 Ignore-this: c0623d27e469799d86cabf67921a13f8 ] [CREDITS to Jacob Appelbaum zooko@zooko.com**20100304015616 Ignore-this: 70db493abbc23968fcc8db93f386ea54 ] [desert-island-build-with-proper-versions jacob@appelbaum.net**20100304013858] [docs: a few small edits to try to guide newcomers through the docs zooko@zooko.com**20100303231902 Ignore-this: a6aab44f5bf5ad97ea73e6976bc4042d These edits were suggested by my watching over Jake Appelbaum's shoulder as he completely ignored/skipped/missed install.html and also as he decided that debian.txt wouldn't help him with basic installation. Then I threw in a few docs edits that have been sitting around in my sandbox asking to be committed for months. ] [TAG allmydata-tahoe-1.6.1 david-sarah@jacaranda.org**20100228062314 Ignore-this: eb5f03ada8ea953ee7780e7fe068539 ] Patch bundle hash: 8e2062439a75ecfa0a1303013c60ca4ac2b876ac