#1382 closed enhancement (fixed)

immutable peer selection refactoring and enhancements

Reported by: kevan Owned by: warner
Priority: major Milestone: 1.13.0
Component: code-peerselection Version: 1.8.2
Keywords: servers-of-happiness blocks-release Cc: zooko, mjberger@…, kyle@…, peter@…, vladimir@…
Launchpad Bug:

Description (last modified by zooko)

I've been working on refactoring and improving immutable peer selection. I have several immediate goals for this project.

  • Decouple servers of happiness specific peer selection from the more mechanical aspects of peer selection so that it can be easily integrated into SDMF and MDMF mutable files now, and other formats later;
  • Address the shortcomings in the servers of happiness file health measure and its current implementation;
  • Extend servers of happiness to file check, verify, and repair operations;
  • Improve the quality and clarity of the peer selection code, easing future maintenance and experimentation.

These improvements correspond roughly to issues presented in tickets #610, #614, #1115, #1124, #1130, and #1293. Unifying mutable and immutable peer selection is ticket #1057, though I don't expect to address that until after MDMF (#393) is merged and until we're done with this ticket.

Attachments (3)

interface.dpatch (13.5 KB) - added by kevan at 2011-03-29T04:06:09Z.
initial implementation of IPeerSelector interface
interfaces.dpatch (35.0 KB) - added by kevan at 2011-04-24T02:38:27Z.
update IPeerSelector definition
1382.dpatch (151.5 KB) - added by kevan at 2011-05-03T04:08:16Z.
snapshot of #1382; includes new, simplified uploader, refactored server selection logic, tests

Download all attachments as: .zip

Change History (100)

comment:1 Changed at 2011-03-29T04:05:29Z by kevan

First up is a draft of of IPeerSelector, an abstract peer selection class. The draft is contained in attachment:interface.dpatch.

A significant annoyance with servers of happiness (and a source of many of its shortcomings) is the fact that it wasn't used to motivate share placement or server selection in a well thought out way; rather, it was just used to evaluate the job that the existing share placement strategy did. So, with IPeerSelector (and with the implicit assumption that the peer selection and share placement strategies you'd want to use when uploading a file are probably generally motivated by how you define file health), I wanted to combine both file health and share placement in one abstraction. I still have a few small issues to fix with the uploader changes that actually use the interface, so you'll have to wait to see the code that exercises and implements the interface, but the interaction is basically:

  • Initialize upload, upload initializes peer selector.
  • Upload tries to find existing shares as appropriate. For each existing share:
    • Call add_peer_with_share to tell the peer selector about that relationship.
  • Upload tells the peer selector all of the peers that it can write to (the peer selector doesn't assume that it can write to the peers it is told about for existing shares).
  • do:
    • Upload asks the peer selector for share assignments, checks their health. If unhealthy, the upload fails with an error as appropriate (so we can account for file repair edge cases here).
    • Upload tries to allocate buckets for the assigned share placements, reporting to the peer selector its successes (via add_peer_with_share) and failures (via mark_full_peer and mark_bad_peer) as appropriate.
  • while the peer selector needs recomputation.

I have an immutable uploader that works pretty well with this sort of interaction.

Things to look for if you're reviewing:

  1. Leaky abstractions; particularly leaky method names. I tried to avoid leaking graph theory into my interface definition. Did I succeed? If not, what should be improved?
  2. Does this seem like a generally useful interface? Could you imagine plugging it into an existing uploader without too much trouble? Could you imagine writing a new uploader that used it without too much trouble? How about a peer selector for an entirely different upload health metric?

(of course, you're welcome to look for whatever you want -- that's just what I think I need feedback on :-)

Changed at 2011-03-29T04:06:09Z by kevan

initial implementation of IPeerSelector interface

comment:2 Changed at 2011-03-29T04:22:37Z by kevan

  • Keywords design-review-needed added

comment:3 Changed at 2011-04-23T19:57:49Z by davidsarah

  • Owner changed from kevan to davidsarah
  • Status changed from new to assigned

Changed at 2011-04-24T02:38:27Z by kevan

update IPeerSelector definition

comment:4 Changed at 2011-04-24T04:08:03Z by zooko

  • Milestone changed from undecided to 1.9.0

Changed at 2011-05-03T04:08:16Z by kevan

snapshot of #1382; includes new, simplified uploader, refactored server selection logic, tests

comment:5 Changed at 2011-07-16T20:27:48Z by zooko

  • Milestone changed from 1.9.0 to soon

On the Tahoe-LAFS Weekly Call, we agreed that this isn't likely to be ready for 1.9.0.

comment:6 Changed at 2011-07-16T20:33:07Z by zooko

Kevan says there isn't any specific thing that he knows is wrong with this patch, but he thinks of it as an experiment that would need more consideration and hunting for edge cases before it should go into trunk.

comment:7 Changed at 2011-07-22T13:41:48Z by zooko

There are a few other tickets that may be affected by this one: #362, #1394, #873, #610, at least.

comment:8 Changed at 2011-11-17T03:50:06Z by kevan

We discussed #1382 at the second Tahoe-LAFS summit. We concluded that this approach (splitting peer selection into two parts along a well-defined and generally useful interface) was a good approach, and that a more complete and correct share allocation algorithm (share director? share overlord? I don't think we agreed on a name for the object) was a good idea. I think we concluded that the right path to bring #1382 into trunk was something like:

  • Split the current peer selection/share allocation algorithm out into a separate class, implementing the interface described in this ticket. I guess we won't have any functional changes at this step, but it will help us design an interface that will be useful in general.
  • Solicit feedback on the interface.
  • Develop an implementation of the revised share placement algorithm using the interface that meets Brian's performance desiderata (200 shares, 1000 servers, 1 second).

Once that's done, we'll have an easy way for people to add new share placement algorithms to Tahoe-LAFS, and a share placement algorithm that fixes the corner cases users currently experience with servers of happiness.

I should have more free time in December than I've had lately, so I'll be happy to work on #1382. I don't know if we want to plan on #1382 for 1.10, or defer it until a future release; if we discussed that at the summit, I don't remember what we concluded.

comment:9 Changed at 2011-11-17T06:12:43Z by zooko

Kevan: excellent! I'm looking forward to progress on this. Your summary of the Summit discussion matches my recollections. We did not talk about whether it would go into 1.10 or not. Lets decide later. I don't even know who is going to be Release Wrangler for 1.10. (Hopefully Brian. He had some good ideas about "more automation, less process".)

comment:10 Changed at 2011-11-17T06:18:47Z by zooko

  • Cc zooko added

comment:11 Changed at 2012-01-14T22:57:55Z by kevan

You can follow my progress on this project by watching the ticket1382 branch on my GitHub page. I'm trying to assemble a series of small deltas that take us from our current immutable peer selector to something that can be easily modified to use a pluggable peer selector, and would be particularly interested in knowing whether the deltas that are there now (and show up there as I continue to work on it) are small enough to be easily reviewed.

comment:12 Changed at 2012-03-29T22:24:23Z by davidsarah

  • Milestone changed from soon to 1.10.0

As Release Manager for 1.10, I say this goes in.

comment:13 Changed at 2012-04-01T02:16:24Z by zooko

  • Keywords review-needed added
  • Milestone changed from 1.11.0 to soon

comment:14 Changed at 2012-04-01T02:43:33Z by davidsarah

  • Milestone changed from soon to 1.11.0

(I effectively renamed 1.10 to 1.11, so that milestone is correct.)

comment:15 Changed at 2013-05-15T17:20:05Z by zooko

  • Description modified (diff)
  • Milestone changed from 1.11.0 to eventually

I'm moving this to Milestone: "eventually", which means that we agree it ought to be fixed, but we don't agree that it is going to get fixed in 1.11.

If someone (e.g. kevan or markberger) wants to prioritize this, then I suppose they might finish it in time for Milestone 1.11. In that case they can move it back into this milestone when they start working on it.

comment:16 follow-up: Changed at 2013-05-17T00:26:45Z by daira

Huh. I thought this was the main ticket we wanted to get fixed for 1.11.

comment:17 in reply to: ↑ 16 Changed at 2013-05-22T23:23:28Z by zooko

Replying to daira:

Huh. I thought this was the main ticket we wanted to get fixed for 1.11.

Oh! Well, it is the main ticket that I want to get fixed for markberger's Google Summer of Code project, but I hope we release Tahoe-LAFS v1.11 before that project is complete...

comment:18 Changed at 2013-06-06T21:13:14Z by zooko

  • Keywords review-needed removed

comment:20 Changed at 2013-06-17T18:14:59Z by markberger

  • Cc mjberger@… added

comment:21 Changed at 2013-06-18T01:33:42Z by daira

  • Owner changed from davidsarah to markberger
  • Status changed from assigned to new

comment:22 follow-up: Changed at 2013-06-25T20:40:26Z by markberger

I'm currently figuring out the last few tests that fail with the new uploader. One of them is test_good_share_hosts in src/allmydata/test/test_checker.py. Here is a summary of the test:

N = 4
k = 3
happy = 1

Initial sharemap:
    share -> [servers]
    0:[A] 1:[A] 2:[A] 3:[A,B,C,D,E]
    4 good shares, but 5 good hosts
After deleting all instances of share #3 and repairing:
    0:[A,B], 1:[A,C], 2:[A,D], 3:[E]
    Still 4 good shares and 5 good hosts

However, with upload of happiness, repair produces

0:[A] 1:[A, B] 2:[C, A] 3:[E]

Is this behavior acceptable? The new algorithm views any additional share placements over N as redundant and does not attempt to place any more shares. However, this appears to be different from the old algorithm, which looks like it tried to place shares on as many servers as possible, regardless of N (at least from this test).

comment:23 in reply to: ↑ 22 Changed at 2013-06-26T16:39:04Z by zooko

Yes, I believe the new behavior is okay. The real question should be: does it match the specification? Well, what is the specification? Could you please make sure it is written in a precise and succinct form in a .rst file under source:docs/ ? (It can refer to Kevan's thesis, but the reader should be able to tell exactly what the specification is without reading Kevan's thesis.)

I'm pretty sure that the result you show in comment:22 does meet the spec.

comment:24 Changed at 2013-06-27T17:03:08Z by daira

It seems non-optimal that the new algorithm does not place share D. The algorithm in ticket:1130#comment:12 does place share D, in step 5.

comment:25 Changed at 2013-06-27T17:22:17Z by markberger

Zooko (comment:23), I have added a document under source:docs/specifications that you should be able to see on the github branch.

Daira (comment:24), the map is shares to servers, so the algorithm is placing share 0 on server A, share 1 on servers A and B, etc. So all of the shares are being placed, but not all of the servers get a share because the set of servers is larger than the set of shares.

comment:26 Changed at 2013-06-27T17:44:56Z by daira

Oh I see, I misread it.

I think this behaviour is completely acceptable.

Does the new code also decide which leases need to be renewed?

comment:27 Changed at 2013-06-27T17:59:06Z by markberger

Yes. The uploader attempts to place the existing shares on their respective servers and this causes the servers to renew their shares instead of uploading them again.

comment:28 Changed at 2013-07-09T17:43:47Z by markberger

In 7/9/13's weekly dev chat, it was agreed upon between Brian, Daira, and Zooko that the upload algorithm associated with this ticket should contact 2n servers. There was also discussion on what to do in the case when existing shares are found. Refer to ticket #610.

comment:29 Changed at 2013-07-16T20:35:21Z by markberger

I think I've found the error for why allmydata.test.test_download.Corruption.test_each_byte fails. Unlike the previous tests I needed to modify for this patch, the downloader is not deterministic. It will download the same shares each time, but not in the same order. Therefore different corruption offsets are not detected each time the test is run. In order for allmydata.test.test_download. Corruption.test_each_byte to pass, something must be modified in order to ensure the downloader is deterministic.

comment:30 Changed at 2013-07-17T13:32:49Z by markberger

It turns out what I perceived to be download order was not the case. Please ignore comment:29.

comment:31 Changed at 2013-07-18T14:41:29Z by markberger

  • Keywords review-needed added; design-review-needed removed

comment:32 Changed at 2013-07-22T18:53:47Z by markberger

The above branch imposes the 2n server limit and a test has been added to check whether 2n servers are contacted.

comment:33 Changed at 2013-07-22T19:12:15Z by markberger

  • Milestone changed from eventually to 1.11.0

comment:34 Changed at 2013-08-19T13:33:27Z by markberger

  • Keywords servers-of-happiness added

comment:35 Changed at 2013-08-21T20:36:44Z by markberger

I rewrote my messy commit history and rebased the branch to trunk. You can find the new branch here: https://github.com/markberger/tahoe-lafs/tree/1382-rewrite

For anyone who is going to review the patch, I suggest reviewing each group in the same sitting.

  • Mostly concerned with the new class associated with the algorithm (first 3 commits)
  • Implementing the algorithm into the uploader (next 3 commits)
  • Adding and fixing unit tests (following 4 commits)
  • Document about the new upload algorithm (final commit)

comment:36 Changed at 2013-08-28T15:33:13Z by markberger

  • Milestone changed from soon to 1.11.0

comment:37 Changed at 2013-08-28T16:02:31Z by daira

  • Keywords blocks-release added

comment:38 Changed at 2013-09-01T13:24:18Z by zooko

This would fix #1814.

comment:39 Changed at 2013-09-01T13:36:14Z by zooko

I believe this would fix #1130.

comment:40 Changed at 2013-09-02T18:11:29Z by kmarkley86

  • Cc kyle@… added

comment:41 follow-up: Changed at 2013-09-03T03:50:55Z by zooko

Mark: I started doing real code review while on a plane to Washington D.C. today. Here are my notes. I apologize if they are confusing or incomplete. I'm deciding I should send you these notes now instead of sleeping on it, in order to not delay.

Overall, these patches are really great! You've done a fantastic job of satisfying all the requirements, recording the patches into coherent chunks, etc. I'm making a lot of requests of you in the following review notes, and I hope it doesn't discourage you! I'm excited about getting #1382 fixed in trunk ASAP!

regarding https://github.com/markberger/tahoe-lafs/commit/739468e69676d30353814248f1fd7dd4b7b9d8f9:

  • "servers- of" → "servers-of"
  • "1. Query all servers for existing shares." — actually only 2*N servers?
  • s/an arbitrary/a/g
  • define (explain) "maximum matching graph" the first time it occurs
  • link from servers-of-happiness.rst to upload-of-happiness.rst, or actually better to combine the two documents together.
  • link to Kevan's thesis from this doc
  • Hey, I think a series of diagrams showing an example graph would help a lot.
  • In step 2, "a server s" → "a readonly server s"
  • Shouldn't step 8 be broken into two steps?
  • Shouldn't failures of renewals in step 8 be handled by removing the server and starting over, just a failures of placements are in step 9?
  • "placement is not generated for a subset of shares" ; Hey! This is Diego "sickness" Righi's long-standing request (comment:12:ticket699).
  • "guarantees that N shares will be placed" → "guarantees that at least H shares will be placed"

regarding https://github.com/markberger/tahoe-lafs/commit/ff1a77924b03d286477496c0e0a3ddfe45353c61:

  • Move the explanation about what a "flow network" is and what a "maximum spanning" is up to before the first occurrence of those terms.
  • No wait, instead move the explanation about what a "flow network" is into upload-of-happiness.rst and link to that document!
  • Also, explain if the flow network in this case is going to just be an integral flow of 0 or 1, in which case… maybe it doesn't matter that it is a flow network?
  • I'm confused about whether generate_mappings() is analyzing only the existing allocations, as the docstring seems to indicate, or additionally proposing new allocations, as the last comment within it indicates.
  • Are "readonly_peers" and "peerids" the same type of thing? Maybe named the former "readonly_peerids" then.
  • "[key for key in servermap]" → "servermap.keys()"
  • change:
        self.servermap_shareids = set()
        for key in servermap:
            for share in servermap[key]:


    self.servermap_shareids = set(servermap.values())
  • No wait, actually remove self.servermap_shareids and self.servermap_peerids entirely and use the appropriate expression instead of those member variables where they are currently used.
  • For _flow_network():
    • "Given set of peerids and shareids" → "Given a set of peerids and a set of shareids"
    • Explain what the type of the return value is. It is a list of… lists… of something? Explain the data structure — what shape it has. I see that over in happinessutil.py it is mentioned "graph is a flow network in adjacency-list form". Is this the same? Maybe just put that in the docstring of all functions that take or return that type. However, I would also like more explanation of what adjacency-list form is, perhaps in happinessutil.py somewhere, and the other uses of it can say "As defined in happinessutil.py" ?
    • Write a unit test showing that _flow_network() returns the right answer for a range of inputs.
    • Okay, okay, hold on. allmydata.immutable.happiness_upload.Happiness_Upload._flow_network is almost identical to allmydata.util.happinessutil.flow_network_for. And, the latter looks like it has more of the kind of documentation that I'm asking for here. (But not yet the unit test that I asked for here.) Also allmydata.immutable.happiness_upload.Happiness_Upload._servermap_flow_graph. So: can we refactor these three functions into one function? And write unit tests for it?

Okay, I'm stopping reviewing [ff1a77924b03d286477496c0e0a3ddfe45353c61] for now because I don't know how much of it is duplicated or near-duplicated code…

regarding https://github.com/markberger/tahoe-lafs/commit/e7af5f7995bc661d19d5c903359813fa91880966 and https://github.com/markberger/tahoe-lafs/commit/5875c31d32ef50467c572736599066fffe8b73ac:

  • In the docstring for get_tasks, "allocations" → "placements", "allocation" → "placement", "reliably or correctly placed" → "well allocated"
  • Add to the confirm_share_allocation docstring "i.e. the share has been successfully uploaded to the specified server" (if that is correct).
  • add_peers isn't used, but an "add_peer" method that isn't included in this interface is. (Perhaps this would have been detected by some kind of interface-checking tool? That Daira might have written? I don't remember…)
  • Likewise with "add_peer_with_share" and "add_peer_with_shares".
  • mark_full_peer is used on each of the servers detected as full by examining their announcements of themselves, but is not used elsewhere. This makes me wonder what happens if a server is not full according to its announcements, but it has become full since the creation of that announcement and before this upload begins, or it becomes full during this upload. Should such servers be marked with "mark_full_peer"? If so, then we should write a unit test that is red with the current code and goes green when we change the code to mark such servers as full.
  • I would prefer if the tasks object returned from get_tasks in upload.py were passed around as arguments instead of stored in a member variable. Member variables can be updated or read from more places, and can persist longer, so I have to look at a wider scope of code when wondering what could be effected by or effecting it. Arguments have narrower scope, and so are preferable except when you actually do want the state to be reachable from all (or at least most) of the methods.
  • I don't think is_healthy is used. There is a different "is_healthy" defined in checker results that is used a lot, but as far as I could see from grep none of the uses of is_healthy are the one from this interface. So, good! Maybe it is unnecessary and can be removed.
  • I don't think needs_recomputation is necessary or used.
  • change:
        def add_peer_with_shares(self, peerid, shnum):
            if peerid in self.existing_shares.keys():
                self.existing_shares[peerid] = set([shnum])


    def add_peer_with_share(self, peerid, shnum):
        self.existing_shares.setdefault(peerid, set()).add(shnum)
  • for mark_bad_peer, you can use the set ".discard()" method to simplify the code

comment:42 follow-up: Changed at 2013-09-03T20:28:51Z by markberger

zooko: Thanks for reviewing my patch! I will start working on these changes. I'm hesitant to implement some of the changes you suggested for https://github.com/markberger/tahoe-lafs/commit/5875c31d32ef50467c572736599066fffe8b73ac because the next commit makes some changes to the same lines. If you also reviewed the next commit and those comments apply to both, please let me know and I'll go ahead with your changes.

comment:43 in reply to: ↑ 42 Changed at 2013-09-04T01:13:08Z by zooko

Replying to markberger:

zooko: Thanks for reviewing my patch! I will start working on these changes. I'm hesitant to implement some of the changes you suggested for https://github.com/markberger/tahoe-lafs/commit/5875c31d32ef50467c572736599066fffe8b73ac because the next commit makes some changes to the same lines. If you also reviewed the next commit and those comments apply to both, please let me know and I'll go ahead with your changes.

Will do.

comment:44 in reply to: ↑ 41 Changed at 2013-09-09T22:16:49Z by markberger

I've pushed some commits which fix the issues with the documentation and the happiness_upload class.

  • Write a unit test showing that _flow_network() returns the right answer for a range of inputs.
  • Okay, okay, hold on. allmydata.immutable.happiness_upload.Happiness_Upload._flow_network is almost identical to allmydata.util.happinessutil.flow_network_for. And, the latter looks like it has more of the kind of documentation that I'm asking for here. (But not yet the unit test that I asked for here.) Also allmydata.immutable.happiness_upload.Happiness_Upload._servermap_flow_graph. So: can we refactor these three functions into one function? And write unit tests for it?

I tried to refactor those two functions into one, but servers are indexed differently between the two and I couldn't figure out how to normalize the input. I will try again later when I have more time.

comment:45 Changed at 2013-09-11T17:08:33Z by markberger

During dev chat today (September 11th, 2013), Zooko and I talked about some modifications to happiness_upload.py that would make it easier to understand.

  • Add notes about when to use _server_flow_graph and when to use _flow_network.
  • Explain the graph data structure we use to represent a flow network.
  • Either summarize why we use flow networks to find the maximum spanning, or link to a good source instead of just referring to Introduction to Algorithms.
  • Combine index_peers and index_shares into one function.
  • Rename peerids and shareids to something else which makes it clear that they are indices used by the flow network.
  • Remove the repeated pattern in generate_mappings to its own function for clarity. That way reindexing can occur in the function and not distract the reader from what is occurring.

comment:46 Changed at 2013-09-13T19:18:45Z by markberger

Besides explaining why flow networks are used in the Edmond-Karp algorithm, all of the above points have been addressed on 1382-rewrite.

comment:47 Changed at 2013-09-13T19:22:20Z by markberger

I've opened a pull request for my branch here.

comment:48 Changed at 2013-10-20T00:39:12Z by zooko

  • Cc peter@… added

Mark: I looked at the pull request linked from comment:47 again just now (with Peter Stuge's) help, and I recognized most of the commits as things I've already read, and I see a lot of comments I've already left. Did you already take into account all previous comments? And if you did, did you put your latest code into new commits at the end of the 1382-rewrite branch, or a new branch, or what? Sorry if I'm being dumb, but please hold my hand and tell me where your latest code is now. Thanks! ☺ —Z

comment:49 Changed at 2013-10-22T03:33:42Z by markberger

Hi Zooko! Sorry for the late response.

After the dev chat we had, I forgot to go and implement the changes we discussed. I quickly added the basic changes yesterday, and they can be found on the pull request. I investigated combining the loop over read only servers and the loop over write servers, but I did not have enough time to commit anything that works.

Sadly I don't have as much time to hack on tahoe as I had expected, but I will try to revisit the loop issue this weekend.

comment:50 Changed at 2013-11-04T22:44:30Z by markberger

I've been working on the suggestions we discussed during last week's dev chat.

It turns out a lot of the query counting mechanics in place on my branch are incorrect. I'm shocked that it actually passes all of the tests in its current state. Also the current version tries to place shares on servers even when we have enough information to determine that a happy upload cannot occur.

I've started to correct these issues, as well as remove the redundant loop, in a branch here.

For the above branch, when tracker.ask_about_existing_shares() is called on FakeStorageServer, I receive the following error for each tracker:

[Failure instance: Traceback: <type 'exceptions.AttributeError'>: FakeStorageServer instance has no attribute 'get_buckets'
--- <exception caught here> ---

Sadly, this error does not appear when you run python setup.py trial -s allmydata.test.test_upload. It simply fails, stating that the happiness error messages are wrong. To reproduce this error message, import pdb and call pdb.set_trace() in _handle_existing_response within immutable/upload.py.

I'm not sure why I'm receiving this error because the same function is called in 1.10. Additionally, there is some code that should give get_buckets a proper response here.

If anyone has some advice on how to fix this error, I would appreciate the help. I haven't been able to figure this bug out.

comment:51 Changed at 2013-11-07T08:03:43Z by markberger

To correct the issue I was having in my previous comment, I added a get_buckets method to FakeStorageServer. I'm not sure why this is necessary since it seemed to work before, but it appears to fix my problem.

comment:52 Changed at 2013-11-12T04:29:30Z by markberger

Today during the summit, we addressed the error messages which occur on the the 1382-rewrite-2 branch. Daira raised some concerns that the uploader might terminate too early because the upload code is tied to repair. I haven't looked into that issue yet.

comment:53 Changed at 2013-11-14T17:34:39Z by daira

At the summit we also discussed the issue that repair should be a best-effort operation and should not exit early, even if it can tell that it isn't going to achieve the happiness set by shared.happy. However I think that should already be addressed by the fixed ticket #1212, which set the happiness threshold to 0 when invoking the repairer. On the other hand, there should be an explicit test of this behaviour, and I'm not sure there is one already.

Adding the resulting happiness to repair and check reports is part of #1784.

comment:54 Changed at 2013-11-14T22:15:18Z by zooko

Once we land this ticket, let's go revisit #614. I am inspired to say that by the discussion on #2106.

comment:55 Changed at 2013-11-15T06:33:19Z by zooko


Here are some more review comments. These are from looking at upload.py on the current version of your 1382-rewrite-2 branch.

  • This comment on line 372 is totally vestigial, right? It is describing the old upload algorithm that you've now replaced.
  • architecture.rst#server-selection is describing the old algorithm. It should probably be replaced with a one-sentence description and a link to servers-of-happiness.rst#upload-strategy-of-happiness.
  • I'm thinking about the constant 2 at line 340. A reader might wonder why that is there, and why it isn't 3 or 2.71828 or something. We should add a comment next to it saying something like "If there are a large number of servers, we don't want to spend a lot of time and network traffic talking to all of them before deciding which ones to upload to. Here we limit our consideration to the first 2*N servers. That ought to be enough. See docs/specifications/servers-of-happiness.rst for details.". Then add to servers-of-happiness.rst a discussion about that constant 2.
  • Hm…in fact, now that I think about it, I'm not happy about this part of the behavior. Actually, I think I'll save that for a separate comment!
  • I find self.tasks confusing. The following things would probably make it easier for me to follow:
    • change the name to something more specific than "tasks". Is it…an "upload_plan"?
    • add a comment next to it saying what it is, possibly: "a map from shnum to a set of servers that the share should be uploaded to"
    • actually you can skip the comment if the renaming above makes it obvious to the reader that it is the return value from Happiness_Upload.generate_mappings, especially if you rename generate_mappings to calculate_upload_plan. ☺
  • Relatedly, the word servermap is used in this file on line 54 to denote a data structure which is shaped like {serverid: set(shnum)} , but here a "servermap" is a datastructure which is shaped like {shnum: serverid} , and then further down in the file "servermap" is used to mean something which is shaped like {shnum: set(serverids)} . Confusing! (Even though "servermap" is a reasonable word for any of these three shapes.) Could we think of different local names for the latter two? Better not touch the first one since it is transmitted over the wire in the helper protocol, apparently.) Oh, here's a partial solution: the one on line 482 doesn't need to be named at all — just inline it instead of using a variable to hold it between line 482 and 483. ☺
  • Please rename _isHappinessPossible to is_happiness_possible and Happiness_Upload to HappinessUpload, to match our conventions.

Okay, I didn't really finish studying the entire upload.py (this version of it), but I'm stopping here for tonight. ☺

comment:56 Changed at 2013-11-15T07:10:59Z by zooko

Okay, so about the "try the first 2*N servers" part (line 340) that I mentioned in comment:55.

There are two limitations of this. The first one relatively simple to solve and I'd like to do it in this branch. The second one is more invasive and I'd like to open a separate ticket for it.

The first one is that all 2*N of the first servers might be in read-only mode (and not already have shares of this file). Then the upload would fail. This is actually fixable without waiting for further network results, because whether the servers were already announced to be in read-only mode is already known (through the introduction protocol) by the client, so we could for example do something like the following. Replace:

    candidate_servers = all_servers[:2*total_shares]
    for server in candidate_servers:
    writeable_servers = [server for server in candidate_servers
                        if _get_maxsize(server) >= allocated_size]
    readonly_servers = set(candidate_servers) - set(writeable_servers)
    for server in readonly_servers:


    num_of_writeable_servers = 0
    for server in all_servers:
        if _get_maxsize(server) >= allocated_size:
            num_of_writeable_servers += 1
            if num_of_writeable_servers >= 2*total_shares:

I guess we ought to write a unit test of this! Demonstrating a situation where the first 2*N servers are all read-only so the upload fails with the old code but succeeds with the new code.

The second limitation is the "deeper" one. Even if we pick 2*N writeable servers at this stage, there's no guarantee that any of those 2*N writeable servers will actually accept our subsequent writes. It could be that those 2*N of them are all full or read-only-mode now (even though they weren't back when the introduction protocol ran and informed our client about those servers), or they might all fail at this moment, or something. The current behavior is to give up on the upload if this happens. It would almost always be better to instead go ahead and ask the *next* few servers if they would hold shares.

I'm going to open a separate ticket for that…

comment:57 Changed at 2013-11-15T07:25:02Z by zooko

Okay, I filed #2108 for the deeper issue raised in comment:56.

comment:58 Changed at 2013-11-16T04:45:01Z by zooko

Here is another review of upload.py from the current tip of 1382-rewrite-2. I'm looking at _get_next_allocation() on line 466, and it seems like it is doing some complicated computations in order to yield just a simple data structure — the ordered list of [(server, set(shnums))] that constitute the upload plan. It seems like the "calculate upload plan" (a.k.a. _calculate_tasks() method could just generate that list and return it and _get_next_allocation() could disappear. Am I missing anything?

comment:59 Changed at 2013-12-11T22:45:17Z by markberger

I've push patches that address what zooko mentioned in comment:55. I have not done anything in response to ticket #2108. I also removed _get_next_allocation.

comment:60 follow-up: Changed at 2013-12-22T00:26:48Z by markberger

I have a lot of free time to work on the patch for the next couple of weeks. If we could clear up what is necessary to get this merged, I would be able to work on this ticket.

comment:61 in reply to: ↑ 60 Changed at 2013-12-26T05:07:57Z by zooko

Replying to markberger:

I have a lot of free time to work on the patch for the next couple of weeks. If we could clear up what is necessary to get this merged, I would be able to work on this ticket.

Hi Mark! Thanks for posting this. I just saw it (I've been spending all day on family Xmas stuff until now), and I'll see if I can answer any questions you have. I have a patch myself somewhere around here…

comment:62 follow-up: Changed at 2013-12-31T00:50:12Z by zooko

transcribed from IRC:

So, now I'm still confused about https://github.com/markberger/tahoe-lafs/blob/6f5a125283494d3bbf1d2fa143eddd3d5183c7d7/src/allmydata/test/test_upload.py#L734 is_happy_enough() It uses k. Why does it do that? The (modern) definition of happiness doesn't include mention of K, right? (It's just that nobody ever sets their H requirement to less than their K.) So I suspect that is_happy_enough() isn't calculating exactly th right function, but we don't notice because it gets called only for well-distributed things, by test_upload, so "return True" also works for the current tests. (I tried this, and return True does indeed give the correct answer for all queries that test_upload.py sends to the is_happy_enough function.) But it seems like we ought to replace it with a function that calculates the correct thing, to avoid confusing future test-writers.

So Mark: please write a new is_happy_enough, or explain to me why I'm wrong, or replace the call to is_happy_enough with a call to allmydata.util.happinessutil.servers_of_happiness.

comment:63 Changed at 2013-12-31T01:24:05Z by zooko

So, I was trying to update docs/specifications/servers-of-happiness.rst to explain the algorithm and provide an "intuition" of it for readers, and look what happened: I confused myself again! I apparently do not have an "intuition" that matches the behavior of servers-of-happiness:

diff --git a/docs/specifications/servers-of-happiness.rst b/docs/specifications/servers-of-happiness.rst
index d18432c..6a335a7 100644
--- a/docs/specifications/servers-of-happiness.rst
+++ b/docs/specifications/servers-of-happiness.rst
@@ -14,29 +14,85 @@ is distributed on the grid in such a way as to ensure that it will
 probably survive in good enough shape to be recoverable, even if a few
 things go wrong between the time of the test and the time that it is
 recovered. Our current upload health metric for immutable files is called
-'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.
-shares-of-happiness used the number of encoded shares generated by a
-file upload to say whether or not it was healthy. If there were more
-shares than a user-configurable threshold, the file was reported to be
-healthy; otherwise, it was reported to be unhealthy. In normal
-situations, the upload process would distribute shares fairly evenly
-over the peers in the grid, and in that case shares-of-happiness
-worked fine. However, because it only considered the number of shares,
-and not where they were on the grid, it could not detect situations
-where a file was unhealthy because most or all of the shares generated
-from the file were stored on one or two peers.
-servers-of-happiness addresses this by extending the share-focused
-upload health metric to also consider the location of the shares on
-grid. servers-of-happiness looks at the mapping of peers to the shares
-that they hold, and compares the cardinality of the largest happy subset
-of those to a user-configurable threshold. A happy subset of peers has
-the property that any k (where k is as in k-of-n encoding) peers within
-the subset can reconstruct the source file. This definition of file
-health provides a stronger assurance of file availability over time;
-with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed
-to be available even if 4 peers fail.
+The servers-of-happiness upload health metric considers the location of the
+shares on grid. servers-of-happiness looks at the mapping of peers to the
+shares that they hold, and considers the size of the largest set of (server,
+share) pairs such that no server appears more than once in the set and no
+share appears more than once in the set. The size of the set is called the
+Happiness value.
+The Happiness value can be understood as the number of servers that are
+"independently" contributing to the health of the file. For example, if
+server A is holding share 0, and server B is holding share 1, then the
+Happiness value is 2.::
+  example 1
+  ---------
+  server A → share 0
+  server B → share 1
+  Happiness value = 2
+In this case, adding server C holding share 0 would not increase the
+Happiness value.::
+  example 2
+  ---------
+  server A → share 0
+  server B → share 1
+  server C → share 1
+  Happiness value = 2
+You can understand this intuitively as being that server C doesn't maximally
+increase the robustness of the file. Server C will help if server B
+disappears, but server C will not be any added help beyond what server B
+provided, if server A disappears.
+But if the added server C held a new share, then it would increase the
+Happiness value.::
+  example 3
+  ---------
+  server A → share 0
+  server B → share 1
+  server C → share 2
+  Happiness value = 3
+For another example, if you have this distribution::
+  example 4
+  ---------
+  server A → share 0, share 1
+  server B → share 1, share 2
+  Happiness value = 2
+And you add a server C which holds share 0 and share 1, then you increase the
+Happiness level to 3.::
+  example 5
+  ---------
+  server A → share 0, share 1
+  server B → share 1, share 2
+  server C → share 1, share 2
+  Happiness value = 3
+XXX FIXME: how come going from example 4 to example 5 increases Happiness but going from example 1 to example 2 doesn't ?? —Zooko
+XXX This definition of file health
+provides a stronger assurance of file availability over time; with 3-of-10
+encoding, and happy=7, a healthy file is still guaranteed to be available
+even if 4 peers fail.
 Measuring Servers of Happiness

comment:64 Changed at 2013-12-31T15:54:37Z by markberger

Servers of happiness is limited by the cardinality of the smaller set.

So in example 1 we have:

server A -> share 0
server B -> share 1 

and then for example two we add:

server C -> share 1

This doesn't increase happiness because share 1 is already matched with server B. Matching shares with extra servers doesn't increase happiness. Since we don't have any homeless shares to match with server C, we are limited by the cardinality of the smallest set, and happiness is set to 2.

In example 4 we have

server A -> share 0, share 1
server B -> share 1, share 2

We can match server A with share 0 and server B with share 1. Once again we are limited by the cardinality of the smaller set, so even though we matched server A with share 1 and server B with share 2 as well, those pairs do not increase happiness. Our set of matchings already has servers A and B, so any additional pairs with servers A and B will not increase happiness.

However, when we add:

server C → share 1, share 2

we are no longer limited by the cardinality of 2 because each set has a cardinality of 3. So now we can match server A with share 0, server B with share 1, and server C with share 2. Each of these pairing has a unique server and share that is only used once in our set of matchings, therefore each pair contributes to a happiness of 3.

That explanation would be the intuitive way to think about the maximum matching of bipartite graphs. I think it is clearer if you show pictures of bipartite graphs, but introducing such graphs might be outside the scope of the document.

comment:65 in reply to: ↑ 62 Changed at 2013-12-31T17:37:57Z by daira

Replying to zooko:

So I suspect that is_happy_enough() isn't calculating exactly the right function, but we don't notice because it gets called only for well-distributed things, by test_upload, so "return True" also works for the current tests. (I tried this, and return True does indeed give the correct answer for all queries that test_upload.py sends to the is_happy_enough function.) But it seems like we ought to replace it with a function that calculates the correct thing, to avoid confusing future test-writers.

So Mark: please write a new is_happy_enough, or explain to me why I'm wrong, or replace the call to is_happy_enough with a call to allmydata.util.happinessutil.servers_of_happiness.

Calling allmydata.util.happinessutil.servers_of_happiness from is_happy_enough and trusting the result would not preserve the intent of the test, because there might be a bug in happinessutil that would then propagate to the test and result in a false pass.

Since the test only checks for happiness rather than unhappiness, it would be possible to call a function that returns a purported maximum matching, and check that it actually is a matching. It is not necessary to check that the matching is maximum, because not being maximum could only cause the test to fail; it could not cause a false pass.

Last edited at 2013-12-31T20:56:02Z by daira (previous) (diff)

comment:66 Changed at 2013-12-31T20:54:08Z by daira

Ah, the maximum matching is generated in happiness_upload.py. I was not expecting it to be there; I was expecting it to be in happinessutil.py. Does this mean there is duplication between those two files that should be factored out?

comment:68 Changed at 2014-01-28T20:12:58Z by nsh

comment x-posted from github:

reviewed, clear and descriptive. only one quibble: not explicitly clear at which point(s) the uploader strategy would fail, that is determine it couldn't achieve happiness, either in principle or in practice

comment:69 Changed at 2014-02-13T17:01:41Z by zooko

I worked on this for a few minutes today and posted notes about what I found as I went. It is entirely possible that the problems I found will turn out to be non-problems once I look more closely at them, and if they are problems they are shallow ones and I'll experiment with cleaning them up myself: pipermail/tahoe-dev/2014-February/008908.html.

comment:70 Changed at 2014-03-20T16:59:55Z by zooko

Here are some of my notes from previous sessions of work on my branch:

comment:71 Changed at 2014-04-29T22:44:28Z by zooko

This will also fix #1791.

comment:72 Changed at 2014-04-29T22:47:10Z by zooko

I believe this will fix #1124. We need to confirm that and make sure there's a unit test that demonstrates that.

comment:73 Changed at 2014-05-02T00:19:46Z by dawuud

I'm commenting on "Calculating Share Placements" in here: https://github.com/zooko/tahoe-lafs/blob/1382-rewrite-3/docs/specifications/servers-of-happiness.rst

I might be wrong but... I think step 10 should be written as follows:

If any placements from step 9 fail (generated in step 7), remove the server from the set of possible servers and regenerate the matchings. Go back to step 6.

Last edited at 2014-05-04T16:57:55Z by daira (previous) (diff)

comment:74 Changed at 2014-09-23T16:58:20Z by daira

I will re-read Mark's docs and consider 73.

comment:76 Changed at 2014-09-23T17:28:40Z by warner

  • Owner changed from markberger to warner
  • Status changed from new to assigned

I've agreed (OMG what have I gotten myself into) to try to understand this patch, the math behind it, and try to review it and/or rearrange it into smaller pieces that other folks can help review. We still think of this as a blocker for 1.11 (it's kind of the showcase feature), but it's likely to push the release out way past our target. Oh well.

comment:77 Changed at 2014-11-25T18:14:30Z by zooko

Okay I've updated the docs and included diagrams! https://github.com/zooko/tahoe-lafs/commits/1382-rewrite-4

I intend to squash it into a few coherent commits on a new branch, to be named 1382-rewrite-5.

comment:78 Changed at 2014-12-16T18:01:11Z by zooko

My most recent code is in Zooko's 1382-rewrite-3 (ae77df1297cdd). If you're working on the code, please start there! :-) See also my notes from when I was last working on this: comment:70.

My most recent doc is in Zooko's 1382-rewrite-4 (3cd00fb32c5ca).

I intend to re-edit and rebase the docs into a new branch, which will be named 1382-rewrite-5.

comment:79 Changed at 2014-12-16T19:55:50Z by warner

we hashed out some more docs in an etherpad, results recorded here: https://gist.github.com/warner/66b604617d307374fe17

comment:80 Changed at 2015-01-20T17:25:30Z by warner

  • Milestone changed from 1.10.1 to 1.11.0

comment:81 Changed at 2015-01-20T18:04:10Z by zooko

Okay, I've written a new introductory doc to complement the specification doc. Brian: please read this and tell me if it makes the servers-of-happiness algorithm seem desirable to you. ☺

Here's the patch that adds the docs:


Here's the introductory doc, rendered:


comment:82 Changed at 2015-01-20T18:05:17Z by zooko

  • Status changed from assigned to new

comment:84 Changed at 2015-05-12T17:03:01Z by warner

I've read that, and I like it. Nice explanation!

comment:85 Changed at 2016-03-22T05:02:52Z by warner

  • Milestone changed from 1.11.0 to 1.12.0

Milestone renamed

comment:86 Changed at 2016-06-28T18:20:37Z by warner

  • Milestone changed from 1.12.0 to 1.13.0

moving most tickets from 1.12 to 1.13 so we can release 1.12 with magic-folders

comment:87 Changed at 2016-08-05T22:03:01Z by rvs

  • Cc vladimir@… added

comment:88 Changed at 2016-10-11T22:01:45Z by daira

Mark Berger's code from comment:47 merged into current master: https://github.com/tahoe-lafs/tahoe-lafs/pull/362

comment:89 Changed at 2016-10-18T14:50:20Z by meejah

I have a branch based on rebasing instead, which gets to about the same point, but there are tests failing depending on the value of PYTHONHASHSEED -- it looks like "because it's choosing different shares" and/or the tests are wrong. But, I wouldn't expect the hashseed to affect how we choose share-placement.

comment:90 Changed at 2016-11-23T13:06:26Z by rvs

  • Keywords review-needed removed

Looks like review-needed is not the case at the moment, so I'll remove the tag if nobody minds.

comment:91 Changed at 2016-11-29T16:40:38Z by meejah

At the Tahoe-LAFS summit recently, we clarified and edited the "servers of happiness" document (that's included in this branch, among a bunch of others). So, this should allow us to write proper tests and then move this forward.

comment:92 Changed at 2017-01-20T07:25:59Z by dawuud

meejah and i made some progress in our pairing session today. i continued work in my dev branch: https://github.com/david415/tahoe-lafs/tree/1382.markberger-rewrite-rebase.3

i noticed that happinessutils had a duplicate servers of happiness subroutine, here:


i wrote a wrapper so that servers_of_happiness in happinessutils called our new servers of happiness routine:


is this the right approach?

Last edited at 2017-01-20T07:26:34Z by dawuud (previous) (diff)

comment:93 Changed at 2017-01-30T22:02:30Z by dawuud

here's my current attempts to make all the tests pass: https://github.com/david415/tahoe-lafs/tree/1382.markberger-rewrite-rebase.3

currently some tests are still failing; in particular allmydata.test.test_upload.EncodingParameters.test_exception_messages_during_server_selection

comment:94 Changed at 2017-02-10T21:08:06Z by dawuud

recent progress here: https://github.com/david415/tahoe-lafs/tree/1382.markberger-rewrite-rebase.4

I fixed a few broken unit tests... at least I thought they were broken because they failed depending on the value set in PYTHONHASHSEED.

I'm currently stuck on trying to get this unit test to pass::

PYTHONHASHSEED=1 trial allmydata.test.test_download.Corruption.test_each_byte

comment:95 Changed at 2017-02-15T18:14:22Z by dawuud

meejah's latest dev branch ​https://github.com/meejah/tahoe-lafs/tree/1382.markberger-rewrite-rebase.5

the following tests are failing:

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent allmydata.test.test_system.SystemTest.test_upload_and_download_random_key allmydata.test.test_checker.BalancingAct.test_good_share_hosts

comment:96 Changed at 2017-02-15T19:20:52Z by meejah

I have submitted a pull-request (https://github.com/tahoe-lafs/tahoe-lafs/pull/402) that is a rebased as squashed of the latest stuff with fixes for the last couple tests that failed. One was "not enough sorting", basically (so the answers weren't as stable as they could be) and the other was a test that wasn't quite correct?

comment:97 Changed at 2017-06-05T23:06:30Z by Brian Warner <warner@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 53130c6/trunk:

Merge PR416: immutable upload now uses happiness-metric algorithm

Closes ticket:1382

Note: See TracTickets for help on using tickets.