[tahoe-lafs-trac-stream] [tahoe-lafs] #1382: immutable peer selection refactoring and enhancements

Tue Sep 3 03:50:57 UTC 2013

#1382: immutable peer selection refactoring and enhancements
-------------------------+-------------------------------------------------
     Reporter:  kevan    |      Owner:  markberger
         Type:           |     Status:  new
  enhancement            |  Milestone:  1.11.0
     Priority:  major    |    Version:  1.8.2
    Component:  code-    |   Keywords:  review-needed servers-of-happiness
  peerselection          |  blocks-release
   Resolution:           |
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Comment (by zooko):

 Mark: I started doing real code review while on a plane to Washington D.C.
 today. Here are my notes. I apologize if they are confusing or incomplete.
 I'm deciding I should send you these notes now instead of sleeping on it,
 in order to not delay.

 Overall, these patches are really great! You've done a fantastic job of
 satisfying all the requirements, recording the patches into coherent
 chunks, etc. I'm making a lot of requests of you in the following review
 notes, and I hope it doesn't discourage you! I'm excited about getting
 #1382 fixed in trunk ASAP!

 regarding https://github.com/markberger/tahoe-
 lafs/commit/739468e69676d30353814248f1fd7dd4b7b9d8f9:
 * "servers- of" → "servers-of"
 * "1. Query all servers for existing shares." — actually only 2*N servers?
 * s/an arbitrary/a/g
 * define (explain) "maximum matching graph" the first time it occurs
 * link from servers-of-happiness.rst to upload-of-happiness.rst, or
 actually better to combine the two documents together.
 * link to Kevan's thesis from this doc
 * Hey, I think a series of diagrams showing an example graph would help a
 lot.
 * In step 2, "a server s" → "a readonly server s"
 * Shouldn't step 8 be broken into two steps?
 * Shouldn't failures of renewals in step 8 be handled by removing the
 server and starting over, just a failures of placements are in step 9?
 * "placement is not generated for a subset of shares" ; Hey! This is Diego
 "sickness" Righi's long-standing request (comment:12:ticket699).
 * "guarantees that N shares will be placed" → "guarantees that at least H
 shares will be placed"

 regarding https://github.com/markberger/tahoe-
 lafs/commit/ff1a77924b03d286477496c0e0a3ddfe45353c61:
 * Move the explanation about what a "flow network" is and what a "maximum
 spanning" is up to before the first occurrence of those terms.
 * No wait, instead move the explanation about what a "flow network" is
 into upload-of-happiness.rst and link to that document!
 * Also, explain if the flow network in this case is going to just be an
 integral flow of 0 or 1, in which case… maybe it doesn't matter that it is
 a flow network?
 * I'm confused about whether generate_mappings() is analyzing only the
 existing allocations, as the docstring seems to indicate, or additionally
 proposing new allocations, as the last comment within it indicates.
 * Are "readonly_peers" and "peerids" the same type of thing? Maybe named
 the former "readonly_peerids" then.
 * "[key for key in servermap]" → "servermap.keys()"
 * change:
 {{{
     self.servermap_shareids = set()
     for key in servermap:
         for share in servermap[key]:
             self.servermap_shareids.add(share)
 }}}
 to:
 {{{
     self.servermap_shareids = set(servermap.values())
 }}}
 * No wait, actually remove {{{self.servermap_shareids}}} and
 {{{self.servermap_peerids}}} entirely and use the appropriate expression
 instead of those member variables where they are currently used.

 * For _flow_network():
    * "Given set of peerids and shareids" → "Given a set of peerids and a
 set of shareids"
    * Explain what the type of the return value is. It is a list of…
 lists…  of something? Explain the data structure — what shape it has. I
 see that over in happinessutil.py it is mentioned "graph is a flow network
 in adjacency-list form". Is this the same? Maybe just put that in the
 docstring of all functions that take or return that type. However, I would
 also like more explanation of what adjacency-list form is, perhaps in
 happinessutil.py somewhere, and the other uses of it can say "As defined
 in happinessutil.py" ?
    * Write a unit test showing that _flow_network() returns the right
 answer for a range of inputs.
    * Okay, okay, hold on.
 allmydata.immutable.happiness_upload.Happiness_Upload._flow_network is
 almost identical to allmydata.util.happinessutil.flow_network_for. And,
 the latter looks like it has more of the kind of documentation that I'm
 asking for here. (But not yet the unit test that I asked for here.) Also
 allmydata.immutable.happiness_upload.Happiness_Upload._servermap_flow_graph.
 So: can we refactor these three functions into one function? And write
 unit tests for it?

 Okay, I'm stopping reviewing [ff1a77924b03d286477496c0e0a3ddfe45353c61]
 for now because I don't know how much of it is duplicated or near-
 duplicated code…

 regarding https://github.com/markberger/tahoe-
 lafs/commit/e7af5f7995bc661d19d5c903359813fa91880966 and
 https://github.com/markberger/tahoe-
 lafs/commit/5875c31d32ef50467c572736599066fffe8b73ac:

 * In the docstring for get_tasks, "allocations" → "placements",
 "allocation" → "placement", "reliably or correctly placed" → "well
 allocated"
 * Add to the confirm_share_allocation docstring "i.e. the share has been
 successfully uploaded to the specified server" (if that is correct).
 * add_peers isn't used, but an "add_peer" method that isn't included in
 this interface is. (Perhaps this would have been detected by some kind of
 interface-checking tool? That Daira might have written? I don't remember…)
 * Likewise with "add_peer_with_share" and "add_peer_with_shares".
 * mark_full_peer is used on each of the servers detected as full by
 examining their announcements of themselves, but is not used elsewhere.
 This makes me wonder what happens if a server is not full according to its
 announcements, but it has become full since the creation of that
 announcement and before this upload begins, or it becomes full during this
 upload. Should such servers be marked with "mark_full_peer"? If so, then
 we should write a unit test that is red with the current code and goes
 green when we change the code to mark such servers as full.
 * I would prefer if the tasks object returned from get_tasks in upload.py
 were passed around as arguments instead of stored in a member variable.
 Member variables can be updated or read from more places, and can persist
 longer, so I have to look at a wider scope of code when wondering what
 could be effected by or effecting it. Arguments have narrower scope, and
 so are preferable except when you actually do want the state to be
 reachable from all (or at least most) of the methods.
 * I don't think is_healthy is used. There is a different "is_healthy"
 defined in checker results that is used a lot, but as far as I could see
 from grep none of the uses of is_healthy are the one from this interface.
 So, good! Maybe it is unnecessary and can be removed.
 * I don't think needs_recomputation is necessary or used.
 * change:
 {{{
     def add_peer_with_shares(self, peerid, shnum):
         if peerid in self.existing_shares.keys():
             self.existing_shares[peerid].add(shnum)
         else:
             self.existing_shares[peerid] = set([shnum])
 }}}
 to:
 {{{
     def add_peer_with_share(self, peerid, shnum):
         self.existing_shares.setdefault(peerid, set()).add(shnum)
 }}}
 * for mark_bad_peer, you can use the set ".discard()" method to simplify
 the code

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1382#comment:41>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage