#1363 closed task (fixed)
refactor storage_client.py, use IServer objects instead of rrefs
Reported by: | warner | Owned by: | warner |
---|---|---|---|
Priority: | major | Milestone: | 1.9.0 |
Component: | code | Version: | 1.8.2 |
Keywords: | review-needed | Cc: | |
Launchpad Bug: |
Description
There's an internal cleanup I've been meaning to do for a while. I started it in source:src/allmydata/storage_client.py a few years ago (some of the TODO notes at the top indicate my plans), but didn't follow through. The goal is for the client to manage a collection of "IServer" objects, each of which represents a storage server. These objects each hold a RemoteReference and track metadata about the server (like nickname, versions, etc).
As we start to handle other kinds of servers, these objects will be a place to abstract out the common behavior. The change will be to support #466, as signed announcements result in a different notion of "server id" than unsigned ones. The IServer object will get some methods that tell the caller what write-enabler seed or lease-renewal seed or peer-selection seed to use.
The next change will be to move the details of interacting with the share into IServer, such as the actual callRemote method names. Then, when we add an HTTP-based server (which would use GET with a Range: header), the uploader/downloader doesn't need to know quite so much about the server type.
This ticket is to track the refactoring progress and host the patches for review.
Attachments (4)
Change History (44)
Changed at 2011-02-20T21:15:00Z by warner
comment:1 Changed at 2011-02-20T21:16:06Z by warner
- Keywords review-needed added
comment:2 Changed at 2011-02-20T21:40:00Z by zooko
- Owner changed from warner to zooko
- Status changed from new to assigned
comment:3 Changed at 2011-02-20T21:45:02Z by zooko
Let's avoid the word "peer". It usually means both more and less than we really mean in this code, and we've already changed most or all of our documentation to use "server" or some other more specific word instead of what we used to call "peer".
So peer_selection_index should probably be renamed server_selection_index. This doesn't have to be a blocker for this patch or for this branch, but please let's get consensus on terminology whenever possible for future convergence.
comment:4 Changed at 2011-02-20T22:07:37Z by zooko
Why are we using sha1 for testing permutation instead of sha256? This patch didn't introduce this behavior so it isn't a blocker for this patch, but it is weird.
comment:5 Changed at 2011-02-20T22:20:59Z by warner
cool, I'll add "server_selection_index" to my TODO list for this ticket. My only problem with it is the acronym collision: we use "storage index" and "SI" in lots of places, and it'd be nice to have something for this that didn't have quite so many Ss and Is in it. But yeah, let's talk it over on the list.
SHA1: hmm, good question. We started out using stdlib sha.new (which is SHA1), and never changed it (because to change it would mess up server-selection ordering for existing filecaps). It doesn't need to be secure, as it's just a load-balancing tool, but it would be nice to use the same hash function everywhere. Maybe when #466 gets us to the point of defining a new server-selection-seed (which could be a new name for peer-selection-index), we could say that old SSSs use SHA1 and new SSSs use SHA256.
comment:6 Changed at 2011-02-20T22:56:22Z by zooko
Okay I've now read through all the changes to test files and didn't see anything wrong.
comment:7 Changed at 2011-02-20T23:26:58Z by zooko
get_known_servers() returns a new copy of a list, sorted (using the sorted function), but the only callers of get_known_servers() are:
- get_connected_servers(), which makes a new frozenset containing a copy of the list, and
- the welcome page in web/root.py.
So, I suggest that get_known_servers() should return a list (or a frozenset for added safety -- David-Sarah and I once spent a long, miserable night tracking down an elusive bug just before a major release which turned out to be due to a function having side-effects on a mutable list of servers that had been passed into it), and that web/root.py should sort it itself.
(Also that in the future web/root.py offer the user controls to sort the list of servers by different columns. :-))
This doesn't need to be a blocker for this patch, but it does look like the kind of thing that Brian might want to tweak.
comment:8 Changed at 2011-02-20T23:38:10Z by zooko
- Keywords reviewed added; review-needed removed
- Owner changed from zooko to warner
- Status changed from assigned to new
Okay, I've reviewed attachment:1363-patch1.diff ! Modulo the comments above, +1.
(If you don't mind, remove the reviewed keyword after you land it. I seem to recall that you had some different protocol for signalling whether a patch was ready to land or not or landed or not -- I forget.)
comment:9 Changed at 2011-02-21T02:07:01Z by warner
- Keywords reviewed removed
- Status changed from new to assigned
Ok, attachment:1363-patch1.diff landed in ffd296fc5ab8007f. Thanks for the quick review!
I'll attach a -patch2 once I'm ready for the next stage of refactoring.
comment:10 Changed at 2011-02-27T01:26:27Z by warner
- Keywords review-needed added
- Owner changed from warner to zooko
- Status changed from assigned to new
ok, -patch2 is ready for review. This one is a darcs patch bundle with 20 individual patches, intended to isolate each change for easier review. Many of them are improving internal names, like referring to "servers" instead of "peers", or fixing the uploader to clearly distinguish between a Server object and a ServerTracker (which were sufficiently confusing before that we had a bunch of assert isinstance(server, ServerTracker) checks). There's also some dead-code removal, which made subsequent refactoring easier.
The bulk of the changes are intended to reduce the use of get_serverid(). Previously, a lot of the code has been passing around (tubid, rref) tuples: the goal is to pass around IServer objects instead. The first step is to replace those tuples with (s.get_serverid(), s.get_rref()), but the second step (which this patch starts to implement) is to push that change further down into the code, delaying the conversion from IServer to serverid until the last possible moment, and in many cases not doing it at all. This means that many data structures which were previously indexed by serverid are now indexed by IServer instance.
This patch doesn't complete the job, but it gets a significant amount of the way there. It doesn't touch the mutable code at all: I'm hoping to review and land #393 before attempting any refactoring of mutable/*.py, to make life easier.
The tree should pass all tests and be pyflakes clean after applying each patch in this series.
note to self: I still need to implement zooko's recommendations from comment:7 in a later patch.
comment:11 Changed at 2011-02-27T03:56:37Z by davidsarah
Reviewing:
- +1 on "test_client.py, upload.py:: remove KiB/MiB/etc constants, and other dead code". I like negative-code patches :-)
- +1 on "storage_client.py: clean up test_add_server/test_add_descriptor, remove .test_servers"
- +1 on "upload.py: more tracker-vs-server cleanup", with a nitpick that "to a set of serverids which claim to already have the share" should be "to a set of serverids for servers that claim to already have the share".
"already_servers" should be "already_serverids".
"contacted_servers" and "contacted_servers2" aren't very good variable names. I suggest s/contacted_servers/worth_asking_servers/ and s/contacted_servers2/have_asked_servers/, and similarly for trackers.
Will look at the rest of the patch tomorrow.
comment:12 Changed at 2011-02-27T03:57:13Z by davidsarah
- Owner changed from zooko to davidsarah
- Status changed from new to assigned
comment:13 Changed at 2011-02-28T02:37:46Z by davidsarah
- +1 on "happinessutil.py: server-vs-tracker cleanup", "test_upload.py: server-vs-tracker cleanup", "test_upload.py: factor out FakeServerTracker"
There are still lots of instances of "peer" in the source after applying these patches. Many of these are in the mutable code which I know you haven't got to yet, but some others in the following files look like they might be sensibly be changed first:
- interfaces.py
- immutable/{encode.py, layout.py, offloaded.py}
- happinessutil.py
- hashutil.py
- storage_client.py
comment:14 Changed at 2011-02-28T02:49:35Z by davidsarah
- +1 on "happinessutil.py: finally rename merge_peers to merge_servers"
- +1 on "upload.py: rearrange _make_trackers a bit, no behavior changes"
- "add remaining get_* methods to storage_client.Server, NoNetworkServer, and ...":
- I'd use get_name() and get_longname() instead of name() and longname().
- Are there references to the .serverid attribute of NativeStorageServer from elsewhere, or can it be deleted?
comment:15 Changed at 2011-02-28T03:41:47Z by warner
heh, one step at a time. I'll add those other peer->server items to the TODO list, though.
Yeah, I don't particularly like name/longname either. I'm using name() as a short placeholder until I figure out what the method really wants to be called: my goal was to turn base32.b2a(serverid) and idlib.shortnodeid_b2a(serverid) into something like s.name(), and I was previously using s.get_short_description() which didn't exactly roll off the tongue. s.get_name() sounds better, but I'm wondering if something even more descriptive might show up once it's only ever being used in log.msg and webapi-display contexts.
But I'll add a patch to use get_name()/get_longname() for now, I only see about 30 uses of it.
And on .serverid, I don't think so, but I'm putting off removing that until I remove get_serverid too, since the goal is to redefine the concept of "serverid" altogether:
- step one: change as much as possible to use more accurate properties like "server permutation seed", "lease secret seed", and human-display-friendly names.
- step two: remove .serverid/.get_serverid() and fix what breaks.
- step three: enjoy brief moment of peace while "serverid" is safely banished
- step four: re-introduce the term to mean "public key which signed the server's Introducer announcement", since I think that's the best claimant to the term "serverid", and I don't want to switch the semantics until I'm sure there are no remaining users of the old form.
comment:16 Changed at 2011-02-28T03:55:50Z by warner
Incidentally, if you use https://github.com/warner/tahoe-lafs to grab a copy of my "pass-server" branch, and compare its tip against the "1363-p2" tag, you'll see the changes I've made since the -p2 attachment which implement your recommendations. I'll hold off on adding another patch bundle to this ticket until you've reviewed the existing one and I've landed it.
I'm trying to not go too crazy with the refactoring/renaming, because that'll induce a lot of merge work with the many other project branches I (and others) have hanging around, but the stuff in this ticket ties directly into the #466 work, so I wanted to do them in the right order. So I'm going to be conservative about what I change until some of that other stuff gets landed (especially including #393).
comment:17 Changed at 2011-02-28T18:43:52Z by davidsarah
https://github.com/warner/tahoe-lafs/commit/cfdbf66ffd28bdd679e6f1fc5caf3385ac5d2385 :
- "We assign each servers/trackers into one three lists." -> "We assign the tracker for each server into one of three lists."
- The doc comment for set_shareholders no longer corresponds to the formal parameters. It should say something like "@param holders: a pair (upload_trackers, already_serverids), where".
comment:18 Changed at 2011-03-10T03:25:02Z by warner
what about -p2? can I land it?
comment:19 Changed at 2011-03-23T22:49:38Z by davidsarah
Yes, fine to land -p2. Some nitpicks:
- Add a blank line between make_server and make_servers in test_download.py
- What are the two XXX's added in allmydata/immutable/downloader/fetcher.py (patch lines 3340 and 3367)?
comment:20 Changed at 2011-03-25T21:11:57Z by warner
- Keywords review-needed removed
- Owner changed from davidsarah to warner
- Status changed from assigned to new
Great, thanks, -p2 has been landed. I'll let you know when I've got a -p3 to review (probably after landing #393 MDMF), and I'll incorporate your suggestions.
comment:21 Changed at 2011-06-15T17:56:32Z by warner
- Keywords review-needed added
ok, here's the next bundle. I'm getting close to the limit of what I can clean up without overlapping with MDMF, but there are a few more I might try to work on. Please review so I can land this puppy.
comment:22 Changed at 2011-06-23T20:47:28Z by zooko
- Owner changed from warner to zooko
- Status changed from new to assigned
comment:23 Changed at 2011-07-16T20:32:54Z by davidsarah
- Milestone changed from undecided to 1.9.0
comment:24 Changed at 2011-07-16T21:08:52Z by zooko
Still working on this! Will prioritize it.
comment:25 Changed at 2011-07-24T04:57:52Z by zooko
Worked on this in the car on the way here last week. Planning to work on this and #1385 on the car ride home tomorrow (about ten hours, with one co-driver and two children in the car). In order to make the deadline for new-feature patches for v1.9, which is tomorrow.
comment:26 Changed at 2011-08-01T17:14:41Z by zooko
attachment:1363-p3.dpatch reviewed. This is all really good stuff—I'm glad to see this sort of clean-up branch. I'm sorry it took me so long to review it. I intend to really elevate the priority of reviewing patches in my day to day life so that whenever Brian posts a new patch review-needed, I drop everything and review it right away.
Patches that get +1 from me and I intend to commit them to trunk soon:
By the way, on my Macbook Pro, allmydata.test.test_immutable.Test takes 8s, not 97s! After the patch "test_immutable.Test: rewrite to use NoNetworkGrid" then it takes about 2s on my system. This isn't an issue with the patch, but it may indicate there is an issue with your system. Improving the speed of the tests from 8s to 2s is valuable even if your system could be changed to run the old tests in a mere 8s. You could look at the timings of the buildslaves for comparison, e.g.: FranXois lenny-armv5tel, Brian ubuntu-linode, Arthur lenny-c7-32bit, FreeStorm WinXP-x86.
- remove now-unused ShareManglingMixin; Would have been better included in the previous patch. I'll rerecord them together.
- apply zooko's advice: storage_client get_known_servers() returns a frozenset, caller sorts
- DownloadStatus.add_known_share wants to be used by Finder, web.status; Wouldn't it be better to open a ticket than to put an XXX comment in the source code? Otherwise +1 on this patch.
- replace IServer.name() with get_name(), and get_longname() and upload.py: apply David-Sarah's advice rename (un)contacted(2) trackers to first_pass/second_pass/next_pass; +1, but I rerecorded them to use darcs replace instead of hunk editing, because that avoided a lot of unnecessary merge conflicts with Kevan's #1382 branch. I'll attach my rerecorded patch to this ticket. I also took notes during the process of merging these with #1382, which notes I'll post to tahoe-dev later, explaining why darcs replace was so valuable in this case. Also in set_shareholders() the docstring is now wrong in describing the "upload_trackers" and "already_serverids". I'll change my rerecord of the patch to fix that.
Patches with issues:
- remove get_serverid from DownloadStatus.add_request_sent and customers and remove get_serverid from DownloadStatus.add_dyhb_sent and customers; These two had lots of merge conflicts with trunk. I rebased these patches onto the current head of trunk and will attach them to this ticket for Brian (if available) to review.
- web/status.py: remove spurious whitespace, no code changes; merge conflicts; an unimportant patch; I ran M-x whitespace-cleanup on this file and emacs made only one change (removing a trailing blank line at end of file).
- remove nodeid from WriteBucketProxy classes and customers; Shouldn't the rref parameter to make_write_bucket_proxy() be removed in favor of using the .get_rref() method of the server argument? Otherwise +1
- remove get_serverid() from ReadBucketProxy and customers, including Checker likewise, shouldn't the ReadBucketProxy stop accepting an rref argument now that it has a server argument to its constructor? Otherwise +1. If Brian (or someone) tells me that these two patches should go in as-is without removing the rref parameter, I'm okay with that.
Changed at 2011-08-01T19:01:03Z by zooko
comment:27 Changed at 2011-08-01T19:03:32Z by zooko
- Owner changed from zooko to warner
- Status changed from assigned to new
Okay of the five patches with issues, attachment:remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch is my rebase of the first two, the whitespace we can skip, and I want to hear from Brian or someone that the last two are okay as-is, or else get an updated version that removes the rref param.
review-needed!
comment:28 Changed at 2011-08-01T19:07:10Z by zooko@…
In 880758340fb827f6:
(The changeset message doesn't reference this ticket)
comment:29 Changed at 2011-08-01T19:07:14Z by warner@…
In 0f11d35f855ed7c0:
(The changeset message doesn't reference this ticket)
comment:30 Changed at 2011-08-01T19:07:14Z by warner@…
In b07af5e1a2e35320:
(The changeset message doesn't reference this ticket)
comment:31 Changed at 2011-08-01T19:07:15Z by warner@…
In 0605c77f08fb4b78:
comment:32 Changed at 2011-08-01T19:07:15Z by warner@…
In feca907499070bc1:
(The changeset message doesn't reference this ticket)
comment:33 Changed at 2011-08-01T19:07:16Z by zooko@…
In dc668754793087a9:
comment:34 Changed at 2011-08-01T19:07:16Z by zooko@…
In 6b2e7985955fb312:
comment:35 Changed at 2011-08-01T20:27:46Z by warner
Your modified patches in attachment:remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch look fine, except for two hunks in the second patch that have a typo (possibly one that I introduced in one patch and then fixed in a subsequent one).
hunk ./src/allmydata/test/test_web.py 88 - serverid_a = hashutil.tagged_hash("foo", "serverid_a")[:20] - serverid_b = hashutil.tagged_hash("foo", "serverid_b")[:20] + serverA = FakeIServer(hashutil.tagged_hash("foo", "serverid_a")[:20]) + serverB = FakeIServer(hashutil.tagged_hash("foo", "serverid_a")[:20])
that last line needs to use serverid_b, not serverid_a.
hunk ./src/allmydata/test/test_web.py 117 - e = ds.add_block_request(serverid_a, 1, 120, 30, now+1) # left unfinished + e = ds.add_block_request(serverB, 1, 120, 30, now+1) # left unfinished
same issue, it needs to be "serverA".
As for changing make_write_bucket_proxy() to take an IServer instead of a (rref, IServer) pair: nope, the rref passed into make_write_bucket_proxy() is an RIBucketWriter (bound to a specific share), whereas IServer.get_rref() returns the server's RIStorageServer (on which you use allocate_buckets() to get an RIBucketWriter). I suppose it'd have been more obvious if the parameter name was "bucket_rref" instead of just "rref".
comment:36 Changed at 2011-08-01T23:54:24Z by warner@…
In 550d67f51f7ebd45:
comment:37 Changed at 2011-08-01T23:54:25Z by warner@…
In 3668cb3d068b7f3a:
(The changeset message doesn't reference this ticket)
comment:38 Changed at 2011-08-02T00:00:36Z by zooko
- Resolution set to fixed
- Status changed from new to closed
comment:39 Changed at 2011-11-01T05:15:54Z by warner
note: 5bf1ffbc879cf082 has some more work along these lines
comment:40 Changed at 2016-08-26T21:48:40Z by Brian Warner <warner@…>
In 54f974d/trunk:
first refactoring step