#1363 closed task (fixed)

refactor storage_client.py, use IServer objects instead of rrefs

Reported by: warner Owned by: warner
Priority: major Milestone: 1.9.0
Component: code Version: 1.8.2
Keywords: review-needed Cc:
Launchpad Bug:

Description

There's an internal cleanup I've been meaning to do for a while. I started it in source:src/allmydata/storage_client.py a few years ago (some of the TODO notes at the top indicate my plans), but didn't follow through. The goal is for the client to manage a collection of "IServer" objects, each of which represents a storage server. These objects each hold a RemoteReference and track metadata about the server (like nickname, versions, etc).

As we start to handle other kinds of servers, these objects will be a place to abstract out the common behavior. The change will be to support #466, as signed announcements result in a different notion of "server id" than unsigned ones. The IServer object will get some methods that tell the caller what write-enabler seed or lease-renewal seed or peer-selection seed to use.

The next change will be to move the details of interacting with the share into IServer, such as the actual callRemote method names. Then, when we add an HTTP-based server (which would use GET with a Range: header), the uploader/downloader doesn't need to know quite so much about the server type.

This ticket is to track the refactoring progress and host the patches for review.

Attachments (4)

1363-patch1.diff (39.5 KB) - added by warner at 2011-02-20T21:15:00Z.
first refactoring step
1363-patch2.dpatch (164.0 KB) - added by warner at 2011-02-27T01:15:03Z.
bundle of refactoring patches
1363-p3.dpatch (88.1 KB) - added by warner at 2011-06-15T17:54:40Z.
next batch of refactoring patches
remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch (28.4 KB) - added by zooko at 2011-08-01T19:01:03Z.

Download all attachments as: .zip

Change History (44)

Changed at 2011-02-20T21:15:00Z by warner

first refactoring step

comment:1 Changed at 2011-02-20T21:16:06Z by warner

  • Keywords review-needed added

comment:2 Changed at 2011-02-20T21:40:00Z by zooko

  • Owner changed from warner to zooko
  • Status changed from new to assigned

comment:3 Changed at 2011-02-20T21:45:02Z by zooko

Let's avoid the word "peer". It usually means both more and less than we really mean in this code, and we've already changed most or all of our documentation to use "server" or some other more specific word instead of what we used to call "peer".

So peer_selection_index should probably be renamed server_selection_index. This doesn't have to be a blocker for this patch or for this branch, but please let's get consensus on terminology whenever possible for future convergence.

comment:4 Changed at 2011-02-20T22:07:37Z by zooko

Why are we using sha1 for testing permutation instead of sha256? This patch didn't introduce this behavior so it isn't a blocker for this patch, but it is weird.

comment:5 Changed at 2011-02-20T22:20:59Z by warner

cool, I'll add "server_selection_index" to my TODO list for this ticket. My only problem with it is the acronym collision: we use "storage index" and "SI" in lots of places, and it'd be nice to have something for this that didn't have quite so many Ss and Is in it. But yeah, let's talk it over on the list.

SHA1: hmm, good question. We started out using stdlib sha.new (which is SHA1), and never changed it (because to change it would mess up server-selection ordering for existing filecaps). It doesn't need to be secure, as it's just a load-balancing tool, but it would be nice to use the same hash function everywhere. Maybe when #466 gets us to the point of defining a new server-selection-seed (which could be a new name for peer-selection-index), we could say that old SSSs use SHA1 and new SSSs use SHA256.

comment:6 Changed at 2011-02-20T22:56:22Z by zooko

Okay I've now read through all the changes to test files and didn't see anything wrong.

comment:7 Changed at 2011-02-20T23:26:58Z by zooko

get_known_servers() returns a new copy of a list, sorted (using the sorted function), but the only callers of get_known_servers() are:

  • get_connected_servers(), which makes a new frozenset containing a copy of the list, and
  • the welcome page in web/root.py.

So, I suggest that get_known_servers() should return a list (or a frozenset for added safety -- David-Sarah and I once spent a long, miserable night tracking down an elusive bug just before a major release which turned out to be due to a function having side-effects on a mutable list of servers that had been passed into it), and that web/root.py should sort it itself.

(Also that in the future web/root.py offer the user controls to sort the list of servers by different columns. :-))

This doesn't need to be a blocker for this patch, but it does look like the kind of thing that Brian might want to tweak.

comment:8 Changed at 2011-02-20T23:38:10Z by zooko

  • Keywords reviewed added; review-needed removed
  • Owner changed from zooko to warner
  • Status changed from assigned to new

Okay, I've reviewed attachment:1363-patch1.diff ! Modulo the comments above, +1.

(If you don't mind, remove the reviewed keyword after you land it. I seem to recall that you had some different protocol for signalling whether a patch was ready to land or not or landed or not -- I forget.)

comment:9 Changed at 2011-02-21T02:07:01Z by warner

  • Keywords reviewed removed
  • Status changed from new to assigned

Ok, attachment:1363-patch1.diff landed in ffd296fc5ab8007f. Thanks for the quick review!

I'll attach a -patch2 once I'm ready for the next stage of refactoring.

Changed at 2011-02-27T01:15:03Z by warner

bundle of refactoring patches

comment:10 Changed at 2011-02-27T01:26:27Z by warner

  • Keywords review-needed added
  • Owner changed from warner to zooko
  • Status changed from assigned to new

ok, -patch2 is ready for review. This one is a darcs patch bundle with 20 individual patches, intended to isolate each change for easier review. Many of them are improving internal names, like referring to "servers" instead of "peers", or fixing the uploader to clearly distinguish between a Server object and a ServerTracker (which were sufficiently confusing before that we had a bunch of assert isinstance(server, ServerTracker) checks). There's also some dead-code removal, which made subsequent refactoring easier.

The bulk of the changes are intended to reduce the use of get_serverid(). Previously, a lot of the code has been passing around (tubid, rref) tuples: the goal is to pass around IServer objects instead. The first step is to replace those tuples with (s.get_serverid(), s.get_rref()), but the second step (which this patch starts to implement) is to push that change further down into the code, delaying the conversion from IServer to serverid until the last possible moment, and in many cases not doing it at all. This means that many data structures which were previously indexed by serverid are now indexed by IServer instance.

This patch doesn't complete the job, but it gets a significant amount of the way there. It doesn't touch the mutable code at all: I'm hoping to review and land #393 before attempting any refactoring of mutable/*.py, to make life easier.

The tree should pass all tests and be pyflakes clean after applying each patch in this series.

note to self: I still need to implement zooko's recommendations from comment:7 in a later patch.

comment:11 Changed at 2011-02-27T03:56:37Z by davidsarah

Reviewing:

  • +1 on "test_client.py, upload.py:: remove KiB/MiB/etc constants, and other dead code". I like negative-code patches :-)
  • +1 on "storage_client.py: clean up test_add_server/test_add_descriptor, remove .test_servers"
  • +1 on "upload.py: more tracker-vs-server cleanup", with a nitpick that "to a set of serverids which claim to already have the share" should be "to a set of serverids for servers that claim to already have the share".

"already_servers" should be "already_serverids".

"contacted_servers" and "contacted_servers2" aren't very good variable names. I suggest s/contacted_servers/worth_asking_servers/ and s/contacted_servers2/have_asked_servers/, and similarly for trackers.

Will look at the rest of the patch tomorrow.

comment:12 Changed at 2011-02-27T03:57:13Z by davidsarah

  • Owner changed from zooko to davidsarah
  • Status changed from new to assigned

comment:13 Changed at 2011-02-28T02:37:46Z by davidsarah

  • +1 on "happinessutil.py: server-vs-tracker cleanup", "test_upload.py: server-vs-tracker cleanup", "test_upload.py: factor out FakeServerTracker"

There are still lots of instances of "peer" in the source after applying these patches. Many of these are in the mutable code which I know you haven't got to yet, but some others in the following files look like they might be sensibly be changed first:

  • interfaces.py
  • immutable/{encode.py, layout.py, offloaded.py}
  • happinessutil.py
  • hashutil.py
  • storage_client.py

comment:14 Changed at 2011-02-28T02:49:35Z by davidsarah

  • +1 on "happinessutil.py: finally rename merge_peers to merge_servers"
  • +1 on "upload.py: rearrange _make_trackers a bit, no behavior changes"
  • "add remaining get_* methods to storage_client.Server, NoNetworkServer, and ...":
    • I'd use get_name() and get_longname() instead of name() and longname().
    • Are there references to the .serverid attribute of NativeStorageServer from elsewhere, or can it be deleted?

comment:15 Changed at 2011-02-28T03:41:47Z by warner

heh, one step at a time. I'll add those other peer->server items to the TODO list, though.

Yeah, I don't particularly like name/longname either. I'm using name() as a short placeholder until I figure out what the method really wants to be called: my goal was to turn base32.b2a(serverid) and idlib.shortnodeid_b2a(serverid) into something like s.name(), and I was previously using s.get_short_description() which didn't exactly roll off the tongue. s.get_name() sounds better, but I'm wondering if something even more descriptive might show up once it's only ever being used in log.msg and webapi-display contexts.

But I'll add a patch to use get_name()/get_longname() for now, I only see about 30 uses of it.

And on .serverid, I don't think so, but I'm putting off removing that until I remove get_serverid too, since the goal is to redefine the concept of "serverid" altogether:

  • step one: change as much as possible to use more accurate properties like "server permutation seed", "lease secret seed", and human-display-friendly names.
  • step two: remove .serverid/.get_serverid() and fix what breaks.
  • step three: enjoy brief moment of peace while "serverid" is safely banished
  • step four: re-introduce the term to mean "public key which signed the server's Introducer announcement", since I think that's the best claimant to the term "serverid", and I don't want to switch the semantics until I'm sure there are no remaining users of the old form.

comment:16 Changed at 2011-02-28T03:55:50Z by warner

Incidentally, if you use https://github.com/warner/tahoe-lafs to grab a copy of my "pass-server" branch, and compare its tip against the "1363-p2" tag, you'll see the changes I've made since the -p2 attachment which implement your recommendations. I'll hold off on adding another patch bundle to this ticket until you've reviewed the existing one and I've landed it.

I'm trying to not go too crazy with the refactoring/renaming, because that'll induce a lot of merge work with the many other project branches I (and others) have hanging around, but the stuff in this ticket ties directly into the #466 work, so I wanted to do them in the right order. So I'm going to be conservative about what I change until some of that other stuff gets landed (especially including #393).

comment:17 Changed at 2011-02-28T18:43:52Z by davidsarah

https://github.com/warner/tahoe-lafs/commit/cfdbf66ffd28bdd679e6f1fc5caf3385ac5d2385 :

  • "We assign each servers/trackers into one three lists." -> "We assign the tracker for each server into one of three lists."
  • The doc comment for set_shareholders no longer corresponds to the formal parameters. It should say something like "@param holders: a pair (upload_trackers, already_serverids), where".
Last edited at 2011-02-28T18:44:47Z by davidsarah (previous) (diff)

comment:18 Changed at 2011-03-10T03:25:02Z by warner

what about -p2? can I land it?

comment:19 Changed at 2011-03-23T22:49:38Z by davidsarah

Yes, fine to land -p2. Some nitpicks:

  • Add a blank line between make_server and make_servers in test_download.py
  • What are the two XXX's added in allmydata/immutable/downloader/fetcher.py (patch lines 3340 and 3367)?

comment:20 Changed at 2011-03-25T21:11:57Z by warner

  • Keywords review-needed removed
  • Owner changed from davidsarah to warner
  • Status changed from assigned to new

Great, thanks, -p2 has been landed. I'll let you know when I've got a -p3 to review (probably after landing #393 MDMF), and I'll incorporate your suggestions.

Changed at 2011-06-15T17:54:40Z by warner

next batch of refactoring patches

comment:21 Changed at 2011-06-15T17:56:32Z by warner

  • Keywords review-needed added

ok, here's the next bundle. I'm getting close to the limit of what I can clean up without overlapping with MDMF, but there are a few more I might try to work on. Please review so I can land this puppy.

comment:22 Changed at 2011-06-23T20:47:28Z by zooko

  • Owner changed from warner to zooko
  • Status changed from new to assigned

comment:23 Changed at 2011-07-16T20:32:54Z by davidsarah

  • Milestone changed from undecided to 1.9.0

comment:24 Changed at 2011-07-16T21:08:52Z by zooko

Still working on this! Will prioritize it.

comment:25 Changed at 2011-07-24T04:57:52Z by zooko

Worked on this in the car on the way here last week. Planning to work on this and #1385 on the car ride home tomorrow (about ten hours, with one co-driver and two children in the car). In order to make the deadline for new-feature patches for v1.9, which is tomorrow.

comment:26 Changed at 2011-08-01T17:14:41Z by zooko

attachment:1363-p3.dpatch reviewed. This is all really good stuff—I'm glad to see this sort of clean-up branch. I'm sorry it took me so long to review it. I intend to really elevate the priority of reviewing patches in my day to day life so that whenever Brian posts a new patch review-needed, I drop everything and review it right away.

Patches that get +1 from me and I intend to commit them to trunk soon:

By the way, on my Macbook Pro, allmydata.test.test_immutable.Test takes 8s, not 97s! After the patch "test_immutable.Test: rewrite to use NoNetworkGrid" then it takes about 2s on my system. This isn't an issue with the patch, but it may indicate there is an issue with your system. Improving the speed of the tests from 8s to 2s is valuable even if your system could be changed to run the old tests in a mere 8s. You could look at the timings of the buildslaves for comparison, e.g.: FranXois lenny-armv5tel, Brian ubuntu-linode, Arthur lenny-c7-32bit, FreeStorm WinXP-x86.

Patches with issues:

comment:27 Changed at 2011-08-01T19:03:32Z by zooko

  • Owner changed from zooko to warner
  • Status changed from assigned to new

Okay of the five patches with issues, attachment:remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch is my rebase of the first two, the whitespace we can skip, and I want to hear from Brian or someone that the last two are okay as-is, or else get an updated version that removes the rref param.

review-needed!

comment:28 Changed at 2011-08-01T19:07:10Z by zooko@…

In 880758340fb827f6:

(The changeset message doesn't reference this ticket)

comment:29 Changed at 2011-08-01T19:07:14Z by warner@…

In 0f11d35f855ed7c0:

(The changeset message doesn't reference this ticket)

comment:30 Changed at 2011-08-01T19:07:14Z by warner@…

In b07af5e1a2e35320:

(The changeset message doesn't reference this ticket)

comment:31 Changed at 2011-08-01T19:07:15Z by warner@…

In 0605c77f08fb4b78:

test_immutable.Test: rewrite to use NoNetworkGrid?, now takes 2.7s not 97s
remove now-unused ShareManglingMixin?
refs #1363

comment:32 Changed at 2011-08-01T19:07:15Z by warner@…

In feca907499070bc1:

(The changeset message doesn't reference this ticket)

comment:33 Changed at 2011-08-01T19:07:16Z by zooko@…

In dc668754793087a9:

remove get_serverid from DownloadStatus?.add_block_request and customers
This is a rebase of a patch Brian originally wrote. I haven't changed the intent of that patch, just ported it to trunk.
refs #1363

comment:34 Changed at 2011-08-01T19:07:16Z by zooko@…

In 6b2e7985955fb312:

remove get_serverid from DownloadStatus?.add_dyhb_request and customers
This patch is a rebase of a patch originally written by Brian. I didn't change any of the intent of Brian's patch, just ported it to current trunk.
refs #1363

comment:35 Changed at 2011-08-01T20:27:46Z by warner

Your modified patches in attachment:remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch look fine, except for two hunks in the second patch that have a typo (possibly one that I introduced in one patch and then fixed in a subsequent one).

hunk ./src/allmydata/test/test_web.py 88
-    serverid_a = hashutil.tagged_hash("foo", "serverid_a")[:20]
-    serverid_b = hashutil.tagged_hash("foo", "serverid_b")[:20]
+    serverA = FakeIServer(hashutil.tagged_hash("foo", "serverid_a")[:20])
+    serverB = FakeIServer(hashutil.tagged_hash("foo", "serverid_a")[:20])

that last line needs to use serverid_b, not serverid_a.

hunk ./src/allmydata/test/test_web.py 117
-    e = ds.add_block_request(serverid_a, 1, 120, 30, now+1) # left unfinished
+    e = ds.add_block_request(serverB, 1, 120, 30, now+1) # left unfinished

same issue, it needs to be "serverA".

As for changing make_write_bucket_proxy() to take an IServer instead of a (rref, IServer) pair: nope, the rref passed into make_write_bucket_proxy() is an RIBucketWriter (bound to a specific share), whereas IServer.get_rref() returns the server's RIStorageServer (on which you use allocate_buckets() to get an RIBucketWriter). I suppose it'd have been more obvious if the parameter name was "bucket_rref" instead of just "rref".

comment:36 Changed at 2011-08-01T23:54:24Z by warner@…

In 550d67f51f7ebd45:

remove get_serverid() from ReadBucketProxy? and customers, including Checker
and debug.py dump-share commands
refs #1363

comment:37 Changed at 2011-08-01T23:54:25Z by warner@…

In 3668cb3d068b7f3a:

(The changeset message doesn't reference this ticket)

comment:38 Changed at 2011-08-02T00:00:36Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

comment:39 Changed at 2011-11-01T05:15:54Z by warner

note: 5bf1ffbc879cf082 has some more work along these lines

comment:40 Changed at 2016-08-26T21:48:40Z by Brian Warner <warner@…>

In 54f974d/trunk:

make IServer.get_serverid() use pubkey, not tubid

This is a change I've wanted to make for many years, because when we get
to HTTP-based servers, we won't have tubids for them. What held me back
was that there's code all over the place that uses the serverid for
various purposes, so I wasn't sure it was safe. I did a big push a few
years ago to use IServer instances instead of serverids in most
places (in #1363), and to split out the values that actually depend upon
tubid into separate accessors (like get_lease_seed and
get_foolscap_write_enabler_seed), which I think took care of all the
important uses.

There are a number of places that use get_serverid() as dictionary key
to track shares (Checker results, mutable servermap). I believe these
are happy to use pubkeys instead of tubids: the only thing they do with
get_serverid() is to compare it to other values obtained from
get_serverid(). A few places in the WUI used serverid to compute display
values: these were fixed.

The main trouble was the Helper: it returns a HelperUploadResults? (a
Copyable) with a share->server mapping that's keyed by whatever the
Helper's get_serverid() returns. If the uploader and the helper are on
different sides of this change, the Helper could return values that the
uploader won't recognize. This is cosmetic: that mapping is only used to
display the upload results on the "Recent and Active Operations" page.
I've added code to StorageFarmBroker?.get_stub_server() to fall back to
tubids when looking up a server, so this should still work correctly
when the uploader is new and the Helper is old. If the Helper is new and
the uploader is old, the upload results will show unusual server ids.

refs ticket:1363

Note: See TracTickets for help on using tickets.