#304 closed task (fixed)

analyze mutable-file retrieval performance: count roundtrips

Reported by: warner Owned by:
Priority: minor Milestone: 0.9.0 (Allmydata 3.0 final)
Component: code-performance Version: 0.7.0
Keywords: mutable Cc:
Launchpad Bug:

Description

I'd like to spend some time analyzing the control flow in mutable.Retrieve . My original hope was to pull down the whole contents of the file in a single round-trip, but it turned out to be trickier than I'd thought. I think we're currently doing more like 3 or 4 RTT.

Directory traversal would be considerably faster with fewer roundtrips.

Or would it? Finding out how much of the time is spend waiting for the network would be a great starting point. The first step would be to implement a "Retrieval Results" object (similar to the immutable-file "Upload Results" object I just wrote this week), into which we could throw timing information during retrieval. Then find a good way to display this information: maybe in a separate box at the bottom of the webish directory display (if some debug checkbox were selected).

Change History (6)

comment:1 Changed at 2008-02-08T03:20:44Z by warner

Note that the only call we ever make to the storage servers is "readv", so the issue is simpler than with immutable files (in which there is a call to get_buckets, followed by some number of 'read' calls, and foolscap does not yet do promise pipelining).

One potential improvement is to avoid doing more than one readv per server. This means reading enough data on the first call that we don't need to ask for more. To do this, we must make a guess as to a reasonable amount of data to ask for: we need the pubkey from one peer, the signed version info from k+epsilon peers, and full share data from 'k' peers.

The other potential improvement is to exploit as much parallelism as possible. There are plenty of tradeoffs here. If we knew 'k' ahead of time, we would know how many signed-version-info queries we need.

Hm, maybe something like this:

  • make a guess at 'k', using the same defaults we'd use for publishing a new mutable file. For DirectoryNodes, the container should provide a hint, so that if dirnodes use 1-of-5 instead of 3-of-7, our guess for k is '1'
  • make sure that readv() correctly handles oversized read requests. I'm 90% sure that it does, but I want to double-check
  • ask 'k' servers for the full SDMF 1MB limit, and then and extra epsilon servers for the signed-version-info+pubkey data.
  • The pubkey is checked from whatever share arrives first.
  • If we learn that 'k' is higher, immediately send out additional full-size queries.
  • If we get back NAKs from the servers, send out an extra query for each NAK.

If we're correct (or high) about 'k', and all the servers have shares, then this will do the job in a single RTT. If we underestimated 'k', we'll need 2*RTT. If some servers did not have shares, we'll need about 2*RTT (or perhaps a fraction more, depending upon the spread of response times).

comment:2 Changed at 2008-02-13T23:44:25Z by warner

I just did some dirnode-size analysis, and put the results in docs/dirnodes.txt . The summary is that each child adds about 336 bytes to the dirnode, and therefore 112 bytes to a k=3 -encoded share. The SDMF format stores about 935 bytes of pubkey+sig+hashes, followed by share data, followed by 1216 bytes of encprivkey. So if we fetch 2kB of data on the initial read, we should be able to retrieve a 9-entry dirnode. If we fetch 3kB, we should be able to get 18 entries, or 7 entries plus the encprivkey. 4kB would get us 27 entries, or 16 plus the encprivkey.

comment:3 Changed at 2008-02-14T01:16:45Z by warner

I did an experiment to examine the effects of changing the Publish initial read_size. Using 2048-bit keys, and using the same speed-test setup we use on the buildbot, I got:

  • colo: (2048-bit keys)
    • 1kB (default)
      • create per-file time: 1.772s/2.806s
      • upload: 184ms/185ms
    • 3kB
      • create: 1.705s/2.299s
      • upload: 184ms/185ms
  • DSL:
    • 1kB (default)
      • create: 3.317s/3.117s
      • upload: 338ms/327ms
    • 3kB
      • create: 4.124s/4.151s
      • upload: 324ms/323ms

I was expecting the 3kB case to upload files one RTT faster (about 16ms on DSL, 3ms in colo). I can't see any evidence of such a speedup, although it is clear that one RTT is somewhat lost in the noise.

We need to investigate mutable-file behavior more closely. As pointed out above, the real first step will be to add "Retrieval Results" / "Publish Results" with detailed timing data.

Incidentally (for completeness), an extra experiment I did with 522-bit keys is bogus, because the whole share is smaller, and a 1kB read is probably enough to get everything. The results of that experiment were:

  • colo: (522-bit keys)
    • 1kB (default)
      • create per-file time: 152ms/142ms
      • upload: 119ms/123ms
    • 3kB
      • create: 135ms/136ms
      • upload: 117ms/119ms
  • DSL:
    • 1kB (default)
      • create: 262ms/279ms
      • upload: 210ms/212ms
    • 3kB
      • create: 256ms/292ms
      • upload: 211ms/207ms

comment:4 Changed at 2008-03-05T03:30:19Z by warner

I added some retrieval-side timing data to the new webish "status" page, which now includes how long each query took. Analyzing the results revealed that the Retrieve code was doing a refetch to grab the encrypted private key, which of course really isn't useful there (you only need the private key to create a new version of the file, so Publish wants it more than Retrieve). As a result, the default initial-read size of 2000 bytes was causing us to do two round-trips, even for small directories.

I changed the code #2245 to have separate unpack_share() and unpack_share_data() methonds. The latter raises NeedMoreDataError if it can't get the encrypted private key, the former only raises NeedMoreDataError if it can't get the share data (and is silent about the encprivkey). Retrieve now uses unpack_share_data() exclusively.

The result is one round trip instead of two for directories with fewer than about 9 or 10 entries, allowing them to complete in roughly half the time.

comment:5 Changed at 2008-03-13T17:09:56Z by zooko

  • Milestone changed from undecided to 0.9.0 (Allmydata 3.0 final)

comment:6 Changed at 2008-03-13T17:51:31Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.