#1398 closed defect (fixed)

make docs/performance.rst more precise and accurate

Reported by: zooko Owned by: warner
Priority: major Milestone: 1.9.2
Component: documentation Version: 1.8.2
Keywords: scalability docs performance memory large Cc: zooko
Launchpad Bug:

Description

In comment:13:ticket:1395, Brian wrote, concerning the Performing a file-verify on an A-byte file:

to be "N/K*S times a small multiple". I think the multiple is currently about 2 or 3. During encryption, we hold both a plaintext share and a ciphertext share in RAM at the same time (so 2*S), then we drop the plaintext. During erasure-coding, we hold a whole S of ciphertext in memory at the same time as the N/K*S shares, then we drop the ciphertext before pushing. We also pipeline the sends a little bit, I think 10kB or 50kB per server, to get better utilization out of a non-zero-latency wire.

Also Python's memory-management strategy interacts weirdly. Dropping the plaintext segment may not be enough: Python might not re-use that memory space for anything else right away. Although I'd expect it to de-fragment or coalesce free blocks before asking the OS for so much memory that it crashed.

Although I wonder if Brian was thinking of repair rather than verify since he talks about encrypting, which is not done in verify.

Subsequently I reviewed the document and I see a bunch of things I'm not sure are right. (Note that I myself am mostly responsible for the current state of this document.)

  • Publishing an A-byte immutable file" / "when the file is already uploaded: memory footprint: N/K*S. Shouldn't that just be memory footprint: S? All it does is read each S-byte segment in turn and hash it. K and N shouldn't come into it. This is probably just a cut-and-paste error of think-o error on my part originally, so unless someone else knows of a better reason why I wrote that then I'm going to change it to memory footprint: S.
  • Downloading B bytes of an A-byte immutable file: cpu: ~A. What? The CPU usage for downloading B bytes of an A-byte immutable file is ~A? I really hope it is actually ~B (plus an amount of CPU logarithmic in A for Merkle Tree verification, but I'm not sure we should try to include that much precision in this document). Unless someone tells me I'm wrong now and was right then, I'm going to change this to cpu: ~B.

Attachments (1)

update-performance.rst.darcs.patch (12.8 KB) - added by zooko at 2011-05-08T12:44:11Z.

Download all attachments as: .zip

Change History (29)

comment:1 Changed at 2011-05-08T12:46:07Z by zooko

  • Keywords review-needed added

Please review attachment:update-performance.rst.darcs.patch . It fixes all of the issues that I raised in the original ticket description except for the ones that I'm not sure how precise Brian wants it to get or what exact numbers would be correct for the finer precision. It also fixed another issue: the costs of repair have been updated to show both lower bounds (just as cheap as a download) and upper (just as expensive as a full initial upload).

comment:2 Changed at 2011-05-13T14:27:16Z by davidsarah

  • Keywords easy removed

comment:3 Changed at 2011-07-28T23:24:30Z by davidsarah

Should this go into 1.9?

comment:4 Changed at 2011-07-28T23:52:18Z by zooko

+1 from me for this to go into 1.9.

comment:5 Changed at 2011-07-29T02:54:20Z by davidsarah

  • Milestone changed from undecided to 1.9.0
  • Owner changed from somebody to warner

It looks OK to me, but Brian is more familiar with the performance characteristics.

comment:6 Changed at 2011-08-26T15:46:15Z by warner

I think I *was* thinking of encryption/upload in that original description, not verify: good catch.

  • publishing immutable when the file is already uploaded: our Uploader doesn't short-circuit the upload when it discovers that enough shares are already in place. The upload sequence runs (including encryption and erasure-coding of each share), but the results are discarded. (this is arguably an inefficiency, but I'm not convinced the additional code paths would be worth it). So the memory footprint is the same as for not-already-uploaded
  • publishing immutable when the file is not uploaded: do you want to include the constant multiple of memory footprint that results from the breadth/overlap of the pipeline? Specifically, we hold both plaintext and ciphertext in memory briefly, and then hold both ciphertext and encoded shares simultaneously. That makes it more like (1+N/K)*S.
  • publishing mutable file: network is the same as for immutable: N/K*A, not just A (SDMF has one *segment*, not one *share*). Also, this should probably refer specifically to SDMF, since MDMF has different characteristics.

Other than that, it looks accurate. Some entries are missing memory footprints, and most memory footprints could take the constant-multiple (1+N/K)*S overlap into account if you want to get that detailed.

comment:7 Changed at 2011-08-26T16:57:57Z by zooko

  • Owner changed from warner to zooko
  • Status changed from new to assigned

comment:8 follow-up: Changed at 2011-08-30T18:39:54Z by zooko

  • Status changed from assigned to new

Note to self:

  • figure out what to do about Brian's comment in the original description
  • consistentify "Checking", "Verifying", "Repairing"
  • add references for people who wonder "what the heck is checking/verifying/repairing?"
  • separate out mutable and immutable for check/verify/repair
  • separate out (S)FTP access to mutables
  • add note about fix to memory footprint in 1.9.0
  • s/memory footprint/memory high water mark/ ?

comment:9 Changed at 2011-08-30T18:42:47Z by zooko

  • FAQify sickness's question "How big are SDMF's and how big are MDMF's and will there be LDMF's?"
  • reference FAQ from performance.rst?
  • reference FAQ and/or performance.rst from mutable.rst

comment:10 in reply to: ↑ 8 Changed at 2011-08-30T23:20:57Z by davidsarah

Replying to zooko:

  • separate out (S)FTP access to mutables

Note that FTP does not support mutables at all (#680).

comment:11 Changed at 2011-10-13T19:58:53Z by zooko@…

In 3dc491758daad9df:

docs: fix several imprecise or inaccurate values in performance.rst
add cpu values for each operation
sort the list of values into the same order in each operation
refs #1398

comment:12 Changed at 2011-10-13T19:59:43Z by davidsarah

  • Owner changed from zooko to davidsarah
  • Status changed from new to assigned

I will fix the points in comment:6.

comment:13 Changed at 2011-10-20T20:24:29Z by zooko@…

In [5506/ticket999-S3-backend]:

docs: fix several imprecise or inaccurate values in performance.rst
add cpu values for each operation
sort the list of values into the same order in each operation
refs #1398

comment:14 Changed at 2011-10-28T04:44:47Z by warner

  • Milestone changed from 1.9.0 to 1.10.0

the remainder of this won't happen in time for 1.9, bumping

comment:15 Changed at 2011-12-05T23:36:33Z by zooko

  • Cc zooko added

comment:16 Changed at 2012-05-13T02:12:53Z by warner

  • Keywords review-needed removed

looks like this is currently owned by davidsarah, and the next step is to edit the patch (to incorporate both my comments and zooko's additional notes). Removing the review-needed flag.

comment:17 Changed at 2012-05-13T02:16:19Z by zooko

davidsarah: please assign it to me once you've updated the doc or decided that you won't prioritize doing it. Once it is assigned to me, I'll go through all the notes I wrote to myself saying "note to self: do $X; do $Y".

comment:18 Changed at 2012-05-13T04:32:01Z by davidsarah

  • Owner changed from davidsarah to zooko
  • Status changed from assigned to new

I'm not prioritizing this.

comment:19 Changed at 2012-05-13T15:08:24Z by zooko

  • Status changed from new to assigned

comment:20 Changed at 2012-05-13T18:59:04Z by zooko

  • Owner changed from zooko to warner
  • Status changed from assigned to new

Brian:

I don't entirely understand the more precise estimate of memory footprint during repair that you've suggested (original ticket contents), and I'm not sure it is worth trying to document that level of precision if the absolute difference is going to be only a hundred kilobytes or so.

Would you please either post a patch (or just new text suitable for pasting into docs/performance.rst with your suggested improvement, or tell me that the current docs of that are good enough?

comment:21 Changed at 2012-05-13T21:08:06Z by Brian Warner <warner@…>

In e850b54772f4303d:

performance.rst: small updates, mention (lack of) MDMF

refs #1398

comment:22 Changed at 2012-05-13T21:08:32Z by Brian Warner <warner@…>

In e850b54772f4303d:

performance.rst: small updates, mention (lack of) MDMF

refs #1398

comment:23 Changed at 2012-05-13T21:10:16Z by warner

  • Owner changed from warner to zooko

I added a line about constant factors, that should cover what I mentioned without getting into unnecessary detail.

comment:24 Changed at 2012-05-16T22:05:17Z by zooko

  • Milestone changed from 1.11.0 to 1.10.0
  • Status changed from new to assigned

comment:25 Changed at 2012-05-16T22:05:33Z by zooko

  • Milestone changed from 1.10.0 to 1.9.2

comment:26 Changed at 2012-06-12T16:57:48Z by davidsarah

  • Owner changed from zooko to warner
  • Status changed from assigned to new

comment:27 Changed at 2012-06-12T22:46:48Z by warner

  • Resolution set to fixed
  • Status changed from new to closed

No complaints from zooko, so I'm considering this one closed.

comment:28 Changed at 2012-06-13T18:02:14Z by zooko

Looks good to me -- thanks!

Note: See TracTickets for help on using tickets.