Opened at 2008-04-05T01:01:17Z
Last modified at 2010-01-26T16:56:20Z
#377 new enhancement
conditionalize plaintext-hasher in upload
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | eventually |
Component: | code-encoding | Version: | 1.0.0 |
Keywords: | performance easy | Cc: | |
Launchpad Bug: |
Description
I made a change the other week to stop sending plaintext hashes (both the flat hash and the merkle tree) by default, to avoid the same partial-plaintext-guessing attack that prompted us to use a per-client convergence secret. However it looks like the EncryptAnUploadable class is still generating the hashes, even if they aren't sent anywhere. Watching the logs on the migration webapi server, it appears that we're spending a significant amount of time in the loop that does AES encryption and plaintext hashing. Turning off those two hashers might speed up the process.
Change History (6)
comment:1 Changed at 2008-04-24T23:55:42Z by warner
comment:2 Changed at 2009-11-23T03:04:06Z by davidsarah
- Keywords easy added
comment:3 Changed at 2009-12-04T04:52:53Z by davidsarah
- Component changed from code-performance to code-encoding
- Keywords encoding removed
- Milestone changed from eventually to 1.6.0
comment:4 Changed at 2009-12-30T01:34:04Z by davidsarah
So the place where the hashes get updated is source:src/allmydata/immutable/upload.py lines 562 and 563, right? Would commenting both those lines (and adding an explanatory comment) fix this ticket?
comment:5 Changed at 2009-12-30T04:24:28Z by warner
In addition to those, you'd want to either comment out or delete the following:
- call to self._plaintext_hasher.update
- call to self._update_segment_hash
- definitions of _update_segment_hash, get_plaintext_hashtree_leaves, get_plaintext_hash
- initializations (in __init__) of _plaintext_hasher, _plaintext_Segment_hasher, and _plaintext_segment_hashes.
It's a lot of code to turn off, which either makes me feel sad for losing such a hard-to-implement and useful-for-integrity feature, or excited about how much smaller upload.py will be without them :-).
Commenting out just the calls in 562/563 will cause those functions to become dead code, so they'll drop out of the unit test coverage reports. Making it conditional would let us write a unit test that exercises them and keeps that code alive (and functional) until we're ready to re-enable these hashes for real.
We really want to bring this code back some day, but it will probably need to be in a new immutable share format that has a place for an encrypted version of these hashes. (the plaintext hash data should only be visible to readcap holders, not the server). The current share format can be extended with small values in the UEB, so we could conceivably put encrypted-plaintext-hash and encrypted-plaintext-hashtree-root values in there, but the full encrypted-plaintext-hashtree cannot fit in there (to do so would hurt alacrity, because the whole UEB must be fetched before anything else).
Hm, on the other hand, we could repurpose the existing space for the (old, unencrypted) plaintext-hashtree data. I believe that old downloaders (who would normally fetch and check plaintext hashes) will not fetch anything from this space unless they see a "plaintext_root_hash" key in the UEB data. So we could store the encrypted plaintext-hashtree in that space, and put a new "encrypted_plaintext_root_hash" key in the UEB as a signal to new downloaders that they should look for the hashtree there.
Hm, I like that idea. Ok, so let's not do anything drastic right now. A small performance hit is worth keeping this code alive until we can implement the encrypted plaintext hashes.
comment:6 Changed at 2010-01-26T16:56:20Z by zooko
- Milestone changed from 1.6.0 to eventually
also, making it possible to re-enable plaintext hashes in upload would help regain test coverage on those bits of the code.