Context Navigation

← Previous Ticket
Next Ticket →

Opened at 2008-04-24T23:14:31Z

Closed at 2012-05-13T02:02:17Z

#393 closed enhancement (fixed)

mutable: implement MDMF

Reported by:	warner	Owned by:	zooko
Priority:	major	Milestone:	1.10.1
Component:	code-mutable	Version:	1.0.0
Keywords:	newcaps performance random-access privacy gsoc mdmf mutable backward-compatibility forward-compatibility review-needed	Cc:
Launchpad Bug:

Description

Our so-called "Medium-sized Distributed Mutable Files" scheme should be implemented at some point. This will use the same protocol and format as the existing SDMF files, but with the following new features:

efficient random-access reads of arbitrary spans
fairly efficient random-access writes to arbitrary spans
allow more than one segment's worth of data

When we implement this, revisit the notion of 'container size' in the mutable file API.. it's a bit fuzzy right now, and will become more important with the random-access writes. It might be useful to allow the user-facing API to seek relative to the end of the file, in which case a storage API that references the container size might be valuable.

Attachments (64)

393progress.dpatch (110.8 KB) - added by kevan at 2010-05-31T23:43:32Z.: progress on MDMF
393status2.dpatch (194.5 KB) - added by kevan at 2010-06-11T19:40:27Z.
393status3.dpatch (243.6 KB) - added by kevan at 2010-06-14T22:58:49Z.
393status4.dpatch (244.0 KB) - added by kevan at 2010-06-15T01:15:04Z.
393status5.dpatch (338.6 KB) - added by kevan at 2010-06-21T23:43:51Z.
393status6.dpatch (353.7 KB) - added by kevan at 2010-06-23T00:36:22Z.
393status7.dpatch (367.3 KB) - added by kevan at 2010-06-23T23:57:34Z.
393status8.dpatch (259.8 KB) - added by kevan at 2010-06-25T00:15:18Z.
393status9.dpatch (275.0 KB) - added by kevan at 2010-06-26T00:52:38Z.
393status10.dpatch (263.3 KB) - added by kevan at 2010-06-27T00:07:27Z.
393status11.dpatch (291.8 KB) - added by kevan at 2010-06-28T23:17:05Z.
393status13.dpatch (350.3 KB) - added by kevan_ at 2010-07-01T23:47:33Z.
393status14.dpatch (359.7 KB) - added by kevan at 2010-07-02T23:38:26Z.
393status15.dpatch (374.5 KB) - added by kevan at 2010-07-06T23:03:20Z.
393status16.dpatch (430.9 KB) - added by kevan at 2010-07-08T00:32:39Z.
393status17.dpatch (434.2 KB) - added by kevan at 2010-07-08T21:23:27Z.
393status18.dpatch (451.9 KB) - added by kevan at 2010-07-09T23:47:54Z.
393status19.dpatch (482.8 KB) - added by kevan at 2010-07-14T00:11:30Z.
393status20.dpatch (559.3 KB) - added by kevan at 2010-07-17T02:32:06Z.
iwritable.diff.txt (1.6 KB) - added by kevan at 2010-07-19T16:40:33Z.
393status21.dpatch (618.8 KB) - added by kevan at 2010-07-27T23:22:22Z.
393status22.dpatch (626.7 KB) - added by kevan at 2010-07-28T23:44:09Z.
393status23.dpatch (650.4 KB) - added by kevan at 2010-07-31T00:10:47Z.
393status24.dpatch (663.8 KB) - added by kevan at 2010-08-02T23:19:27Z.
393status25.dpatch (671.3 KB) - added by kevan at 2010-08-03T23:45:11Z.
393status26.dpatch (677.6 KB) - added by kevan at 2010-08-05T00:19:11Z.
393status27.dpatch (679.7 KB) - added by kevan at 2010-08-06T00:23:43Z.
393status28.dpatch (685.9 KB) - added by kevan at 2010-08-07T00:14:55Z.
393status29.dpatch (511.3 KB) - added by kevan at 2010-08-10T00:52:37Z.
393status30.dpatch (515.9 KB) - added by kevan at 2010-08-11T00:55:46Z.
393status31.dpatch (518.3 KB) - added by kevan at 2010-08-12T00:15:35Z.
393status32.dpatch (536.5 KB) - added by kevan at 2010-08-12T23:56:57Z.
393status33.dpatch (544.6 KB) - added by kevan at 2010-08-14T00:22:22Z.
393status34.dpatch (555.4 KB) - added by kevan at 2010-08-14T23:31:53Z.
393status35.dpatch (560.1 KB) - added by kevan at 2010-08-19T01:11:00Z.
393status36.dpatch (560.3 KB) - added by warner at 2011-02-22T00:13:24Z.: updated to current trunk, conflicts and test failures fixed
393status37.dpatch (569.7 KB) - added by kevan at 2011-02-26T07:21:35Z.
393status38.dpatch (610.1 KB) - added by kevan at 2011-02-28T01:44:10Z.
fencepost-test.dpatch (12.8 KB) - added by warner at 2011-02-28T02:15:13Z.: exercise fencepost-update bug
393status39.dpatch (618.6 KB) - added by kevan at 2011-03-01T03:25:39Z.: fix fencepost error, improve tahoe put option parsing
393status40.dpatch (626.6 KB) - added by kevan at 2011-03-07T08:43:10Z.: correct poor choice of internal offset representation, add tests, clarify some calculations
more-tests.diff.patch (19.5 KB) - added by zooko at 2011-04-01T23:14:27Z.
393status41.dpatch (700.2 KB) - added by kevan at 2011-05-03T03:17:44Z.: begin work on layout changes, add MDMF caps, tweak nodemaker and filenode to use MDMF caps
393status42.dpatch (720.9 KB) - added by kevan at 2011-05-12T03:24:47Z.: add tests for MDMF caps
393status43.dpatch (743.3 KB) - added by kevan at 2011-05-16T01:16:42Z.: add more tests, fix failing tests, fix broken pausing in mutable downloader
393status44.dpatch (847.9 KB) - added by kevan at 2011-05-31T01:52:19Z.: add more tests; append hints to caps when we have hints to give; initial work on MDMF directories
393status45.dpatch (893.8 KB) - added by kevan at 2011-06-18T02:59:30Z.: teach webapi, WUI, and CLI how to deal with MDMF caps
393status46.dpatch (958.5 KB) - added by kevan at 2011-07-31T22:51:09Z.: add tighter bounds in MDMF shares, resolve bugs
393status47.dpatch (785.8 KB) - added by kevan at 2011-08-02T02:58:13Z.: rework #393 patches to be more comprehensible
fix-test-failures-393.darcs.patch (787.7 KB) - added by davidsarah at 2011-08-02T04:31:40Z.: Fix for test failures in test_immutable.py caused by 393status47.dpatch
393status48.dpatch (791.9 KB) - added by kevan at 2011-08-07T01:18:24Z.: fix broken test, fix pyflakes warnings
fix-393-boundary-issues.darcspatch (48.7 KB) - added by kevan at 2011-08-13T21:11:42Z.: address review comments, test for and fix boundary conditions (meant for application to ticket393-MDMF-2 branch)
131073.html (4.3 KB) - added by davidsarah at 2011-08-17T00:25:21Z.: Check results for an MDMF file on the testgrid that is causing UncoordinatedWriteErrors?
partialoutputfromflogtooltailduringuncoordinatedwriteerror.txt (779.3 KB) - added by zancas at 2011-08-17T04:54:30Z.: The name pretty much says it all. This attachment is associated with the elusive UCWE…
zero-length-tests.darcs.patch (50.1 KB) - added by davidsarah at 2011-08-22T02:51:18Z.: Additional tests for zero-length partial reads and updates to mutable versions. Also refactors some tests to reduce duplicated code.
393-davidsarah.darcs.patch (65.1 KB) - added by davidsarah at 2011-08-23T02:49:11Z.: Additional tests for MDMF. Also, make the immutable/read-only constraint checking for MDMF URIs identical to that for SSK URIs. refs #393
misc-docs-davidsarah.darcs.patch (94.0 KB) - added by davidsarah at 2011-08-23T02:50:48Z.: Miscellaneous documentation updates (including some for MDMF)
393-fix-interfaces.darcs.patch (65.3 KB) - added by davidsarah at 2011-08-25T02:49:08Z.: Fix interfaces related to MDMF. refs #393 Also some grammar corrections, and removal of 'self' from interface method declarations in interfaces.py
393-zero-length-reads-todo.darcs.patch (47.1 KB) - added by davidsarah at 2011-08-25T03:18:33Z.: test_mutable.py: mark zero-length read tests as TODO. refs #393
code-cleanup-and-datalength-fixes.dpatch (10.6 KB) - added by kevan at 2011-08-26T22:19:03Z.: fixes described in comment:139 and comment:152
verifier-uri-rename.darcs.patch (49.6 KB) - added by davidsarah at 2011-08-28T03:03:03Z.: Replace *URIVerifier -> *VerifierURI, which is more logical and consistent with existing class names. refs #393
choose_segs.py (2.8 KB) - added by zooko at 2011-09-03T06:49:15Z.
choose_segs2.py (3.1 KB) - added by zooko at 2011-09-03T07:03:34Z.
choose_segs3.py (3.5 KB) - added by zooko at 2011-09-03T07:07:17Z.

Change History (239)

comment:1 Changed at 2008-04-24T23:45:41Z by warner

Component changed from code-encoding to code-mutable

comment:2 follow-up: ↓ 4 Changed at 2008-09-08T02:46:01Z by warner

I think MDMF is going to require a separate AES key per segment, since it isn't safe to reuse a whole-file AES key when replacing individual segments. That means we'll need to store a table of salts, one per segment, and compute the per-segment AES key by hashing the whole-file key (derived from the writecap) with the salt. We'll need to replace the salt with a new random value each time we overwrite a segment, and we'll need to overwrite a whole segment at a time.

comment:3 Changed at 2009-10-28T04:17:41Z by davidsarah

Keywords newcaps added

Tagging issues relevant to new cap protocol design.

comment:4 in reply to: ↑ 2 Changed at 2009-12-20T17:33:11Z by davidsarah

Keywords performance random-access added

Replying to warner:

I think MDMF is going to require a separate AES key per segment, since it isn't safe to reuse a whole-file AES key when replacing individual segments. That means we'll need to store a table of salts, one per segment, and compute the per-segment AES key by hashing the whole-file key (derived from the writecap) with the salt.

Either that or use a per-segment IV for CTR mode.

(Don't you mean 'derived from the readcap'?)

We'll need to replace the salt with a new random value each time we overwrite a segment, and we'll need to overwrite a whole segment at a time.

Yes, you'd still need to do that with the per-segment IV.

comment:5 Changed at 2010-01-05T06:41:44Z by davidsarah

Keywords privacy added

Random-access writes will leak information about which parts of the file have been updated, to any attacker who sees the ciphertext before and after the update. We already have this issue for random-access reads (#879), but it seems more serious to me in the case of writes.

comment:6 Changed at 2010-02-23T03:07:54Z by zooko

Milestone changed from eventually to 2.0.0

comment:7 Changed at 2010-04-12T19:14:31Z by davidsarah

Keywords gsoc added
Milestone changed from 2.0.0 to 1.8.0
Owner set to kevan
Type changed from task to enhancement

comment:8 Changed at 2010-04-26T22:30:30Z by kevan

Status changed from new to assigned

I posted a copy of my proposal on tahoe-dev. You can read it here.

There are a few design questions outstanding in there, and I'd like to get them dealt with ASAP, so I can hit the ground running and have a better chance of getting MDMF trunk-ready by August.

Since the existing code doesn't quite (AFAICT) know how to process multi-segment mutable files, MDMF introduces forward compatibility issues in that older versions of Tahoe-LAFS won't know how to read MDMF mutable files uploaded by newer versions of Tahoe-LAFS. When I was writing my proposal, I could think of two solutions, which I write about more thoroughly in the proposal:

Bump the mutable file protocol version number, and rely on the existing mutable file code to not process mutable files that it doesn't know how to process.
Make a new cap format for mutable files, and rely on the caps-from-the-future functionality to sensibly limit what can be done with MDMF files from old versions of Tahoe-LAFS.

I favor the first solution.

Caps, as they are now, encapsulate a combination of the authority given by the capability to its holder and some information about the resource designated by the cap (e.g., is it a file? is it a directory? can it be modified?). The difference between MDMF and SDMF is (unless I'm missing something) basically an implementation detail, and not really any of those things. A protocol version number seems like a much better way of signaling a change in implementation than a different cap format.

(this assumes that the existing mutable file caps will be sufficient to ensure us of the authenticity of mutable files using MDMF; I think that they are, but I may have missed something)

comment:9 Changed at 2010-05-21T16:03:52Z by davidsarah

Keywords mdmf mutable added

comment:10 Changed at 2010-05-24T22:06:54Z by kevan

The first hurdle here is to decide what MDMF files will look like at rest on storage servers; we need to do this before we can start writing or reading MDMF files.

Storage servers provide a content-independent container for storing mutable share files; this means that the MutableShareFile? in mutable.py should not need to be changed. Similarly, the idea of test vectors, write vectors, and read vectors to detect and react to uncoordinated writes applies to MDMF just as it did to SDMF. Given this, I think that we can leave storage servers alone.

The code that describes the SDMF layout is in mutable/layout.py . As implemented, they are preceded by this header (byte-indexed):

offset:     size:       name:
0           1           version number (currently 00)
1           8           sequence number 
9           32          share tree root hash
41          16          The AES salt
57          1           The "k" encoding parameter
58          1           The "N" encoding parameter
59          8           The segment size of the uploaded file
67          8           The data length of the uploaded file
75          4           The offset of the signature
79          4           The offset of the signature hash chain
83          4           The offset of the block hash tree
87          4           The offset of the share data
91          8           The offset of the encrypted private key
99          8           EOF

which is followed by a verification key, a signature of everything before the offsets in the header, the share hash chain, the block hash tree, the share data, and the encrypted private key.

Unless there's something I'm missing, we can ditch the per-file salt in favor of per-segment salts. Like the per-file salt, the per segment salt will be 128 bits, unless there's a good argument for something different. I think it makes sense to designate a separate part of the share for the per-segment salts -- we could also store them interleaved into the shares, but that makes access patterns like:

   share_0 = data[o['sharedata']:o['sharedata'] + blocksize]

(where o is a dictionary of offsets and blocksize is the size of blocks generated by the encoder, given the encoding parameters for the upload)

harder and less clear to read, since you would also need to take into account the size of the salt in addressing. The disadvantage of this approach is that we need to add a new offset to the header, but I don't see anything terribly wrong with that.

Then the MDMF on-server file format would have a header like this (byte-indexed again):

offset:     size:       name:
0           1           version number (currently 00; we'll be changing to 01)
1           8           sequence number 
9           32          share tree root hash
41          1           The "k" encoding parameter
42          1           The "N" encoding parameter
43          8           The segment size of the uploaded file
51          8           The data length of the uploaded file
59          4           The offset of the signature
63          4           The offset of the signature hash chain
67          4           The offset of the block hash tree
71          4           The offset of the per-segment AES salts
75          4           The offset of the share data
79          8           The offset of the encrypted private key
87          8           EOF

This would be followed by the verification key, a signature of everything before the offsets in the header, the share hash chain, the block hash tree, the per-segment salts, the share data, and the encrypted private key.

The per-segment salts would be written out with no separators, like the shares are, and you'd access them like:

    salt_0 = data[o['salt']:o['salt'] + saltsize]

(again where o is a dictionary of offsets)

Like the current per-file salt, they could be randomly generated, and updated on every write.

Thoughts? Am I missing anything here?

comment:11 Changed at 2010-05-25T17:17:53Z by kevan

If we do the above, the salts aren't covered by the signature (either directly or otherwise), and there isn't even an unsigned way to detect whether the salts are corrupted.

A really simple way to allow readers to detect corruption would be to hash the concatenation of all of the salts, then stick this in the signed header. We could update a hasher as we processed segments without consuming more than constant memory. This means that to validate the authenticity of one salt, we'd need to download all of them, but salts will be tiny when compared to the data that they key, and we only need to download and validate them once for the whole file, so maybe that's not as big of a deal.

Then the new design would look something like:

offset:     size:       name:
-- signed part --
0           1           version number (currently 00; we'll be changing to 01)
1           8           sequence number 
9           32          share tree root hash
41          32         concatenated salts hash
-- end signed part --
73          1           The "k" encoding parameter
74          1           The "N" encoding parameter
75          8           The segment size of the uploaded file
83          8           The data length of the uploaded file
91          4           The offset of the signature
95          4           The offset of the signature hash chain
99          4           The offset of the block hash tree
103         4           The offset of the per-segment AES salts
107         4           The offset of the share data
115         8           The offset of the encrypted private key
123         8           EOF

comment:12 Changed at 2010-05-25T17:47:10Z by kevan

"signed part" in the diagram above should extend to incorporate everything up to and including the data length, since that's how it is implemented. Also, after the header comes the verification key, signature, share hash chain (not signature hash chain; that's me not proofreading well enough), block hash tree, AES salts, share data, and private key, as before.

Changed at 2010-05-31T23:43:32Z by kevan

Attachment 393progress.dpatch added

progress on MDMF

comment:13 Changed at 2010-05-31T23:45:49Z by kevan

I've attached my progress so far, which includes proxies to write and read the MDMF data format, and tests for those proxies. If you're interested in seeing what the format evolved into, check out line 1375 of the patch.

Next up: reworking the mutable uploader and downloader to use these proxies.

comment:14 Changed at 2010-06-11T19:40:04Z by kevan

attachment:393status2.dpatch contains a snapshot of my work so far.

There is an incomplete segmented uploader. Actually, it handles both SDMF and MDMF files, based on a version number that I hacked into the mutable filenode (for testing, until we get better semantics on what will and won't be MDMF nailed down)
The servermap has been somewhat extensively reworked to tolerate both SDMF and MDMF files. It doesn't quite work 100% yet, but it does okay.
There are tests for these things, though they are probably incomplete.
I've reworked the inner workings of my low-level format wrappers, but the format in the previous patch is basically the same as it was.

Changed at 2010-06-11T19:40:27Z by kevan

Attachment 393status2.dpatch added

comment:15 Changed at 2010-06-14T22:41:10Z by kevan

attachment:393status3.dpatch contains more work on the servermap. The servermap now passes all of the existing tests (and all of my new tests) with the exception of four tests that test how Tahoe-LAFS reacts to hung servers. Once I have that worked out, I'll start work on the downloader.

Changed at 2010-06-14T22:58:49Z by kevan

Attachment 393status3.dpatch added

comment:16 Changed at 2010-06-15T01:14:14Z by kevan

attachment:393status4.dpatch contains a revised servermap that passes the hung server tests. After checking out test coverage reports, I'm going to start working on the downloader next.

Changed at 2010-06-15T01:15:04Z by kevan

Attachment 393status4.dpatch added

comment:17 Changed at 2010-06-21T23:43:16Z by kevan

attachment:393status5.dpatch contains a mostly-complete segmented downloader. It has only a few major regressions, and lacks tests. The new downloader is much prettier and more complete than the new uploader, which is much more of a hack and which I will probably rewrite after I'm done with the downloader.

Changed at 2010-06-21T23:43:51Z by kevan

Attachment 393status5.dpatch added

comment:18 Changed at 2010-06-23T00:34:59Z by kevan

attachment:393status6.dpatch contains an improved segmented downloader. It passes all but 3 regression tests, and is essentially complete aside from that. It still needs tests, which I will write tomorrow.

Changed at 2010-06-23T00:36:22Z by kevan

Attachment 393status6.dpatch added

comment:19 follow-up: ↓ 21 Changed at 2010-06-23T21:56:50Z by warner

as we just discussed briefly on IRC, I'd like to avoid O(N) transfers, so I'd rather go with a merkle structure of salts rather than a flat hash. I think the best approach would be to put the salt in front of each encrypted block, and compute the merkle tree over the (salt+ciphertext) pairs. The one block-tree remains the same size, but is computed over slightly different data. Downloading a single block requires downloading the (salt+ciphertext) for that one block, plus the merkle hash chain to validate it, plus the (constant-length) top-level UEB-like structure (which includes the root hash of the merkle chain, and the version number), and the signature of that structure.

Using version=2 in the share to indicate MDMF sounds fine. readcaps/writecaps will remain the same as for SDMF: clients will need to fetch a share before they can discover whether the file is SDMF or MDMF.

One thing to keep in mind, which I learned while writing the #798 new immutable downloader, is to try and structure the fields so that downloaders can fetch everything in a single request. This means avoiding as many round-trip-inducing decisions as possible. Having an offset table makes the share nicely flexible, but it also means that you have to fetch it (and burn a roundtrip) before you can precisely fetch anything else. The new-downloader makes a best-guess about the offset table, and optimistically fetches both the table and the data it points to at the same time, with fallbacks in case it guessed wrong.

One down side of moving anything in the share format, or of increasing the size of the data blocks, is that these sorts of guesses are more likely to be wrong, or that there will be less overlap between the wrong guess and the real location of the data. The #798 downloader unconditionally fetches the first few kilobytes of the file, without being too precise about what it grabs, because that can save a roundtrip in some cases of bad guesses.

I'm not suggesting that we do a lot of changes to the mutable downloader, but it might be worth walking through what a #798-style mutable downloader would look like, and see how many roundtrips are required, and if there's something simple we can do to the share format now, it might give us room to improve performance in the future.

comment:20 Changed at 2010-06-23T23:57:10Z by kevan

attachment:393status7.dpatch contains minor modifications to the downloader that cause the last two seemingly related tests to pass. For whatever reason, tests that test the bucket counter and lease crawler sporadically fail -- I didn't touch that part of the code, and I'm not sure why the parts of the code that I did touch would cause those failures, but I guess that's software. I'll look at that after I get some tests written. Also, the mutable slot proxy now allows callers to tell it how to batch requests. This is helpful in the case where we want to fetch a block, salt, and the parts of the block hash and share hash trees necessary to validate those things from the remote storage server, since it allows for us to do that in one remote read operation with four read vectors. There is also a test for this behavior.

Changed at 2010-06-23T23:57:34Z by kevan

Attachment 393status7.dpatch added

comment:21 in reply to: ↑ 19 Changed at 2010-06-24T00:02:54Z by kevan

Replying to warner:

as we just discussed briefly on IRC, I'd like to avoid O(N) transfers, so I'd rather go with a merkle structure of salts rather than a flat hash. I think the best approach would be to put the salt in front of each encrypted block, and compute the merkle tree over the (salt+ciphertext) pairs. The one block-tree remains the same size, but is computed over slightly different data. Downloading a single block requires downloading the (salt+ciphertext) for that one block, plus the merkle hash chain to validate it, plus the (constant-length) top-level UEB-like structure (which includes the root hash of the merkle chain, and the version number), and the signature of that structure.

I like that; it's much nicer than having a separate salt block and a merkle tree over just the salt block.

Using version=2 in the share to indicate MDMF sounds fine. readcaps/writecaps will remain the same as for SDMF: clients will need to fetch a share before they can discover whether the file is SDMF or MDMF.

I'm using version=1, since SDMF is version 0, IIRC.

One thing to keep in mind, which I learned while writing the #798 new immutable downloader, is to try and structure the fields so that downloaders can fetch everything in a single request. This means avoiding as many round-trip-inducing decisions as possible. Having an offset table makes the share nicely flexible, but it also means that you have to fetch it (and burn a roundtrip) before you can precisely fetch anything else. The new-downloader makes a best-guess about the offset table, and optimistically fetches both the table and the data it points to at the same time, with fallbacks in case it guessed wrong.

One down side of moving anything in the share format, or of increasing the size of the data blocks, is that these sorts of guesses are more likely to be wrong, or that there will be less overlap between the wrong guess and the real location of the data. The #798 downloader unconditionally fetches the first few kilobytes of the file, without being too precise about what it grabs, because that can save a roundtrip in some cases of bad guesses.

I'm not suggesting that we do a lot of changes to the mutable downloader, but it might be worth walking through what a #798-style mutable downloader would look like, and see how many roundtrips are required, and if there's something simple we can do to the share format now, it might give us room to improve performance in the future.

Looking at the #798 downloader and figuring out how the #393 downloader could borrow some of its ideas would definitely be time well spent.

comment:22 Changed at 2010-06-25T00:14:40Z by kevan

attachment:393status8.dpatch contains improvements to code coverage, and tests for MDMF files. Aside from the need to extend tests for various sorts of corruption to specifically test MDMF files, the downloader is done.

Note that I unrecorded and re-recorded my changes into a more coherent set of patches. If you've applied any of my earlier bundles, you will want to apply this bundle in a new tree to avoid issues caused by that.

On my system, I have noticed intermittent test failures in a few places that shouldn't be failing as a result of my modifications -- notably the bucket and lease crawlers (allmydata.test.test_storage.BucketCrawler? and LeaseCrawler?), and, more recently, the immutable test test code (allmydata.test.test_immutable.Test). These typically fail with DirtyReactorErrors?. I'm not sure why anything I've done would cause these to fail, since, to my knowledge, none of the code I have modified relates to any of these areas, and I can't reliably duplicate the failures. Setting t.i.b.DelayedCall?.debug = True doesn't reveal anything of interest. If you do happen to apply these patches, and do happen to run the tests, and notice that these tests either fail or don't fail for you, please post a note here -- I would appreciate it. There should be no failing tests at this point.

Changed at 2010-06-25T00:15:18Z by kevan

Attachment 393status8.dpatch added

comment:23 Changed at 2010-06-26T00:52:13Z by kevan

attachment:393status9.dpatch adds the tests that I talked about yesterday -- there is now 98% test coverage in mutable/retrieve.py, with the only uncovered statements in the Retrieve class being related to looking for keywords in logging statements. Over the weekend, I'm going to change the salts to be stored alongside the share data, then I'll get to work finishing up the uploader, and looking at the checker/repairer code to see what will be involved in changing that.

Changed at 2010-06-26T00:52:38Z by kevan

Attachment 393status9.dpatch added

comment:24 Changed at 2010-06-27T00:03:44Z by kevan

attachment:393status10.dpatch modifies the salts to be stored alongside the share data and integrity checked as part of the block hash tree, as suggested by Brian. I will be moving on to the uploader, checker, and repairer logic come Monday.

Changed at 2010-06-27T00:07:27Z by kevan

Attachment 393status10.dpatch added

comment:25 Changed at 2010-06-28T23:16:44Z by kevan

attachment:393status11.dpatch modifies the checker/verifier and repairer code to work with MDMF files. This turned out to be very straightforward -- I tweaked the new downloader to know how to perform the verification, which eliminated a bunch of duplicate code from the verifier itself. I also added tests to ensure that checking, verifying, and repairing an MDMF file works as expected. I'll start polishing the uploader tomorrow.

Changed at 2010-06-28T23:17:05Z by kevan

Attachment 393status11.dpatch added

comment:26 Changed at 2010-07-01T23:46:16Z by kevan_

attachment:393status13.dpatch cleans up the uploader. It is now fairly free of special-case logic (that is, SDMF versus MDMF), and is easier to read and shorter as a result. There are still a couple of test failures in checking, though.

(393status12.dpatch got skipped because it was a bunch of very minor changes)

comment:27 Changed at 2010-07-01T23:46:54Z by kevan_

(also, kevan_ == kevan -- Trac has convinced itself that the latter account needs email address verification, but fails to send mail to allow that)

Changed at 2010-07-01T23:47:33Z by kevan_

Attachment 393status13.dpatch added

comment:28 Changed at 2010-07-02T23:37:42Z by kevan

attachment:393status14.dpatch revises the cleaned-up uploader to pass all of the tests, and makes some minor code coverage improvements. It needs more tests, but I'm going to wait until I get started on IVersion before I write them; otherwise, I'll have to rewrite them in a week anyway.

Changed at 2010-07-02T23:38:26Z by kevan

Attachment 393status14.dpatch added

comment:29 Changed at 2010-07-06T23:02:45Z by kevan

attachment:393status15.dpatch lays the groundwork for not converting every mutable file uploaded to the web gateway into a string before processing it. This was a major memory pain point with SDMF, and is something that MDMF will address.

Changed at 2010-07-06T23:03:20Z by kevan

Attachment 393status15.dpatch added

comment:30 Changed at 2010-07-08T00:32:13Z by kevan

attachment:393status16.dpatch alters various mutable filenode APIs to have callers use MutableFileHandles? and MutableData? objects instead of strings when uploading things to the grid. This allows the mutable file publishing code to use memory-saving measures that Nevow performs when dealing with a lot of data (writing it to a tempfile, if I understand correctly), and is modeled on the FileHandle? and Data classes that immutable files use for the same purpose. Almost all tests pass right now; the SFTP frontend tests are failing, as I need to go in there and see what (if anything) needs to be done to fix that. After that, I'll address memory concerns that crop up when downloading huge files by cloning the download target pattern used by immutable files -- as a side effect, this will give users streaming downloads of mutable files.

Changed at 2010-07-08T00:32:39Z by kevan

Attachment 393status16.dpatch added

comment:31 follow-up: ↓ 32 Changed at 2010-07-08T21:23:01Z by kevan

attachment:393status17.dpatch alters the SFTP frontend to work with yesterday's API changes. I'm going to start working on the streaming download later today/tomorrow, and then I'll be releasing a beta with the memory consumption pain point fixed. In testing, uploading a large (~700MiB) video file uses about the same amount of memory with the mutable upload code as it does with the immutable upload code, which is a good thing.

Changed at 2010-07-08T21:23:27Z by kevan

Attachment 393status17.dpatch added

comment:32 in reply to: ↑ 31 Changed at 2010-07-09T19:19:20Z by davidsarah

Replying to kevan:

attachment:393status17.dpatch alters the SFTP frontend to work with yesterday's API changes.

-    d2.addCallback(lambda ign: self.consumer.get_current_size())
-    d2.addCallback(lambda size: self.consumer.read(0, size))
-    d2.addCallback(lambda new_contents: self.filenode.overwrite(new_contents))

+    u = MutableFileHandle(self.consumer.get_file())
+    d2.addCallback(lambda ign: self.filenode.overwrite(u))

Because self.consumer.get_file() always returns the same file object that is set in the consumer's __init__, the fact that it is called synchronously (rather than in a callback for d2) does not cause a problem -- but that's dependent on the implementation of OverwriteableFileConsumer and is not immediately obvious. I would suggest changing it to

d2.addCallback(lambda ign:
               self.filenode.overwrite(MutableFileHandle(self.consumer.get_file())))

I'm going to start working on the streaming download later today/tomorrow, and then I'll be releasing a beta with the memory consumption pain point fixed. In testing, uploading a large (~700MiB) video file uses about the same amount of memory with the mutable upload code as it does with the immutable upload code, which is a good thing.

Cool -- and nice that this improvement should also be inherited by the SFTP frontend.

comment:33 Changed at 2010-07-09T19:46:11Z by kevan

Will do -- thanks for that. :-)

comment:34 Changed at 2010-07-09T23:47:29Z by kevan

attachment:393status18.dpatch addresses davidsarahs' comment, and starts laying the groundwork for the download changes, which will be based on the ideas and suggestions in #993 (I will post feedback to that ticket as I get farther along). The changes I made today make immutable files (both literal and CHK) conform to the interface changes; I've also added tests for the modifications. Next will be changing mutable files to also conform to the new interfaces.

Changed at 2010-07-09T23:47:54Z by kevan

Attachment 393status18.dpatch added

comment:35 Changed at 2010-07-10T01:15:22Z by davidsarah

attachment:393status18.dpatch addresses davidsarahs' comment ...

(We like your placement of the apostrophe ;-)

MutableFileHandle's close method closes its underlying file, whereas FileHandle doesn't . Is this difference intentional?

MutableFileHandle's read method says that:

        In most cases, I return length bytes. If I don't, it is because
        length is longer than the distance between my current position
        in the file that I represent and its end. In that case, I return
        as many bytes as I can before going over the EOF.

but it just delegates to the underlying file's read method, which might not uphold this contract if the underlying file is not a regular file. For example, the Python docs say that read for a stdlib file object "is simply a wrapper for the underlying fread() C function", so it will return partial reads for stream-like files. (FileHandle also has this problem, if it is one.)

comment:36 Changed at 2010-07-10T01:24:16Z by davidsarah

attachment:393status18.dpatch#L8640 :

+            self._filehandle.seek(0, os.SEEK_END)

The os.SEEK_* constants were added in Python 2.5, but we still support 2.4.x. Use

+            self._filehandle.seek(0, 2)  # 2 = SEEK_END

comment:37 Changed at 2010-07-10T01:39:26Z by davidsarah

Keywords review-needed added

Review needed for GSoC mid-term evaluations.

comment:38 Changed at 2010-07-13T04:22:29Z by kevan

Brian pointed out on IRC today that writing a mutable file segment-by-segment leads to regressions relative to SDMF behavior. If a writer dies in the middle of a mutable file update (i.e.: before writing all of the blocks), we can have a situation where there are no recoverable versions of the mutable file on the grid. In SDMF, each share is written all at once, so we have the property that each storage server with a share for a particular mutable file has a complete share (barring disk errors, network problems, etc), which makes SDMF more resilient to writer death than MDMF as currently implemented.

Brian suggested that this could be resolved by making MDMF write shares all at once, as SDMF does. This preserves the semantics described above from SDMF, but means that modifying a mutable file would consume memory proportional to the size of the modification.

Most of the user-level tools (at least the ones I'm familiar with) only modify mutable files by completely overwriting them, so many/most user-level mutable file operations will still consume memory proportional to the size of the file, as with SDMF. We can still use (unless I'm missing another regression) the block-by-block downloader that I'm working on now, so MDMF would give users streaming and memory-friendly downloads of large files, just not uploads.

(part of MDMF is writing webapi functions to allow outside code to easily modify portions of mutable files, which would be easier on memory than modifying all of the mutable file, but many command-line tools have no easy way of taking advantage of such functionality)

I'd really like to keep the memory efficient uploads, but I can't see a way to do it without that regression. Any ideas, audience at home?

comment:39 Changed at 2010-07-14T00:10:40Z by kevan

attachment:393status19.dpatch includes work on IVersion (ticket #993). IVersion is about 75% complete. Existing tests pass, though I need to write some additional tests to test more specific aspects that the existing tests miss. I also need to make some improvements to POLA as applied to MutableFileVersion?. Finally, I need to implement the read call. I'm hoping to have all of that done by tomorrow, along with davidsarahs' suggestions.

Changed at 2010-07-14T00:11:30Z by kevan

Attachment 393status19.dpatch added

comment:40 Changed at 2010-07-16T00:56:48Z by kevan

no patch today, but I have a rough and mostly complete take on IVersion implemented and working, and have the webapi and sftpd happily using the uniform download interface. As a pleasant side effect of this, downloads of MDMF mutable files through the webapi are now streaming, and at least fast enough to stream a FLAC file from shares stored on a remote storage server so that VLC can play it. They also use constant memory space -- a benefit which, unlike the constant memory space uploads, will probably stick around. I need to do some cleanup, write some tests, and fix some odds and ends tomorrow, then I'll update this ticket with a fresh patch, and post feedback to #993.

comment:41 follow-up: ↓ 43 Changed at 2010-07-17T02:31:06Z by kevan

attachment:393status20.dpatch contributes a working and reasonably well-tested IVersion implementation, a streaming downloader, and some other things. I still have some cleanup to do and some tests to write, but it seems to work without issue. In this patch, I've altered the mutable publisher to upload files larger than 128 KiB as MDMF. For the moment, this will use constant memory, though that will go away soon as I change the uploader to write MDMF files in a single write. Random-access reads are implemented, but not used by anything yet. The webapi knows how to use the streaming downloader, though.

There are some failing tests in the webapi, because I removed some restrictions in the test code that incorrectly raised a FileTooLargeError when a test attempted to upload a mutable file larger than 3.5 MiB (an SDMF restriction that was eliminated a while ago). There are also some failing tests in test_mutable.py, because I need to tweak the Version class to publish multiple versions of files to the grid.

davidsarah: Could you look at my changes to sftpd.py, specifically at the area where you left a reference to #993 as a TODO, and see if you agree with them? I think I understand self.async and think that I am not appropriating it for something bad, but I may be wrong.

Next up: More cleanup, then partial updating of MDMF files as they sit on the grid.

Changed at 2010-07-17T02:32:06Z by kevan

Attachment 393status20.dpatch added

comment:42 Changed at 2010-07-17T03:44:07Z by davidsarah

Milestone changed from 1.8.0 to 1.8β

comment:43 in reply to: ↑ 41 Changed at 2010-07-17T07:14:55Z by davidsarah

Replying to kevan:

davidsarah: Could you look at my changes to sftpd.py, specifically at the area where you left a reference to #993 as a TODO, and see if you agree with them? I think I understand self.async and think that I am not appropriating it for something bad, but I may be wrong.

They look correct as far as I can see from the darcs patch -- I'll need to apply them later to check more thoroughly. Very minor issue: the logging for _read should probably be at the start of the callback.

comment:44 Changed at 2010-07-19T16:39:59Z by kevan

(people who have been diligent in following this ticket -- if there are any such people -- will be disappointed to know that I will be at a security training seminar for the week of July 19th through July 23rd. I'll resume progress on this after that)

I've been thinking about how we want the partial-file upload and modification interface to look like. The one that I'm leaning toward looks something like this:

class IWritable(Interface):
    """
    I define methods that callers can use to update SDMF and MDMF
    mutable files on a Tahoe-LAFS grid.
    """
    # XXX: For the moment, we have only this. It is possible that we
    #      want to move overwrite() and modify() in here too.
    def update(data=IMutableUploadable, offset):
        """
        I write the data from my data argument to the MDMF file,
        starting at offset. I continue writing data until my data
        argument is exhausted, appending data to the file as necessary.
        """
        # to append data: offset=node.get_size_of_best_version()
        # do we want to support compacting MDMF?
        # for an MDMF file, this can be done with O(data.get_size())
        # memory. For an SDMF file, any modification takes
        # O(node.get_size_of_best_version()).

(iwritable.diff is this, essentially)

There were suggestions from (IIRC) Brian that we should make the interface encapsulate some notion of segments -- like def set_segments({1: "some data", 2:"some other data"}). I like the idea of having something like this, but I worry that an abstraction that makes callers worry about segments is either leaky or too low-level, while treating modifications as data to be appended at an offset in the file is more intuitive and less leaky. Further, it is much easier to map (offset, data) semantics to SDMF in a meaningful and useful way. Thoughts?

We may also want a way to delete segments from the middle of an MDMF file, which that function doesn't yet provide.

Changed at 2010-07-19T16:40:33Z by kevan

Attachment iwritable.diff.txt added

comment:45 Changed at 2010-07-23T05:35:28Z by zooko

Keywords review-needed removed

Unsetting review-needed. This patch is not ready to be reviewed and then applied to trunk. However, it would probably be a good help and encouragement to Kevan if anyone would look at his code, docs, or comments and give him your thoughts. :-)

comment:46 Changed at 2010-07-27T00:01:51Z by kevan

No patch today, but I've been working out how a partial file update will work.

The mutable filenode code, if we use a (data, offset) change representation like the one I suggest above, will be responsible for mapping the new data into segments, and then replacing those segments. For segments which can be replaced in whole, this isn't difficult -- we just upload new segments in place of old segments, and then update the integrity checks to match the new data. But it is likely that the new data will not start or end on precise segment boundaries. In these cases, we will have to fetch two segments of data in order to do an update, so we can pad the existing data appropriately. In addition, we'll need parts of the block hash trees for the shares that we're updating, so that they can be regenerated to reflect the new content. At the moment, I'm thinking about fetching this information (boundary segments + block hash tree information) during the server map update step, then presenting it to the uploader. This yields a nice separation between parts of the code that upload files and parts of the code that download files. I haven't quite worked out all the details yet, but I plan to start working on it tomorrow and see what more ideas come from that.

I'm hoping to have the low level update functionality done by the end of this week, along with the restoration of the single-write uploads that we have with SDMF. After that, I can work out how we tie all of that into the WebAPI so client code can use it.

comment:47 Changed at 2010-07-27T23:21:25Z by kevan

attachment:393status21.dpatch includes some initial work on the modifications to the servermap update that I described yesterday. It also changes MDMF publishing to write each share all at once, as SDMF currently does (and includes a test to verify that this is indeed the case).

Changed at 2010-07-27T23:22:22Z by kevan

Attachment 393status21.dpatch added

comment:48 Changed at 2010-07-28T23:43:29Z by kevan

attachment:393status22.dpatch adds a mostly complete but subject to change implementation of the servermap update modifications I talked about earlier. It also fixes a sporadic test failure that I inadvertently introduced with yesterday's changes. Tomorrow, I'll work on altering the uploader to use the information provided by the servermap to perform in-place file updates.

Changed at 2010-07-28T23:44:09Z by kevan

Attachment 393status22.dpatch added

comment:49 Changed at 2010-07-31T00:07:38Z by kevan

attachment:393status23.dpatch implements most of the rest of the changes needed for partial file update. Some of the knobs still need to be connected, and it doesn't handle the edge case where enough segments are being added to push the segment count over a power-of-two boundary, but it should hopefully work aside from that. To come are thorough tests, and edge case coverage. Then I'll hook it up to the WebAPI somehow. I'm really hoping I can have something more or less complete by next friday, so I can spend the last week of the official GSoC period reviewing my older code and tying up loose ends. We'll see how that goes, though.

Changed at 2010-07-31T00:10:47Z by kevan

Attachment 393status23.dpatch added

comment:50 Changed at 2010-08-02T23:18:51Z by kevan

attachment:393status24.dpatch includes tests for the update behavior (none of which pass at the moment) and some modifications to fix minor bugs that were causing unrelated test failures.

Changed at 2010-08-02T23:19:27Z by kevan

Attachment 393status24.dpatch added

comment:51 Changed at 2010-08-03T23:44:36Z by kevan

attachment:393status25.dpatch fixes many bugs in the update behavior; 3 of the 5 tests I wrote yesterday pass now.

Changed at 2010-08-03T23:45:11Z by kevan

Attachment 393status25.dpatch added

comment:52 Changed at 2010-08-05T00:17:38Z by kevan

attachment:393status26.dpatch fixes more bugs in the update behavior; all of the tests I wrote on Monday now pass, though I'm going to write more tests tomorrow to see if some edge cases that have been bothering me are going to be problematic. After that, it'll be time to figure out a way to integrate update into the rest of Tahoe-LAFS; into the WebAPI, at least.

It also occurs to me that we could get rid of the re-encoding update case (where we need to re-encode and upload the file because the block hash tree grows, shifting everything else forward into where the block and salt data was before the update) by juggling the layout of MDMF files. If we put the block and salt data after the offsets and signature but before the integrity data, then we no longer have to re-upload the file when the block hash tree grows larger than it was originally, since there wouldn't be any share data to be upset beyond the block hash tree. The disadvantage of this approach is that reading the first 1000 or so bytes of an MDMF file will have a smaller chance of fetching the encrypted private key and verification key, which would make the servermap update step use more roundtrips, but this could be addressed by putting those (and whatever else the servermap update step is likely to want) before the blocks and salts. We'd probably be just fine if we stuck only the block and share hash trees after the block and salt data, since that gets re-written on an update anyway. Then we'd have O(change_size) updates in general without any real regression with what is there now.

If testing goes well tomorrow, I'll start working on that.

Changed at 2010-08-05T00:19:11Z by kevan

Attachment 393status26.dpatch added

comment:53 Changed at 2010-08-06T00:23:10Z by kevan

So it turns out that the files are already implemented in a way that allows power-of-two updates like I described above. attachment:393status27.dpatch fixes the update behavior to take advantage of this, includes some new tests, and fixes some bugs revealed by those new tests. The mutable filenode code can now do O(update size) updates to MDMF mutable files in general.

Changed at 2010-08-06T00:23:43Z by kevan

Attachment 393status27.dpatch added

comment:54 Changed at 2010-08-07T00:14:07Z by kevan

attachment:393status38.dpatch hooks the webapi up to the update behavior by adding an optional offset parameter to PUT requests to a filecap. If the filecap is mutable, it will append the content of the HTTP request to the file at the given offset. I'm not sure if this is the best way to go about this, and I'm not sure if it makes sense to add a corresponding option to the corresponding POST handler (which I think is used by the WUI, which might not have much use for the ability to do this). It also adds tests for these behaviors.

I'm going to spend next week tidying up the code and polishing up some odds and ends. Before I start working on that, I'll re-record my patchset into something more coherent for review purposes. Once I've done that, I'll set the review-needed keyword, so people can hopefully review my changes. I also need to sync these patches with the trunk so that they apply cleanly.

Changed at 2010-08-07T00:14:55Z by kevan

Attachment 393status28.dpatch added

comment:55 Changed at 2010-08-09T22:13:51Z by zooko

Milestone changed from 1.8β to 1.9.0

comment:56 follow-up: ↓ 58 Changed at 2010-08-10T00:51:49Z by kevan

attachment:393status29.dpatch fixes all of the failing tests, so all unit tests should pass now. It syncs up my changes with trunk. It also represents a fairly substantial reorganization of the patch set, which will (hopefully!) make it a lot easier to review. I'm going to be reviewing it myself and fixing a few rough edges this week, but I'm going to set the review-needed keyword so that others can start reviewing it too.

Rough areas that I know about:

_loop in the mutable file downloader isn't hooked up to anything. I took it out of the path of execution when we were still doing multiple-write uploads, but it makes sense to use it again now that we're back to single-write uploads, as in SDMF, since it will allow us to place more shares than we would otherwise in certain circumstances. I'm going to hook it back up tomorrow.
Neither the uploader nor the downloader are very good about keeping their status objects updated. I'm going to try to address this tomorrow.
The way that you tell a mutable file whether it is SDMF or MDMF is poorly specified and kind of a kludge that I made a while ago to help me test things. It probably needs to be at least looked at, specified more precisely, and maybe changed. Right now, if you upload a mutable file that is larger than 128KiB, it is automatically made to be MDMF; otherwise, it is SDMF. These semantics are probably not what we want for backward compatibility/interoperating with older clients.
I need to look at the deferred chains in the publisher and downloader to make sure that I'm not going to be causing ballooning memory usage due to tail recursion.

Changed at 2010-08-10T00:52:37Z by kevan

Attachment 393status29.dpatch added

comment:57 Changed at 2010-08-10T00:54:48Z by kevan

Keywords review-needed added

comment:58 in reply to: ↑ 56 Changed at 2010-08-10T01:58:25Z by davidsarah

Replying to kevan:

The way that you tell a mutable file whether it is SDMF or MDMF is poorly specified and kind of a kludge that I made a while ago to help me test things. It probably needs to be at least looked at, specified more precisely, and maybe changed. Right now, if you upload a mutable file that is larger than 128KiB, it is automatically made to be MDMF; otherwise, it is SDMF. These semantics are probably not what we want for backward compatibility/interoperating with older clients.

We want to be able to test uploading of MDMF files, and enable their use on grids where all clients are known to be of a version that understands them, without breaking backward compatibility for grids with a mix of client versions.

I can think of several ways to do that:

Use a query parameter to webapi upload requests. This would default to SDMF in 1.9.0, and the default could be changed in some future version.
Use a parameter in tahoe.cfg.
A combination of 1 and 2, i.e. use a query parameter to the webapi request, with the default specified in tahoe.cfg.

Option 1. is okay for testing, but then there's no obvious way to configure the WUI and CLI (or other frontends) to say that you only have newer clients and so want to use MDMF. That problem is solved by either 2. or 3.; the choice between them depends on whether we want to be able to specify this on a per-file basis.

comment:59 Changed at 2010-08-10T02:00:01Z by davidsarah

Keywords backward-compatibility forward-compatibility added

comment:60 Changed at 2010-08-11T00:55:12Z by kevan

attachment:393status30.dpatch deals with a few things.

_loop turned out not to do anything that wasn't done elsewhere (which explains why the tests passed with it removed), so I just removed it.
Both the publisher and the downloader now do a better job of keeping their status objects informed about what they're doing.
A test that I wrote yesterday was buggy, so I fixed that. I also fixed a bug in the filenode code that the test exposed.

Like the patch yesterday, it should apply cleanly and the tests should run cleanly (i.e.: they should pass).

I'll be working on the compatibility issues tomorrow.

Changed at 2010-08-11T00:55:46Z by kevan

Attachment 393status30.dpatch added

comment:61 Changed at 2010-08-12T00:15:04Z by kevan

attachment:393status31.dpatch introduces the _turn_barrier style fix of #237 into the deferred chains in the publisher, downloader, and servermap updater to avoid running into recursion limit errors. It also removes the size-based MDMF vs. SDMF criterion, and lays the groundwork for hooking that instead up to the webapi, something I'll be looking at tomorrow. I like option 3 in davidsarahs' comment, since it is fairly simple, but also flexible. I'm hoping that I'll have time to alter the WUI and the CLI to support that feature in an intelligent way after I make that change, but we'll see.

Changed at 2010-08-12T00:15:35Z by kevan

Attachment 393status31.dpatch added

comment:62 Changed at 2010-08-12T23:56:18Z by kevan

attachment:393status32.dpatch teaches the webapi how to upload new files as MDMF and SDMF, based on a new mutable-type parameter. Thinking about it, I'm fairly sure that I don't like the way I chose to do that for the POST verb, so I'll probably change that tomorrow. I also introduced a mutable.format configuration knob in tahoe.cfg, which can be set to sdmf or mdmf and configures the default behavior of new mutable file creation as suggested by davidsarah in comment:58. Once I rework the POST verb, or convince myself that the implementation won't make the WUI-side part of this as clunky as I think it will now, I'll start working on hooking up the WUI and the CLI to this behavior. I'm really hoping to have that done tomorrow. I'll then work on documentation over the weekend.

Changed at 2010-08-12T23:56:57Z by kevan

Attachment 393status32.dpatch added

comment:63 follow-up: ↓ 64 Changed at 2010-08-13T02:29:47Z by zooko

Kevan: by the way you are doing a great job of keeping us updated on the progress on this work. Thank you very much!

comment:64 in reply to: ↑ 63 Changed at 2010-08-13T23:59:44Z by kevan

Replying to zooko:

Kevan: by the way you are doing a great job of keeping us updated on the progress on this work. Thank you very much!

I'm glad you find the obsessive updates helpful. :-) Talking about what I do and posting frequent updates helps me make the most of my time, which is why I try to update this ticket frequently.

comment:65 Changed at 2010-08-14T00:21:51Z by kevan

attachment:393status33.dpatch hooks the WUI and the CLI up to the webapi behavior that I worked on earlier. You can now upload MDMF mutable files on the root page of the WUI, the directory page of the WUI, and using the tahoe put command; the first two of those should be pretty straightforward despite my brilliant web design skills, while the third is documented in tahoe put --help. On my to-do list for the weekend are documentation, code review, making it so that the default from tahoe.cfg is also the default on the WUI MDMF radio buttons, and making sure that I haven't introduced any regressions in the JSON representation of things from the webapi.

Changed at 2010-08-14T00:22:22Z by kevan

Attachment 393status33.dpatch added

comment:66 Changed at 2010-08-14T23:31:14Z by kevan

attachment:393status34.dpatch documents the new webapi functions, the new tahoe.cfg parameter, fixes a few annoyances in the JSON representation of things, improves code coverage, improves the WUI as I wanted to yesterday, and adds some other tests that I wanted to add. I need to write up a better spec of MDMF at some point, need to distill my work on #993 into a coherent comment on that ticket, and need to see what other tickets this patch may have solved, but there aren't any major TODOs that I'm aware of left in the code, so it should be pretty safe to test.

Changed at 2010-08-14T23:31:53Z by kevan

Attachment 393status34.dpatch added

comment:67 Changed at 2010-08-19T01:09:59Z by kevan

attachment:393status35.dpatch fixes some little things revealed by pyflakes, and removes some dead code. It is functionally equivalent to the previous patch.

Changed at 2010-08-19T01:11:00Z by kevan

Attachment 393status35.dpatch added

comment:68 Changed at 2010-08-25T00:24:59Z by kevan

I'm on vacation (and reconditioning my new old car to not break down on said vacation) until early September, so if you leave a review comment, I probably won't get to it until then. :-)

comment:69 Changed at 2010-12-13T01:28:11Z by kevan

#393 now has a branch; you can browse it through Trac, or check it out with darcs:

darcs get --lazy http://tahoe-lafs.org/source/tahoe-lafs/ticket393

I've pushed a slightly updated version of attachment:393status35.dpatch to this ticket to that branch. It is functionally identical, save for that it fixes a few places where the original patchset had bitrotted.

comment:70 follow-up: ↓ 74 Changed at 2011-02-21T17:33:10Z by warner

I'm bringing this up-to-date for trunk and (finally!) reviewing it. Amazing work! I'm building up a list of minor cleanups that can wait until after we land it, but one item came up that I wanted to ask about.

The big new webapi method is a PUT/POST operation to a mutable file that includes an offset=X argument, allowing you to modify the middle of an existing mutable file. But it looks like negative offsets are used to mean "do the old-style replace-the-whole-file behavior". Should we reserve negative offsets for something more useful in the future (by having them raise an error for now)? I'm thinking that it'd be nice to have an easy API to append to a mutable file, and right now it seems like the only way to accomplish that is to first do a t=json query to learn the size of the file, then do a PUT with offset=size. I can imagine wanting to express an append with offset=-1 or offset=append or something like that, and I don't want to teach webapi users that using offset=-X means "replace", making it unsafe for us to reclaim that portion of the argument space later.

thoughts?

comment:71 follow-up: ↓ 77 Changed at 2011-02-21T22:03:34Z by warner

Another issue: in MutableFileVersion._modify_once, the old code repeated the servermap update step at the beginning of each loop, whereas the new code appears to rely upon a mapupdate being done earlier (and doesn't repeat it). The original reason for re-updating was to handle the UCWE-triggered retry case: if we're racing with somebody else to modify the file, and detected contention, then we need to update our servermap to properly update on top of what somebody else finished writing. If we don't update our servermap, we'll be using a stale one, and we'll probably keep hitting UCWE forever (or until the retry counter hits the limit).

There might be some other way your code deals with this that I'm not seeing, though.. for example, in some sense, if we observe UCWE, then we should really create a new MutableFileVersion instance for the new state of the file, and then delegate to .modify on it. Except we need to impose the retry limit (maybe add a retries_permitted= argument to .modify, and when we delegate, we pass in a value that's one lower than the one we were given? limited recursion?).

Changed at 2011-02-22T00:13:24Z by warner

Attachment 393status36.dpatch added

updated to current trunk, conflicts and test failures fixed

comment:72 follow-up: ↓ 73 Changed at 2011-02-22T00:16:44Z by warner

I got about halfway through the review this weekend. Nice work! I've uploaded the merge work that I did, that -36 dpatch should apply cleanly to trunk and pass all tests. I have a bunch of tiny cleanup issues that can wait until after we land this. Apart from the two things mentioned above, I haven't seen any major blockers. Really, this is an excellent patch.

I can't quite figure out how the MDMF salts are handled, though. I can see that they're stored right in front of each data block, but the code (at least layout.py, and the parts of publish.py that I've read so far) don't make it clear how their validated. There's talk of a "salt hash tree", but I can't find any code that manages it. Any hints?

comment:73 in reply to: ↑ 72 Changed at 2011-02-22T00:40:25Z by kevan

Replying to warner:

I can't quite figure out how the MDMF salts are handled, though. I can see that they're stored right in front of each data block, but the code (at least layout.py, and the parts of publish.py that I've read so far) don't make it clear how their validated. There's talk of a "salt hash tree", but I can't find any code that manages it. Any hints?

They're validated alongside the block text that they correspond to. The leaves of the block hash tree are of the form hash(salt + block). See around line 854 of mutable/retrieve.py to see that in action.

(the idea for that is due to your comment:21. Before that, there was a salt hash tree. I saw some references to it when I was looking at the code to answer your question, which are definitely confusing and should be changed.)

comment:74 in reply to: ↑ 70 ; follow-up: ↓ 76 Changed at 2011-02-22T03:49:47Z by kevan

Replying to warner:

The big new webapi method is a PUT/POST operation to a mutable file that includes an offset=X argument, allowing you to modify the middle of an existing mutable file. But it looks like negative offsets are used to mean "do the old-style replace-the-whole-file behavior". Should we reserve negative offsets for something more useful in the future (by having them raise an error for now)? I'm thinking that it'd be nice to have an easy API to append to a mutable file, and right now it seems like the only way to accomplish that is to first do a t=json query to learn the size of the file, then do a PUT with offset=size. I can imagine wanting to express an append with offset=-1 or offset=append or something like that, and I don't want to teach webapi users that using offset=-X means "replace", making it unsafe for us to reclaim that portion of the argument space later.

thoughts?

Agreed; we shouldn't deny ourselves that part of the argument space, especially not for the sake of making the fairly unintuitive "offset=-1 means replace the whole thing" behavior work. How about offset=replace for the current whole file replacement behavior, offset >= 0 to replace part of the file, and offset < 0 throws an exception for now? We'll say that offset=replace by default (i.e.: if the caller doesn't specify their own offset) so that callers get the whole file replacement behavior that they're accustomed to by default.

comment:75 Changed at 2011-02-23T05:34:42Z by warner

Ah, ok, so the block-hash-tree is really a (salt+block)-hash-tree? That sounds fine. Yeah, if you could scan through for references to block-hash-tree or salt-hash-tree and make everything consistent, that'll help future readers. Maybe use saltandblock_hash_tree for an attribute name.

That offset= behavior sounds great!

comment:76 in reply to: ↑ 74 ; follow-up: ↓ 78 Changed at 2011-02-24T01:25:47Z by davidsarah

Replying to kevan:

How about offset=replace for the current whole file replacement behavior, offset >= 0 to replace part of the file, and offset < 0 throws an exception for now? We'll say that offset=replace by default (i.e.: if the caller doesn't specify their own offset) so that callers get the whole file replacement behavior that they're accustomed to by default.

Why have two ways to express that? I'd suggest just requiring the offset parameter to be omitted to get whole-file replacement.

comment:77 in reply to: ↑ 71 Changed at 2011-02-26T02:11:43Z by kevan

Replying to warner:

Another issue: in MutableFileVersion._modify_once, the old code repeated the servermap update step at the beginning of each loop, whereas the new code appears to rely upon a mapupdate being done earlier (and doesn't repeat it). The original reason for re-updating was to handle the UCWE-triggered retry case: if we're racing with somebody else to modify the file, and detected contention, then we need to update our servermap to properly update on top of what somebody else finished writing. If we don't update our servermap, we'll be using a stale one, and we'll probably keep hitting UCWE forever (or until the retry counter hits the limit).

There might be some other way your code deals with this that I'm not seeing, though.. for example, in some sense, if we observe UCWE, then we should really create a new MutableFileVersion instance for the new state of the file, and then delegate to .modify on it. Except we need to impose the retry limit (maybe add a retries_permitted= argument to .modify, and when we delegate, we pass in a value that's one lower than the one we were given? limited recursion?).

Good catch. My code doesn't do anything tricky there, so it needs the servermap update step. I'll post a patch that includes that modification.

Changed at 2011-02-26T07:21:35Z by kevan

Attachment 393status37.dpatch added

comment:78 in reply to: ↑ 76 Changed at 2011-02-26T07:37:24Z by kevan

Replying to davidsarah:

Why have two ways to express that? I'd suggest just requiring the offset parameter to be omitted to get whole-file replacement.

I worry about inconsistent behavior if we later allow offset=append or offset=some-other-thing-that-isn't-really-an-offset-but-a-behavior, since replacement would then behave differently than other operations that would otherwise be treated similarly by the API. That's not really well-founded, though, since we can always start to support offset=replace at that point (and in any case would want to think carefully about allowing offset to take strings that describe behaviors and have little to do with offsets as values). At the moment, what you suggest is a consistency improvement, and gives us more freedom in the future to add other parameters for high-level behaviors instead of overloading the offset parameter. Good suggestion; I'll work it into a patch. Thanks.

comment:79 Changed at 2011-02-26T07:41:16Z by kevan

attachment:393status37.dpatch changes the mutable file modification code to do a servermap update before attempting (or retrying) file updates, increasing the robustness of the process in the presence of uncoordinated writes, per comment:71. It also changes the webapi as discussed in comment:70 and comment:74, but I find myself agreeing with davidsarahs' comment:76 after replying, so that behavior will be changed in a future patch.

comment:80 follow-up: ↓ 82 Changed at 2011-02-27T23:33:47Z by warner

I noticed another curiosity, in the ServermapUpdater(update_range=) code (which fetches the segments that live on the two ends of the span being modified, which are only partially changed and thus need to copy the old contents). There are lots of situations which cause start_segment and end_segment to be the same, such as modifying a small range in the middle of a segment. In those cases, it looks like Servermapupdater._got_results will request the same data twice, and the cache in MDMFSlotReadProxy doesn't help unless the second request arrives after the first has been resolved. So I think we're fetching twice as much data as we need to.

Also, if the span we're replacing exactly aligns with the start or end of a segment (which is very common if we're modifying the start or end of the file), then we don't need to fetch the old segment at all. I think (but haven't confirmed) that the current code always fetches two segments, even in the aligned case.

If so, then after we land this stuff, let's do a cleanup pass. I think the update_range= code could changed from thinking about start/end segments to just getting a set of segnums, of length 0 (if the span aligns on both ends), length 1 (if only one end is aligned, or if the span falls within a single segment) or length 2 (if the two ends of the span fall in different segments). Then the fetch_update_data code can make exactly as many get_block_and_salt calls as necessary, instead of always making 2.

Changed at 2011-02-28T01:44:10Z by kevan

Attachment 393status38.dpatch added

comment:81 Changed at 2011-02-28T01:46:13Z by kevan

attachment:393status38.dpatch alters the offset behavior to be similar to that described in comment:76 and comment:78 and removes misleading comments about the salt hash tree from the mutable file layout code.

comment:82 in reply to: ↑ 80 Changed at 2011-02-28T01:53:49Z by kevan

Replying to warner:

I noticed another curiosity, in the ServermapUpdater(update_range=) code (which fetches the segments that live on the two ends of the span being modified, which are only partially changed and thus need to copy the old contents). There are lots of situations which cause start_segment and end_segment to be the same, such as modifying a small range in the middle of a segment. In those cases, it looks like Servermapupdater._got_results will request the same data twice, and the cache in MDMFSlotReadProxy doesn't help unless the second request arrives after the first has been resolved. So I think we're fetching twice as much data as we need to.

Also, if the span we're replacing exactly aligns with the start or end of a segment (which is very common if we're modifying the start or end of the file), then we don't need to fetch the old segment at all. I think (but haven't confirmed) that the current code always fetches two segments, even in the aligned case.

I think you're right on that; I don't remember designing for either of those scenarios (I was apparently firmly in the two-segment mindset when I wrote that), and a quick glance at the code doesn't reveal that they're given special consideration.

If so, then after we land this stuff, let's do a cleanup pass. I think the update_range= code could changed from thinking about start/end segments to just getting a set of segnums, of length 0 (if the span aligns on both ends), length 1 (if only one end is aligned, or if the span falls within a single segment) or length 2 (if the two ends of the span fall in different segments). Then the fetch_update_data code can make exactly as many get_block_and_salt calls as necessary, instead of always making 2.

Sounds good.

Changed at 2011-02-28T02:15:13Z by warner

Attachment fencepost-test.dpatch added

exercise fencepost-update bug

comment:83 Changed at 2011-02-28T02:18:17Z by warner

The patch I just attached is against -36, I think, but should still apply. Something funny is going on in the update() method when it decides which segments to modify: I'm seeing file corruption when small updates are made near the start of the second or thing segment. The test should trigger the problem I'm seeing, but you can also hit it manually by using 'curl' to PUT a few bytes into a large file (I'd suggest 3 or more segments) with offset=128*1024+1 . What I'm seeing is that the bytes wind up at the start of the file instead.

comment:84 Changed at 2011-03-01T00:28:34Z by davidsarah

In the patch for tahoe put, throw twisted.python.usage.UsageError to indicate that the mutable-type argument is wrong. (This will print a usage string as well as the error message.)

Changed at 2011-03-01T03:25:39Z by kevan

Attachment 393status39.dpatch added

fix fencepost error, improve tahoe put option parsing

comment:85 Changed at 2011-03-01T04:17:37Z by kevan

attachment:393status39.dpatch fixes the fencepost error exposed in Brian's tests (which was the publisher not accounting for offsets that lie on segment boundaries when interpreting the results of div_ceil combined with the filenode code assuming that the segment size would be exactly DEFAULT_MAX_SEGMENT_SIZE, instead of the actual segment size, which is possibly a little larger so that it is a multiple of k). I also moved the mutable-type check to the PutOptions class (since the option parser seems like a good place for option parsing and validation :-) and caused invalid errors to throw a usage.UsageError, as suggested by davidsarah in comment:84.

comment:86 Changed at 2011-03-03T18:25:29Z by warner

I found another bug in the handling of offset=0 which, when used to replace the first few bytes of a file through the webapi, caused the file to be truncated instead. Here's a test:

--- old-trunk-393-s39/src/allmydata/test/test_mutable.py	2011-03-03 10:17:21.000000000 -0800
+++ new-trunk-393-s39/src/allmydata/test/test_mutable.py	2011-03-03 10:17:21.000000000 -0800
@@ -2978,6 +2978,17 @@
             self.failUnlessEqual(results, new_data))
         return d
 
+    def test_replace_beginning(self):
+        # We should be able to replace data at the beginning of the file
+        # without truncating the file
+        B = "beginning"
+        new_data = B + self.data[len(B):]
+        d = self.mdmf_node.get_best_mutable_version()
+        d.addCallback(lambda mv: mv.update(MutableData(B), 0))
+        d.addCallback(lambda ignored: self.mdmf_node.download_best_version())
+        d.addCallback(lambda results: self.failUnlessEqual(results, new_data))
+        return d
+
     def test_replace_segstart1(self):
         offset = 128*1024+1
         new_data = "NNNN"
--- old-trunk-393-s39/src/allmydata/test/test_web.py	2011-03-03 10:17:27.000000000 -0800
+++ new-trunk-393-s39/src/allmydata/test/test_web.py	2011-03-03 10:17:27.000000000 -0800
@@ -3182,6 +3182,12 @@
         d.addCallback(_get_data)
         d.addCallback(lambda results:
             self.failUnlessEqual(results, self.new_data + ("puppies" * 100)))
+        # and try replacing the beginning of the file
+        d.addCallback(lambda ignored:
+            self.PUT("/uri/%s?offset=0" % self.filecap, "begin"))
+        d.addCallback(_get_data)
+        d.addCallback(lambda results:
+            self.failUnlessEqual(results, "begin"+self.new_data[len("begin"):]+("puppies"*100)))
         return d
 
     def test_PUT_update_at_invalid_offset(self):

In web/filenode.py I'd suggest using "None" instead of "False" to mean "no offset= argument was provided", and I'd strongly recommend using "x is None" instead of e.g. "x == None". I'm pretty sure the bug hit by that test_web.py test is the result of comparing "0 == False" (which is true, whereas "0 is False" is not). The test_mutable code passes, but the test_web code fails, so I think the problem is in the webapi layer and not in the MutableFileNode layer.

Also, the code in Publish.setup_encoding_parameters which calculates self.starting_segment .. could you replace that div_ceil+corrections with a simple integer divide? self.starting_segment = offset // segmentsize ? I think that's equivalent to what it's currently trying to do, and probably easier to maintain.

comment:87 Changed at 2011-03-07T08:42:09Z by kevan

Done and done. I replaced what was essentially the same calculation in mutable/filenode.py with integer division as well. In the case where div_ceil is appropriate, I added a comment explaining why for future maintainability. I also removed a comment about the power-of-two boundary that wasn't relevant anymore; these changes (along with your tests) are in attachment:393status40.dpatch .

Changed at 2011-03-07T08:43:10Z by kevan

Attachment 393status40.dpatch added

correct poor choice of internal offset representation, add tests, clarify some calculations

comment:88 Changed at 2011-03-10T23:14:39Z by zooko

Hm, so I'd like to jump in and look at the "latest version" of TransformingUploadable because Brian said he was having trouble digesting it. If we were still maintaining the ticket393 branch then I could darcs get that and read the current version. Since we're apparently not, then do I need to apply all the patches attached to this ticket in order to see what Brian is seeing? Or apply only some of them? Or get a git (ugh) branch from Brian? :-)

comment:89 Changed at 2011-03-11T00:00:10Z by kevan

The easiest way (where easiest is defined to be "the fewest steps" :-) is probably to pull a fresh copy of trunk and apply attachment:393status40.dpatch . There weren't any conflicts when I made that patch, and it's the latest copy of the code.

comment:90 follow-up: ↓ 91 Changed at 2011-03-11T00:40:07Z by zooko

Cool, thanks!

comment:91 in reply to: ↑ 90 Changed at 2011-03-11T00:44:38Z by zooko

That was easy.

Changed at 2011-04-01T23:14:27Z by zooko

Attachment more-tests.diff.patch added

comment:92 Changed at 2011-04-01T23:14:41Z by zooko

Here is a patch that Brian and I worked on at PyCon to do tests of boundary conditions in mutable filenode update(). I think the boundary conditions are not thoroughly exhausted by these tests and Brian and I were thinking about other boundary conditions to test, but at the moment I don't remember what they were. Maybe they had to do with whether the end of the *new* file fell on or near a segment boundary and whether the end of the *old* file fell on or near a segment boundary.

This also contains a whole bunch of incomplete hacks, debug printouts, and "notes to self". I intend to clean this up later tonight or tomorrow, but just in case someone else wants it, here it is.

Changed at 2011-05-03T03:17:44Z by kevan

Attachment 393status41.dpatch added

begin work on layout changes, add MDMF caps, tweak nodemaker and filenode to use MDMF caps

comment:93 Changed at 2011-05-03T03:34:41Z by kevan

attachment:393status41.dpatch contains the current state of my work on the layout changes, cap changes, and other improvements we've come up with. In particular:

The on-disk share format was changed to place data of unknown size after data whose size is predictable to a downloader. This change helps enable smart downloaders that can satisfy a read request in one round-trip. For development purposes, we pad some parameters quite generously; we'll shrink this before it hits trunk as we decide on better bounds.
An MDMF cap was added. In order to enable a smart downloader, we need to stick information about encoding parameters (specifically k and the segment size) in the cap. Current mutable file caps don't do this, so we needed to add an MDMF cap for the purpose in order to not break older clients.
The nodemaker and mutable filenode classes have been modified to deal with MDMF caps appropriately, and tests have been added to ensure that they do.

There are a few stub tests that I wrote as a sort of executable design decision todo list; these fail at the moment. We still need to decide on whether it makes sense to make MDMF directory caps, and to write more tests to ensure that other aspects of Tahoe-LAFS interact in a sane way with MDMF caps (particularly the WUI, webapi, and cli). We also need to write some tests to ensure that pausing an MDMF downloader works correctly, and to ensure interoperability with older clients, and to examine some edge cases and questionable error handling in the uploader and downloader, as noted by Brian, Zooko, and other reviewers.

comment:94 Changed at 2011-05-12T03:22:55Z by kevan

attachment:393status42.dpatch adds several webapi and WUI tests for MDMF caps (not all of which pass at the moment). It also changes some CLI tests to test for MDMF caps. It applies cleanly against trunk as of about 20 minutes ago. I still need to assess the various CLI commands to find other places that need testing. Then I'll work on making all of the failing tests pass.

Changed at 2011-05-12T03:24:47Z by kevan

Attachment 393status42.dpatch added

add tests for MDMF caps

Changed at 2011-05-16T01:16:42Z by kevan

Attachment 393status43.dpatch added

add more tests, fix failing tests, fix broken pausing in mutable downloader

comment:95 Changed at 2011-05-16T02:56:56Z by kevan

attachment:393status43.dpatch includes yet more tests for MDMF caps. Most tests (with the exception of a test that measures the number of writes performed by MDMF operations, which Brian is looking at, and three tests that I still need to write) pass. I've also fixed an error in the mutable downloader that Brian pointed out: without the fix, pausing a download would break the downloader. There's a test for that, also.

comment:96 Changed at 2011-05-31T01:50:59Z by kevan

attachment:393status44.dpatch includes still more tests. Functional changes included with this patch include some internal tweaks to make MDMF caps come with downloader hints by default (so our future smart downloader will have a large number of MDMF caps to handle intelligently when it gets written :-), and a core implementation of MDMF directories (which turned out to be very easy to add). Next will come even more tests, as well as frontends that know how to create MDMF directories when asked to do so.

Changed at 2011-05-31T01:52:19Z by kevan

Attachment 393status44.dpatch added

add more tests; append hints to caps when we have hints to give; initial work on MDMF directories

Changed at 2011-06-18T02:59:30Z by kevan

Attachment 393status45.dpatch added

teach webapi, WUI, and CLI how to deal with MDMF caps

comment:97 Changed at 2011-07-16T20:59:35Z by zooko

Someone needs to review this for Tahoe-LAFS v1.9.0! This is one of the major features that would make 1.9 shine. We think that the deadline for getting patches into trunk for v1.9.0 is July 21 -- Thursday. Or maybe we should move the deadline to the following weekend, say Sunday, July 24. When Brian returns from Paris (tonight or tomorrow), he will hopefully weigh in on that.

comment:98 Changed at 2011-07-28T23:52:28Z by davidsarah

There's a trivial conflict with trunk: the line that starts "if self['mutable-type']" in class MakeDirectoryOptions of scripts/cli.py should be part of the parseArgs method, not the getSynopsis method.

comment:99 Changed at 2011-07-31T22:49:59Z by kevan

attachment:393status46.dpatch sets tighter bounds on signature key, verification key, share hash chain and signature size, fixes a bug I found in situations where n = k (and adds a test for that bug), resolves the conflict with trunk identified by davidsarah, fixes a test from #1304 that breaks when run against MDMF, and adds tests to ensure that the new downloader can handle old SDMF shares. I've also performed other interoperability testing between the old and new mutable file code, confirming that the old downloader can read SDMF files written by the new publisher, that the new downloader can read files published by the new publisher and modified by the old code, and that the old downloader can read files modified by the new publisher.

warner: During one of the first phonecalls, you mentioned that you were worried that the update process could fetch more data than necessary to update a segment near the end of the file, I think. My notes on that issue are sketchy; could you elaborate on your concerns there, if you remember what they are?

The signature, verification key, and signing key bounds are kind of iffy, in that they come from simulation (repeating the key generation and signature operations as the publisher would perform them thousands of times and observing the size of the output) rather than something more concrete (e.g., a property of the cryptosystem, or something in cryptopp that we don't think will change). Unfortunately, I haven't had time to dig deeply into the issue to find more reassuring explanations for those bounds yet. I'll probably have some time over the next week for that, but if someone else wants to give it a try in the meantime, feel free.

Changed at 2011-07-31T22:51:09Z by kevan

Attachment 393status46.dpatch added

add tighter bounds in MDMF shares, resolve bugs

comment:100 Changed at 2011-08-01T16:56:20Z by nejucomo

Owner changed from kevan to nejucomo
Status changed from assigned to new

I'm going to start looking at this, but I suspect it will take me a while to get enough context for a proper review. Fortunately zooko is sitting next to me, so I'll try to finish this before he gets away.

comment:101 Changed at 2011-08-02T02:53:06Z by kevan

attachment:393status47.dpatch is equivalent to attachment:393status46.dpatch , but I've refactored the patches to be a little more coherent, so attachment:393status47.dpatch should hopefully be easier to review.

Changed at 2011-08-02T02:58:13Z by kevan

Attachment 393status47.dpatch added

rework #393 patches to be more comprehensible

comment:102 Changed at 2011-08-02T03:35:33Z by davidsarah

Thanks for refactoring the patches, that's really helpful.

Applying 393status47.dpatch to trunk (r5103) gives these test failures:

[ERROR]: allmydata.test.test_immutable.Test.test_download_best_version

Traceback (most recent call last):
  File "/home/davidsarah/tahoe/sparkly/src/allmydata/test/test_immutable.py", line 298, in test_download_best_version
    d = self.n.download_best_version()
exceptions.AttributeError: 'Test' object has no attribute 'n'
===============================================================================
[ERROR]: allmydata.test.test_immutable.Test.test_download_to_data

Traceback (most recent call last):
  File "/home/davidsarah/tahoe/sparkly/src/allmydata/test/test_immutable.py", line 291, in test_download_to_data
    d = self.n.download_to_data()
exceptions.AttributeError: 'Test' object has no attribute 'n'
===============================================================================
[ERROR]: allmydata.test.test_immutable.Test.test_get_best_readable_version

Traceback (most recent call last):
  File "/home/davidsarah/tahoe/sparkly/src/allmydata/test/test_immutable.py", line 305, in test_get_best_readable_version
    d = self.n.get_best_readable_version()
exceptions.AttributeError: 'Test' object has no attribute 'n'
===============================================================================
[ERROR]: allmydata.test.test_immutable.Test.test_get_size_of_best_version

Traceback (most recent call last):
  File "/home/davidsarah/tahoe/sparkly/src/allmydata/test/test_immutable.py", line 311, in test_get_size_of_best_version
    d = self.n.get_size_of_best_version()
exceptions.AttributeError: 'Test' object has no attribute 'n'
===============================================================================
[ERROR]: allmydata.test.test_mutable.Filenode.test_mdmf_write_count

Traceback (most recent call last):
  File "/home/davidsarah/tahoe/sparkly/src/allmydata/test/test_mutable.py", line 622, in _check_server_write_counts
    peers = sb.test_servers.values()
exceptions.AttributeError: StorageFarmBroker instance has no attribute 'test_servers'
-------------------------------------------------------------------------------

Changed at 2011-08-02T04:31:40Z by davidsarah

Attachment fix-test-failures-393.darcs.patch added

Fix for test failures in test_immutable.py caused by 393status47.dpatch

comment:103 Changed at 2011-08-02T04:39:32Z by davidsarah

The last patch in fix-test-failures-393.darcs.patch fixes the failures in test_immutable, but not the one in test_mutable.Filenode.test_mdmf_write_count. Maybe the tests touched by that patch should be combined so that they're not doing the setup multiple times, for speed (they're not slow tests, but every little helps with #20).

comment:104 Changed at 2011-08-02T06:48:14Z by nejucomo

I applied attachment:393status47.dpatch and attachment:fix-test-failures-393.darcs.patch and now have only test_mutable.Filenode.test_mdmf_write_count as the only error/failure with the same error as comment:102.

Here's my analysis so far: It appears as if this one unittest is relying on a unittest-specific interface to allmydata.storage_client.StorageFarmBroker which is not present in the trunk state with the above patches. Specifically, StorageFarmBroker *does* provide a unittest-specific interface with methods called test_add_rref and test_add_server. These modify a member called servers.

I suspect that the version of StorageFarmBroker which the test was written against had separate servers and test_servers members, whereas the version in this repo now only has servers.

I attempted to change the attribute test_servers to servers in the test, and it proceeds to iterate over the values which are instances of NativeStorageServer in this loop:

            for peer in peers:
                self.failUnlessEqual(peer.queries, 1)

However, the test then fails when evaluating the assertion because the NativeStorageServer instances do not have .queries members.

The interface changes between the test expectation and the actual code are larger than my guess-and-check approach accommodates for now.

I'm going to sleep on this. My next step will be to see if I can get darcs to show me which patches have modified the storage_client.py APIs away from the unittest code in order to see how to bring the test up-to-date.

comment:105 Changed at 2011-08-03T00:30:29Z by davidsarah

I thought the test failures might have been introduced by the patch refactoring, but no, they were present in 393status46.dpatch .

comment:106 Changed at 2011-08-03T01:26:18Z by kevan

It looks like 0605c77f08fb4b78 removed common.ShareManglingMixin. The broken tests relied on a side effect of common.ShareManglingMixin's setup functionality (the self.n variable) to work, which is why they broke when it was removed.

comment:107 Changed at 2011-08-04T20:21:29Z by zooko

I've created a branch hosted on http://tahoe-lafs.org named ticket393-MDMF. With this you can browse the patches through trac and browse builds of that branch. To trigger builds of that branch, enter ticket393-MDMF as the "branch name" on the "Force Build" form.

One thing I've learned already is that the current ticket393-MDMF branch isn't pyflakes-clean.

comment:108 Changed at 2011-08-07T01:16:30Z by kevan

attachment:393status48.dpatch fixes the pyflakes warnings and the mdmf write count test.

Changed at 2011-08-07T01:18:24Z by kevan

Attachment 393status48.dpatch added

fix broken test, fix pyflakes warnings

comment:109 Changed at 2011-08-07T04:06:15Z by nejucomo

I just applied attachment:393status48.dpatch successfully to trunk. Running python ./setup.py test succeeds, ending with this output:

PASSED (skips=6, expectedFailures=3, successes=1061)

comment:110 Changed at 2011-08-07T04:34:50Z by nejucomo

I'm about to embark on an international flight, and the state of my internet access tomorrow will be unknown. Therefore, if anyone wants to increase the chance of MDMF successfully landing in the 1.9 release, feel free to review this ticket in parallel!

Here is what I'd like to accomplish on the flight:

Learn the format and cryptographic derivations of the MDMF cap. There is a good deal of discussion in these comments about the share serialization, but not the MDMF cap. I'll first check for documentation in the repository or the wiki for this, otherwise I'll revert to code.

Think carefully about the share file format. I have only skimmed over the proposed layout without reviewing which fields are authenticated and how, as well as what the authentication tells the reader of a share file.

Brainstorm how I would implement the protocol, given what I know about the MDMF cap and the share file format. Jot down some notes. I'll use this as a reference point in the code review to notice divergences between my brain-stormed implementation and the real one.

Read the actual code, noticing any confusing bits.

After I land, I'll attempt to get internet access to dump my feedback, especially an approval for release or please-fix-X comment.

comment:111 Changed at 2011-08-07T06:01:35Z by nejucomo

Owner changed from nejucomo to zooko

I'm reassigning this ticket to Zooko. I still plan to work on this review, but I want to explicitly get more review coverage on this feature, both because of my schedule/connectivity uncertainty and "more eyes, shallower bugs".

comment:112 Changed at 2011-08-08T18:20:55Z by zooko

I applied attachment:393status48.dpatch to ticket393-MDMF, but there were conflicts. Apparently Kevan re-recorded one of the patches and changed nothing:

-Mon Aug  1 20:05:11 MDT 2011  Kevan Carstensen <kevan@isnotajoke.com>
+Sat Aug  6 18:42:24 MDT 2011  Kevan Carstensen <kevan@isnotajoke.com>
   * dirnode: teach dirnode to make MDMF directories

I created a new branch named ticket393-MDMF-2. Browse the source; view the builds. It now passes pyflakes and unit tests!

comment:113 follow-up: ↓ 114 Changed at 2011-08-08T20:01:30Z by zooko

Oh, actually there is a unit test failure on the Windows buildslave. Looks like a URI in the test is a Python string instead of a Python unicode. I guess our caps should always be Python unicode objects, even though they don't use non-ASCII characters. (Or actually I guess they could use non-ASCII characters in the path part after a dir cap. But that doesn't matter -- they should always be unicode objects and never strs regardless of what characters they use.)

I wonder why this happens only on Windows?

comment:114 in reply to: ↑ 113 Changed at 2011-08-09T00:34:48Z by davidsarah

Replying to zooko:

Oh, actually there is a unit test failure on the Windows buildslave.

Fixed in [5147/ticket393-MDMF-2].

Looks like a URI in the test is a Python string instead of a Python unicode. I guess our caps should always be Python unicode objects, even though they don't use non-ASCII characters.

No, they are UTF-8 str objects (when not decoded into a *URI object).

I wonder why this happens only on Windows?

Versions of the simplejson library differ in whether simplejson.loads always uses unicode objects, or sometimes uses unicode and sometimes str (the latter for strings that are representable as ASCII). The version installed on the Windows buildslave returns unicode more often. Using to_str on each string in the output makes the behaviour consistent. (Note that this isn't necessary for keys, only values.)

comment:115 Changed at 2011-08-10T17:36:45Z by warner

Applying status48 to current trunk, fixing up the conflicts that resulted (including deleting an extra call to d.addCallback(_created)), then attempting a simple WUI MDMF upload, I got an exception. After adding a line to dump the self._offsets table, here's what it looked like:

11:10:21.946 L20 []#335 {'EOF': 7156,
11:10:21.946 L20 []#336  'block_hash_tree': 7124,
11:10:21.946 L20 []#337  'enc_privkey': 123,
11:10:21.946 L20 []#338  'share_data': 1905,
11:10:21.946 L20 []#339  'share_hash_chain': 1337,
11:10:21.946 L20 []#340  'signature': 1473,
11:10:21.947 L20 []#341  'verification_key': 1729,
11:10:21.947 L20 []#342  'verification_key_end': 2021}
11:10:21.947 L20 []#343 Publish ran out of good servers, last failure was: [Failure instance: Traceback: <type 'exceptions.AssertionError'>: 
/Library/Python/2.6/site-packages/Twisted-11.0.0-py2.6-macosx-10.6-universal.egg/twisted/internet/base.py:793:runUntilCurrent
/Library/Python/2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn
/Library/Python/2.6/site-packages/Twisted-11.0.0-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:361:callback
/Library/Python/2.6/site-packages/Twisted-11.0.0-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:455:_startRunCallbacks
--- <exception caught here> ---
/Library/Python/2.6/site-packages/Twisted-11.0.0-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:542:_runCallbacks
/Users/warner2/stuff/tahoe/trunk-393-3/src/allmydata/mutable/publish.py:634:_push
/Users/warner2/stuff/tahoe/trunk-393-3/src/allmydata/mutable/publish.py:651:push_segment
/Users/warner2/stuff/tahoe/trunk-393-3/src/allmydata/mutable/publish.py:637:_push
/Users/warner2/stuff/tahoe/trunk-393-3/src/allmydata/mutable/publish.py:771:push_everything_else
/Users/warner2/stuff/tahoe/trunk-393-3/src/allmydata/mutable/publish.py:862:finish_publishing
/Users/warner2/stuff/tahoe/trunk-393-s48/src/allmydata/mutable/layout.py:1080:put_verification_key
]

It looks like the pubkey is overrunning the subsequent share data, as if it was larger than the space originally reserved for it.

comment:116 Changed at 2011-08-10T18:27:57Z by warner

Ah. The actual problem is in the code which pre-allocates space in the share for the various fields. In particular, layout.py line 554 is counting hashes, not bytes:

SHARE_HASH_CHAIN_SIZE = int(math.log(HASH_SIZE * 256)) + 1

This is meant to reserve space for a worst-case N=256 share hash chain, which would require 8 nodes (the root is stored separately, so we don't actually need the +1).

In my 3-of-10 upload, the hash chain is length 4 (there are 16 leaves on the bottom layer {one per share}, then 8, then 4, then 2). Each node is stored with a two-byte node number and a 32-byte SHA256d output. So the hash chain is a total of 136 bytes long.

Huh, looking more carefully at that equation, there are other problems:

math.log is base e, not base 2
the HASH_SIZE* should go outside the int(), not inside
because of the extra two-byte node-number, it should be (HASH_SIZE+2)*
the +1 is unnecessary because the roothash is stored elsewhere
- although it'd be a conservative replacement for the missing ceil() that'd be necessary if the 256 weren't hard-coded,

(this sort of problem reinforces my belief that it's better to add up the sizes of the actual fields than to attempt to predict them or calculate maximum values. Kevan pointed out that he might have been advised to do it this way to enable parallelism between RSA key generation and encryption/FEC/upload of share data, but I think I'd have advised a more conservative approach)

So I think this ought to be:

SHARE_HASH_CHAIN_SIZE = (HASH_SIZE+2)*mathutil.log_ceil(256, 2)

at least that gives me (32+2)*8, which feels right.

Kevan: could you update this (and add a new patch with both this and the merge conflicts resolved)?

Now, the *real* question is why the unit tests didn't catch this. There's a little bit of slack between the estimated maximums and the actual RSA field sizes, but only about 9 bytes, not enough to store an entire hash. So at first glace it seems like this could only have worked when N=1 (i.e. the share_hash_chain was empty).

comment:117 follow-up: ↓ 118 Changed at 2011-08-10T18:55:11Z by warner

Oh, unit tests tend to use artifically small RSA keys (522 bits) for speed. That might create enough slack (in the encprivkey, pubkey, and signature) to allow a short share_hash_chain to still fit.. maybe that's why the unit tests were passing.

I think the high-level test_system uses full-sized keys.. maybe we need an MDMF file-creation step in there. (in general, we should have a single high-level non-detailed non-exhaustive test that behaves as much as possible like normal human-initiated operations).

comment:118 in reply to: ↑ 117 ; follow-up: ↓ 122 Changed at 2011-08-10T21:13:35Z by davidsarah

Replying to warner:

Oh, unit tests tend to use artifically small RSA keys (522 bits) for speed. That might create enough slack (in the encprivkey, pubkey, and signature) to allow a short share_hash_chain to still fit.. maybe that's why the unit tests were passing.

Yes -- many tests fail if the test key size is 2048 bits (but all tests still pass if it is 1024 bits). I'll attach a patch that makes the test key size easily changeable.

comment:119 Changed at 2011-08-10T21:36:32Z by david-sarah@…

In [5171/ticket393-MDMF-2]:

Replace the hard-coded 522-bit RSA key size used for tests with a TEST_RSA_KEY_SIZE constant defined in test/common.py (part 1). refs #393

comment:120 Changed at 2011-08-10T21:36:37Z by david-sarah@…

In [5172/ticket393-MDMF-2]:

Replace the hard-coded 522-bit RSA key size used for tests with a TEST_RSA_KEY_SIZE constant defined in test/common.py (part 2). refs #393

comment:121 Changed at 2011-08-10T21:36:38Z by david-sarah@…

In [5173/ticket393-MDMF-2]:

More idiomatic resolution of the conflict between ticket393-MDMF-2 and trunk. refs #393

comment:122 in reply to: ↑ 118 Changed at 2011-08-10T21:39:16Z by davidsarah

Replying to davidsarah:

I'll attach a patch that makes the test key size easily changeable.

Oops, I said I was going to attach it but I actually committed it to the branch. Sorry about that.

comment:123 Changed at 2011-08-10T21:41:44Z by davidsarah

Although it's not obvious from the trac view of the changeset, [5171/ticket393-MDMF-2] actually uses darcs replace.

comment:124 follow-up: ↓ 126 Changed at 2011-08-12T18:24:24Z by davidsarah

Notes on [5124/ticket393-MDMF-2]:

This:

self._need_privkey = fetch_privkey or verify
if self._node.get_privkey() and not verify: 
    self._need_privkey = False

could be simplified to

self._need_privkey = verify or (fetch_privkey
                                and not self._node.get_privkey())

This:

self._verify = False 
if verify: 
    self._verify = True

could be

self._verify = verify

self._paused_deferred = None is a typo, it should be _pause_deferred.

self._paused seems to be True whenever self._pause_deferred is not None, except that there is one place in the download method that sets self._paused to False without setting self._pause_deferred to None. I don't think that the self._paused = ~~None~~False there is actually necessary; everything will work if the download is started while paused. That would allow the code to be simplified by removing self._paused.

Is this calculation correct?

# our start segment is the first segment containing the 
# offset we were given.  
start = mathutil.div_ceil(self._offset, 
                          self._segment_size) 
# this gets us the first segment after self._offset. Then 
# our start segment is the one before it. 
start -= 1

I would have thought that the segment containing offset was the floor of offset / segment_size.

Unusued -> Unused at retrieve.py line 305

set self._block_hash_trees to None in decode(), and make its initialization in _setup_encoding_parameters conditional on if self._block_hash_trees is not None:.

My preferred whitespace conventions:

one blank line before a block comment if the preceding line is code at the same indent level (this makes it harder to mistake the code as part of the comment, and makes the comment easier to read IMHO). examples are lines 235 and 285 of retrieve.py
two blank lines between top-level classes, or groups of methods
otherwise only one blank line between methods

Last edited at 2011-08-13T20:34:48Z by davidsarah (previous) (diff)

comment:125 Changed at 2011-08-13T21:03:54Z by davidsarah

A couple of possible refactorings in [5124/ticket393-MDMF-2] (these don't need to be done for the beta):

Rather than using pairs such as (success, message) or (seqnum, root_hash, IV, segsize, datalength, k, N, known_prefix, offsets_tuple), it might be clearer to use struct-like objects:
```
class Result(object):
    def __init__(self, success, message):
        self.success = success
        self.message = message
    def __repr__(self):
        return "Result(success=%r, message=%r)" % (self.success, self.message)
```
(This might be an unpleasant reminder of Java, but actually, making the code that uses a struct less ugly and easier to read is more important than the overhead of the struct-like class definition, IME.)

_remove_reader in Retrieve is only called from _mark_bad_share, so it could be inlined. There's a comment about this, but it's asking about whether there might be a reason to use _remove_reader independently in future. My advice would be: don't worry about trying to predict the future, just simplify the current code quite aggressively.

Last edited at 2011-08-13T21:57:27Z by davidsarah (previous) (diff)

comment:126 in reply to: ↑ 124 Changed at 2011-08-13T21:10:07Z by kevan

Replying to davidsarah:

Notes on [5124/ticket393-MDMF-2]:
This:
self._need_privkey = fetch_privkey or verify
if self._node.get_privkey() and not verify: 
    self._need_privkey = False
could be simplified to
self._need_privkey = verify or (fetch_privkey
                                and not self._node.get_privkey())
This:
self._verify = False 
if verify: 
    self._verify = True
could be
self._verify = verify
self._paused_deferred = None is a typo, it should be _pause_deferred.
self._paused seems to be True whenever self._pause_deferred is not None, except that there is one place in the download method that sets self._paused to False without setting self._pause_deferred to None. I don't think that the self._paused = None there is actually necessary; everything will work if the download is started while paused. That would allow the code to be simplified by removing self._paused.

Good observations.

Is this calculation correct?

# our start segment is the first segment containing the 
# offset we were given.  
start = mathutil.div_ceil(self._offset, 
                          self._segment_size) 
# this gets us the first segment after self._offset. Then 
# our start segment is the one before it. 
start -= 1

I would have thought that the segment containing offset was the floor of offset / segment_size.

Indeed. Both this and the similar end segment calculation are wrong.

This code gets exercised for partial reads, and doesn't behave correctly for reads that start at an offset that is evenly divisible by the segment size. For example, suppose that the segment size is 131073, and that we want to read starting at offset 131073. Since bytes are zero-indexed, we actually want to read the 131074th byte, which is the first byte in the second segment (or segment 1, since segments are also zero-indexed). div_ceil(131073, 131073) = 1, but the unconditional start -= 1 means that we'll actually start reading from segment 0, which is wrong. Floor division handles this case correctly.

The end segment calculation is wrong in a similar way, but manages to work anyway because of a different bug. Specifically, it expects to read one more byte of data than is actually necessary to fulfill the read. For example, suppose that we want to read 5 bytes starting at offset 131073. Then we'll read 131073, 131074, 131075, 131076, and 131077, or the range 131073-131077 inclusive. The code doesn't calculate this range correctly, because it uses offset + read_size to calculate the endpoint; in this case, it produces the range 131073 - 131078.

I've attached a patch that adds tests for both of these issues. The case where the read starts on a segment boundary fails without modification on the ticket393-MDMF-2 branch. To see the test where a read ends on a segment boundary fail, you need to add end_data -= 1 after end_data is first calculated around line 400 of retrieve.py to fix the off-by-one error there.

Both tests pass with floor division. I've also fixed similar equations for partial updates of MDMF files in filenode.py; these already used floor division, but still read more data than necessary to perform an update.

Unusued -> Unused at retrieve.py line 305
set self._block_hash_trees to None in decode(), and make its initialization in _setup_encoding_parameters conditional on if self._block_hash_trees is not None:.

Fixed in the updated patch.

My preferred whitespace conventions:

one blank line before a block comment if the preceding line is code at the same indent level (this makes it harder to mistake the code as part of the comment, and makes the comment easier to read IMHO). examples are lines 235 and 285 of retrieve.py
two blank lines between top-level classes, or groups of methods
otherwise only one blank line between methods

Noted; I'll start fixing this.

Replying to warner:

So I think this ought to be:
SHARE_HASH_CHAIN_SIZE = (HASH_SIZE+2)*mathutil.log_ceil(256, 2)
at least that gives me (32+2)*8, which feels right.

Good catch; thanks.

Kevan: could you update this (and add a new patch with both this and the merge conflicts resolved)?

Is there a compelling reason to keep posting patches that encompass the entire MDMF feature instead of using the ticket393-MDMF-2 branch? I find the latter much easier to work with.

Also, would it be worthwhile to make it so the buildbots can easily run tests with 2048-bit keys? Or to add a target to the Makefile to do that? Might make these sorts of issues easier to find in the future.

I'll attach a patch against ticket393-MDMF-2 that addresses davidsarahs' review comments (it looks like warner and davidsarah have addressed warner's comments in comment:116, at least in that branch). If anyone wants one, I can also make a big patch bundle against trunk that contains the whole MDMF feature.

Changed at 2011-08-13T21:11:42Z by kevan

Attachment fix-393-boundary-issues.darcspatch added

address review comments, test for and fix boundary conditions (meant for application to ticket393-MDMF-2 branch)

comment:127 Changed at 2011-08-13T22:40:26Z by davidsarah

+1 on fix-393-boundary-issues.darcspatch . The "# XXX: Make it so that it won't set this if we're just decoding." comment is no longer necessary, though.

It's fine by me if you just post patches against the ticket393-MDMF-2 branch.

Is there a ticket for the "robust, configurable tahoe-lafs simulator" mentioned in this comment? I don't think there is.

Last edited at 2011-08-13T22:43:58Z by davidsarah (previous) (diff)

comment:128 Changed at 2011-08-13T22:54:31Z by davidsarah

Also delete the commented-out "#needed.discard(0)" here.

comment:129 Changed at 2011-08-13T23:02:24Z by davidsarah

In the _failed method here, the "eventually(self._done_deferred.callback, ret)" on the last line looks wrong. You have to use .callback in the if self._verify: branch and .errback in the else: branch, I think.

comment:130 Changed at 2011-08-14T01:45:30Z by davidsarah

At ticket393-MDMF-2/src/allmydata/interfaces.py@5126#L452, put_encprivey -> put_encprivkey (or put_encrypted_private_key might be better).

comment:131 Changed at 2011-08-16T22:28:14Z by davidsarah

tahoe debug dump-cap does not seem to understand MDMF cap URIs:

bin/tahoe debug dump-cap URI:MDMF:ft6h6t4r3fszqsmabux7t356ke:xtpsvpmvusfn5eqbkltjbpqvjrauh3zlcdkvj4qctg3vjo2x72xq:3:131073

unknown cap type

comment:132 Changed at 2011-08-16T22:53:45Z by david-sarah@…

In [5188/ticket393-MDMF-2]:

mutable/retrieve.py: cosmetics and remove a stale comment. refs #393

comment:133 Changed at 2011-08-16T23:57:11Z by david-sarah@…

In [5191/ticket393-MDMF-2]:

mutable/layout.py: fix unused import. refs #393

Changed at 2011-08-17T00:25:21Z by davidsarah

Attachment 131073.html added

Check results for an MDMF file on the testgrid that is causing UncoordinatedWriteErrors?

comment:134 follow-up: ↓ 135 Changed at 2011-08-17T04:09:38Z by davidsarah

I'm reviewing changeset 5125. It's a very substantial rewrite of the original code, which feels risky -- can you explain why this much of a rewrite was needed to support multiple segments? The new code is used for both SDMF and MDMF, which means there is a significant risk of regressions in SDMF support.

Also, it seems like there's a lot of near-duplicated code between the update and publish methods of the Publish class (update is new).

Changed at 2011-08-17T04:54:30Z by zancas

Attachment partialoutputfromflogtooltailduringuncoordinatedwriteerror.txt added

The name pretty much says it all. This attachment is associated with the elusive UCWE...

comment:135 in reply to: ↑ 134 Changed at 2011-08-17T15:31:57Z by kevan

Replying to davidsarah:

I'm reviewing changeset 5125. It's a very substantial rewrite of the original code, which feels risky -- can you explain why this much of a rewrite was needed to support multiple segments? The new code is used for both SDMF and MDMF, which means there is a significant risk of regressions in SDMF support.

The short answer is that it probably isn't necessary to support multiple segments (and that I agree with you about the riskiness of a rewrite).

When I wrote the publisher, part of the goal of MDMF was to publish MDMF files segment-by-segment (that is, to write the encrypted+encoded blocks to the grid as they are produced on the client), as immutable files are published. The patch attached to comment:39 contains a publisher that handles this sort of publishing, which is heavier on deferreds/callbacks and coordination logic than the current mutable publisher. I can't speak precisely to my reasoning at the time I wrote that; probably I felt that there wasn't a good/clean/maintainable way to add that sort of coordination logic to the current uploader without a substantial rewrite (maybe that was hasty; I recall exploring smaller, less invasive changes before the rewrite and not liking them for various reasons, but it's possible that I missed something simple). Anyway, comment:38 describes an issue with segment-by-segment publishing for mutable files, and is ultimately why we don't publish segment-by-segment now. I guess at that point I felt that it was easier to tweak the revised publisher to work with the relatively minor layout.py changes induced by comment:38 than to go back to the drawing board and re-evaluate the existing publish code in the context of the modified publishing semantics (a significant oversight, in retrospect, since so many of the publisher changes were due to the segment-by-segment publishing semantics).

I can take another look at the existing publisher to see if it can be made to handle MDMF with fewer/smaller changes. I can't say how long that would take (or whether I'd be done by the 09/01/2011 deadline for 1.9). It does seem like a good idea, though.

Also, it seems like there's a lot of near-duplicated code between the update and publish methods of the Publish class (update is new).

Yes, that setup logic seems like a good target for refactoring.

comment:136 Changed at 2011-08-18T03:25:12Z by davidsarah

In [5126/ticket393-MDMF-2], interface IWritable, the comment for update should just say "mutable file" rather than "MDMF file". Also we spell "writeable" with an "e" (that's not the most common spelling, but we use it consistently elsewhere).

comment:137 Changed at 2011-08-18T03:32:01Z by davidsarah

IWritable doesn't support truncating a file to a given length. I don't think this is needed immediately, but most file APIs provide this functionality.

comment:138 Changed at 2011-08-21T04:36:49Z by davidsarah

In MutableFileNode:

    def _upload(self, new_contents, servermap):
        """
        A !MutableFileNode still has to have some way of getting
        published initially, which is what I am here for. After that,
        all publishing, updating, modifying and so on happens through
        !MutableFileVersions.
        """
        assert self._pubkey, "update_servermap must be called before publish"

Is this comment correct? I thought upload (which calls _upload) was supposed to work whether or not there is an existing version.

Also, the message in the assertion should be "_update_servermap must be called before _upload", I think.

comment:139 Changed at 2011-08-22T00:32:00Z by kevan

I think I agree with comment:128, comment:129, comment:130 and comment:136; I've made those fixes in my tree.

That comment on _upload is a bit confusing. You're correct that there's no restriction (either in the code or in the semantics prescribed by IMutableFileNode) that upload be used only for new mutable files. I'll remove that. I probably put it there because the only internal caller of _upload is associated with mutable file creation.

I'll attach a patch with those fixes and a few others (removing some TODOs in the mutable filenode code that no longer apply, removing SDMF-specific comments from interfaces.py, and removing a commented-out method from the mutable filenode code).

On the call yesterday, we had some concerns about fencepost errors in segment boundary calculations, and in particular their potential impact on SDMF mutable files. In the case of SDMF files, most of these calculations are easy: there's only one segment, and we always want to fetch or upload it. It would be fairly easy to use that knowledge to short-circuit the possibly wrong segment calculations so that they're only used for MDMF mutable files, which would reduce the potential for regression if they are in fact wrong. Does that seem like a good idea? I can prepare a patch with those changes if so.

comment:140 Changed at 2011-08-22T01:16:50Z by davidsarah

In [5187/ticket393-MDMF-2], is it possible that offset and data.get_size() are both zero, in which case end_segment will end up as -1?

I looked for other possible fencepost errors and couldn't see any. I don't think that we need to short-circuit the segment calculations for SDMF.

Last edited at 2011-08-22T01:33:34Z by davidsarah (previous) (diff)

comment:141 follow-ups: ↓ 153 ↓ 163 Changed at 2011-08-22T01:31:21Z by davidsarah

Minor nit: the CLI accepts --mutable-type=mdmf (for example), but not --mutable-type=MDMF. It should probably be case-insensitive.

Also, specifying --mutable-type should probably imply --mutable. Currently there's an assert if the former is given without the latter; asserts should not be used to report usage errors. (Also, usage errors should be reported first, before other possible errors -- but I'm suggesting that this shouldn't be an error in any case.)

These changes aren't needed for the alpha.

Last edited at 2011-08-22T01:32:22Z by davidsarah (previous) (diff)

comment:142 Changed at 2011-08-22T01:50:13Z by davidsarah

I added the following tests to test_mutable.py, to try to catch fencepost errors with zero-length reads. They both fail, although I think they're correct:

    def test_partial_read_zero_length_at_start(self):
        d = self.mdmf_node.get_best_readable_version()
        c = consumer.MemoryConsumer()
        d.addCallback(lambda version:
            version.read(c, 0, 0))
        d.addCallback(lambda ignored:
            self.failUnlessEqual("", "".join(c.chunks)))
        return d

    def test_partial_read_zero_length_at_segment_boundary(self):
        d = self.mdmf_node.get_best_readable_version()
        c = consumer.MemoryConsumer()
        offset = mathutil.next_multiple(128 * 1024, 3)
        d.addCallback(lambda version:
            version.read(c, offset, 0))
        d.addCallback(lambda ignored:
            self.failUnlessEqual("", "".join(c.chunks)))
        return d

Also, the test named test_partial_read_ending_on_segment_boundary is actually testing a read ending one byte after a segment boundary. I don't know whether that's intentional; if it is then the test should be renamed.

Changed at 2011-08-22T02:51:18Z by davidsarah

Attachment zero-length-tests.darcs.patch added

Additional tests for zero-length partial reads and updates to mutable versions. Also refactors some tests to reduce duplicated code.

comment:143 Changed at 2011-08-22T04:56:17Z by zooko

I've been searching for any critical bugs in SDMF writes in the branch. I haven't found any yet. I've found some minor bugs that shouldn't be blockers for the 1.9 alpha release.

comment:144 Changed at 2011-08-22T05:37:03Z by zooko

Opened #1496 about an issue that I think should be a blocker for 1.9 final but possibly not for 1.9 alpha.

comment:145 Changed at 2011-08-22T05:39:51Z by zooko

Opened #1497 about an issue that I think should be a blocker for 1.9 final but not for 1.9 alpha.

comment:146 Changed at 2011-08-22T05:59:29Z by zooko

It is bed time and unfortunately I haven't finished examining the patch in search of ways that it could make us write out ill-formatted SDMF files, or leak the write-authorization secrets, or succumb to an attacker's attempt to make us write out (and digitally sign) an SDMF file that we didn't mean to (including making us use a wrong sequence number). Those are the three possibilities that I consider to be the "worst case scenarios", so they are what I started looking for first. I'll resume looking for those possibilities tomorrow.

Last edited at 2011-08-22T06:01:19Z by zooko (previous) (diff)

comment:147 follow-up: ↓ 148 Changed at 2011-08-22T06:20:12Z by zooko

Kevan has caught some places which say "if the file is mutable, load all of its contents into RAM", for example in 20110802021207-b8d28-b0c006c43f8efc1c3d89484ed7d6ff037f07774a GeneralSFTPFile._close(). This worked in Tahoe-LAFS <= v1.8, because mutable files were necessarily smaller than RAM, but this assumption is no longer valid in Tahoe-LAFS >= v1.9.

If he didn't catch one, then the fact that a file is mutable but can have large contents (even larger than RAM) would be a critical bug. How can we be sure we've caught all of them? Can we write unit tests which detect if the entire file has been loaded into RAM? (Somehow... by examining the change in the RAM usage? By scanning the Python strings for a special pattern which we've put into the MDMF file? Or better: by using a mock object that observes how many reads of what size the client performs and when the resulting objects get garbage collected and adds up the high water mark.)

Failing that, we should have someone else do what Kevan has presumably already done and search for all uses of mutable files and examine each use to see if it reads the entire contents into RAM.

(Should this comment turn into a new ticket?)

Changed at 2011-08-23T02:49:11Z by davidsarah

Attachment 393-davidsarah.darcs.patch added

Additional tests for MDMF. Also, make the immutable/read-only constraint checking for MDMF URIs identical to that for SSK URIs. refs #393

Changed at 2011-08-23T02:50:48Z by davidsarah

Attachment misc-docs-davidsarah.darcs.patch added

Miscellaneous documentation updates (including some for MDMF)

comment:148 in reply to: ↑ 147 ; follow-up: ↓ 150 Changed at 2011-08-23T16:38:42Z by davidsarah

Replying to zooko:

Kevan has caught some places which say "if the file is mutable, load all of its contents into RAM", for example in 20110802021207-b8d28-b0c006c43f8efc1c3d89484ed7d6ff037f07774a GeneralSFTPFile._close(). This worked in Tahoe-LAFS <= v1.8, because mutable files were necessarily smaller than RAM, but this assumption is no longer valid in Tahoe-LAFS >= v1.9.

GeneralSFTPFile._close does:

self.filenode.overwrite(MutableFileHandle(self.consumer.get_file()))

where self.consumer.get_file() is a file object for an open temporary file.

Does this read the whole file into memory? If it does, I think the bug is in MutableFileHandle or MutableFileVersion.overwrite. The SFTP frontend itself is careful not to assume that files fit into memory.

(I agree with the more general point that we need tests to detect whether we're making that assumption. Manual smoke-testing with large files would also help, and could be done with the alpha.)

comment:149 Changed at 2011-08-23T17:02:59Z by davidsarah

At src/allmydata/mutable/publish.py@5168#L342, ConsistencyError -> UncoordinatedWriteError. (This bug predates MDMF.)

comment:150 in reply to: ↑ 148 Changed at 2011-08-23T18:22:28Z by davidsarah

Replying to davidsarah:

Replying to zooko:

Kevan has caught some places which say "if the file is mutable, load all of its contents into RAM", for example in 20110802021207-b8d28-b0c006c43f8efc1c3d89484ed7d6ff037f07774a GeneralSFTPFile._close(). This worked in Tahoe-LAFS <= v1.8, because mutable files were necessarily smaller than RAM, but this assumption is no longer valid in Tahoe-LAFS >= v1.9.

GeneralSFTPFile._close does:
self.filenode.overwrite(MutableFileHandle(self.consumer.get_file()))

Oh, you meant that Kevan fixed it in this case. Sorry for being dense.

comment:151 follow-up: ↓ 152 Changed at 2011-08-23T18:53:04Z by davidsarah

I'm confused by this code in publish.py Publish.update :

    def update(self, data, offset, blockhashes, version):
        [...]
        self.datalength = self._node.get_size()
        if data.get_size() > self.datalength:
            self.datalength = data.get_size()

If I understand correctly, self.datalength is the length that we're going to update. Because of the issue in comment:38, we don't do partial updates yet, which might explain why this code sets self.datalength to at least self._node.get_size() (the previous size of the file). However, I don't see how this works when offset > 0. If (offset, self.datalength) is supposed to be the offset and length of the share contents we're going to write, then I would expect:

        self.datalength = max(self._node.get_size(), offset + data.get_size())
        offset = 0

If (offset, self.datalength) is supposed to be only the offset and length of the data the client provided, I would expect just:

        self.datalength = data.get_size()

Changed at 2011-08-25T02:49:08Z by davidsarah

Attachment 393-fix-interfaces.darcs.patch added

Fix interfaces related to MDMF. refs #393 Also some grammar corrections, and removal of 'self' from interface method declarations in interfaces.py

Changed at 2011-08-25T03:18:33Z by davidsarah

Attachment 393-zero-length-reads-todo.darcs.patch added

test_mutable.py: mark zero-length read tests as TODO. refs #393

comment:152 in reply to: ↑ 151 ; follow-up: ↓ 154 Changed at 2011-08-26T17:13:31Z by kevan

Replying to davidsarah:

If I understand correctly, self.datalength is the length that we're going to update. Because of the issue in comment:38, we don't do partial updates yet, which might explain why this code sets self.datalength to at least self._node.get_size() (the previous size of the file). However, I don't see how this works when offset > 0. If (offset, self.datalength) is supposed to be the offset and length of the share contents we're going to write, then I would expect:
        self.datalength = max(self._node.get_size(), offset + data.get_size())
        offset = 0

self.datalength is perhaps confusingly named -- it is the size of the MDMF file if the update operation completes successfully, and is used to calculate segment boundaries and to initialize the tail segment encoder (if the tail segment differs in length from the other segments).

self.datalength should be calculated as in your first example. Actually, due to a bug in TransformingUploadable, it is calculated that way -- for whatever reason, I wrote TransformingUploadable.get_size to return offset + get_size. That's inconsistent with MutableFileHandle.get_size, and not what I would expect a method named get_size to do; I'll fix that. AFAICT, the update method works despite relying on buggy behavior because it is always passed a TransformingUploadable -- full-file updates use MutableFileHandles or MutableData instances.

As an aside, we still do partial updates -- we just do them in such a way as to ensure that all of the changes are written at once, in that we retain all of the updated segments in memory as write vectors, then send those all at once to the storage server. So, IIUC, it isn't necessarily correct to just set offset to zero.

comment:153 in reply to: ↑ 141 Changed at 2011-08-26T20:38:19Z by davidsarah

Replying to davidsarah:

Minor nit: the CLI accepts --mutable-type=mdmf (for example), but not --mutable-type=MDMF. It should probably be case-insensitive.

Also, specifying --mutable-type should probably imply --mutable. Currently there's an assert if the former is given without the latter; asserts should not be used to report usage errors.

Actually the assert at tahoe_put.py line 72 never triggers because the mutable_type variable is only set if mutable is set. So as #1506 says, --mutable-type is silently ignored if --mutable is not present.

comment:154 in reply to: ↑ 152 Changed at 2011-08-26T20:48:51Z by davidsarah

Replying to kevan:

self.datalength is perhaps confusingly named -- it is the size of the MDMF file if the update operation completes successfully, and is used to calculate segment boundaries and to initialize the tail segment encoder (if the tail segment differs in length from the other segments).

I suggest new_length for this.

self.datalength should be calculated as in your first example. Actually, due to a bug in TransformingUploadable, it is calculated that way -- for whatever reason, I wrote TransformingUploadable.get_size to return offset + get_size.

I'm not surprised I was confused! ;-) Thanks for the explanation.

Changed at 2011-08-26T22:19:03Z by kevan

Attachment code-cleanup-and-datalength-fixes.dpatch added

fixes described in comment:139 and comment:152

comment:155 Changed at 2011-08-28T00:14:17Z by david-sarah@…

In ac7b8400d4680a64:

Additional tests for zero-length partial reads and updates to mutable versions. refs #393

comment:156 Changed at 2011-08-28T00:14:21Z by david-sarah@…

In 88989a4ea260dec9:

Additional tests for MDMF URIs and for zero-length files. refs #393

comment:157 Changed at 2011-08-28T00:14:22Z by david-sarah@…

In 3c92b832f2958ba9:

Make the immutable/read-only constraint checking for MDMF URIs identical to that for SSK URIs. refs #393

comment:158 Changed at 2011-08-28T00:15:17Z by warner

I've landed attachment:393-davidsarah.darcs.patch and attachment:393-zero-length-reads-todo.darcs.patch .

Changed at 2011-08-28T03:03:03Z by davidsarah

Attachment verifier-uri-rename.darcs.patch added

Replace *URIVerifier -> *VerifierURI, which is more logical and consistent with existing class names. refs #393

comment:159 Changed at 2011-08-28T03:12:03Z by zooko

I reviewed attachment:verifier-uri-rename.darcs.patch and it looks good to me. By the way, I'm glad this patch is a set of darcs replace instructions and not a much larger set of hunk edits. That made it easier to review, and easier to be sure that applying this patch wouldn't accidentally omit a necessary change. However, I would still be just as happy leaving this patch out of trunk until after 1.9, in order to minimize disruption to the code review and testing work that Zancas and I (hopefully among others) are doing for 1.9.

comment:160 Changed at 2011-08-28T03:35:02Z by davidsarah

In src/allmydata/test/test_cli.py, test_dump_cap_sdmf_directory and test_dump_cap_mdmf_directory are fairly long test methods that differ only in which classes they use. Most of their code should probably be factored out into a helper method.

comment:161 Changed at 2011-09-03T06:48:36Z by zooko

Brian asked me on IRC to investigate the first-segment/last-segment logic to make sure we don't have off-by-one errors, etc., in there. I started reading TransformingUploadable . Then I decided that I needed to understand what the behavior should be, so I stopped reading TransformingUploadable and started writing down what I thought the result should be for the the first-segment/last-segment logic. These notes turned into Python code, which I will attach under the name "choose_segs.py"...

I'll come back and look at this some more tomorrow.

Changed at 2011-09-03T06:49:15Z by zooko

Attachment choose_segs.py added

Changed at 2011-09-03T07:03:34Z by zooko

Attachment choose_segs2.py added

Changed at 2011-09-03T07:07:17Z by zooko

Attachment choose_segs3.py added

comment:162 Changed at 2011-09-03T07:14:15Z by zooko

No sooner had I posted attachment:choose_segs.py than I realized I had forgotten an important case—the case where your write doesn't overwrite the entire last segment of your write, but it is also the last segment of the current version of the file and your write overwrites all of the bytes in the current version of the file.

So then no sooner had I posted attachment:choose_segs2.py , which fixed that, that I realized I had also forgotten the case that your write of the first segment is not overwriting the entire segment but is overwriting all of the bytes in that segment in the current version of the file (because it is also the last segment of your write and the last segment of the current version of the file). So here is attachment:choose_segs3.py . This sort of arithmetic is annoyingly hard, and I'm sleepy, so I wouldn't be surprised if there are more mistakes like that in there.

Still no tests. Perhaps attachment:choose_segs3.py could be the tests for TransformingUploadable ...

comment:163 in reply to: ↑ 141 Changed at 2011-09-05T02:20:02Z by davidsarah

Replying to davidsarah:

Minor nit: the CLI accepts --mutable-type=mdmf (for example), but not --mutable-type=MDMF. It should probably be case-insensitive.

This is #1527.

comment:164 Changed at 2011-09-20T18:34:31Z by david-sarah@…

In 5d3d0dc33656c0ad:

test_mutable.py: skip test_publish_surprise_mdmf, which is causing an error. refs #1534, #393

comment:165 follow-up: ↓ 169 Changed at 2011-09-24T20:16:08Z by davidsarah

Version changed from 1.0.0 to 1.1.0

At src/allmydata/mutable/publish.py@5280#L428, there is a comment about the self.outstanding set: "the second table is our list [sic] of outstanding queries: those which are in flight and may or may not be delivered, accepted, or acknowledged. Items are added to this table when the request is sent, and removed when the response returns (or errbacks)."

However nothing in publish.py removes entries from self.outstanding. In finish_publishing at src/allmydata/mutable/publish.py@5280#L855, the loop adds entries to self.outstanding, and registers an errback to self._connection_problem if writer.finish_publishing fails. But self._connection_problem removes the writer from self.writers, it doesn't remove (writer.peerid, writer.shnum) from self.outstanding. So either the comment or the logic is wrong. (Also there's a reference to _loop at line 872, and I don't see any _loop in that file.)

comment:166 Changed at 2011-09-24T20:17:19Z by davidsarah

Version changed from 1.1.0 to 1.0.0

(Don't kow how that version field got changed.)

comment:167 Changed at 2011-09-24T21:14:14Z by david-sarah@…

In f94eb86fc9232ce6:

mutable/publish.py: simplify by refactoring self.outstanding to self.num_outstanding. refs #393

comment:168 Changed at 2011-09-24T21:14:19Z by david-sarah@…

In 1fa5c729b758776b:

mutable/publish.py: copy the self.writers dict before iterating over it, since we remove elements from it during the iteration. refs #393

comment:169 in reply to: ↑ 165 Changed at 2011-09-25T05:17:14Z by davidsarah

Replying to davidsarah:

However nothing in publish.py removes entries from self.outstanding. In finish_publishing at src/allmydata/mutable/publish.py@5280#L855, the loop adds entries to self.outstanding, and registers an errback to self._connection_problem if writer.finish_publishing fails. But self._connection_problem removes the writer from self.writers, it doesn't remove (writer.peerid, writer.shnum) from self.outstanding. So either the comment or the logic is wrong. (Also there's a reference to _loop at line 872, and I don't see any _loop in that file.)

These were fixed in f94eb86fc9232ce6.

Last edited at 2011-09-25T05:18:43Z by davidsarah (previous) (diff)

comment:170 Changed at 2011-10-10T19:53:52Z by david-sarah@…

In bbb6e5d25e62eaed:

(The changeset message doesn't reference this ticket)

comment:171 Changed at 2011-10-10T19:53:54Z by david-sarah@…

In de00b277cc9adfb0:

interfaces.py: remove get_extension_params and set_extension_params methods from IMutableFileURI. refs #393, #1526

comment:13 Changed at 2011-10-10T20:07:47Z by david-sarah@…

In [5418/ticket999-S3-backend]:

interfaces.py: remove get_extension_params and set_extension_params methods from IMutableFileURI. refs #393, #1526

comment:14 Changed at 2011-10-10T20:10:56Z by david-sarah@…

In [5420/ticket999-S3-backend]:

interfaces.py: fix a typo in the name of IMutableSlotWriter.put_encprivkey. refs #393

comment:15 Changed at 2011-10-13T17:09:08Z by warner

I need this ticket to be closed.. it's too big to be useful, and MDMF landed weeks ago. Can the interested parties create (a finite number of) new tickets for specific issues? Like, one for docs, one for the segmentation question, and one for recommended refactoring/code-cleanups?

comment:16 Changed at 2012-05-13T02:02:17Z by warner

Resolution set to fixed
Status changed from new to closed

I'm closing this one, the code landed forever ago, and I've seen at least a few specific tickets for MDMF issues found since then. If there are comments here that describe problems that are still relevant, please go ahead and file additional bugs on them.

Note: See TracTickets for help on using tickets.

Download in other formats: