Opened at 2011-08-22T05:35:34Z
Last modified at 2011-08-26T20:54:37Z
#1496 assigned defect
make SFTP frontend handle updates to MDMFs without downloading and uploading the entire file
Reported by: | zooko | Owned by: | davidsarah |
---|---|---|---|
Priority: | major | Milestone: | soon |
Component: | code-mutable | Version: | 1.8.2 |
Keywords: | sftp performance mdmf | Cc: | kevan, warner |
Launchpad Bug: |
Description
It appears that the current version of the #393 branch, in the SFTPD frontend, downloads the entire MDMF file and then uploads the entire new version of it, even if the SFTP client has overwritten only a portion of it. This isn't a regression—Tahoe-LAFS v1.8 didn't have MDMF's at all, but did the same download-entire-file-and-upload-entire-new-version in order to let an SFTP client appear to "overwrite" a portion of an immutable file.
However, I think this should probably be considered a blocker for 1.9 final. Users could legitimately expect that the performance benefits of MDMFs -- namely spending only approximately A network usage to overwrite A bytes out of a B-byte MDMF -- will apply when the edit the file through SFTPD as well as when they edit it through the WAPI.
We should update performance.rst to state what the performance of MDMFs is in addition to the performance of SDMFs and immutables. If we were going to ship Tahoe-LAFS v1.9 with the current behavior (which seems like a bad idea to me at the moment), then we would need to add another section of MDMF as edited through SFTPD in addition to MDMF as edited through the WAPI.
Change History (4)
comment:1 Changed at 2011-08-22T05:36:49Z by zooko
- Cc kevan warner added
comment:2 in reply to: ↑ description ; follow-up: ↓ 3 Changed at 2011-08-23T16:06:02Z by davidsarah
- Keywords sftp performance mdmf added
- Priority changed from critical to major
comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed at 2011-08-23T19:27:31Z by davidsarah
- Milestone changed from 1.9.0 to 1.10.0
- Owner set to davidsarah
- Status changed from new to assigned
Replying to davidsarah:
Replying to zooko:
It appears that the current version of the #393 branch, in the SFTPD frontend, downloads the entire MDMF file and then uploads the entire new version of it, even if the SFTP client has overwritten only a portion of it. This isn't a regression—Tahoe-LAFS v1.8 didn't have MDMF's at all, but did the same download-entire-file-and-upload-entire-new-version in order to let an SFTP client appear to "overwrite" a portion of an immutable file.
However, I think this should probably be considered a blocker for 1.9 final.
b) for MDMF files: when the SFTP file handle is closed, overwrite only segments that have changed.
As explained in ticket:393#comment:38 and ticket:393#comment:135, the MDMF uploader always uploads whole shares, even if its client tells it the regions that have changed. So making the SFTP frontend tell it which regions have changed should not be a blocker for 1.9.
(I'm not sure I agree with the reasoning in ticket:393#comment:38, but that's a separate issue.)
If we don't implement this optimization for 1.9, we would just need to add a note that the SFTP frontend does not have any MDMF-specific optimizations, so its performance for MDMF is the same as for SDMF.
Actually, the memory usage for downloads should be better than for SDMF.
comment:4 in reply to: ↑ 3 Changed at 2011-08-26T20:54:37Z by davidsarah
Replying to davidsarah:
Replying to davidsarah:
Replying to zooko:
However, I think this should probably be considered a blocker for 1.9 final.
b) for MDMF files: when the SFTP file handle is closed, overwrite only segments that have changed.
As explained in ticket:393#comment:38 and ticket:393#comment:135, the MDMF uploader always uploads whole shares, even if its client tells it the regions that have changed.
I was mistaken about that; Kevan clarified in ticket:393#comment:152 :
As an aside, we still do partial updates -- we just do them in such a way as to ensure that all of the changes are written at once, in that we retain all of the updated segments in memory as write vectors, then send those all at once to the storage server.
I still think this should not be a blocker for 1.9, though, since it's too big a change.
Replying to zooko:
There are two applicable optimizations.
a) for immutable and MDMF files: download segments out-of-order, i.e. if the client tries to read from a segment beyond the last downloaded segment so far, schedule that segment to be downloaded next.
b) for MDMF files: when the SFTP file handle is closed, overwrite only segments that have changed.
I think you're talking about b). An OverwriteableFileConsumer instance already keeps track of regions that have been overwritten, but it currently discards information about regions that have also been fully downloaded, and it's slightly inconvenient to change that (because we use a heap to provide efficient access to the first remaining region that has not yet been downloaded). It's feasible to implement b) within the 1.9 schedule, but it does require some non-trivial code changes, so we'd probably want to do it before the beta.
I don't think this should be considered a blocker, though. Remember that the SFTP frontend never creates mutable files, even though it can read and write existing ones. So someone using SFTP as their main interface would rarely, if ever, be affected by the performance of MDMF as seen through SFTP.
Also, OverwriteableFileConsumer already has a fairly complicated implementation. I had planned to improve its test coverage before making any further optimizations. Currently it is not as well-tested as the rest of sftpd.py, partly because its behaviour depends nondeterministically on the timing of the download relative to the timing of requests from the SFTP client, which is more difficult to test (although it's possible to make the test deterministic by mocking the downloader).
If we don't implement this optimization for 1.9, we would just need to add a note that the SFTP frontend does not have any MDMF-specific optimizations, so its performance for MDMF is the same as for SDMF.