[tahoe-lafs-trac-stream] [tahoe-lafs] #1496: make SFTP frontend handle updates to MDMFs without downloading and uploading the entire file
tahoe-lafs
trac at tahoe-lafs.org
Tue Aug 23 09:06:02 PDT 2011
#1496: make SFTP frontend handle updates to MDMFs without downloading and
uploading the entire file
------------------------------+-----------------------------------
Reporter: zooko | Owner:
Type: defect | Status: new
Priority: major | Milestone: 1.9.0
Component: code-mutable | Version: 1.8.2
Resolution: | Keywords: sftp performance mdmf
Launchpad Bug: |
------------------------------+-----------------------------------
Changes (by davidsarah):
* keywords: => sftp performance mdmf
* priority: critical => major
Comment:
Replying to [ticket:1496 zooko]:
> It appears that the current version of the #393 branch, in the SFTPD
frontend,
[source:ticket393-MDMF-2/src/allmydata/frontends/sftpd.py?annotate=blame&rev=5151#L815
downloads the entire MDMF file and then uploads the entire new version of
it], even if the SFTP client has overwritten only a portion of it. This
isn't a regression—Tahoe-LAFS v1.8 didn't have MDMF's at all, but did
[source:trunk/src/allmydata/frontends/sftpd.py?annotate=blame&rev=5127#L828
the same download-entire-file-and-upload-entire-new-version] in order to
let an SFTP client appear to "overwrite" a portion of an immutable file.
>
> However, I think this should probably be considered a blocker for 1.9
final.
There are two applicable optimizations.
a) for immutable and MDMF files: download segments out-of-order, i.e. if
the client tries to read from a segment beyond the last downloaded segment
so far, schedule that segment to be downloaded next.
b) for MDMF files: when the SFTP file handle is closed, overwrite only
segments that have changed.
I think you're talking about b). An
[source:src/allmydata/frontends/sftpd.py at 5179#L294
OverwriteableFileConsumer] instance already keeps track of regions that
have been overwritten, but it currently discards information about regions
that have also been fully downloaded, and it's slightly inconvenient to
change that (because we use a heap to provide efficient access to the
first remaining region that has not yet been downloaded). It's feasible to
implement b) within the 1.9 schedule, but it does require some non-trivial
code changes, so we'd probably want to do it before the beta.
I don't think this should be considered a blocker, though. Remember that
the SFTP frontend never creates mutable files, even though it can read and
write existing ones. So someone using SFTP as their main interface would
rarely, if ever, be affected by the performance of MDMF as seen through
SFTP.
Also, !OverwriteableFileConsumer already has a fairly complicated
implementation. I had planned to improve its test coverage before making
any further optimizations. Currently it is not as well-tested as the rest
of sftpd.py, partly because its behaviour depends nondeterministically on
the timing of the download relative to the timing of requests from the
SFTP client, which is more difficult to test (although it's possible to
make the test deterministic by mocking the downloader).
> We should update [source:ticket393-MDMF-2/docs/performance.rst
performance.rst] to state what the performance of MDMFs is in addition to
the performance of SDMFs and immutables. If we were going to ship Tahoe-
LAFS v1.9 with the current behavior (which seems like a bad idea to me at
the moment), then we would need to add another section of MDMF as edited
through SFTPD in addition to MDMF as edited through the WAPI.
If we don't implement this optimization for 1.9, we would just need to add
a note that the SFTP frontend does not have any MDMF-specific
optimizations, so its performance for MDMF is the same as for SDMF.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1496#comment:2>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list