[tahoe-lafs-trac-stream] [tahoe-lafs] #1496: make SFTP frontend handle updates to MDMFs without downloading and uploading the entire file

tahoe-lafs trac at tahoe-lafs.org
Tue Aug 23 09:06:02 PDT 2011


#1496: make SFTP frontend handle updates to MDMFs without downloading and
uploading the entire file
------------------------------+-----------------------------------
     Reporter:  zooko         |      Owner:
         Type:  defect        |     Status:  new
     Priority:  major         |  Milestone:  1.9.0
    Component:  code-mutable  |    Version:  1.8.2
   Resolution:                |   Keywords:  sftp performance mdmf
Launchpad Bug:                |
------------------------------+-----------------------------------
Changes (by davidsarah):

 * keywords:   => sftp performance mdmf
 * priority:  critical => major


Comment:

 Replying to [ticket:1496 zooko]:
 > It appears that the current version of the #393 branch, in the SFTPD
 frontend,
 [source:ticket393-MDMF-2/src/allmydata/frontends/sftpd.py?annotate=blame&rev=5151#L815
 downloads the entire MDMF file and then uploads the entire new version of
 it], even if the SFTP client has overwritten only a portion of it. This
 isn't a regression—Tahoe-LAFS v1.8 didn't have MDMF's at all, but did
 [source:trunk/src/allmydata/frontends/sftpd.py?annotate=blame&rev=5127#L828
 the same download-entire-file-and-upload-entire-new-version] in order to
 let an SFTP client appear to "overwrite" a portion of an immutable file.
 >
 > However, I think this should probably be considered a blocker for 1.9
 final.

 There are two applicable optimizations.

 a) for immutable and MDMF files: download segments out-of-order, i.e. if
 the client tries to read from a segment beyond the last downloaded segment
 so far, schedule that segment to be downloaded next.

 b) for MDMF files: when the SFTP file handle is closed, overwrite only
 segments that have changed.

 I think you're talking about b). An
 [source:src/allmydata/frontends/sftpd.py at 5179#L294
 OverwriteableFileConsumer] instance already keeps track of regions that
 have been overwritten, but it currently discards information about regions
 that have also been fully downloaded, and it's slightly inconvenient to
 change that (because we use a heap to provide efficient access to the
 first remaining region that has not yet been downloaded). It's feasible to
 implement b) within the 1.9 schedule, but it does require some non-trivial
 code changes, so we'd probably want to do it before the beta.

 I don't think this should be considered a blocker, though. Remember that
 the SFTP frontend never creates mutable files, even though it can read and
 write existing ones. So someone using SFTP as their main interface would
 rarely, if ever, be affected by the performance of MDMF as seen through
 SFTP.

 Also, !OverwriteableFileConsumer already has a fairly complicated
 implementation. I had planned to improve its test coverage before making
 any further optimizations. Currently it is not as well-tested as the rest
 of sftpd.py, partly because its behaviour depends nondeterministically on
 the timing of the download relative to the timing of requests from the
 SFTP client, which is more difficult to test (although it's possible to
 make the test deterministic by mocking the downloader).

 > We should update [source:ticket393-MDMF-2/docs/performance.rst
 performance.rst] to state what the performance of MDMFs is in addition to
 the performance of SDMFs and immutables. If we were going to ship Tahoe-
 LAFS v1.9 with the current behavior (which seems like a bad idea to me at
 the moment), then we would need to add another section of MDMF as edited
 through SFTPD in addition to MDMF as edited through the WAPI.

 If we don't implement this optimization for 1.9, we would just need to add
 a note that the SFTP frontend does not have any MDMF-specific
 optimizations, so its performance for MDMF is the same as for SDMF.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1496#comment:2>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list