[tahoe-lafs-trac-stream] [tahoe-lafs] #1513: memory usage in MDMF publish

tahoe-lafs trac at tahoe-lafs.org
Sun Aug 28 15:41:19 PDT 2011


#1513: memory usage in MDMF publish
------------------------------+--------------------------
     Reporter:  warner        |      Owner:
         Type:  defect        |     Status:  new
     Priority:  minor         |  Milestone:  1.9.0
    Component:  code-mutable  |    Version:  1.9.0a1
   Resolution:                |   Keywords:  mutable mdmf
Launchpad Bug:                |
------------------------------+--------------------------

Comment (by warner):

 Hm, there's a tension between reliability and memory-footprint-performance
 here. When making changes, we want each share to atomically jump from
 version1 to version2, without it being left in any intermediate state. But
 that means all of the changes need to be held in memory and applied at the
 same time.

 When we're jumping from "no such share" to version1, those changes are the
 entire file. The data needs to be buffered *somewhere*. If we were allowed
 to write one segment at a time to the server's disk, then a server failure
 or lost connection would leave us in an intermediate state, where the
 share only had a portion of version1, which would effectively be a corrupt
 share.

 I can think of a couple of ways to improve this:

  * special-case the initial share creation: give the client an API to
 incrementally write blocks to the new share, and either allow the world to
 see the incomplete share early, or put the partial share in a separate
 incoming/ directory and figure out a way to only make it visible to the
 client that's building it.
  * create an API to build a new version of the share one change at a time,
 then a second API call to finalize the change (and make the new version
 visible to the world). It might look something like the immutable share-
 building API.:
    * edithandle = share.start_editing()
    * edithandle.apply_delta(offset, newdata)
    * edithandle.finish()
    * edithandle.abort()
    * finish() is the test-and-set operation: it might fail if some other
 writer has completed their own start_editing()/apply_delta()/finish()
 sequence faster.

 If we're willing to tolerate the disk-footprint, we could increase
 reliability against server crashes by making start_editing() create a full
 copy of the old share in a sibling directory (like incoming/, not visible
 to anyone but the edithandle). Then apply_delta() would do normal write()s
 to the copy, and finish() would atomically move the copy back into place.
 Everything in the incoming/ directory would be deleted at startup, and the
 temp copies would also be deleted when the connection to the client was
 lost. This would slow down the updates for large files (since a lot of
 data would need to be shuffled around before the edit could begin), and
 would consume more disk (twice the size of the share), but would allow
 edits to be spread across separate messages, which reduces the client's
 memory requirements. It would also reduce share corruption caused by the
 server being bounced during a mutable write.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1513#comment:1>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list