Ticket #393: 393status39.dpatch

File 393status39.dpatch, 618.6 KB (added by kevan, at 2011-03-01T03:25:39Z)

fix fencepost error, improve tahoe put option parsing

Line 
1Mon Aug  9 16:32:44 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
2  * interfaces.py: Add #993 interfaces
3
4Mon Aug  9 16:35:35 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
5  * frontends/sftpd.py: Modify the sftp frontend to work with the MDMF changes
6
7Mon Aug  9 17:06:19 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
8  * immutable/filenode.py: Make the immutable file node implement the same interfaces as the mutable one
9
10Mon Aug  9 17:06:33 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
11  * immutable/literal.py: implement the same interfaces as other filenodes
12
13Fri Aug 13 16:49:57 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
14  * scripts: tell 'tahoe put' about MDMF
15
16Sat Aug 14 01:10:12 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
17  * web: Alter the webapi to get along with and take advantage of the MDMF changes
18 
19  The main benefit that the webapi gets from MDMF, at least initially, is
20  the ability to do a streaming download of an MDMF mutable file. It also
21  exposes a way (through the PUT verb) to append to or otherwise modify
22  (in-place) an MDMF mutable file.
23
24Sat Aug 14 15:57:11 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
25  * client.py: learn how to create different kinds of mutable files
26
27Wed Aug 18 17:32:16 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
28  * mutable/checker.py and mutable/repair.py: Modify checker and repairer to work with MDMF
29 
30  The checker and repairer required minimal changes to work with the MDMF
31  modifications made elsewhere. The checker duplicated a lot of the code
32  that was already in the downloader, so I modified the downloader
33  slightly to expose this functionality to the checker and removed the
34  duplicated code. The repairer only required a minor change to deal with
35  data representation.
36
37Wed Aug 18 17:32:31 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
38  * mutable/filenode.py: add versions and partial-file updates to the mutable file node
39 
40  One of the goals of MDMF as a GSoC project is to lay the groundwork for
41  LDMF, a format that will allow Tahoe-LAFS to deal with and encourage
42  multiple versions of a single cap on the grid. In line with this, there
43  is a now a distinction between an overriding mutable file (which can be
44  thought to correspond to the cap/unique identifier for that mutable
45  file) and versions of the mutable file (which we can download, update,
46  and so on). All download, upload, and modification operations end up
47  happening on a particular version of a mutable file, but there are
48  shortcut methods on the object representing the overriding mutable file
49  that perform these operations on the best version of the mutable file
50  (which is what code should be doing until we have LDMF and better
51  support for other paradigms).
52 
53  Another goal of MDMF was to take advantage of segmentation to give
54  callers more efficient partial file updates or appends. This patch
55  implements methods that do that, too.
56 
57
58Wed Aug 18 17:33:42 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
59  * mutable/publish.py: Modify the publish process to support MDMF
60 
61  The inner workings of the publishing process needed to be reworked to a
62  large extend to cope with segmented mutable files, and to cope with
63  partial-file updates of mutable files. This patch does that. It also
64  introduces wrappers for uploadable data, allowing the use of
65  filehandle-like objects as data sources, in addition to strings. This
66  reduces memory inefficiency when dealing with large files through the
67  webapi, and clarifies update code there.
68
69Wed Aug 18 17:35:09 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
70  * nodemaker.py: Make nodemaker expose a way to create MDMF files
71
72Sat Aug 14 15:56:44 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
73  * docs: update docs to mention MDMF
74
75Wed Aug 18 17:33:04 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
76  * mutable/layout.py and interfaces.py: add MDMF writer and reader
77 
78  The MDMF writer is responsible for keeping state as plaintext is
79  gradually processed into share data by the upload process. When the
80  upload finishes, it will write all of its share data to a remote server,
81  reporting its status back to the publisher.
82 
83  The MDMF reader is responsible for abstracting an MDMF file as it sits
84  on the grid from the downloader; specifically, by receiving and
85  responding to requests for arbitrary data within the MDMF file.
86 
87  The interfaces.py file has also been modified to contain an interface
88  for the writer.
89
90Wed Aug 18 17:34:09 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
91  * mutable/retrieve.py: Modify the retrieval process to support MDMF
92 
93  The logic behind a mutable file download had to be adapted to work with
94  segmented mutable files; this patch performs those adaptations. It also
95  exposes some decoding and decrypting functionality to make partial-file
96  updates a little easier, and supports efficient random-access downloads
97  of parts of an MDMF file.
98
99Wed Aug 18 17:34:39 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
100  * mutable/servermap.py: Alter the servermap updater to work with MDMF files
101 
102  These modifications were basically all to the end of having the
103  servermap updater use the unified MDMF + SDMF read interface whenever
104  possible -- this reduces the complexity of the code, making it easier to
105  read and maintain. To do this, I needed to modify the process of
106  updating the servermap a little bit.
107 
108  To support partial-file updates, I also modified the servermap updater
109  to fetch the block hash trees and certain segments of files while it
110  performed a servermap update (this can be done without adding any new
111  roundtrips because of batch-read functionality that the read proxy has).
112 
113
114Wed Aug 18 17:35:31 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
115  * tests:
116 
117      - A lot of existing tests relied on aspects of the mutable file
118        implementation that were changed. This patch updates those tests
119        to work with the changes.
120      - This patch also adds tests for new features.
121
122Sun Feb 20 15:02:01 PST 2011  "Brian Warner <warner@lothar.com>"
123  * resolve conflicts between 393-MDMF patches and trunk as of 1.8.2
124
125Sun Feb 20 17:46:59 PST 2011  "Brian Warner <warner@lothar.com>"
126  * mutable/filenode.py: fix create_mutable_file('string')
127
128Sun Feb 20 21:56:00 PST 2011  "Brian Warner <warner@lothar.com>"
129  * resolve more conflicts with current trunk
130
131Sun Feb 20 22:10:04 PST 2011  "Brian Warner <warner@lothar.com>"
132  * update MDMF code with StorageFarmBroker changes
133
134Fri Feb 25 17:04:33 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
135  * mutable/filenode: Clean up servermap handling in MutableFileVersion
136 
137  We want to update the servermap before attempting to modify a file,
138  which we now do. This introduced code duplication, which was addressed
139  by refactoring the servermap update into its own method, and then
140  eliminating duplicate servermap updates throughout the
141  MutableFileVersion.
142
143Sun Feb 27 15:16:43 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
144  * web: Use the string "replace" to trigger whole-file replacement when processing an offset parameter.
145
146Sun Feb 27 16:34:26 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
147  * docs/configuration.rst: fix more conflicts between #393 and trunk
148
149Sun Feb 27 17:06:37 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
150  * mutable/layout: remove references to the salt hash tree.
151
152Sun Feb 27 18:10:56 PST 2011  warner@lothar.com
153  * test_mutable.py: add test to exercise fencepost bug
154
155Mon Feb 28 00:33:27 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
156  * mutable/publish: account for offsets on segment boundaries.
157
158Mon Feb 28 19:08:07 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
159  * tahoe-put: raise UsageError when given a nonsensical mutable type, move option validation code to the option parser.
160
161New patches:
162
163[interfaces.py: Add #993 interfaces
164Kevan Carstensen <kevan@isnotajoke.com>**20100809233244
165 Ignore-this: b58621ac5cc86f1b4b4149f9e6c6a1ce
166] {
167hunk ./src/allmydata/interfaces.py 499
168 class MustNotBeUnknownRWError(CapConstraintError):
169     """Cannot add an unknown child cap specified in a rw_uri field."""
170 
171+
172+class IReadable(Interface):
173+    """I represent a readable object -- either an immutable file, or a
174+    specific version of a mutable file.
175+    """
176+
177+    def is_readonly():
178+        """Return True if this reference provides mutable access to the given
179+        file or directory (i.e. if you can modify it), or False if not. Note
180+        that even if this reference is read-only, someone else may hold a
181+        read-write reference to it.
182+
183+        For an IReadable returned by get_best_readable_version(), this will
184+        always return True, but for instances of subinterfaces such as
185+        IMutableFileVersion, it may return False."""
186+
187+    def is_mutable():
188+        """Return True if this file or directory is mutable (by *somebody*,
189+        not necessarily you), False if it is is immutable. Note that a file
190+        might be mutable overall, but your reference to it might be
191+        read-only. On the other hand, all references to an immutable file
192+        will be read-only; there are no read-write references to an immutable
193+        file."""
194+
195+    def get_storage_index():
196+        """Return the storage index of the file."""
197+
198+    def get_size():
199+        """Return the length (in bytes) of this readable object."""
200+
201+    def download_to_data():
202+        """Download all of the file contents. I return a Deferred that fires
203+        with the contents as a byte string."""
204+
205+    def read(consumer, offset=0, size=None):
206+        """Download a portion (possibly all) of the file's contents, making
207+        them available to the given IConsumer. Return a Deferred that fires
208+        (with the consumer) when the consumer is unregistered (either because
209+        the last byte has been given to it, or because the consumer threw an
210+        exception during write(), possibly because it no longer wants to
211+        receive data). The portion downloaded will start at 'offset' and
212+        contain 'size' bytes (or the remainder of the file if size==None).
213+
214+        The consumer will be used in non-streaming mode: an IPullProducer
215+        will be attached to it.
216+
217+        The consumer will not receive data right away: several network trips
218+        must occur first. The order of events will be::
219+
220+         consumer.registerProducer(p, streaming)
221+          (if streaming == False)::
222+           consumer does p.resumeProducing()
223+            consumer.write(data)
224+           consumer does p.resumeProducing()
225+            consumer.write(data).. (repeat until all data is written)
226+         consumer.unregisterProducer()
227+         deferred.callback(consumer)
228+
229+        If a download error occurs, or an exception is raised by
230+        consumer.registerProducer() or consumer.write(), I will call
231+        consumer.unregisterProducer() and then deliver the exception via
232+        deferred.errback(). To cancel the download, the consumer should call
233+        p.stopProducing(), which will result in an exception being delivered
234+        via deferred.errback().
235+
236+        See src/allmydata/util/consumer.py for an example of a simple
237+        download-to-memory consumer.
238+        """
239+
240+
241+class IWritable(Interface):
242+    """
243+    I define methods that callers can use to update SDMF and MDMF
244+    mutable files on a Tahoe-LAFS grid.
245+    """
246+    # XXX: For the moment, we have only this. It is possible that we
247+    #      want to move overwrite() and modify() in here too.
248+    def update(data, offset):
249+        """
250+        I write the data from my data argument to the MDMF file,
251+        starting at offset. I continue writing data until my data
252+        argument is exhausted, appending data to the file as necessary.
253+        """
254+        # assert IMutableUploadable.providedBy(data)
255+        # to append data: offset=node.get_size_of_best_version()
256+        # do we want to support compacting MDMF?
257+        # for an MDMF file, this can be done with O(data.get_size())
258+        # memory. For an SDMF file, any modification takes
259+        # O(node.get_size_of_best_version()).
260+
261+
262+class IMutableFileVersion(IReadable):
263+    """I provide access to a particular version of a mutable file. The
264+    access is read/write if I was obtained from a filenode derived from
265+    a write cap, or read-only if the filenode was derived from a read cap.
266+    """
267+
268+    def get_sequence_number():
269+        """Return the sequence number of this version."""
270+
271+    def get_servermap():
272+        """Return the IMutableFileServerMap instance that was used to create
273+        this object.
274+        """
275+
276+    def get_writekey():
277+        """Return this filenode's writekey, or None if the node does not have
278+        write-capability. This may be used to assist with data structures
279+        that need to make certain data available only to writers, such as the
280+        read-write child caps in dirnodes. The recommended process is to have
281+        reader-visible data be submitted to the filenode in the clear (where
282+        it will be encrypted by the filenode using the readkey), but encrypt
283+        writer-visible data using this writekey.
284+        """
285+
286+    # TODO: Can this be overwrite instead of replace?
287+    def replace(new_contents):
288+        """Replace the contents of the mutable file, provided that no other
289+        node has published (or is attempting to publish, concurrently) a
290+        newer version of the file than this one.
291+
292+        I will avoid modifying any share that is different than the version
293+        given by get_sequence_number(). However, if another node is writing
294+        to the file at the same time as me, I may manage to update some shares
295+        while they update others. If I see any evidence of this, I will signal
296+        UncoordinatedWriteError, and the file will be left in an inconsistent
297+        state (possibly the version you provided, possibly the old version,
298+        possibly somebody else's version, and possibly a mix of shares from
299+        all of these).
300+
301+        The recommended response to UncoordinatedWriteError is to either
302+        return it to the caller (since they failed to coordinate their
303+        writes), or to attempt some sort of recovery. It may be sufficient to
304+        wait a random interval (with exponential backoff) and repeat your
305+        operation. If I do not signal UncoordinatedWriteError, then I was
306+        able to write the new version without incident.
307+
308+        I return a Deferred that fires (with a PublishStatus object) when the
309+        update has completed.
310+        """
311+
312+    def modify(modifier_cb):
313+        """Modify the contents of the file, by downloading this version,
314+        applying the modifier function (or bound method), then uploading
315+        the new version. This will succeed as long as no other node
316+        publishes a version between the download and the upload.
317+        I return a Deferred that fires (with a PublishStatus object) when
318+        the update is complete.
319+
320+        The modifier callable will be given three arguments: a string (with
321+        the old contents), a 'first_time' boolean, and a servermap. As with
322+        download_to_data(), the old contents will be from this version,
323+        but the modifier can use the servermap to make other decisions
324+        (such as refusing to apply the delta if there are multiple parallel
325+        versions, or if there is evidence of a newer unrecoverable version).
326+        'first_time' will be True the first time the modifier is called,
327+        and False on any subsequent calls.
328+
329+        The callable should return a string with the new contents. The
330+        callable must be prepared to be called multiple times, and must
331+        examine the input string to see if the change that it wants to make
332+        is already present in the old version. If it does not need to make
333+        any changes, it can either return None, or return its input string.
334+
335+        If the modifier raises an exception, it will be returned in the
336+        errback.
337+        """
338+
339+
340 # The hierarchy looks like this:
341 #  IFilesystemNode
342 #   IFileNode
343hunk ./src/allmydata/interfaces.py 758
344     def raise_error():
345         """Raise any error associated with this node."""
346 
347+    # XXX: These may not be appropriate outside the context of an IReadable.
348     def get_size():
349         """Return the length (in bytes) of the data this node represents. For
350         directory nodes, I return the size of the backing store. I return
351hunk ./src/allmydata/interfaces.py 775
352 class IFileNode(IFilesystemNode):
353     """I am a node which represents a file: a sequence of bytes. I am not a
354     container, like IDirectoryNode."""
355+    def get_best_readable_version():
356+        """Return a Deferred that fires with an IReadable for the 'best'
357+        available version of the file. The IReadable provides only read
358+        access, even if this filenode was derived from a write cap.
359 
360hunk ./src/allmydata/interfaces.py 780
361-class IImmutableFileNode(IFileNode):
362-    def read(consumer, offset=0, size=None):
363-        """Download a portion (possibly all) of the file's contents, making
364-        them available to the given IConsumer. Return a Deferred that fires
365-        (with the consumer) when the consumer is unregistered (either because
366-        the last byte has been given to it, or because the consumer threw an
367-        exception during write(), possibly because it no longer wants to
368-        receive data). The portion downloaded will start at 'offset' and
369-        contain 'size' bytes (or the remainder of the file if size==None).
370-
371-        The consumer will be used in non-streaming mode: an IPullProducer
372-        will be attached to it.
373+        For an immutable file, there is only one version. For a mutable
374+        file, the 'best' version is the recoverable version with the
375+        highest sequence number. If no uncoordinated writes have occurred,
376+        and if enough shares are available, then this will be the most
377+        recent version that has been uploaded. If no version is recoverable,
378+        the Deferred will errback with an UnrecoverableFileError.
379+        """
380 
381hunk ./src/allmydata/interfaces.py 788
382-        The consumer will not receive data right away: several network trips
383-        must occur first. The order of events will be::
384+    def download_best_version():
385+        """Download the contents of the version that would be returned
386+        by get_best_readable_version(). This is equivalent to calling
387+        download_to_data() on the IReadable given by that method.
388 
389hunk ./src/allmydata/interfaces.py 793
390-         consumer.registerProducer(p, streaming)
391-          (if streaming == False)::
392-           consumer does p.resumeProducing()
393-            consumer.write(data)
394-           consumer does p.resumeProducing()
395-            consumer.write(data).. (repeat until all data is written)
396-         consumer.unregisterProducer()
397-         deferred.callback(consumer)
398+        I return a Deferred that fires with a byte string when the file
399+        has been fully downloaded. To support streaming download, use
400+        the 'read' method of IReadable. If no version is recoverable,
401+        the Deferred will errback with an UnrecoverableFileError.
402+        """
403 
404hunk ./src/allmydata/interfaces.py 799
405-        If a download error occurs, or an exception is raised by
406-        consumer.registerProducer() or consumer.write(), I will call
407-        consumer.unregisterProducer() and then deliver the exception via
408-        deferred.errback(). To cancel the download, the consumer should call
409-        p.stopProducing(), which will result in an exception being delivered
410-        via deferred.errback().
411+    def get_size_of_best_version():
412+        """Find the size of the version that would be returned by
413+        get_best_readable_version().
414 
415hunk ./src/allmydata/interfaces.py 803
416-        See src/allmydata/util/consumer.py for an example of a simple
417-        download-to-memory consumer.
418+        I return a Deferred that fires with an integer. If no version
419+        is recoverable, the Deferred will errback with an
420+        UnrecoverableFileError.
421         """
422 
423hunk ./src/allmydata/interfaces.py 808
424+
425+class IImmutableFileNode(IFileNode, IReadable):
426+    """I am a node representing an immutable file. Immutable files have
427+    only one version"""
428+
429+
430 class IMutableFileNode(IFileNode):
431     """I provide access to a 'mutable file', which retains its identity
432     regardless of what contents are put in it.
433hunk ./src/allmydata/interfaces.py 873
434     only be retrieved and updated all-at-once, as a single big string. Future
435     versions of our mutable files will remove this restriction.
436     """
437-
438-    def download_best_version():
439-        """Download the 'best' available version of the file, meaning one of
440-        the recoverable versions with the highest sequence number. If no
441+    def get_best_mutable_version():
442+        """Return a Deferred that fires with an IMutableFileVersion for
443+        the 'best' available version of the file. The best version is
444+        the recoverable version with the highest sequence number. If no
445         uncoordinated writes have occurred, and if enough shares are
446hunk ./src/allmydata/interfaces.py 878
447-        available, then this will be the most recent version that has been
448-        uploaded.
449+        available, then this will be the most recent version that has
450+        been uploaded.
451 
452hunk ./src/allmydata/interfaces.py 881
453-        I update an internal servermap with MODE_READ, determine which
454-        version of the file is indicated by
455-        servermap.best_recoverable_version(), and return a Deferred that
456-        fires with its contents. If no version is recoverable, the Deferred
457-        will errback with UnrecoverableFileError.
458-        """
459-
460-    def get_size_of_best_version():
461-        """Find the size of the version that would be downloaded with
462-        download_best_version(), without actually downloading the whole file.
463-
464-        I return a Deferred that fires with an integer.
465+        If no version is recoverable, the Deferred will errback with an
466+        UnrecoverableFileError.
467         """
468 
469     def overwrite(new_contents):
470hunk ./src/allmydata/interfaces.py 921
471         errback.
472         """
473 
474-
475     def get_servermap(mode):
476         """Return a Deferred that fires with an IMutableFileServerMap
477         instance, updated using the given mode.
478hunk ./src/allmydata/interfaces.py 974
479         writer-visible data using this writekey.
480         """
481 
482+    def set_version(version):
483+        """Tahoe-LAFS supports SDMF and MDMF mutable files. By default,
484+        we upload in SDMF for reasons of compatibility. If you want to
485+        change this, set_version will let you do that.
486+
487+        To say that this file should be uploaded in SDMF, pass in a 0. To
488+        say that the file should be uploaded as MDMF, pass in a 1.
489+        """
490+
491+    def get_version():
492+        """Returns the mutable file protocol version."""
493+
494 class NotEnoughSharesError(Exception):
495     """Download was unable to get enough shares"""
496 
497hunk ./src/allmydata/interfaces.py 1822
498         """The upload is finished, and whatever filehandle was in use may be
499         closed."""
500 
501+
502+class IMutableUploadable(Interface):
503+    """
504+    I represent content that is due to be uploaded to a mutable filecap.
505+    """
506+    # This is somewhat simpler than the IUploadable interface above
507+    # because mutable files do not need to be concerned with possibly
508+    # generating a CHK, nor with per-file keys. It is a subset of the
509+    # methods in IUploadable, though, so we could just as well implement
510+    # the mutable uploadables as IUploadables that don't happen to use
511+    # those methods (with the understanding that the unused methods will
512+    # never be called on such objects)
513+    def get_size():
514+        """
515+        Returns a Deferred that fires with the size of the content held
516+        by the uploadable.
517+        """
518+
519+    def read(length):
520+        """
521+        Returns a list of strings which, when concatenated, are the next
522+        length bytes of the file, or fewer if there are fewer bytes
523+        between the current location and the end of the file.
524+        """
525+
526+    def close():
527+        """
528+        The process that used the Uploadable is finished using it, so
529+        the uploadable may be closed.
530+        """
531+
532 class IUploadResults(Interface):
533     """I am returned by upload() methods. I contain a number of public
534     attributes which can be read to determine the results of the upload. Some
535}
536[frontends/sftpd.py: Modify the sftp frontend to work with the MDMF changes
537Kevan Carstensen <kevan@isnotajoke.com>**20100809233535
538 Ignore-this: 2d25e2cfcd0d7bbcbba660c7e1da12f
539] {
540hunk ./src/allmydata/frontends/sftpd.py 33
541 from allmydata.interfaces import IFileNode, IDirectoryNode, ExistingChildError, \
542      NoSuchChildError, ChildOfWrongTypeError
543 from allmydata.mutable.common import NotWriteableError
544+from allmydata.mutable.publish import MutableFileHandle
545 from allmydata.immutable.upload import FileHandle
546 from allmydata.dirnode import update_metadata
547 from allmydata.util.fileutil import EncryptedTemporaryFile
548hunk ./src/allmydata/frontends/sftpd.py 667
549         else:
550             assert IFileNode.providedBy(filenode), filenode
551 
552-            if filenode.is_mutable():
553-                self.async.addCallback(lambda ign: filenode.download_best_version())
554-                def _downloaded(data):
555-                    self.consumer = OverwriteableFileConsumer(len(data), tempfile_maker)
556-                    self.consumer.write(data)
557-                    self.consumer.finish()
558-                    return None
559-                self.async.addCallback(_downloaded)
560-            else:
561-                download_size = filenode.get_size()
562-                assert download_size is not None, "download_size is None"
563+            self.async.addCallback(lambda ignored: filenode.get_best_readable_version())
564+
565+            def _read(version):
566+                if noisy: self.log("_read", level=NOISY)
567+                download_size = version.get_size()
568+                assert download_size is not None
569+
570                 self.consumer = OverwriteableFileConsumer(download_size, tempfile_maker)
571hunk ./src/allmydata/frontends/sftpd.py 675
572-                def _read(ign):
573-                    if noisy: self.log("_read immutable", level=NOISY)
574-                    filenode.read(self.consumer, 0, None)
575-                self.async.addCallback(_read)
576+
577+                version.read(self.consumer, 0, None)
578+            self.async.addCallback(_read)
579 
580         eventually(self.async.callback, None)
581 
582hunk ./src/allmydata/frontends/sftpd.py 821
583                     assert parent and childname, (parent, childname, self.metadata)
584                     d2.addCallback(lambda ign: parent.set_metadata_for(childname, self.metadata))
585 
586-                d2.addCallback(lambda ign: self.consumer.get_current_size())
587-                d2.addCallback(lambda size: self.consumer.read(0, size))
588-                d2.addCallback(lambda new_contents: self.filenode.overwrite(new_contents))
589+                d2.addCallback(lambda ign: self.filenode.overwrite(MutableFileHandle(self.consumer.get_file())))
590             else:
591                 def _add_file(ign):
592                     self.log("_add_file childname=%r" % (childname,), level=OPERATIONAL)
593}
594[immutable/filenode.py: Make the immutable file node implement the same interfaces as the mutable one
595Kevan Carstensen <kevan@isnotajoke.com>**20100810000619
596 Ignore-this: 93e536c0f8efb705310f13ff64621527
597] {
598hunk ./src/allmydata/immutable/filenode.py 8
599 now = time.time
600 from zope.interface import implements, Interface
601 from twisted.internet import defer
602-from twisted.internet.interfaces import IConsumer
603 
604hunk ./src/allmydata/immutable/filenode.py 9
605-from allmydata.interfaces import IImmutableFileNode, IUploadResults
606 from allmydata import uri
607hunk ./src/allmydata/immutable/filenode.py 10
608+from twisted.internet.interfaces import IConsumer
609+from twisted.protocols import basic
610+from foolscap.api import eventually
611+from allmydata.interfaces import IImmutableFileNode, ICheckable, \
612+     IDownloadTarget, IUploadResults
613+from allmydata.util import dictutil, log, base32, consumer
614+from allmydata.immutable.checker import Checker
615 from allmydata.check_results import CheckResults, CheckAndRepairResults
616 from allmydata.util.dictutil import DictOfSets
617 from pycryptopp.cipher.aes import AES
618hunk ./src/allmydata/immutable/filenode.py 296
619         return self._cnode.check_and_repair(monitor, verify, add_lease)
620     def check(self, monitor, verify=False, add_lease=False):
621         return self._cnode.check(monitor, verify, add_lease)
622+
623+    def get_best_readable_version(self):
624+        """
625+        Return an IReadable of the best version of this file. Since
626+        immutable files can have only one version, we just return the
627+        current filenode.
628+        """
629+        return defer.succeed(self)
630+
631+
632+    def download_best_version(self):
633+        """
634+        Download the best version of this file, returning its contents
635+        as a bytestring. Since there is only one version of an immutable
636+        file, we download and return the contents of this file.
637+        """
638+        d = consumer.download_to_data(self)
639+        return d
640+
641+    # for an immutable file, download_to_data (specified in IReadable)
642+    # is the same as download_best_version (specified in IFileNode). For
643+    # mutable files, the difference is more meaningful, since they can
644+    # have multiple versions.
645+    download_to_data = download_best_version
646+
647+
648+    # get_size() (IReadable), get_current_size() (IFilesystemNode), and
649+    # get_size_of_best_version(IFileNode) are all the same for immutable
650+    # files.
651+    get_size_of_best_version = get_current_size
652}
653[immutable/literal.py: implement the same interfaces as other filenodes
654Kevan Carstensen <kevan@isnotajoke.com>**20100810000633
655 Ignore-this: b50dd5df2d34ecd6477b8499a27aef13
656] hunk ./src/allmydata/immutable/literal.py 106
657         d.addCallback(lambda lastSent: consumer)
658         return d
659 
660+    # IReadable, IFileNode, IFilesystemNode
661+    def get_best_readable_version(self):
662+        return defer.succeed(self)
663+
664+
665+    def download_best_version(self):
666+        return defer.succeed(self.u.data)
667+
668+
669+    download_to_data = download_best_version
670+    get_size_of_best_version = get_current_size
671+
672[scripts: tell 'tahoe put' about MDMF
673Kevan Carstensen <kevan@isnotajoke.com>**20100813234957
674 Ignore-this: c106b3384fc676bd3c0fb466d2a52b1b
675] {
676hunk ./src/allmydata/scripts/cli.py 160
677     optFlags = [
678         ("mutable", "m", "Create a mutable file instead of an immutable one."),
679         ]
680+    optParameters = [
681+        ("mutable-type", None, False, "Create a mutable file in the given format. Valid formats are 'sdmf' for SDMF and 'mdmf' for MDMF"),
682+        ]
683 
684     def parseArgs(self, arg1=None, arg2=None):
685         # see Examples below
686hunk ./src/allmydata/scripts/tahoe_put.py 21
687     from_file = options.from_file
688     to_file = options.to_file
689     mutable = options['mutable']
690+    mutable_type = False
691+
692+    if mutable:
693+        mutable_type = options['mutable-type']
694     if options['quiet']:
695         verbosity = 0
696     else:
697hunk ./src/allmydata/scripts/tahoe_put.py 33
698     stdout = options.stdout
699     stderr = options.stderr
700 
701+    if mutable_type and mutable_type not in ('sdmf', 'mdmf'):
702+        # Don't try to pass unsupported types to the webapi
703+        print >>stderr, "error: %s is an invalid format" % mutable_type
704+        return 1
705+
706     if nodeurl[-1] != "/":
707         nodeurl += "/"
708     if to_file:
709hunk ./src/allmydata/scripts/tahoe_put.py 76
710         url = nodeurl + "uri"
711     if mutable:
712         url += "?mutable=true"
713+    if mutable_type:
714+        assert mutable
715+        url += "&mutable-type=%s" % mutable_type
716+
717     if from_file:
718         infileobj = open(os.path.expanduser(from_file), "rb")
719     else:
720}
721[web: Alter the webapi to get along with and take advantage of the MDMF changes
722Kevan Carstensen <kevan@isnotajoke.com>**20100814081012
723 Ignore-this: 96c2ed4e4a9f450fb84db5d711d10bd6
724 
725 The main benefit that the webapi gets from MDMF, at least initially, is
726 the ability to do a streaming download of an MDMF mutable file. It also
727 exposes a way (through the PUT verb) to append to or otherwise modify
728 (in-place) an MDMF mutable file.
729] {
730hunk ./src/allmydata/web/common.py 12
731 from allmydata.interfaces import ExistingChildError, NoSuchChildError, \
732      FileTooLargeError, NotEnoughSharesError, NoSharesError, \
733      EmptyPathnameComponentError, MustBeDeepImmutableError, \
734-     MustBeReadonlyError, MustNotBeUnknownRWError
735+     MustBeReadonlyError, MustNotBeUnknownRWError, SDMF_VERSION, MDMF_VERSION
736 from allmydata.mutable.common import UnrecoverableFileError
737 from allmydata.util import abbreviate
738 from allmydata.util.encodingutil import to_str, quote_output
739hunk ./src/allmydata/web/common.py 35
740     else:
741         return boolean_of_arg(replace)
742 
743+
744+def parse_mutable_type_arg(arg):
745+    if not arg:
746+        return None # interpreted by the caller as "let the nodemaker decide"
747+
748+    arg = arg.lower()
749+    assert arg in ("mdmf", "sdmf")
750+
751+    if arg == "mdmf":
752+        return MDMF_VERSION
753+
754+    return SDMF_VERSION
755+
756+
757+def parse_offset_arg(offset):
758+    # XXX: This will raise a ValueError when invoked on something that
759+    # is not an integer. Is that okay? Or do we want a better error
760+    # message? Since this call is going to be used by programmers and
761+    # their tools rather than users (through the wui), it is not
762+    # inconsistent to return that, I guess.
763+    offset = int(offset)
764+    return offset
765+
766+
767 def get_root(ctx_or_req):
768     req = IRequest(ctx_or_req)
769     # the addSlash=True gives us one extra (empty) segment
770hunk ./src/allmydata/web/directory.py 19
771 from allmydata.uri import from_string_dirnode
772 from allmydata.interfaces import IDirectoryNode, IFileNode, IFilesystemNode, \
773      IImmutableFileNode, IMutableFileNode, ExistingChildError, \
774-     NoSuchChildError, EmptyPathnameComponentError
775+     NoSuchChildError, EmptyPathnameComponentError, SDMF_VERSION, MDMF_VERSION
776 from allmydata.monitor import Monitor, OperationCancelledError
777 from allmydata import dirnode
778 from allmydata.web.common import text_plain, WebError, \
779hunk ./src/allmydata/web/directory.py 153
780         if not t:
781             # render the directory as HTML, using the docFactory and Nevow's
782             # whole templating thing.
783-            return DirectoryAsHTML(self.node)
784+            return DirectoryAsHTML(self.node,
785+                                   self.client.mutable_file_default)
786 
787         if t == "json":
788             return DirectoryJSONMetadata(ctx, self.node)
789hunk ./src/allmydata/web/directory.py 556
790     docFactory = getxmlfile("directory.xhtml")
791     addSlash = True
792 
793-    def __init__(self, node):
794+    def __init__(self, node, default_mutable_format):
795         rend.Page.__init__(self)
796         self.node = node
797 
798hunk ./src/allmydata/web/directory.py 560
799+        assert default_mutable_format in (MDMF_VERSION, SDMF_VERSION)
800+        self.default_mutable_format = default_mutable_format
801+
802     def beforeRender(self, ctx):
803         # attempt to get the dirnode's children, stashing them (or the
804         # failure that results) for later use
805hunk ./src/allmydata/web/directory.py 780
806             ]]
807         forms.append(T.div(class_="freeform-form")[mkdir])
808 
809+        # Build input elements for mutable file type. We do this outside
810+        # of the list so we can check the appropriate format, based on
811+        # the default configured in the client (which reflects the
812+        # default configured in tahoe.cfg)
813+        if self.default_mutable_format == MDMF_VERSION:
814+            mdmf_input = T.input(type='radio', name='mutable-type',
815+                                 id='mutable-type-mdmf', value='mdmf',
816+                                 checked='checked')
817+        else:
818+            mdmf_input = T.input(type='radio', name='mutable-type',
819+                                 id='mutable-type-mdmf', value='mdmf')
820+
821+        if self.default_mutable_format == SDMF_VERSION:
822+            sdmf_input = T.input(type='radio', name='mutable-type',
823+                                 id='mutable-type-sdmf', value='sdmf',
824+                                 checked="checked")
825+        else:
826+            sdmf_input = T.input(type='radio', name='mutable-type',
827+                                 id='mutable-type-sdmf', value='sdmf')
828+
829         upload = T.form(action=".", method="post",
830                         enctype="multipart/form-data")[
831             T.fieldset[
832hunk ./src/allmydata/web/directory.py 812
833             T.input(type="submit", value="Upload"),
834             " Mutable?:",
835             T.input(type="checkbox", name="mutable"),
836+            sdmf_input, T.label(for_="mutable-type-sdmf")["SDMF"],
837+            mdmf_input,
838+            T.label(for_="mutable-type-mdmf")["MDMF (experimental)"],
839             ]]
840         forms.append(T.div(class_="freeform-form")[upload])
841 
842hunk ./src/allmydata/web/directory.py 850
843                 kiddata = ("filenode", {'size': childnode.get_size(),
844                                         'mutable': childnode.is_mutable(),
845                                         })
846+                if childnode.is_mutable() and \
847+                    childnode.get_version() is not None:
848+                    mutable_type = childnode.get_version()
849+                    assert mutable_type in (SDMF_VERSION, MDMF_VERSION)
850+
851+                    if mutable_type == MDMF_VERSION:
852+                        mutable_type = "mdmf"
853+                    else:
854+                        mutable_type = "sdmf"
855+                    kiddata[1]['mutable-type'] = mutable_type
856+
857             elif IDirectoryNode.providedBy(childnode):
858                 kiddata = ("dirnode", {'mutable': childnode.is_mutable()})
859             else:
860hunk ./src/allmydata/web/filenode.py 9
861 from nevow import url, rend
862 from nevow.inevow import IRequest
863 
864-from allmydata.interfaces import ExistingChildError
865+from allmydata.interfaces import ExistingChildError, SDMF_VERSION, MDMF_VERSION
866 from allmydata.monitor import Monitor
867 from allmydata.immutable.upload import FileHandle
868hunk ./src/allmydata/web/filenode.py 12
869+from allmydata.mutable.publish import MutableFileHandle
870+from allmydata.mutable.common import MODE_READ
871 from allmydata.util import log, base32
872 
873 from allmydata.web.common import text_plain, WebError, RenderMixin, \
874hunk ./src/allmydata/web/filenode.py 18
875      boolean_of_arg, get_arg, should_create_intermediate_directories, \
876-     MyExceptionHandler, parse_replace_arg
877+     MyExceptionHandler, parse_replace_arg, parse_offset_arg, \
878+     parse_mutable_type_arg
879 from allmydata.web.check_results import CheckResults, \
880      CheckAndRepairResults, LiteralCheckResults
881 from allmydata.web.info import MoreInfo
882hunk ./src/allmydata/web/filenode.py 29
883         # a new file is being uploaded in our place.
884         mutable = boolean_of_arg(get_arg(req, "mutable", "false"))
885         if mutable:
886-            req.content.seek(0)
887-            data = req.content.read()
888-            d = client.create_mutable_file(data)
889+            mutable_type = parse_mutable_type_arg(get_arg(req,
890+                                                          "mutable-type",
891+                                                          None))
892+            data = MutableFileHandle(req.content)
893+            d = client.create_mutable_file(data, version=mutable_type)
894             def _uploaded(newnode):
895                 d2 = self.parentnode.set_node(self.name, newnode,
896                                               overwrite=replace)
897hunk ./src/allmydata/web/filenode.py 66
898         d.addCallback(lambda res: childnode.get_uri())
899         return d
900 
901-    def _read_data_from_formpost(self, req):
902-        # SDMF: files are small, and we can only upload data, so we read
903-        # the whole file into memory before uploading.
904-        contents = req.fields["file"]
905-        contents.file.seek(0)
906-        data = contents.file.read()
907-        return data
908 
909     def replace_me_with_a_formpost(self, req, client, replace):
910         # create a new file, maybe mutable, maybe immutable
911hunk ./src/allmydata/web/filenode.py 71
912         mutable = boolean_of_arg(get_arg(req, "mutable", "false"))
913 
914+        # create an immutable file
915+        contents = req.fields["file"]
916         if mutable:
917hunk ./src/allmydata/web/filenode.py 74
918-            data = self._read_data_from_formpost(req)
919-            d = client.create_mutable_file(data)
920+            mutable_type = parse_mutable_type_arg(get_arg(req, "mutable-type",
921+                                                          None))
922+            uploadable = MutableFileHandle(contents.file)
923+            d = client.create_mutable_file(uploadable, version=mutable_type)
924             def _uploaded(newnode):
925                 d2 = self.parentnode.set_node(self.name, newnode,
926                                               overwrite=replace)
927hunk ./src/allmydata/web/filenode.py 85
928                 return d2
929             d.addCallback(_uploaded)
930             return d
931-        # create an immutable file
932-        contents = req.fields["file"]
933+
934         uploadable = FileHandle(contents.file, convergence=client.convergence)
935         d = self.parentnode.add_file(self.name, uploadable, overwrite=replace)
936         d.addCallback(lambda newnode: newnode.get_uri())
937hunk ./src/allmydata/web/filenode.py 91
938         return d
939 
940+
941 class PlaceHolderNodeHandler(RenderMixin, rend.Page, ReplaceMeMixin):
942     def __init__(self, client, parentnode, name):
943         rend.Page.__init__(self)
944hunk ./src/allmydata/web/filenode.py 174
945             # properly. So we assume that at least the browser will agree
946             # with itself, and echo back the same bytes that we were given.
947             filename = get_arg(req, "filename", self.name) or "unknown"
948-            if self.node.is_mutable():
949-                # some day: d = self.node.get_best_version()
950-                d = makeMutableDownloadable(self.node)
951-            else:
952-                d = defer.succeed(self.node)
953+            d = self.node.get_best_readable_version()
954             d.addCallback(lambda dn: FileDownloader(dn, filename))
955             return d
956         if t == "json":
957hunk ./src/allmydata/web/filenode.py 178
958-            if self.parentnode and self.name:
959-                d = self.parentnode.get_metadata_for(self.name)
960+            # We do this to make sure that fields like size and
961+            # mutable-type (which depend on the file on the grid and not
962+            # just on the cap) are filled in. The latter gets used in
963+            # tests, in particular.
964+            #
965+            # TODO: Make it so that the servermap knows how to update in
966+            # a mode specifically designed to fill in these fields, and
967+            # then update it in that mode.
968+            if self.node.is_mutable():
969+                d = self.node.get_servermap(MODE_READ)
970             else:
971                 d = defer.succeed(None)
972hunk ./src/allmydata/web/filenode.py 190
973+            if self.parentnode and self.name:
974+                d.addCallback(lambda ignored:
975+                    self.parentnode.get_metadata_for(self.name))
976+            else:
977+                d.addCallback(lambda ignored: None)
978             d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
979             return d
980         if t == "info":
981hunk ./src/allmydata/web/filenode.py 211
982         if t:
983             raise WebError("GET file: bad t=%s" % t)
984         filename = get_arg(req, "filename", self.name) or "unknown"
985-        if self.node.is_mutable():
986-            # some day: d = self.node.get_best_version()
987-            d = makeMutableDownloadable(self.node)
988-        else:
989-            d = defer.succeed(self.node)
990+        d = self.node.get_best_readable_version()
991         d.addCallback(lambda dn: FileDownloader(dn, filename))
992         return d
993 
994hunk ./src/allmydata/web/filenode.py 219
995         req = IRequest(ctx)
996         t = get_arg(req, "t", "").strip()
997         replace = parse_replace_arg(get_arg(req, "replace", "true"))
998+        offset = parse_offset_arg(get_arg(req, "offset", -1))
999 
1000         if not t:
1001hunk ./src/allmydata/web/filenode.py 222
1002-            if self.node.is_mutable():
1003+            if self.node.is_mutable() and offset >= 0:
1004+                return self.update_my_contents(req, offset)
1005+
1006+            elif self.node.is_mutable():
1007                 return self.replace_my_contents(req)
1008             if not replace:
1009                 # this is the early trap: if someone else modifies the
1010hunk ./src/allmydata/web/filenode.py 232
1011                 # directory while we're uploading, the add_file(overwrite=)
1012                 # call in replace_me_with_a_child will do the late trap.
1013                 raise ExistingChildError()
1014+            if offset >= 0:
1015+                raise WebError("PUT to a file: append operation invoked "
1016+                               "on an immutable cap")
1017+
1018+
1019             assert self.parentnode and self.name
1020             return self.replace_me_with_a_child(req, self.client, replace)
1021         if t == "uri":
1022hunk ./src/allmydata/web/filenode.py 299
1023 
1024     def replace_my_contents(self, req):
1025         req.content.seek(0)
1026-        new_contents = req.content.read()
1027+        new_contents = MutableFileHandle(req.content)
1028         d = self.node.overwrite(new_contents)
1029         d.addCallback(lambda res: self.node.get_uri())
1030         return d
1031hunk ./src/allmydata/web/filenode.py 304
1032 
1033+
1034+    def update_my_contents(self, req, offset):
1035+        req.content.seek(0)
1036+        added_contents = MutableFileHandle(req.content)
1037+
1038+        d = self.node.get_best_mutable_version()
1039+        d.addCallback(lambda mv:
1040+            mv.update(added_contents, offset))
1041+        d.addCallback(lambda ignored:
1042+            self.node.get_uri())
1043+        return d
1044+
1045+
1046     def replace_my_contents_with_a_formpost(self, req):
1047         # we have a mutable file. Get the data from the formpost, and replace
1048         # the mutable file's contents with it.
1049hunk ./src/allmydata/web/filenode.py 320
1050-        new_contents = self._read_data_from_formpost(req)
1051+        new_contents = req.fields['file']
1052+        new_contents = MutableFileHandle(new_contents.file)
1053+
1054         d = self.node.overwrite(new_contents)
1055         d.addCallback(lambda res: self.node.get_uri())
1056         return d
1057hunk ./src/allmydata/web/filenode.py 327
1058 
1059-class MutableDownloadable:
1060-    #implements(IDownloadable)
1061-    def __init__(self, size, node):
1062-        self.size = size
1063-        self.node = node
1064-    def get_size(self):
1065-        return self.size
1066-    def is_mutable(self):
1067-        return True
1068-    def read(self, consumer, offset=0, size=None):
1069-        d = self.node.download_best_version()
1070-        d.addCallback(self._got_data, consumer, offset, size)
1071-        return d
1072-    def _got_data(self, contents, consumer, offset, size):
1073-        start = offset
1074-        if size is not None:
1075-            end = offset+size
1076-        else:
1077-            end = self.size
1078-        # SDMF: we can write the whole file in one big chunk
1079-        consumer.write(contents[start:end])
1080-        return consumer
1081-
1082-def makeMutableDownloadable(n):
1083-    d = defer.maybeDeferred(n.get_size_of_best_version)
1084-    d.addCallback(MutableDownloadable, n)
1085-    return d
1086 
1087 class FileDownloader(rend.Page):
1088     # since we override the rendering process (to let the tahoe Downloader
1089hunk ./src/allmydata/web/filenode.py 509
1090     data[1]['mutable'] = filenode.is_mutable()
1091     if edge_metadata is not None:
1092         data[1]['metadata'] = edge_metadata
1093+
1094+    if filenode.is_mutable() and filenode.get_version() is not None:
1095+        mutable_type = filenode.get_version()
1096+        assert mutable_type in (MDMF_VERSION, SDMF_VERSION)
1097+        if mutable_type == MDMF_VERSION:
1098+            mutable_type = "mdmf"
1099+        else:
1100+            mutable_type = "sdmf"
1101+        data[1]['mutable-type'] = mutable_type
1102+
1103     return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx)
1104 
1105 def FileURI(ctx, filenode):
1106hunk ./src/allmydata/web/root.py 15
1107 from allmydata import get_package_versions_string
1108 from allmydata import provisioning
1109 from allmydata.util import idlib, log
1110-from allmydata.interfaces import IFileNode
1111+from allmydata.interfaces import IFileNode, MDMF_VERSION, SDMF_VERSION
1112 from allmydata.web import filenode, directory, unlinked, status, operations
1113 from allmydata.web import reliability, storage
1114 from allmydata.web.common import abbreviate_size, getxmlfile, WebError, \
1115hunk ./src/allmydata/web/root.py 19
1116-     get_arg, RenderMixin, boolean_of_arg
1117+     get_arg, RenderMixin, boolean_of_arg, parse_mutable_type_arg
1118 
1119 
1120 class URIHandler(RenderMixin, rend.Page):
1121hunk ./src/allmydata/web/root.py 50
1122         if t == "":
1123             mutable = boolean_of_arg(get_arg(req, "mutable", "false").strip())
1124             if mutable:
1125-                return unlinked.PUTUnlinkedSSK(req, self.client)
1126+                version = parse_mutable_type_arg(get_arg(req, "mutable-type",
1127+                                                 None))
1128+                return unlinked.PUTUnlinkedSSK(req, self.client, version)
1129             else:
1130                 return unlinked.PUTUnlinkedCHK(req, self.client)
1131         if t == "mkdir":
1132hunk ./src/allmydata/web/root.py 70
1133         if t in ("", "upload"):
1134             mutable = bool(get_arg(req, "mutable", "").strip())
1135             if mutable:
1136-                return unlinked.POSTUnlinkedSSK(req, self.client)
1137+                version = parse_mutable_type_arg(get_arg(req, "mutable-type",
1138+                                                         None))
1139+                return unlinked.POSTUnlinkedSSK(req, self.client, version)
1140             else:
1141                 return unlinked.POSTUnlinkedCHK(req, self.client)
1142         if t == "mkdir":
1143hunk ./src/allmydata/web/root.py 324
1144 
1145     def render_upload_form(self, ctx, data):
1146         # this is a form where users can upload unlinked files
1147+        #
1148+        # for mutable files, users can choose the format by selecting
1149+        # MDMF or SDMF from a radio button. They can also configure a
1150+        # default format in tahoe.cfg, which they rightly expect us to
1151+        # obey. we convey to them that we are obeying their choice by
1152+        # ensuring that the one that they've chosen is selected in the
1153+        # interface.
1154+        if self.client.mutable_file_default == MDMF_VERSION:
1155+            mdmf_input = T.input(type='radio', name='mutable-type',
1156+                                 value='mdmf', id='mutable-type-mdmf',
1157+                                 checked='checked')
1158+        else:
1159+            mdmf_input = T.input(type='radio', name='mutable-type',
1160+                                 value='mdmf', id='mutable-type-mdmf')
1161+
1162+        if self.client.mutable_file_default == SDMF_VERSION:
1163+            sdmf_input = T.input(type='radio', name='mutable-type',
1164+                                 value='sdmf', id='mutable-type-sdmf',
1165+                                 checked='checked')
1166+        else:
1167+            sdmf_input = T.input(type='radio', name='mutable-type',
1168+                                 value='sdmf', id='mutable-type-sdmf')
1169+
1170+
1171         form = T.form(action="uri", method="post",
1172                       enctype="multipart/form-data")[
1173             T.fieldset[
1174hunk ./src/allmydata/web/root.py 356
1175                   T.input(type="file", name="file", class_="freeform-input-file")],
1176             T.input(type="hidden", name="t", value="upload"),
1177             T.div[T.input(type="checkbox", name="mutable"), T.label(for_="mutable")["Create mutable file"],
1178+                  sdmf_input, T.label(for_="mutable-type-sdmf")["SDMF"],
1179+                  mdmf_input,
1180+                  T.label(for_='mutable-type-mdmf')['MDMF (experimental)'],
1181                   " ", T.input(type="submit", value="Upload!")],
1182             ]]
1183         return T.div[form]
1184hunk ./src/allmydata/web/unlinked.py 7
1185 from twisted.internet import defer
1186 from nevow import rend, url, tags as T
1187 from allmydata.immutable.upload import FileHandle
1188+from allmydata.mutable.publish import MutableFileHandle
1189 from allmydata.web.common import getxmlfile, get_arg, boolean_of_arg, \
1190      convert_children_json, WebError
1191 from allmydata.web import status
1192hunk ./src/allmydata/web/unlinked.py 20
1193     # that fires with the URI of the new file
1194     return d
1195 
1196-def PUTUnlinkedSSK(req, client):
1197+def PUTUnlinkedSSK(req, client, version):
1198     # SDMF: files are small, and we can only upload data
1199     req.content.seek(0)
1200hunk ./src/allmydata/web/unlinked.py 23
1201-    data = req.content.read()
1202-    d = client.create_mutable_file(data)
1203+    data = MutableFileHandle(req.content)
1204+    d = client.create_mutable_file(data, version=version)
1205     d.addCallback(lambda n: n.get_uri())
1206     return d
1207 
1208hunk ./src/allmydata/web/unlinked.py 83
1209                       ["/uri/" + res.uri])
1210         return d
1211 
1212-def POSTUnlinkedSSK(req, client):
1213+def POSTUnlinkedSSK(req, client, version):
1214     # "POST /uri", to create an unlinked file.
1215     # SDMF: files are small, and we can only upload data
1216hunk ./src/allmydata/web/unlinked.py 86
1217-    contents = req.fields["file"]
1218-    contents.file.seek(0)
1219-    data = contents.file.read()
1220-    d = client.create_mutable_file(data)
1221+    contents = req.fields["file"].file
1222+    data = MutableFileHandle(contents)
1223+    d = client.create_mutable_file(data, version=version)
1224     d.addCallback(lambda n: n.get_uri())
1225     return d
1226 
1227}
1228[client.py: learn how to create different kinds of mutable files
1229Kevan Carstensen <kevan@isnotajoke.com>**20100814225711
1230 Ignore-this: 61ff665bc050cba5f58bf2ed779d692b
1231] {
1232hunk ./src/allmydata/client.py 25
1233 from allmydata.util.time_format import parse_duration, parse_date
1234 from allmydata.stats import StatsProvider
1235 from allmydata.history import History
1236-from allmydata.interfaces import IStatsProducer, RIStubClient
1237+from allmydata.interfaces import IStatsProducer, RIStubClient, \
1238+                                 SDMF_VERSION, MDMF_VERSION
1239 from allmydata.nodemaker import NodeMaker
1240 
1241 
1242hunk ./src/allmydata/client.py 357
1243                                    self.terminator,
1244                                    self.get_encoding_parameters(),
1245                                    self._key_generator)
1246+        default = self.get_config("client", "mutable.format", default="sdmf")
1247+        if default == "mdmf":
1248+            self.mutable_file_default = MDMF_VERSION
1249+        else:
1250+            self.mutable_file_default = SDMF_VERSION
1251 
1252     def get_history(self):
1253         return self.history
1254hunk ./src/allmydata/client.py 500
1255     def create_immutable_dirnode(self, children, convergence=None):
1256         return self.nodemaker.create_immutable_directory(children, convergence)
1257 
1258-    def create_mutable_file(self, contents=None, keysize=None):
1259-        return self.nodemaker.create_mutable_file(contents, keysize)
1260+    def create_mutable_file(self, contents=None, keysize=None, version=None):
1261+        if not version:
1262+            version = self.mutable_file_default
1263+        return self.nodemaker.create_mutable_file(contents, keysize,
1264+                                                  version=version)
1265 
1266     def upload(self, uploadable):
1267         uploader = self.getServiceNamed("uploader")
1268}
1269[mutable/checker.py and mutable/repair.py: Modify checker and repairer to work with MDMF
1270Kevan Carstensen <kevan@isnotajoke.com>**20100819003216
1271 Ignore-this: d3bd3260742be8964877f0a53543b01b
1272 
1273 The checker and repairer required minimal changes to work with the MDMF
1274 modifications made elsewhere. The checker duplicated a lot of the code
1275 that was already in the downloader, so I modified the downloader
1276 slightly to expose this functionality to the checker and removed the
1277 duplicated code. The repairer only required a minor change to deal with
1278 data representation.
1279] {
1280hunk ./src/allmydata/mutable/checker.py 2
1281 
1282-from twisted.internet import defer
1283-from twisted.python import failure
1284-from allmydata import hashtree
1285 from allmydata.uri import from_string
1286hunk ./src/allmydata/mutable/checker.py 3
1287-from allmydata.util import hashutil, base32, idlib, log
1288+from allmydata.util import base32, idlib, log
1289 from allmydata.check_results import CheckAndRepairResults, CheckResults
1290 
1291 from allmydata.mutable.common import MODE_CHECK, CorruptShareError
1292hunk ./src/allmydata/mutable/checker.py 8
1293 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
1294-from allmydata.mutable.layout import unpack_share, SIGNED_PREFIX_LENGTH
1295+from allmydata.mutable.retrieve import Retrieve # for verifying
1296 
1297 class MutableChecker:
1298 
1299hunk ./src/allmydata/mutable/checker.py 25
1300 
1301     def check(self, verify=False, add_lease=False):
1302         servermap = ServerMap()
1303+        # Updating the servermap in MODE_CHECK will stand a good chance
1304+        # of finding all of the shares, and getting a good idea of
1305+        # recoverability, etc, without verifying.
1306         u = ServermapUpdater(self._node, self._storage_broker, self._monitor,
1307                              servermap, MODE_CHECK, add_lease=add_lease)
1308         if self._history:
1309hunk ./src/allmydata/mutable/checker.py 51
1310         if num_recoverable:
1311             self.best_version = servermap.best_recoverable_version()
1312 
1313+        # The file is unhealthy and needs to be repaired if:
1314+        # - There are unrecoverable versions.
1315         if servermap.unrecoverable_versions():
1316             self.need_repair = True
1317hunk ./src/allmydata/mutable/checker.py 55
1318+        # - There isn't a recoverable version.
1319         if num_recoverable != 1:
1320             self.need_repair = True
1321hunk ./src/allmydata/mutable/checker.py 58
1322+        # - The best recoverable version is missing some shares.
1323         if self.best_version:
1324             available_shares = servermap.shares_available()
1325             (num_distinct_shares, k, N) = available_shares[self.best_version]
1326hunk ./src/allmydata/mutable/checker.py 69
1327 
1328     def _verify_all_shares(self, servermap):
1329         # read every byte of each share
1330+        #
1331+        # This logic is going to be very nearly the same as the
1332+        # downloader. I bet we could pass the downloader a flag that
1333+        # makes it do this, and piggyback onto that instead of
1334+        # duplicating a bunch of code.
1335+        #
1336+        # Like:
1337+        #  r = Retrieve(blah, blah, blah, verify=True)
1338+        #  d = r.download()
1339+        #  (wait, wait, wait, d.callback)
1340+        # 
1341+        #  Then, when it has finished, we can check the servermap (which
1342+        #  we provided to Retrieve) to figure out which shares are bad,
1343+        #  since the Retrieve process will have updated the servermap as
1344+        #  it went along.
1345+        #
1346+        #  By passing the verify=True flag to the constructor, we are
1347+        #  telling the downloader a few things.
1348+        #
1349+        #  1. It needs to download all N shares, not just K shares.
1350+        #  2. It doesn't need to decrypt or decode the shares, only
1351+        #     verify them.
1352         if not self.best_version:
1353             return
1354hunk ./src/allmydata/mutable/checker.py 93
1355-        versionmap = servermap.make_versionmap()
1356-        shares = versionmap[self.best_version]
1357-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
1358-         offsets_tuple) = self.best_version
1359-        offsets = dict(offsets_tuple)
1360-        readv = [ (0, offsets["EOF"]) ]
1361-        dl = []
1362-        for (shnum, peerid, timestamp) in shares:
1363-            ss = servermap.connections[peerid]
1364-            d = self._do_read(ss, peerid, self._storage_index, [shnum], readv)
1365-            d.addCallback(self._got_answer, peerid, servermap)
1366-            dl.append(d)
1367-        return defer.DeferredList(dl, fireOnOneErrback=True, consumeErrors=True)
1368 
1369hunk ./src/allmydata/mutable/checker.py 94
1370-    def _do_read(self, ss, peerid, storage_index, shnums, readv):
1371-        # isolate the callRemote to a separate method, so tests can subclass
1372-        # Publish and override it
1373-        d = ss.callRemote("slot_readv", storage_index, shnums, readv)
1374+        r = Retrieve(self._node, servermap, self.best_version, verify=True)
1375+        d = r.download()
1376+        d.addCallback(self._process_bad_shares)
1377         return d
1378 
1379hunk ./src/allmydata/mutable/checker.py 99
1380-    def _got_answer(self, datavs, peerid, servermap):
1381-        for shnum,datav in datavs.items():
1382-            data = datav[0]
1383-            try:
1384-                self._got_results_one_share(shnum, peerid, data)
1385-            except CorruptShareError:
1386-                f = failure.Failure()
1387-                self.need_repair = True
1388-                self.bad_shares.append( (peerid, shnum, f) )
1389-                prefix = data[:SIGNED_PREFIX_LENGTH]
1390-                servermap.mark_bad_share(peerid, shnum, prefix)
1391-                ss = servermap.connections[peerid]
1392-                self.notify_server_corruption(ss, shnum, str(f.value))
1393-
1394-    def check_prefix(self, peerid, shnum, data):
1395-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
1396-         offsets_tuple) = self.best_version
1397-        got_prefix = data[:SIGNED_PREFIX_LENGTH]
1398-        if got_prefix != prefix:
1399-            raise CorruptShareError(peerid, shnum,
1400-                                    "prefix mismatch: share changed while we were reading it")
1401-
1402-    def _got_results_one_share(self, shnum, peerid, data):
1403-        self.check_prefix(peerid, shnum, data)
1404-
1405-        # the [seqnum:signature] pieces are validated by _compare_prefix,
1406-        # which checks their signature against the pubkey known to be
1407-        # associated with this file.
1408 
1409hunk ./src/allmydata/mutable/checker.py 100
1410-        (seqnum, root_hash, IV, k, N, segsize, datalen, pubkey, signature,
1411-         share_hash_chain, block_hash_tree, share_data,
1412-         enc_privkey) = unpack_share(data)
1413-
1414-        # validate [share_hash_chain,block_hash_tree,share_data]
1415-
1416-        leaves = [hashutil.block_hash(share_data)]
1417-        t = hashtree.HashTree(leaves)
1418-        if list(t) != block_hash_tree:
1419-            raise CorruptShareError(peerid, shnum, "block hash tree failure")
1420-        share_hash_leaf = t[0]
1421-        t2 = hashtree.IncompleteHashTree(N)
1422-        # root_hash was checked by the signature
1423-        t2.set_hashes({0: root_hash})
1424-        try:
1425-            t2.set_hashes(hashes=share_hash_chain,
1426-                          leaves={shnum: share_hash_leaf})
1427-        except (hashtree.BadHashError, hashtree.NotEnoughHashesError,
1428-                IndexError), e:
1429-            msg = "corrupt hashes: %s" % (e,)
1430-            raise CorruptShareError(peerid, shnum, msg)
1431-
1432-        # validate enc_privkey: only possible if we have a write-cap
1433-        if not self._node.is_readonly():
1434-            alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
1435-            alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
1436-            if alleged_writekey != self._node.get_writekey():
1437-                raise CorruptShareError(peerid, shnum, "invalid privkey")
1438+    def _process_bad_shares(self, bad_shares):
1439+        if bad_shares:
1440+            self.need_repair = True
1441+        self.bad_shares = bad_shares
1442 
1443hunk ./src/allmydata/mutable/checker.py 105
1444-    def notify_server_corruption(self, ss, shnum, reason):
1445-        ss.callRemoteOnly("advise_corrupt_share",
1446-                          "mutable", self._storage_index, shnum, reason)
1447 
1448     def _count_shares(self, smap, version):
1449         available_shares = smap.shares_available()
1450hunk ./src/allmydata/mutable/repairer.py 5
1451 from zope.interface import implements
1452 from twisted.internet import defer
1453 from allmydata.interfaces import IRepairResults, ICheckResults
1454+from allmydata.mutable.publish import MutableData
1455 
1456 class RepairResults:
1457     implements(IRepairResults)
1458hunk ./src/allmydata/mutable/repairer.py 108
1459             raise RepairRequiresWritecapError("Sorry, repair currently requires a writecap, to set the write-enabler properly.")
1460 
1461         d = self.node.download_version(smap, best_version, fetch_privkey=True)
1462+        d.addCallback(lambda data:
1463+            MutableData(data))
1464         d.addCallback(self.node.upload, smap)
1465         d.addCallback(self.get_results, smap)
1466         return d
1467}
1468[mutable/filenode.py: add versions and partial-file updates to the mutable file node
1469Kevan Carstensen <kevan@isnotajoke.com>**20100819003231
1470 Ignore-this: b7b5434201fdb9b48f902d7ab25ef45c
1471 
1472 One of the goals of MDMF as a GSoC project is to lay the groundwork for
1473 LDMF, a format that will allow Tahoe-LAFS to deal with and encourage
1474 multiple versions of a single cap on the grid. In line with this, there
1475 is a now a distinction between an overriding mutable file (which can be
1476 thought to correspond to the cap/unique identifier for that mutable
1477 file) and versions of the mutable file (which we can download, update,
1478 and so on). All download, upload, and modification operations end up
1479 happening on a particular version of a mutable file, but there are
1480 shortcut methods on the object representing the overriding mutable file
1481 that perform these operations on the best version of the mutable file
1482 (which is what code should be doing until we have LDMF and better
1483 support for other paradigms).
1484 
1485 Another goal of MDMF was to take advantage of segmentation to give
1486 callers more efficient partial file updates or appends. This patch
1487 implements methods that do that, too.
1488 
1489] {
1490hunk ./src/allmydata/mutable/filenode.py 7
1491 from zope.interface import implements
1492 from twisted.internet import defer, reactor
1493 from foolscap.api import eventually
1494-from allmydata.interfaces import IMutableFileNode, \
1495-     ICheckable, ICheckResults, NotEnoughSharesError
1496-from allmydata.util import hashutil, log
1497+from allmydata.interfaces import IMutableFileNode, ICheckable, ICheckResults, \
1498+     NotEnoughSharesError, MDMF_VERSION, SDMF_VERSION, IMutableUploadable, \
1499+     IMutableFileVersion, IWritable
1500+from allmydata.util import hashutil, log, consumer, deferredutil, mathutil
1501 from allmydata.util.assertutil import precondition
1502 from allmydata.uri import WriteableSSKFileURI, ReadonlySSKFileURI
1503 from allmydata.monitor import Monitor
1504hunk ./src/allmydata/mutable/filenode.py 16
1505 from pycryptopp.cipher.aes import AES
1506 
1507-from allmydata.mutable.publish import Publish
1508+from allmydata.mutable.publish import Publish, MutableData,\
1509+                                      DEFAULT_MAX_SEGMENT_SIZE, \
1510+                                      TransformingUploadable
1511 from allmydata.mutable.common import MODE_READ, MODE_WRITE, UnrecoverableFileError, \
1512      ResponseCache, UncoordinatedWriteError
1513 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
1514hunk ./src/allmydata/mutable/filenode.py 70
1515         self._sharemap = {} # known shares, shnum-to-[nodeids]
1516         self._cache = ResponseCache()
1517         self._most_recent_size = None
1518+        # filled in after __init__ if we're being created for the first time;
1519+        # filled in by the servermap updater before publishing, otherwise.
1520+        # set to this default value in case neither of those things happen,
1521+        # or in case the servermap can't find any shares to tell us what
1522+        # to publish as.
1523+        # TODO: Set this back to None, and find out why the tests fail
1524+        #       with it set to None.
1525+        self._protocol_version = None
1526 
1527         # all users of this MutableFileNode go through the serializer. This
1528         # takes advantage of the fact that Deferreds discard the callbacks
1529hunk ./src/allmydata/mutable/filenode.py 134
1530         return self._upload(initial_contents, None)
1531 
1532     def _get_initial_contents(self, contents):
1533-        if isinstance(contents, str):
1534-            return contents
1535         if contents is None:
1536hunk ./src/allmydata/mutable/filenode.py 135
1537-            return ""
1538+            return MutableData("")
1539+
1540+        if IMutableUploadable.providedBy(contents):
1541+            return contents
1542+
1543         assert callable(contents), "%s should be callable, not %s" % \
1544                (contents, type(contents))
1545         return contents(self)
1546hunk ./src/allmydata/mutable/filenode.py 209
1547 
1548     def get_size(self):
1549         return self._most_recent_size
1550+
1551     def get_current_size(self):
1552         d = self.get_size_of_best_version()
1553         d.addCallback(self._stash_size)
1554hunk ./src/allmydata/mutable/filenode.py 214
1555         return d
1556+
1557     def _stash_size(self, size):
1558         self._most_recent_size = size
1559         return size
1560hunk ./src/allmydata/mutable/filenode.py 273
1561             return cmp(self.__class__, them.__class__)
1562         return cmp(self._uri, them._uri)
1563 
1564-    def _do_serialized(self, cb, *args, **kwargs):
1565-        # note: to avoid deadlock, this callable is *not* allowed to invoke
1566-        # other serialized methods within this (or any other)
1567-        # MutableFileNode. The callable should be a bound method of this same
1568-        # MFN instance.
1569-        d = defer.Deferred()
1570-        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
1571-        # we need to put off d.callback until this Deferred is finished being
1572-        # processed. Otherwise the caller's subsequent activities (like,
1573-        # doing other things with this node) can cause reentrancy problems in
1574-        # the Deferred code itself
1575-        self._serializer.addBoth(lambda res: eventually(d.callback, res))
1576-        # add a log.err just in case something really weird happens, because
1577-        # self._serializer stays around forever, therefore we won't see the
1578-        # usual Unhandled Error in Deferred that would give us a hint.
1579-        self._serializer.addErrback(log.err)
1580-        return d
1581 
1582     #################################
1583     # ICheckable
1584hunk ./src/allmydata/mutable/filenode.py 298
1585 
1586 
1587     #################################
1588-    # IMutableFileNode
1589+    # IFileNode
1590+
1591+    def get_best_readable_version(self):
1592+        """
1593+        I return a Deferred that fires with a MutableFileVersion
1594+        representing the best readable version of the file that I
1595+        represent
1596+        """
1597+        return self.get_readable_version()
1598+
1599+
1600+    def get_readable_version(self, servermap=None, version=None):
1601+        """
1602+        I return a Deferred that fires with an MutableFileVersion for my
1603+        version argument, if there is a recoverable file of that version
1604+        on the grid. If there is no recoverable version, I fire with an
1605+        UnrecoverableFileError.
1606+
1607+        If a servermap is provided, I look in there for the requested
1608+        version. If no servermap is provided, I create and update a new
1609+        one.
1610+
1611+        If no version is provided, then I return a MutableFileVersion
1612+        representing the best recoverable version of the file.
1613+        """
1614+        d = self._get_version_from_servermap(MODE_READ, servermap, version)
1615+        def _build_version((servermap, their_version)):
1616+            assert their_version in servermap.recoverable_versions()
1617+            assert their_version in servermap.make_versionmap()
1618+
1619+            mfv = MutableFileVersion(self,
1620+                                     servermap,
1621+                                     their_version,
1622+                                     self._storage_index,
1623+                                     self._storage_broker,
1624+                                     self._readkey,
1625+                                     history=self._history)
1626+            assert mfv.is_readonly()
1627+            # our caller can use this to download the contents of the
1628+            # mutable file.
1629+            return mfv
1630+        return d.addCallback(_build_version)
1631+
1632+
1633+    def _get_version_from_servermap(self,
1634+                                    mode,
1635+                                    servermap=None,
1636+                                    version=None):
1637+        """
1638+        I return a Deferred that fires with (servermap, version).
1639+
1640+        This function performs validation and a servermap update. If it
1641+        returns (servermap, version), the caller can assume that:
1642+            - servermap was last updated in mode.
1643+            - version is recoverable, and corresponds to the servermap.
1644+
1645+        If version and servermap are provided to me, I will validate
1646+        that version exists in the servermap, and that the servermap was
1647+        updated correctly.
1648+
1649+        If version is not provided, but servermap is, I will validate
1650+        the servermap and return the best recoverable version that I can
1651+        find in the servermap.
1652+
1653+        If the version is provided but the servermap isn't, I will
1654+        obtain a servermap that has been updated in the correct mode and
1655+        validate that version is found and recoverable.
1656+
1657+        If neither servermap nor version are provided, I will obtain a
1658+        servermap updated in the correct mode, and return the best
1659+        recoverable version that I can find in there.
1660+        """
1661+        # XXX: wording ^^^^
1662+        if servermap and servermap.last_update_mode == mode:
1663+            d = defer.succeed(servermap)
1664+        else:
1665+            d = self._get_servermap(mode)
1666+
1667+        def _get_version(servermap, v):
1668+            if v and v not in servermap.recoverable_versions():
1669+                v = None
1670+            elif not v:
1671+                v = servermap.best_recoverable_version()
1672+            if not v:
1673+                raise UnrecoverableFileError("no recoverable versions")
1674+
1675+            return (servermap, v)
1676+        return d.addCallback(_get_version, version)
1677+
1678 
1679     def download_best_version(self):
1680hunk ./src/allmydata/mutable/filenode.py 389
1681+        """
1682+        I return a Deferred that fires with the contents of the best
1683+        version of this mutable file.
1684+        """
1685         return self._do_serialized(self._download_best_version)
1686hunk ./src/allmydata/mutable/filenode.py 394
1687+
1688+
1689     def _download_best_version(self):
1690hunk ./src/allmydata/mutable/filenode.py 397
1691-        servermap = ServerMap()
1692-        d = self._try_once_to_download_best_version(servermap, MODE_READ)
1693-        def _maybe_retry(f):
1694-            f.trap(NotEnoughSharesError)
1695-            # the download is worth retrying once. Make sure to use the
1696-            # old servermap, since it is what remembers the bad shares,
1697-            # but use MODE_WRITE to make it look for even more shares.
1698-            # TODO: consider allowing this to retry multiple times.. this
1699-            # approach will let us tolerate about 8 bad shares, I think.
1700-            return self._try_once_to_download_best_version(servermap,
1701-                                                           MODE_WRITE)
1702+        """
1703+        I am the serialized sibling of download_best_version.
1704+        """
1705+        d = self.get_best_readable_version()
1706+        d.addCallback(self._record_size)
1707+        d.addCallback(lambda version: version.download_to_data())
1708+
1709+        # It is possible that the download will fail because there
1710+        # aren't enough shares to be had. If so, we will try again after
1711+        # updating the servermap in MODE_WRITE, which may find more
1712+        # shares than updating in MODE_READ, as we just did. We can do
1713+        # this by getting the best mutable version and downloading from
1714+        # that -- the best mutable version will be a MutableFileVersion
1715+        # with a servermap that was last updated in MODE_WRITE, as we
1716+        # want. If this fails, then we give up.
1717+        def _maybe_retry(failure):
1718+            failure.trap(NotEnoughSharesError)
1719+
1720+            d = self.get_best_mutable_version()
1721+            d.addCallback(self._record_size)
1722+            d.addCallback(lambda version: version.download_to_data())
1723+            return d
1724+
1725         d.addErrback(_maybe_retry)
1726         return d
1727hunk ./src/allmydata/mutable/filenode.py 422
1728-    def _try_once_to_download_best_version(self, servermap, mode):
1729-        d = self._update_servermap(servermap, mode)
1730-        d.addCallback(self._once_updated_download_best_version, servermap)
1731-        return d
1732-    def _once_updated_download_best_version(self, ignored, servermap):
1733-        goal = servermap.best_recoverable_version()
1734-        if not goal:
1735-            raise UnrecoverableFileError("no recoverable versions")
1736-        return self._try_once_to_download_version(servermap, goal)
1737+
1738+
1739+    def _record_size(self, mfv):
1740+        """
1741+        I record the size of a mutable file version.
1742+        """
1743+        self._most_recent_size = mfv.get_size()
1744+        return mfv
1745+
1746 
1747     def get_size_of_best_version(self):
1748hunk ./src/allmydata/mutable/filenode.py 433
1749-        d = self.get_servermap(MODE_READ)
1750-        def _got_servermap(smap):
1751-            ver = smap.best_recoverable_version()
1752-            if not ver:
1753-                raise UnrecoverableFileError("no recoverable version")
1754-            return smap.size_of_version(ver)
1755-        d.addCallback(_got_servermap)
1756-        return d
1757+        """
1758+        I return the size of the best version of this mutable file.
1759 
1760hunk ./src/allmydata/mutable/filenode.py 436
1761+        This is equivalent to calling get_size() on the result of
1762+        get_best_readable_version().
1763+        """
1764+        d = self.get_best_readable_version()
1765+        return d.addCallback(lambda mfv: mfv.get_size())
1766+
1767+
1768+    #################################
1769+    # IMutableFileNode
1770+
1771+    def get_best_mutable_version(self, servermap=None):
1772+        """
1773+        I return a Deferred that fires with a MutableFileVersion
1774+        representing the best readable version of the file that I
1775+        represent. I am like get_best_readable_version, except that I
1776+        will try to make a writable version if I can.
1777+        """
1778+        return self.get_mutable_version(servermap=servermap)
1779+
1780+
1781+    def get_mutable_version(self, servermap=None, version=None):
1782+        """
1783+        I return a version of this mutable file. I return a Deferred
1784+        that fires with a MutableFileVersion
1785+
1786+        If version is provided, the Deferred will fire with a
1787+        MutableFileVersion initailized with that version. Otherwise, it
1788+        will fire with the best version that I can recover.
1789+
1790+        If servermap is provided, I will use that to find versions
1791+        instead of performing my own servermap update.
1792+        """
1793+        if self.is_readonly():
1794+            return self.get_readable_version(servermap=servermap,
1795+                                             version=version)
1796+
1797+        # get_mutable_version => write intent, so we require that the
1798+        # servermap is updated in MODE_WRITE
1799+        d = self._get_version_from_servermap(MODE_WRITE, servermap, version)
1800+        def _build_version((servermap, smap_version)):
1801+            # these should have been set by the servermap update.
1802+            assert self._secret_holder
1803+            assert self._writekey
1804+
1805+            mfv = MutableFileVersion(self,
1806+                                     servermap,
1807+                                     smap_version,
1808+                                     self._storage_index,
1809+                                     self._storage_broker,
1810+                                     self._readkey,
1811+                                     self._writekey,
1812+                                     self._secret_holder,
1813+                                     history=self._history)
1814+            assert not mfv.is_readonly()
1815+            return mfv
1816+
1817+        return d.addCallback(_build_version)
1818+
1819+
1820+    # XXX: I'm uncomfortable with the difference between upload and
1821+    #      overwrite, which, FWICT, is basically that you don't have to
1822+    #      do a servermap update before you overwrite. We split them up
1823+    #      that way anyway, so I guess there's no real difficulty in
1824+    #      offering both ways to callers, but it also makes the
1825+    #      public-facing API cluttery, and makes it hard to discern the
1826+    #      right way of doing things.
1827+
1828+    # In general, we leave it to callers to ensure that they aren't
1829+    # going to cause UncoordinatedWriteErrors when working with
1830+    # MutableFileVersions. We know that the next three operations
1831+    # (upload, overwrite, and modify) will all operate on the same
1832+    # version, so we say that only one of them can be going on at once,
1833+    # and serialize them to ensure that that actually happens, since as
1834+    # the caller in this situation it is our job to do that.
1835     def overwrite(self, new_contents):
1836hunk ./src/allmydata/mutable/filenode.py 511
1837+        """
1838+        I overwrite the contents of the best recoverable version of this
1839+        mutable file with new_contents. This is equivalent to calling
1840+        overwrite on the result of get_best_mutable_version with
1841+        new_contents as an argument. I return a Deferred that eventually
1842+        fires with the results of my replacement process.
1843+        """
1844         return self._do_serialized(self._overwrite, new_contents)
1845hunk ./src/allmydata/mutable/filenode.py 519
1846+
1847+
1848     def _overwrite(self, new_contents):
1849hunk ./src/allmydata/mutable/filenode.py 522
1850+        """
1851+        I am the serialized sibling of overwrite.
1852+        """
1853+        d = self.get_best_mutable_version()
1854+        d.addCallback(lambda mfv: mfv.overwrite(new_contents))
1855+        d.addCallback(self._did_upload, new_contents.get_size())
1856+        return d
1857+
1858+
1859+
1860+    def upload(self, new_contents, servermap):
1861+        """
1862+        I overwrite the contents of the best recoverable version of this
1863+        mutable file with new_contents, using servermap instead of
1864+        creating/updating our own servermap. I return a Deferred that
1865+        fires with the results of my upload.
1866+        """
1867+        return self._do_serialized(self._upload, new_contents, servermap)
1868+
1869+
1870+    def modify(self, modifier, backoffer=None):
1871+        """
1872+        I modify the contents of the best recoverable version of this
1873+        mutable file with the modifier. This is equivalent to calling
1874+        modify on the result of get_best_mutable_version. I return a
1875+        Deferred that eventually fires with an UploadResults instance
1876+        describing this process.
1877+        """
1878+        return self._do_serialized(self._modify, modifier, backoffer)
1879+
1880+
1881+    def _modify(self, modifier, backoffer):
1882+        """
1883+        I am the serialized sibling of modify.
1884+        """
1885+        d = self.get_best_mutable_version()
1886+        d.addCallback(lambda mfv: mfv.modify(modifier, backoffer))
1887+        return d
1888+
1889+
1890+    def download_version(self, servermap, version, fetch_privkey=False):
1891+        """
1892+        Download the specified version of this mutable file. I return a
1893+        Deferred that fires with the contents of the specified version
1894+        as a bytestring, or errbacks if the file is not recoverable.
1895+        """
1896+        d = self.get_readable_version(servermap, version)
1897+        return d.addCallback(lambda mfv: mfv.download_to_data(fetch_privkey))
1898+
1899+
1900+    def get_servermap(self, mode):
1901+        """
1902+        I return a servermap that has been updated in mode.
1903+
1904+        mode should be one of MODE_READ, MODE_WRITE, MODE_CHECK or
1905+        MODE_ANYTHING. See servermap.py for more on what these mean.
1906+        """
1907+        return self._do_serialized(self._get_servermap, mode)
1908+
1909+
1910+    def _get_servermap(self, mode):
1911+        """
1912+        I am a serialized twin to get_servermap.
1913+        """
1914         servermap = ServerMap()
1915hunk ./src/allmydata/mutable/filenode.py 587
1916-        d = self._update_servermap(servermap, mode=MODE_WRITE)
1917-        d.addCallback(lambda ignored: self._upload(new_contents, servermap))
1918+        d = self._update_servermap(servermap, mode)
1919+        # The servermap will tell us about the most recent size of the
1920+        # file, so we may as well set that so that callers might get
1921+        # more data about us.
1922+        if not self._most_recent_size:
1923+            d.addCallback(self._get_size_from_servermap)
1924+        return d
1925+
1926+
1927+    def _get_size_from_servermap(self, servermap):
1928+        """
1929+        I extract the size of the best version of this file and record
1930+        it in self._most_recent_size. I return the servermap that I was
1931+        given.
1932+        """
1933+        if servermap.recoverable_versions():
1934+            v = servermap.best_recoverable_version()
1935+            size = v[4] # verinfo[4] == size
1936+            self._most_recent_size = size
1937+        return servermap
1938+
1939+
1940+    def _update_servermap(self, servermap, mode):
1941+        u = ServermapUpdater(self, self._storage_broker, Monitor(), servermap,
1942+                             mode)
1943+        if self._history:
1944+            self._history.notify_mapupdate(u.get_status())
1945+        return u.update()
1946+
1947+
1948+    def set_version(self, version):
1949+        # I can be set in two ways:
1950+        #  1. When the node is created.
1951+        #  2. (for an existing share) when the Servermap is updated
1952+        #     before I am read.
1953+        assert version in (MDMF_VERSION, SDMF_VERSION)
1954+        self._protocol_version = version
1955+
1956+
1957+    def get_version(self):
1958+        return self._protocol_version
1959+
1960+
1961+    def _do_serialized(self, cb, *args, **kwargs):
1962+        # note: to avoid deadlock, this callable is *not* allowed to invoke
1963+        # other serialized methods within this (or any other)
1964+        # MutableFileNode. The callable should be a bound method of this same
1965+        # MFN instance.
1966+        d = defer.Deferred()
1967+        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
1968+        # we need to put off d.callback until this Deferred is finished being
1969+        # processed. Otherwise the caller's subsequent activities (like,
1970+        # doing other things with this node) can cause reentrancy problems in
1971+        # the Deferred code itself
1972+        self._serializer.addBoth(lambda res: eventually(d.callback, res))
1973+        # add a log.err just in case something really weird happens, because
1974+        # self._serializer stays around forever, therefore we won't see the
1975+        # usual Unhandled Error in Deferred that would give us a hint.
1976+        self._serializer.addErrback(log.err)
1977         return d
1978 
1979 
1980hunk ./src/allmydata/mutable/filenode.py 649
1981+    def _upload(self, new_contents, servermap):
1982+        """
1983+        A MutableFileNode still has to have some way of getting
1984+        published initially, which is what I am here for. After that,
1985+        all publishing, updating, modifying and so on happens through
1986+        MutableFileVersions.
1987+        """
1988+        assert self._pubkey, "update_servermap must be called before publish"
1989+
1990+        p = Publish(self, self._storage_broker, servermap)
1991+        if self._history:
1992+            self._history.notify_publish(p.get_status(),
1993+                                         new_contents.get_size())
1994+        d = p.publish(new_contents)
1995+        d.addCallback(self._did_upload, new_contents.get_size())
1996+        return d
1997+
1998+
1999+    def _did_upload(self, res, size):
2000+        self._most_recent_size = size
2001+        return res
2002+
2003+
2004+class MutableFileVersion:
2005+    """
2006+    I represent a specific version (most likely the best version) of a
2007+    mutable file.
2008+
2009+    Since I implement IReadable, instances which hold a
2010+    reference to an instance of me are guaranteed the ability (absent
2011+    connection difficulties or unrecoverable versions) to read the file
2012+    that I represent. Depending on whether I was initialized with a
2013+    write capability or not, I may also provide callers the ability to
2014+    overwrite or modify the contents of the mutable file that I
2015+    reference.
2016+    """
2017+    implements(IMutableFileVersion, IWritable)
2018+
2019+    def __init__(self,
2020+                 node,
2021+                 servermap,
2022+                 version,
2023+                 storage_index,
2024+                 storage_broker,
2025+                 readcap,
2026+                 writekey=None,
2027+                 write_secrets=None,
2028+                 history=None):
2029+
2030+        self._node = node
2031+        self._servermap = servermap
2032+        self._version = version
2033+        self._storage_index = storage_index
2034+        self._write_secrets = write_secrets
2035+        self._history = history
2036+        self._storage_broker = storage_broker
2037+
2038+        #assert isinstance(readcap, IURI)
2039+        self._readcap = readcap
2040+
2041+        self._writekey = writekey
2042+        self._serializer = defer.succeed(None)
2043+
2044+
2045+    def get_sequence_number(self):
2046+        """
2047+        Get the sequence number of the mutable version that I represent.
2048+        """
2049+        return self._version[0] # verinfo[0] == the sequence number
2050+
2051+
2052+    # TODO: Terminology?
2053+    def get_writekey(self):
2054+        """
2055+        I return a writekey or None if I don't have a writekey.
2056+        """
2057+        return self._writekey
2058+
2059+
2060+    def overwrite(self, new_contents):
2061+        """
2062+        I overwrite the contents of this mutable file version with the
2063+        data in new_contents.
2064+        """
2065+        assert not self.is_readonly()
2066+
2067+        return self._do_serialized(self._overwrite, new_contents)
2068+
2069+
2070+    def _overwrite(self, new_contents):
2071+        assert IMutableUploadable.providedBy(new_contents)
2072+        assert self._servermap.last_update_mode == MODE_WRITE
2073+
2074+        return self._upload(new_contents)
2075+
2076+
2077     def modify(self, modifier, backoffer=None):
2078         """I use a modifier callback to apply a change to the mutable file.
2079         I implement the following pseudocode::
2080hunk ./src/allmydata/mutable/filenode.py 785
2081         backoffer should not invoke any methods on this MutableFileNode
2082         instance, and it needs to be highly conscious of deadlock issues.
2083         """
2084+        assert not self.is_readonly()
2085+
2086         return self._do_serialized(self._modify, modifier, backoffer)
2087hunk ./src/allmydata/mutable/filenode.py 788
2088+
2089+
2090     def _modify(self, modifier, backoffer):
2091hunk ./src/allmydata/mutable/filenode.py 791
2092-        servermap = ServerMap()
2093         if backoffer is None:
2094             backoffer = BackoffAgent().delay
2095hunk ./src/allmydata/mutable/filenode.py 793
2096-        return self._modify_and_retry(servermap, modifier, backoffer, True)
2097-    def _modify_and_retry(self, servermap, modifier, backoffer, first_time):
2098-        d = self._modify_once(servermap, modifier, first_time)
2099+        return self._modify_and_retry(modifier, backoffer, True)
2100+
2101+
2102+    def _modify_and_retry(self, modifier, backoffer, first_time):
2103+        """
2104+        I try to apply modifier to the contents of this version of the
2105+        mutable file. If I succeed, I return an UploadResults instance
2106+        describing my success. If I fail, I try again after waiting for
2107+        a little bit.
2108+        """
2109+        log.msg("doing modify")
2110+        d = self._modify_once(modifier, first_time)
2111         def _retry(f):
2112             f.trap(UncoordinatedWriteError)
2113             d2 = defer.maybeDeferred(backoffer, self, f)
2114hunk ./src/allmydata/mutable/filenode.py 809
2115             d2.addCallback(lambda ignored:
2116-                           self._modify_and_retry(servermap, modifier,
2117+                           self._modify_and_retry(modifier,
2118                                                   backoffer, False))
2119             return d2
2120         d.addErrback(_retry)
2121hunk ./src/allmydata/mutable/filenode.py 814
2122         return d
2123-    def _modify_once(self, servermap, modifier, first_time):
2124-        d = self._update_servermap(servermap, MODE_WRITE)
2125-        d.addCallback(self._once_updated_download_best_version, servermap)
2126+
2127+
2128+    def _modify_once(self, modifier, first_time):
2129+        """
2130+        I attempt to apply a modifier to the contents of the mutable
2131+        file.
2132+        """
2133+        # XXX: This is wrong -- we could get more servers if we updated
2134+        # in MODE_ANYTHING and possibly MODE_CHECK. Probably we want to
2135+        # assert that the last update wasn't MODE_READ
2136+        assert self._servermap.last_update_mode == MODE_WRITE
2137+
2138+        # download_to_data is serialized, so we have to call this to
2139+        # avoid deadlock.
2140+        d = self._try_to_download_data()
2141         def _apply(old_contents):
2142hunk ./src/allmydata/mutable/filenode.py 830
2143-            new_contents = modifier(old_contents, servermap, first_time)
2144+            new_contents = modifier(old_contents, self._servermap, first_time)
2145+            precondition((isinstance(new_contents, str) or
2146+                          new_contents is None),
2147+                         "Modifier function must return a string "
2148+                         "or None")
2149+
2150             if new_contents is None or new_contents == old_contents:
2151hunk ./src/allmydata/mutable/filenode.py 837
2152+                log.msg("no changes")
2153                 # no changes need to be made
2154                 if first_time:
2155                     return
2156hunk ./src/allmydata/mutable/filenode.py 845
2157                 # recovery when it observes UCWE, we need to do a second
2158                 # publish. See #551 for details. We'll basically loop until
2159                 # we managed an uncontested publish.
2160-                new_contents = old_contents
2161-            precondition(isinstance(new_contents, str),
2162-                         "Modifier function must return a string or None")
2163-            return self._upload(new_contents, servermap)
2164+                old_uploadable = MutableData(old_contents)
2165+                new_contents = old_uploadable
2166+            else:
2167+                new_contents = MutableData(new_contents)
2168+
2169+            return self._upload(new_contents)
2170         d.addCallback(_apply)
2171         return d
2172 
2173hunk ./src/allmydata/mutable/filenode.py 854
2174-    def get_servermap(self, mode):
2175-        return self._do_serialized(self._get_servermap, mode)
2176-    def _get_servermap(self, mode):
2177-        servermap = ServerMap()
2178-        return self._update_servermap(servermap, mode)
2179-    def _update_servermap(self, servermap, mode):
2180-        u = ServermapUpdater(self, self._storage_broker, Monitor(), servermap,
2181-                             mode)
2182-        if self._history:
2183-            self._history.notify_mapupdate(u.get_status())
2184-        return u.update()
2185 
2186hunk ./src/allmydata/mutable/filenode.py 855
2187-    def download_version(self, servermap, version, fetch_privkey=False):
2188-        return self._do_serialized(self._try_once_to_download_version,
2189-                                   servermap, version, fetch_privkey)
2190-    def _try_once_to_download_version(self, servermap, version,
2191-                                      fetch_privkey=False):
2192-        r = Retrieve(self, servermap, version, fetch_privkey)
2193+    def is_readonly(self):
2194+        """
2195+        I return True if this MutableFileVersion provides no write
2196+        access to the file that it encapsulates, and False if it
2197+        provides the ability to modify the file.
2198+        """
2199+        return self._writekey is None
2200+
2201+
2202+    def is_mutable(self):
2203+        """
2204+        I return True, since mutable files are always mutable by
2205+        somebody.
2206+        """
2207+        return True
2208+
2209+
2210+    def get_storage_index(self):
2211+        """
2212+        I return the storage index of the reference that I encapsulate.
2213+        """
2214+        return self._storage_index
2215+
2216+
2217+    def get_size(self):
2218+        """
2219+        I return the length, in bytes, of this readable object.
2220+        """
2221+        return self._servermap.size_of_version(self._version)
2222+
2223+
2224+    def download_to_data(self, fetch_privkey=False):
2225+        """
2226+        I return a Deferred that fires with the contents of this
2227+        readable object as a byte string.
2228+
2229+        """
2230+        c = consumer.MemoryConsumer()
2231+        d = self.read(c, fetch_privkey=fetch_privkey)
2232+        d.addCallback(lambda mc: "".join(mc.chunks))
2233+        return d
2234+
2235+
2236+    def _try_to_download_data(self):
2237+        """
2238+        I am an unserialized cousin of download_to_data; I am called
2239+        from the children of modify() to download the data associated
2240+        with this mutable version.
2241+        """
2242+        c = consumer.MemoryConsumer()
2243+        # modify will almost certainly write, so we need the privkey.
2244+        d = self._read(c, fetch_privkey=True)
2245+        d.addCallback(lambda mc: "".join(mc.chunks))
2246+        return d
2247+
2248+
2249+    def read(self, consumer, offset=0, size=None, fetch_privkey=False):
2250+        """
2251+        I read a portion (possibly all) of the mutable file that I
2252+        reference into consumer.
2253+        """
2254+        return self._do_serialized(self._read, consumer, offset, size,
2255+                                   fetch_privkey)
2256+
2257+
2258+    def _read(self, consumer, offset=0, size=None, fetch_privkey=False):
2259+        """
2260+        I am the serialized companion of read.
2261+        """
2262+        r = Retrieve(self._node, self._servermap, self._version, fetch_privkey)
2263         if self._history:
2264             self._history.notify_retrieve(r.get_status())
2265hunk ./src/allmydata/mutable/filenode.py 927
2266-        d = r.download()
2267-        d.addCallback(self._downloaded_version)
2268+        d = r.download(consumer, offset, size)
2269         return d
2270hunk ./src/allmydata/mutable/filenode.py 929
2271-    def _downloaded_version(self, data):
2272-        self._most_recent_size = len(data)
2273-        return data
2274 
2275hunk ./src/allmydata/mutable/filenode.py 930
2276-    def upload(self, new_contents, servermap):
2277-        return self._do_serialized(self._upload, new_contents, servermap)
2278-    def _upload(self, new_contents, servermap):
2279-        assert self._pubkey, "update_servermap must be called before publish"
2280-        p = Publish(self, self._storage_broker, servermap)
2281+
2282+    def _do_serialized(self, cb, *args, **kwargs):
2283+        # note: to avoid deadlock, this callable is *not* allowed to invoke
2284+        # other serialized methods within this (or any other)
2285+        # MutableFileNode. The callable should be a bound method of this same
2286+        # MFN instance.
2287+        d = defer.Deferred()
2288+        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
2289+        # we need to put off d.callback until this Deferred is finished being
2290+        # processed. Otherwise the caller's subsequent activities (like,
2291+        # doing other things with this node) can cause reentrancy problems in
2292+        # the Deferred code itself
2293+        self._serializer.addBoth(lambda res: eventually(d.callback, res))
2294+        # add a log.err just in case something really weird happens, because
2295+        # self._serializer stays around forever, therefore we won't see the
2296+        # usual Unhandled Error in Deferred that would give us a hint.
2297+        self._serializer.addErrback(log.err)
2298+        return d
2299+
2300+
2301+    def _upload(self, new_contents):
2302+        #assert self._pubkey, "update_servermap must be called before publish"
2303+        p = Publish(self._node, self._storage_broker, self._servermap)
2304         if self._history:
2305hunk ./src/allmydata/mutable/filenode.py 954
2306-            self._history.notify_publish(p.get_status(), len(new_contents))
2307+            self._history.notify_publish(p.get_status(),
2308+                                         new_contents.get_size())
2309         d = p.publish(new_contents)
2310hunk ./src/allmydata/mutable/filenode.py 957
2311-        d.addCallback(self._did_upload, len(new_contents))
2312+        d.addCallback(self._did_upload, new_contents.get_size())
2313         return d
2314hunk ./src/allmydata/mutable/filenode.py 959
2315+
2316+
2317     def _did_upload(self, res, size):
2318         self._most_recent_size = size
2319         return res
2320hunk ./src/allmydata/mutable/filenode.py 964
2321+
2322+    def update(self, data, offset):
2323+        """
2324+        Do an update of this mutable file version by inserting data at
2325+        offset within the file. If offset is the EOF, this is an append
2326+        operation. I return a Deferred that fires with the results of
2327+        the update operation when it has completed.
2328+
2329+        In cases where update does not append any data, or where it does
2330+        not append so many blocks that the block count crosses a
2331+        power-of-two boundary, this operation will use roughly
2332+        O(data.get_size()) memory/bandwidth/CPU to perform the update.
2333+        Otherwise, it must download, re-encode, and upload the entire
2334+        file again, which will use O(filesize) resources.
2335+        """
2336+        return self._do_serialized(self._update, data, offset)
2337+
2338+
2339+    def _update(self, data, offset):
2340+        """
2341+        I update the mutable file version represented by this particular
2342+        IMutableVersion by inserting the data in data at the offset
2343+        offset. I return a Deferred that fires when this has been
2344+        completed.
2345+        """
2346+        # We have two cases here:
2347+        # 1. The new data will add few enough segments so that it does
2348+        #    not cross into the next power-of-two boundary.
2349+        # 2. It doesn't.
2350+        #
2351+        # In the former case, we can modify the file in place. In the
2352+        # latter case, we need to re-encode the file.
2353+        new_size = data.get_size() + offset
2354+        old_size = self.get_size()
2355+        segment_size = self._version[3]
2356+        num_old_segments = mathutil.div_ceil(old_size,
2357+                                             segment_size)
2358+        num_new_segments = mathutil.div_ceil(new_size,
2359+                                             segment_size)
2360+        log.msg("got %d old segments, %d new segments" % \
2361+                        (num_old_segments, num_new_segments))
2362+
2363+        # We also do a whole file re-encode if the file is an SDMF file.
2364+        if self._version[2]: # version[2] == SDMF salt, which MDMF lacks
2365+            log.msg("doing re-encode instead of in-place update")
2366+            return self._do_modify_update(data, offset)
2367+
2368+        log.msg("updating in place")
2369+        d = self._do_update_update(data, offset)
2370+        d.addCallback(self._decode_and_decrypt_segments, data, offset)
2371+        d.addCallback(self._build_uploadable_and_finish, data, offset)
2372+        return d
2373+
2374+
2375+    def _do_modify_update(self, data, offset):
2376+        """
2377+        I perform a file update by modifying the contents of the file
2378+        after downloading it, then reuploading it. I am less efficient
2379+        than _do_update_update, but am necessary for certain updates.
2380+        """
2381+        def m(old, servermap, first_time):
2382+            start = offset
2383+            rest = offset + data.get_size()
2384+            new = old[:start]
2385+            new += "".join(data.read(data.get_size()))
2386+            new += old[rest:]
2387+            return new
2388+        return self._modify(m, None)
2389+
2390+
2391+    def _do_update_update(self, data, offset):
2392+        """
2393+        I start the Servermap update that gets us the data we need to
2394+        continue the update process. I return a Deferred that fires when
2395+        the servermap update is done.
2396+        """
2397+        assert IMutableUploadable.providedBy(data)
2398+        assert self.is_mutable()
2399+        # offset == self.get_size() is valid and means that we are
2400+        # appending data to the file.
2401+        assert offset <= self.get_size()
2402+
2403+        # We'll need the segment that the data starts in, regardless of
2404+        # what we'll do later.
2405+        start_segment = mathutil.div_ceil(offset, DEFAULT_MAX_SEGMENT_SIZE)
2406+        start_segment -= 1
2407+
2408+        # We only need the end segment if the data we append does not go
2409+        # beyond the current end-of-file.
2410+        end_segment = start_segment
2411+        if offset + data.get_size() < self.get_size():
2412+            end_data = offset + data.get_size()
2413+            end_segment = mathutil.div_ceil(end_data, DEFAULT_MAX_SEGMENT_SIZE)
2414+            end_segment -= 1
2415+        self._start_segment = start_segment
2416+        self._end_segment = end_segment
2417+
2418+        # Now ask for the servermap to be updated in MODE_WRITE with
2419+        # this update range.
2420+        u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
2421+                             self._servermap,
2422+                             mode=MODE_WRITE,
2423+                             update_range=(start_segment, end_segment))
2424+        return u.update()
2425+
2426+
2427+    def _decode_and_decrypt_segments(self, ignored, data, offset):
2428+        """
2429+        After the servermap update, I take the encrypted and encoded
2430+        data that the servermap fetched while doing its update and
2431+        transform it into decoded-and-decrypted plaintext that can be
2432+        used by the new uploadable. I return a Deferred that fires with
2433+        the segments.
2434+        """
2435+        r = Retrieve(self._node, self._servermap, self._version)
2436+        # decode: takes in our blocks and salts from the servermap,
2437+        # returns a Deferred that fires with the corresponding plaintext
2438+        # segments. Does not download -- simply takes advantage of
2439+        # existing infrastructure within the Retrieve class to avoid
2440+        # duplicating code.
2441+        sm = self._servermap
2442+        # XXX: If the methods in the servermap don't work as
2443+        # abstractions, you should rewrite them instead of going around
2444+        # them.
2445+        update_data = sm.update_data
2446+        start_segments = {} # shnum -> start segment
2447+        end_segments = {} # shnum -> end segment
2448+        blockhashes = {} # shnum -> blockhash tree
2449+        for (shnum, data) in update_data.iteritems():
2450+            data = [d[1] for d in data if d[0] == self._version]
2451+
2452+            # Every data entry in our list should now be share shnum for
2453+            # a particular version of the mutable file, so all of the
2454+            # entries should be identical.
2455+            datum = data[0]
2456+            assert filter(lambda x: x != datum, data) == []
2457+
2458+            blockhashes[shnum] = datum[0]
2459+            start_segments[shnum] = datum[1]
2460+            end_segments[shnum] = datum[2]
2461+
2462+        d1 = r.decode(start_segments, self._start_segment)
2463+        d2 = r.decode(end_segments, self._end_segment)
2464+        d3 = defer.succeed(blockhashes)
2465+        return deferredutil.gatherResults([d1, d2, d3])
2466+
2467+
2468+    def _build_uploadable_and_finish(self, segments_and_bht, data, offset):
2469+        """
2470+        After the process has the plaintext segments, I build the
2471+        TransformingUploadable that the publisher will eventually
2472+        re-upload to the grid. I then invoke the publisher with that
2473+        uploadable, and return a Deferred when the publish operation has
2474+        completed without issue.
2475+        """
2476+        u = TransformingUploadable(data, offset,
2477+                                   self._version[3],
2478+                                   segments_and_bht[0],
2479+                                   segments_and_bht[1])
2480+        p = Publish(self._node, self._storage_broker, self._servermap)
2481+        return p.update(u, offset, segments_and_bht[2], self._version)
2482}
2483[mutable/publish.py: Modify the publish process to support MDMF
2484Kevan Carstensen <kevan@isnotajoke.com>**20100819003342
2485 Ignore-this: 2bb379974927e2e20cff75bae8302d1d
2486 
2487 The inner workings of the publishing process needed to be reworked to a
2488 large extend to cope with segmented mutable files, and to cope with
2489 partial-file updates of mutable files. This patch does that. It also
2490 introduces wrappers for uploadable data, allowing the use of
2491 filehandle-like objects as data sources, in addition to strings. This
2492 reduces memory inefficiency when dealing with large files through the
2493 webapi, and clarifies update code there.
2494] {
2495hunk ./src/allmydata/mutable/publish.py 3
2496 
2497 
2498-import os, struct, time
2499+import os, time
2500+from StringIO import StringIO
2501 from itertools import count
2502 from zope.interface import implements
2503 from twisted.internet import defer
2504hunk ./src/allmydata/mutable/publish.py 9
2505 from twisted.python import failure
2506-from allmydata.interfaces import IPublishStatus
2507+from allmydata.interfaces import IPublishStatus, SDMF_VERSION, MDMF_VERSION, \
2508+                                 IMutableUploadable
2509 from allmydata.util import base32, hashutil, mathutil, idlib, log
2510 from allmydata.util.dictutil import DictOfSets
2511 from allmydata import hashtree, codec
2512hunk ./src/allmydata/mutable/publish.py 21
2513 from allmydata.mutable.common import MODE_WRITE, MODE_CHECK, \
2514      UncoordinatedWriteError, NotEnoughServersError
2515 from allmydata.mutable.servermap import ServerMap
2516-from allmydata.mutable.layout import pack_prefix, pack_share, unpack_header, pack_checkstring, \
2517-     unpack_checkstring, SIGNED_PREFIX
2518+from allmydata.mutable.layout import unpack_checkstring, MDMFSlotWriteProxy, \
2519+                                     SDMFSlotWriteProxy
2520+
2521+KiB = 1024
2522+DEFAULT_MAX_SEGMENT_SIZE = 128 * KiB
2523+PUSHING_BLOCKS_STATE = 0
2524+PUSHING_EVERYTHING_ELSE_STATE = 1
2525+DONE_STATE = 2
2526 
2527 class PublishStatus:
2528     implements(IPublishStatus)
2529hunk ./src/allmydata/mutable/publish.py 118
2530         self._status.set_helper(False)
2531         self._status.set_progress(0.0)
2532         self._status.set_active(True)
2533+        self._version = self._node.get_version()
2534+        assert self._version in (SDMF_VERSION, MDMF_VERSION)
2535+
2536 
2537     def get_status(self):
2538         return self._status
2539hunk ./src/allmydata/mutable/publish.py 132
2540             kwargs["facility"] = "tahoe.mutable.publish"
2541         return log.msg(*args, **kwargs)
2542 
2543+
2544+    def update(self, data, offset, blockhashes, version):
2545+        """
2546+        I replace the contents of this file with the contents of data,
2547+        starting at offset. I return a Deferred that fires with None
2548+        when the replacement has been completed, or with an error if
2549+        something went wrong during the process.
2550+
2551+        Note that this process will not upload new shares. If the file
2552+        being updated is in need of repair, callers will have to repair
2553+        it on their own.
2554+        """
2555+        # How this works:
2556+        # 1: Make peer assignments. We'll assign each share that we know
2557+        # about on the grid to that peer that currently holds that
2558+        # share, and will not place any new shares.
2559+        # 2: Setup encoding parameters. Most of these will stay the same
2560+        # -- datalength will change, as will some of the offsets.
2561+        # 3. Upload the new segments.
2562+        # 4. Be done.
2563+        assert IMutableUploadable.providedBy(data)
2564+
2565+        self.data = data
2566+
2567+        # XXX: Use the MutableFileVersion instead.
2568+        self.datalength = self._node.get_size()
2569+        if data.get_size() > self.datalength:
2570+            self.datalength = data.get_size()
2571+
2572+        self.log("starting update")
2573+        self.log("adding new data of length %d at offset %d" % \
2574+                    (data.get_size(), offset))
2575+        self.log("new data length is %d" % self.datalength)
2576+        self._status.set_size(self.datalength)
2577+        self._status.set_status("Started")
2578+        self._started = time.time()
2579+
2580+        self.done_deferred = defer.Deferred()
2581+
2582+        self._writekey = self._node.get_writekey()
2583+        assert self._writekey, "need write capability to publish"
2584+
2585+        # first, which servers will we publish to? We require that the
2586+        # servermap was updated in MODE_WRITE, so we can depend upon the
2587+        # peerlist computed by that process instead of computing our own.
2588+        assert self._servermap
2589+        assert self._servermap.last_update_mode in (MODE_WRITE, MODE_CHECK)
2590+        # we will push a version that is one larger than anything present
2591+        # in the grid, according to the servermap.
2592+        self._new_seqnum = self._servermap.highest_seqnum() + 1
2593+        self._status.set_servermap(self._servermap)
2594+
2595+        self.log(format="new seqnum will be %(seqnum)d",
2596+                 seqnum=self._new_seqnum, level=log.NOISY)
2597+
2598+        # We're updating an existing file, so all of the following
2599+        # should be available.
2600+        self.readkey = self._node.get_readkey()
2601+        self.required_shares = self._node.get_required_shares()
2602+        assert self.required_shares is not None
2603+        self.total_shares = self._node.get_total_shares()
2604+        assert self.total_shares is not None
2605+        self._status.set_encoding(self.required_shares, self.total_shares)
2606+
2607+        self._pubkey = self._node.get_pubkey()
2608+        assert self._pubkey
2609+        self._privkey = self._node.get_privkey()
2610+        assert self._privkey
2611+        self._encprivkey = self._node.get_encprivkey()
2612+
2613+        sb = self._storage_broker
2614+        full_peerlist = sb.get_servers_for_index(self._storage_index)
2615+        self.full_peerlist = full_peerlist # for use later, immutable
2616+        self.bad_peers = set() # peerids who have errbacked/refused requests
2617+
2618+        # This will set self.segment_size, self.num_segments, and
2619+        # self.fec. TODO: Does it know how to do the offset? Probably
2620+        # not. So do that part next.
2621+        self.setup_encoding_parameters(offset=offset)
2622+
2623+        # if we experience any surprises (writes which were rejected because
2624+        # our test vector did not match, or shares which we didn't expect to
2625+        # see), we set this flag and report an UncoordinatedWriteError at the
2626+        # end of the publish process.
2627+        self.surprised = False
2628+
2629+        # we keep track of three tables. The first is our goal: which share
2630+        # we want to see on which servers. This is initially populated by the
2631+        # existing servermap.
2632+        self.goal = set() # pairs of (peerid, shnum) tuples
2633+
2634+        # the second table is our list of outstanding queries: those which
2635+        # are in flight and may or may not be delivered, accepted, or
2636+        # acknowledged. Items are added to this table when the request is
2637+        # sent, and removed when the response returns (or errbacks).
2638+        self.outstanding = set() # (peerid, shnum) tuples
2639+
2640+        # the third is a table of successes: share which have actually been
2641+        # placed. These are populated when responses come back with success.
2642+        # When self.placed == self.goal, we're done.
2643+        self.placed = set() # (peerid, shnum) tuples
2644+
2645+        # we also keep a mapping from peerid to RemoteReference. Each time we
2646+        # pull a connection out of the full peerlist, we add it to this for
2647+        # use later.
2648+        self.connections = {}
2649+
2650+        self.bad_share_checkstrings = {}
2651+
2652+        # This is set at the last step of the publishing process.
2653+        self.versioninfo = ""
2654+
2655+        # we use the servermap to populate the initial goal: this way we will
2656+        # try to update each existing share in place. Since we're
2657+        # updating, we ignore damaged and missing shares -- callers must
2658+        # do a repair to repair and recreate these.
2659+        for (peerid, shnum) in self._servermap.servermap:
2660+            self.goal.add( (peerid, shnum) )
2661+            self.connections[peerid] = self._servermap.connections[peerid]
2662+        self.writers = {}
2663+
2664+        # SDMF files are updated differently.
2665+        self._version = MDMF_VERSION
2666+        writer_class = MDMFSlotWriteProxy
2667+
2668+        # For each (peerid, shnum) in self.goal, we make a
2669+        # write proxy for that peer. We'll use this to write
2670+        # shares to the peer.
2671+        for key in self.goal:
2672+            peerid, shnum = key
2673+            write_enabler = self._node.get_write_enabler(peerid)
2674+            renew_secret = self._node.get_renewal_secret(peerid)
2675+            cancel_secret = self._node.get_cancel_secret(peerid)
2676+            secrets = (write_enabler, renew_secret, cancel_secret)
2677+
2678+            self.writers[shnum] =  writer_class(shnum,
2679+                                                self.connections[peerid],
2680+                                                self._storage_index,
2681+                                                secrets,
2682+                                                self._new_seqnum,
2683+                                                self.required_shares,
2684+                                                self.total_shares,
2685+                                                self.segment_size,
2686+                                                self.datalength)
2687+            self.writers[shnum].peerid = peerid
2688+            assert (peerid, shnum) in self._servermap.servermap
2689+            old_versionid, old_timestamp = self._servermap.servermap[key]
2690+            (old_seqnum, old_root_hash, old_salt, old_segsize,
2691+             old_datalength, old_k, old_N, old_prefix,
2692+             old_offsets_tuple) = old_versionid
2693+            self.writers[shnum].set_checkstring(old_seqnum,
2694+                                                old_root_hash,
2695+                                                old_salt)
2696+
2697+        # Our remote shares will not have a complete checkstring until
2698+        # after we are done writing share data and have started to write
2699+        # blocks. In the meantime, we need to know what to look for when
2700+        # writing, so that we can detect UncoordinatedWriteErrors.
2701+        self._checkstring = self.writers.values()[0].get_checkstring()
2702+
2703+        # Now, we start pushing shares.
2704+        self._status.timings["setup"] = time.time() - self._started
2705+        # First, we encrypt, encode, and publish the shares that we need
2706+        # to encrypt, encode, and publish.
2707+
2708+        # Our update process fetched these for us. We need to update
2709+        # them in place as publishing happens.
2710+        self.blockhashes = {} # (shnum, [blochashes])
2711+        for (i, bht) in blockhashes.iteritems():
2712+            # We need to extract the leaves from our old hash tree.
2713+            old_segcount = mathutil.div_ceil(version[4],
2714+                                             version[3])
2715+            h = hashtree.IncompleteHashTree(old_segcount)
2716+            bht = dict(enumerate(bht))
2717+            h.set_hashes(bht)
2718+            leaves = h[h.get_leaf_index(0):]
2719+            for j in xrange(self.num_segments - len(leaves)):
2720+                leaves.append(None)
2721+
2722+            assert len(leaves) >= self.num_segments
2723+            self.blockhashes[i] = leaves
2724+            # This list will now be the leaves that were set during the
2725+            # initial upload + enough empty hashes to make it a
2726+            # power-of-two. If we exceed a power of two boundary, we
2727+            # should be encoding the file over again, and should not be
2728+            # here. So, we have
2729+            #assert len(self.blockhashes[i]) == \
2730+            #    hashtree.roundup_pow2(self.num_segments), \
2731+            #        len(self.blockhashes[i])
2732+            # XXX: Except this doesn't work. Figure out why.
2733+
2734+        # These are filled in later, after we've modified the block hash
2735+        # tree suitably.
2736+        self.sharehash_leaves = None # eventually [sharehashes]
2737+        self.sharehashes = {} # shnum -> [sharehash leaves necessary to
2738+                              # validate the share]
2739+
2740+        self.log("Starting push")
2741+
2742+        self._state = PUSHING_BLOCKS_STATE
2743+        self._push()
2744+
2745+        return self.done_deferred
2746+
2747+
2748     def publish(self, newdata):
2749         """Publish the filenode's current contents.  Returns a Deferred that
2750         fires (with None) when the publish has done as much work as it's ever
2751hunk ./src/allmydata/mutable/publish.py 344
2752         simultaneous write.
2753         """
2754 
2755-        # 1: generate shares (SDMF: files are small, so we can do it in RAM)
2756-        # 2: perform peer selection, get candidate servers
2757-        #  2a: send queries to n+epsilon servers, to determine current shares
2758-        #  2b: based upon responses, create target map
2759-        # 3: send slot_testv_and_readv_and_writev messages
2760-        # 4: as responses return, update share-dispatch table
2761-        # 4a: may need to run recovery algorithm
2762-        # 5: when enough responses are back, we're done
2763+        # 0. Setup encoding parameters, encoder, and other such things.
2764+        # 1. Encrypt, encode, and publish segments.
2765+        assert IMutableUploadable.providedBy(newdata)
2766 
2767hunk ./src/allmydata/mutable/publish.py 348
2768-        self.log("starting publish, datalen is %s" % len(newdata))
2769-        self._status.set_size(len(newdata))
2770+        self.data = newdata
2771+        self.datalength = newdata.get_size()
2772+        #if self.datalength >= DEFAULT_MAX_SEGMENT_SIZE:
2773+        #    self._version = MDMF_VERSION
2774+        #else:
2775+        #    self._version = SDMF_VERSION
2776+
2777+        self.log("starting publish, datalen is %s" % self.datalength)
2778+        self._status.set_size(self.datalength)
2779         self._status.set_status("Started")
2780         self._started = time.time()
2781 
2782hunk ./src/allmydata/mutable/publish.py 405
2783         self.full_peerlist = full_peerlist # for use later, immutable
2784         self.bad_peers = set() # peerids who have errbacked/refused requests
2785 
2786-        self.newdata = newdata
2787-        self.salt = os.urandom(16)
2788-
2789+        # This will set self.segment_size, self.num_segments, and
2790+        # self.fec.
2791         self.setup_encoding_parameters()
2792 
2793         # if we experience any surprises (writes which were rejected because
2794hunk ./src/allmydata/mutable/publish.py 415
2795         # end of the publish process.
2796         self.surprised = False
2797 
2798-        # as a failsafe, refuse to iterate through self.loop more than a
2799-        # thousand times.
2800-        self.looplimit = 1000
2801-
2802         # we keep track of three tables. The first is our goal: which share
2803         # we want to see on which servers. This is initially populated by the
2804         # existing servermap.
2805hunk ./src/allmydata/mutable/publish.py 438
2806 
2807         self.bad_share_checkstrings = {}
2808 
2809+        # This is set at the last step of the publishing process.
2810+        self.versioninfo = ""
2811+
2812         # we use the servermap to populate the initial goal: this way we will
2813         # try to update each existing share in place.
2814         for (peerid, shnum) in self._servermap.servermap:
2815hunk ./src/allmydata/mutable/publish.py 454
2816             self.bad_share_checkstrings[key] = old_checkstring
2817             self.connections[peerid] = self._servermap.connections[peerid]
2818 
2819-        # create the shares. We'll discard these as they are delivered. SDMF:
2820-        # we're allowed to hold everything in memory.
2821+        # TODO: Make this part do peer selection.
2822+        self.update_goal()
2823+        self.writers = {}
2824+        if self._version == MDMF_VERSION:
2825+            writer_class = MDMFSlotWriteProxy
2826+        else:
2827+            writer_class = SDMFSlotWriteProxy
2828 
2829hunk ./src/allmydata/mutable/publish.py 462
2830+        # For each (peerid, shnum) in self.goal, we make a
2831+        # write proxy for that peer. We'll use this to write
2832+        # shares to the peer.
2833+        for key in self.goal:
2834+            peerid, shnum = key
2835+            write_enabler = self._node.get_write_enabler(peerid)
2836+            renew_secret = self._node.get_renewal_secret(peerid)
2837+            cancel_secret = self._node.get_cancel_secret(peerid)
2838+            secrets = (write_enabler, renew_secret, cancel_secret)
2839+
2840+            self.writers[shnum] =  writer_class(shnum,
2841+                                                self.connections[peerid],
2842+                                                self._storage_index,
2843+                                                secrets,
2844+                                                self._new_seqnum,
2845+                                                self.required_shares,
2846+                                                self.total_shares,
2847+                                                self.segment_size,
2848+                                                self.datalength)
2849+            self.writers[shnum].peerid = peerid
2850+            if (peerid, shnum) in self._servermap.servermap:
2851+                old_versionid, old_timestamp = self._servermap.servermap[key]
2852+                (old_seqnum, old_root_hash, old_salt, old_segsize,
2853+                 old_datalength, old_k, old_N, old_prefix,
2854+                 old_offsets_tuple) = old_versionid
2855+                self.writers[shnum].set_checkstring(old_seqnum,
2856+                                                    old_root_hash,
2857+                                                    old_salt)
2858+            elif (peerid, shnum) in self.bad_share_checkstrings:
2859+                old_checkstring = self.bad_share_checkstrings[(peerid, shnum)]
2860+                self.writers[shnum].set_checkstring(old_checkstring)
2861+
2862+        # Our remote shares will not have a complete checkstring until
2863+        # after we are done writing share data and have started to write
2864+        # blocks. In the meantime, we need to know what to look for when
2865+        # writing, so that we can detect UncoordinatedWriteErrors.
2866+        self._checkstring = self.writers.values()[0].get_checkstring()
2867+
2868+        # Now, we start pushing shares.
2869         self._status.timings["setup"] = time.time() - self._started
2870hunk ./src/allmydata/mutable/publish.py 502
2871-        d = self._encrypt_and_encode()
2872-        d.addCallback(self._generate_shares)
2873-        def _start_pushing(res):
2874-            self._started_pushing = time.time()
2875-            return res
2876-        d.addCallback(_start_pushing)
2877-        d.addCallback(self.loop) # trigger delivery
2878-        d.addErrback(self._fatal_error)
2879+        # First, we encrypt, encode, and publish the shares that we need
2880+        # to encrypt, encode, and publish.
2881+
2882+        # This will eventually hold the block hash chain for each share
2883+        # that we publish. We define it this way so that empty publishes
2884+        # will still have something to write to the remote slot.
2885+        self.blockhashes = dict([(i, []) for i in xrange(self.total_shares)])
2886+        for i in xrange(self.total_shares):
2887+            blocks = self.blockhashes[i]
2888+            for j in xrange(self.num_segments):
2889+                blocks.append(None)
2890+        self.sharehash_leaves = None # eventually [sharehashes]
2891+        self.sharehashes = {} # shnum -> [sharehash leaves necessary to
2892+                              # validate the share]
2893+
2894+        self.log("Starting push")
2895+
2896+        self._state = PUSHING_BLOCKS_STATE
2897+        self._push()
2898 
2899         return self.done_deferred
2900 
2901hunk ./src/allmydata/mutable/publish.py 524
2902-    def setup_encoding_parameters(self):
2903-        segment_size = len(self.newdata)
2904+
2905+    def _update_status(self):
2906+        self._status.set_status("Sending Shares: %d placed out of %d, "
2907+                                "%d messages outstanding" %
2908+                                (len(self.placed),
2909+                                 len(self.goal),
2910+                                 len(self.outstanding)))
2911+        self._status.set_progress(1.0 * len(self.placed) / len(self.goal))
2912+
2913+
2914+    def setup_encoding_parameters(self, offset=0):
2915+        if self._version == MDMF_VERSION:
2916+            segment_size = DEFAULT_MAX_SEGMENT_SIZE # 128 KiB by default
2917+        else:
2918+            segment_size = self.datalength # SDMF is only one segment
2919         # this must be a multiple of self.required_shares
2920         segment_size = mathutil.next_multiple(segment_size,
2921                                               self.required_shares)
2922hunk ./src/allmydata/mutable/publish.py 543
2923         self.segment_size = segment_size
2924+
2925+        # Calculate the starting segment for the upload.
2926         if segment_size:
2927hunk ./src/allmydata/mutable/publish.py 546
2928-            self.num_segments = mathutil.div_ceil(len(self.newdata),
2929+            self.num_segments = mathutil.div_ceil(self.datalength,
2930                                                   segment_size)
2931hunk ./src/allmydata/mutable/publish.py 548
2932+            self.starting_segment = mathutil.div_ceil(offset,
2933+                                                      segment_size)
2934+            self.starting_segment -= 1
2935+            if offset == 0:
2936+                self.starting_segment = 0
2937+
2938         else:
2939             self.num_segments = 0
2940hunk ./src/allmydata/mutable/publish.py 556
2941-        assert self.num_segments in [0, 1,] # SDMF restrictions
2942+            self.starting_segment = 0
2943+
2944+
2945+        self.log("building encoding parameters for file")
2946+        self.log("got segsize %d" % self.segment_size)
2947+        self.log("got %d segments" % self.num_segments)
2948+
2949+        if self._version == SDMF_VERSION:
2950+            assert self.num_segments in (0, 1) # SDMF
2951+        # calculate the tail segment size.
2952+
2953+        if segment_size and self.datalength:
2954+            self.tail_segment_size = self.datalength % segment_size
2955+            self.log("got tail segment size %d" % self.tail_segment_size)
2956+        else:
2957+            self.tail_segment_size = 0
2958+
2959+        if self.tail_segment_size == 0 and segment_size:
2960+            # The tail segment is the same size as the other segments.
2961+            self.tail_segment_size = segment_size
2962+
2963+        # Make FEC encoders
2964+        fec = codec.CRSEncoder()
2965+        fec.set_params(self.segment_size,
2966+                       self.required_shares, self.total_shares)
2967+        self.piece_size = fec.get_block_size()
2968+        self.fec = fec
2969+
2970+        if self.tail_segment_size == self.segment_size:
2971+            self.tail_fec = self.fec
2972+        else:
2973+            tail_fec = codec.CRSEncoder()
2974+            tail_fec.set_params(self.tail_segment_size,
2975+                                self.required_shares,
2976+                                self.total_shares)
2977+            self.tail_fec = tail_fec
2978+
2979+        self._current_segment = self.starting_segment
2980+        self.end_segment = self.num_segments - 1
2981+        # Now figure out where the last segment should be.
2982+        if self.data.get_size() != self.datalength:
2983+            end = self.data.get_size()
2984+            self.end_segment = mathutil.div_ceil(end,
2985+                                                 segment_size)
2986+            self.end_segment -= 1
2987+        self.log("got start segment %d" % self.starting_segment)
2988+        self.log("got end segment %d" % self.end_segment)
2989+
2990+
2991+    def _push(self, ignored=None):
2992+        """
2993+        I manage state transitions. In particular, I see that we still
2994+        have a good enough number of writers to complete the upload
2995+        successfully.
2996+        """
2997+        # Can we still successfully publish this file?
2998+        # TODO: Keep track of outstanding queries before aborting the
2999+        #       process.
3000+        if len(self.writers) <= self.required_shares or self.surprised:
3001+            return self._failure()
3002+
3003+        # Figure out what we need to do next. Each of these needs to
3004+        # return a deferred so that we don't block execution when this
3005+        # is first called in the upload method.
3006+        if self._state == PUSHING_BLOCKS_STATE:
3007+            return self.push_segment(self._current_segment)
3008+
3009+        elif self._state == PUSHING_EVERYTHING_ELSE_STATE:
3010+            return self.push_everything_else()
3011+
3012+        # If we make it to this point, we were successful in placing the
3013+        # file.
3014+        return self._done(None)
3015+
3016+
3017+    def push_segment(self, segnum):
3018+        if self.num_segments == 0 and self._version == SDMF_VERSION:
3019+            self._add_dummy_salts()
3020 
3021hunk ./src/allmydata/mutable/publish.py 635
3022-    def _fatal_error(self, f):
3023-        self.log("error during loop", failure=f, level=log.UNUSUAL)
3024-        self._done(f)
3025+        if segnum > self.end_segment:
3026+            # We don't have any more segments to push.
3027+            self._state = PUSHING_EVERYTHING_ELSE_STATE
3028+            return self._push()
3029+
3030+        d = self._encode_segment(segnum)
3031+        d.addCallback(self._push_segment, segnum)
3032+        def _increment_segnum(ign):
3033+            self._current_segment += 1
3034+        # XXX: I don't think we need to do addBoth here -- any errBacks
3035+        # should be handled within push_segment.
3036+        d.addBoth(_increment_segnum)
3037+        d.addBoth(self._turn_barrier)
3038+        d.addBoth(self._push)
3039+
3040+
3041+    def _turn_barrier(self, result):
3042+        """
3043+        I help the publish process avoid the recursion limit issues
3044+        described in #237.
3045+        """
3046+        return fireEventually(result)
3047+
3048+
3049+    def _add_dummy_salts(self):
3050+        """
3051+        SDMF files need a salt even if they're empty, or the signature
3052+        won't make sense. This method adds a dummy salt to each of our
3053+        SDMF writers so that they can write the signature later.
3054+        """
3055+        salt = os.urandom(16)
3056+        assert self._version == SDMF_VERSION
3057+
3058+        for writer in self.writers.itervalues():
3059+            writer.put_salt(salt)
3060+
3061+
3062+    def _encode_segment(self, segnum):
3063+        """
3064+        I encrypt and encode the segment segnum.
3065+        """
3066+        started = time.time()
3067+
3068+        if segnum + 1 == self.num_segments:
3069+            segsize = self.tail_segment_size
3070+        else:
3071+            segsize = self.segment_size
3072+
3073+
3074+        self.log("Pushing segment %d of %d" % (segnum + 1, self.num_segments))
3075+        data = self.data.read(segsize)
3076+        # XXX: This is dumb. Why return a list?
3077+        data = "".join(data)
3078+
3079+        assert len(data) == segsize, len(data)
3080+
3081+        salt = os.urandom(16)
3082+
3083+        key = hashutil.ssk_readkey_data_hash(salt, self.readkey)
3084+        self._status.set_status("Encrypting")
3085+        enc = AES(key)
3086+        crypttext = enc.process(data)
3087+        assert len(crypttext) == len(data)
3088+
3089+        now = time.time()
3090+        self._status.timings["encrypt"] = now - started
3091+        started = now
3092+
3093+        # now apply FEC
3094+        if segnum + 1 == self.num_segments:
3095+            fec = self.tail_fec
3096+        else:
3097+            fec = self.fec
3098+
3099+        self._status.set_status("Encoding")
3100+        crypttext_pieces = [None] * self.required_shares
3101+        piece_size = fec.get_block_size()
3102+        for i in range(len(crypttext_pieces)):
3103+            offset = i * piece_size
3104+            piece = crypttext[offset:offset+piece_size]
3105+            piece = piece + "\x00"*(piece_size - len(piece)) # padding
3106+            crypttext_pieces[i] = piece
3107+            assert len(piece) == piece_size
3108+        d = fec.encode(crypttext_pieces)
3109+        def _done_encoding(res):
3110+            elapsed = time.time() - started
3111+            self._status.timings["encode"] = elapsed
3112+            return (res, salt)
3113+        d.addCallback(_done_encoding)
3114+        return d
3115+
3116+
3117+    def _push_segment(self, encoded_and_salt, segnum):
3118+        """
3119+        I push (data, salt) as segment number segnum.
3120+        """
3121+        results, salt = encoded_and_salt
3122+        shares, shareids = results
3123+        self._status.set_status("Pushing segment")
3124+        for i in xrange(len(shares)):
3125+            sharedata = shares[i]
3126+            shareid = shareids[i]
3127+            if self._version == MDMF_VERSION:
3128+                hashed = salt + sharedata
3129+            else:
3130+                hashed = sharedata
3131+            block_hash = hashutil.block_hash(hashed)
3132+            self.blockhashes[shareid][segnum] = block_hash
3133+            # find the writer for this share
3134+            writer = self.writers[shareid]
3135+            writer.put_block(sharedata, segnum, salt)
3136+
3137+
3138+    def push_everything_else(self):
3139+        """
3140+        I put everything else associated with a share.
3141+        """
3142+        self._pack_started = time.time()
3143+        self.push_encprivkey()
3144+        self.push_blockhashes()
3145+        self.push_sharehashes()
3146+        self.push_toplevel_hashes_and_signature()
3147+        d = self.finish_publishing()
3148+        def _change_state(ignored):
3149+            self._state = DONE_STATE
3150+        d.addCallback(_change_state)
3151+        d.addCallback(self._push)
3152+        return d
3153+
3154+
3155+    def push_encprivkey(self):
3156+        encprivkey = self._encprivkey
3157+        self._status.set_status("Pushing encrypted private key")
3158+        for writer in self.writers.itervalues():
3159+            writer.put_encprivkey(encprivkey)
3160+
3161+
3162+    def push_blockhashes(self):
3163+        self.sharehash_leaves = [None] * len(self.blockhashes)
3164+        self._status.set_status("Building and pushing block hash tree")
3165+        for shnum, blockhashes in self.blockhashes.iteritems():
3166+            t = hashtree.HashTree(blockhashes)
3167+            self.blockhashes[shnum] = list(t)
3168+            # set the leaf for future use.
3169+            self.sharehash_leaves[shnum] = t[0]
3170+
3171+            writer = self.writers[shnum]
3172+            writer.put_blockhashes(self.blockhashes[shnum])
3173+
3174+
3175+    def push_sharehashes(self):
3176+        self._status.set_status("Building and pushing share hash chain")
3177+        share_hash_tree = hashtree.HashTree(self.sharehash_leaves)
3178+        for shnum in xrange(len(self.sharehash_leaves)):
3179+            needed_indices = share_hash_tree.needed_hashes(shnum)
3180+            self.sharehashes[shnum] = dict( [ (i, share_hash_tree[i])
3181+                                             for i in needed_indices] )
3182+            writer = self.writers[shnum]
3183+            writer.put_sharehashes(self.sharehashes[shnum])
3184+        self.root_hash = share_hash_tree[0]
3185+
3186+
3187+    def push_toplevel_hashes_and_signature(self):
3188+        # We need to to three things here:
3189+        #   - Push the root hash and salt hash
3190+        #   - Get the checkstring of the resulting layout; sign that.
3191+        #   - Push the signature
3192+        self._status.set_status("Pushing root hashes and signature")
3193+        for shnum in xrange(self.total_shares):
3194+            writer = self.writers[shnum]
3195+            writer.put_root_hash(self.root_hash)
3196+        self._update_checkstring()
3197+        self._make_and_place_signature()
3198+
3199+
3200+    def _update_checkstring(self):
3201+        """
3202+        After putting the root hash, MDMF files will have the
3203+        checkstring written to the storage server. This means that we
3204+        can update our copy of the checkstring so we can detect
3205+        uncoordinated writes. SDMF files will have the same checkstring,
3206+        so we need not do anything.
3207+        """
3208+        self._checkstring = self.writers.values()[0].get_checkstring()
3209+
3210+
3211+    def _make_and_place_signature(self):
3212+        """
3213+        I create and place the signature.
3214+        """
3215+        started = time.time()
3216+        self._status.set_status("Signing prefix")
3217+        signable = self.writers[0].get_signable()
3218+        self.signature = self._privkey.sign(signable)
3219+
3220+        for (shnum, writer) in self.writers.iteritems():
3221+            writer.put_signature(self.signature)
3222+        self._status.timings['sign'] = time.time() - started
3223+
3224+
3225+    def finish_publishing(self):
3226+        # We're almost done -- we just need to put the verification key
3227+        # and the offsets
3228+        started = time.time()
3229+        self._status.set_status("Pushing shares")
3230+        self._started_pushing = started
3231+        ds = []
3232+        verification_key = self._pubkey.serialize()
3233+
3234+
3235+        # TODO: Bad, since we remove from this same dict. We need to
3236+        # make a copy, or just use a non-iterated value.
3237+        for (shnum, writer) in self.writers.iteritems():
3238+            writer.put_verification_key(verification_key)
3239+            d = writer.finish_publishing()
3240+            # Add the (peerid, shnum) tuple to our list of outstanding
3241+            # queries. This gets used by _loop if some of our queries
3242+            # fail to place shares.
3243+            self.outstanding.add((writer.peerid, writer.shnum))
3244+            d.addCallback(self._got_write_answer, writer, started)
3245+            d.addErrback(self._connection_problem, writer)
3246+            ds.append(d)
3247+        self._record_verinfo()
3248+        self._status.timings['pack'] = time.time() - started
3249+        return defer.DeferredList(ds)
3250+
3251+
3252+    def _record_verinfo(self):
3253+        self.versioninfo = self.writers.values()[0].get_verinfo()
3254+
3255+
3256+    def _connection_problem(self, f, writer):
3257+        """
3258+        We ran into a connection problem while working with writer, and
3259+        need to deal with that.
3260+        """
3261+        self.log("found problem: %s" % str(f))
3262+        self._last_failure = f
3263+        del(self.writers[writer.shnum])
3264 
3265hunk ./src/allmydata/mutable/publish.py 875
3266-    def _update_status(self):
3267-        self._status.set_status("Sending Shares: %d placed out of %d, "
3268-                                "%d messages outstanding" %
3269-                                (len(self.placed),
3270-                                 len(self.goal),
3271-                                 len(self.outstanding)))
3272-        self._status.set_progress(1.0 * len(self.placed) / len(self.goal))
3273 
3274hunk ./src/allmydata/mutable/publish.py 876
3275-    def loop(self, ignored=None):
3276-        self.log("entering loop", level=log.NOISY)
3277-        if not self._running:
3278-            return
3279-
3280-        self.looplimit -= 1
3281-        if self.looplimit <= 0:
3282-            raise LoopLimitExceededError("loop limit exceeded")
3283-
3284-        if self.surprised:
3285-            # don't send out any new shares, just wait for the outstanding
3286-            # ones to be retired.
3287-            self.log("currently surprised, so don't send any new shares",
3288-                     level=log.NOISY)
3289-        else:
3290-            self.update_goal()
3291-            # how far are we from our goal?
3292-            needed = self.goal - self.placed - self.outstanding
3293-            self._update_status()
3294-
3295-            if needed:
3296-                # we need to send out new shares
3297-                self.log(format="need to send %(needed)d new shares",
3298-                         needed=len(needed), level=log.NOISY)
3299-                self._send_shares(needed)
3300-                return
3301-
3302-        if self.outstanding:
3303-            # queries are still pending, keep waiting
3304-            self.log(format="%(outstanding)d queries still outstanding",
3305-                     outstanding=len(self.outstanding),
3306-                     level=log.NOISY)
3307-            return
3308-
3309-        # no queries outstanding, no placements needed: we're done
3310-        self.log("no queries outstanding, no placements needed: done",
3311-                 level=log.OPERATIONAL)
3312-        now = time.time()
3313-        elapsed = now - self._started_pushing
3314-        self._status.timings["push"] = elapsed
3315-        return self._done(None)
3316-
3317     def log_goal(self, goal, message=""):
3318         logmsg = [message]
3319         for (shnum, peerid) in sorted([(s,p) for (p,s) in goal]):
3320hunk ./src/allmydata/mutable/publish.py 957
3321             self.log_goal(self.goal, "after update: ")
3322 
3323 
3324+    def _got_write_answer(self, answer, writer, started):
3325+        if not answer:
3326+            # SDMF writers only pretend to write when readers set their
3327+            # blocks, salts, and so on -- they actually just write once,
3328+            # at the end of the upload process. In fake writes, they
3329+            # return defer.succeed(None). If we see that, we shouldn't
3330+            # bother checking it.
3331+            return
3332 
3333hunk ./src/allmydata/mutable/publish.py 966
3334-    def _encrypt_and_encode(self):
3335-        # this returns a Deferred that fires with a list of (sharedata,
3336-        # sharenum) tuples. TODO: cache the ciphertext, only produce the
3337-        # shares that we care about.
3338-        self.log("_encrypt_and_encode")
3339-
3340-        self._status.set_status("Encrypting")
3341-        started = time.time()
3342-
3343-        key = hashutil.ssk_readkey_data_hash(self.salt, self.readkey)
3344-        enc = AES(key)
3345-        crypttext = enc.process(self.newdata)
3346-        assert len(crypttext) == len(self.newdata)
3347+        peerid = writer.peerid
3348+        lp = self.log("_got_write_answer from %s, share %d" %
3349+                      (idlib.shortnodeid_b2a(peerid), writer.shnum))
3350 
3351         now = time.time()
3352hunk ./src/allmydata/mutable/publish.py 971
3353-        self._status.timings["encrypt"] = now - started
3354-        started = now
3355-
3356-        # now apply FEC
3357-
3358-        self._status.set_status("Encoding")
3359-        fec = codec.CRSEncoder()
3360-        fec.set_params(self.segment_size,
3361-                       self.required_shares, self.total_shares)
3362-        piece_size = fec.get_block_size()
3363-        crypttext_pieces = [None] * self.required_shares
3364-        for i in range(len(crypttext_pieces)):
3365-            offset = i * piece_size
3366-            piece = crypttext[offset:offset+piece_size]
3367-            piece = piece + "\x00"*(piece_size - len(piece)) # padding
3368-            crypttext_pieces[i] = piece
3369-            assert len(piece) == piece_size
3370-
3371-        d = fec.encode(crypttext_pieces)
3372-        def _done_encoding(res):
3373-            elapsed = time.time() - started
3374-            self._status.timings["encode"] = elapsed
3375-            return res
3376-        d.addCallback(_done_encoding)
3377-        return d
3378-
3379-    def _generate_shares(self, shares_and_shareids):
3380-        # this sets self.shares and self.root_hash
3381-        self.log("_generate_shares")
3382-        self._status.set_status("Generating Shares")
3383-        started = time.time()
3384-
3385-        # we should know these by now
3386-        privkey = self._privkey
3387-        encprivkey = self._encprivkey
3388-        pubkey = self._pubkey
3389-
3390-        (shares, share_ids) = shares_and_shareids
3391-
3392-        assert len(shares) == len(share_ids)
3393-        assert len(shares) == self.total_shares
3394-        all_shares = {}
3395-        block_hash_trees = {}
3396-        share_hash_leaves = [None] * len(shares)
3397-        for i in range(len(shares)):
3398-            share_data = shares[i]
3399-            shnum = share_ids[i]
3400-            all_shares[shnum] = share_data
3401-
3402-            # build the block hash tree. SDMF has only one leaf.
3403-            leaves = [hashutil.block_hash(share_data)]
3404-            t = hashtree.HashTree(leaves)
3405-            block_hash_trees[shnum] = list(t)
3406-            share_hash_leaves[shnum] = t[0]
3407-        for leaf in share_hash_leaves:
3408-            assert leaf is not None
3409-        share_hash_tree = hashtree.HashTree(share_hash_leaves)
3410-        share_hash_chain = {}
3411-        for shnum in range(self.total_shares):
3412-            needed_hashes = share_hash_tree.needed_hashes(shnum)
3413-            share_hash_chain[shnum] = dict( [ (i, share_hash_tree[i])
3414-                                              for i in needed_hashes ] )
3415-        root_hash = share_hash_tree[0]
3416-        assert len(root_hash) == 32
3417-        self.log("my new root_hash is %s" % base32.b2a(root_hash))
3418-        self._new_version_info = (self._new_seqnum, root_hash, self.salt)
3419-
3420-        prefix = pack_prefix(self._new_seqnum, root_hash, self.salt,
3421-                             self.required_shares, self.total_shares,
3422-                             self.segment_size, len(self.newdata))
3423-
3424-        # now pack the beginning of the share. All shares are the same up
3425-        # to the signature, then they have divergent share hash chains,
3426-        # then completely different block hash trees + salt + share data,
3427-        # then they all share the same encprivkey at the end. The sizes
3428-        # of everything are the same for all shares.
3429-
3430-        sign_started = time.time()
3431-        signature = privkey.sign(prefix)
3432-        self._status.timings["sign"] = time.time() - sign_started
3433-
3434-        verification_key = pubkey.serialize()
3435-
3436-        final_shares = {}
3437-        for shnum in range(self.total_shares):
3438-            final_share = pack_share(prefix,
3439-                                     verification_key,
3440-                                     signature,
3441-                                     share_hash_chain[shnum],
3442-                                     block_hash_trees[shnum],
3443-                                     all_shares[shnum],
3444-                                     encprivkey)
3445-            final_shares[shnum] = final_share
3446-        elapsed = time.time() - started
3447-        self._status.timings["pack"] = elapsed
3448-        self.shares = final_shares
3449-        self.root_hash = root_hash
3450-
3451-        # we also need to build up the version identifier for what we're
3452-        # pushing. Extract the offsets from one of our shares.
3453-        assert final_shares
3454-        offsets = unpack_header(final_shares.values()[0])[-1]
3455-        offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
3456-        verinfo = (self._new_seqnum, root_hash, self.salt,
3457-                   self.segment_size, len(self.newdata),
3458-                   self.required_shares, self.total_shares,
3459-                   prefix, offsets_tuple)
3460-        self.versioninfo = verinfo
3461-
3462-
3463-
3464-    def _send_shares(self, needed):
3465-        self.log("_send_shares")
3466-
3467-        # we're finally ready to send out our shares. If we encounter any
3468-        # surprises here, it's because somebody else is writing at the same
3469-        # time. (Note: in the future, when we remove the _query_peers() step
3470-        # and instead speculate about [or remember] which shares are where,
3471-        # surprises here are *not* indications of UncoordinatedWriteError,
3472-        # and we'll need to respond to them more gracefully.)
3473-
3474-        # needed is a set of (peerid, shnum) tuples. The first thing we do is
3475-        # organize it by peerid.
3476-
3477-        peermap = DictOfSets()
3478-        for (peerid, shnum) in needed:
3479-            peermap.add(peerid, shnum)
3480-
3481-        # the next thing is to build up a bunch of test vectors. The
3482-        # semantics of Publish are that we perform the operation if the world
3483-        # hasn't changed since the ServerMap was constructed (more or less).
3484-        # For every share we're trying to place, we create a test vector that
3485-        # tests to see if the server*share still corresponds to the
3486-        # map.
3487-
3488-        all_tw_vectors = {} # maps peerid to tw_vectors
3489-        sm = self._servermap.servermap
3490-
3491-        for key in needed:
3492-            (peerid, shnum) = key
3493-
3494-            if key in sm:
3495-                # an old version of that share already exists on the
3496-                # server, according to our servermap. We will create a
3497-                # request that attempts to replace it.
3498-                old_versionid, old_timestamp = sm[key]
3499-                (old_seqnum, old_root_hash, old_salt, old_segsize,
3500-                 old_datalength, old_k, old_N, old_prefix,
3501-                 old_offsets_tuple) = old_versionid
3502-                old_checkstring = pack_checkstring(old_seqnum,
3503-                                                   old_root_hash,
3504-                                                   old_salt)
3505-                testv = (0, len(old_checkstring), "eq", old_checkstring)
3506-
3507-            elif key in self.bad_share_checkstrings:
3508-                old_checkstring = self.bad_share_checkstrings[key]
3509-                testv = (0, len(old_checkstring), "eq", old_checkstring)
3510-
3511-            else:
3512-                # add a testv that requires the share not exist
3513-
3514-                # Unfortunately, foolscap-0.2.5 has a bug in the way inbound
3515-                # constraints are handled. If the same object is referenced
3516-                # multiple times inside the arguments, foolscap emits a
3517-                # 'reference' token instead of a distinct copy of the
3518-                # argument. The bug is that these 'reference' tokens are not
3519-                # accepted by the inbound constraint code. To work around
3520-                # this, we need to prevent python from interning the
3521-                # (constant) tuple, by creating a new copy of this vector
3522-                # each time.
3523-
3524-                # This bug is fixed in foolscap-0.2.6, and even though this
3525-                # version of Tahoe requires foolscap-0.3.1 or newer, we are
3526-                # supposed to be able to interoperate with older versions of
3527-                # Tahoe which are allowed to use older versions of foolscap,
3528-                # including foolscap-0.2.5 . In addition, I've seen other
3529-                # foolscap problems triggered by 'reference' tokens (see #541
3530-                # for details). So we must keep this workaround in place.
3531-
3532-                #testv = (0, 1, 'eq', "")
3533-                testv = tuple([0, 1, 'eq', ""])
3534-
3535-            testvs = [testv]
3536-            # the write vector is simply the share
3537-            writev = [(0, self.shares[shnum])]
3538-
3539-            if peerid not in all_tw_vectors:
3540-                all_tw_vectors[peerid] = {}
3541-                # maps shnum to (testvs, writevs, new_length)
3542-            assert shnum not in all_tw_vectors[peerid]
3543-
3544-            all_tw_vectors[peerid][shnum] = (testvs, writev, None)
3545-
3546-        # we read the checkstring back from each share, however we only use
3547-        # it to detect whether there was a new share that we didn't know
3548-        # about. The success or failure of the write will tell us whether
3549-        # there was a collision or not. If there is a collision, the first
3550-        # thing we'll do is update the servermap, which will find out what
3551-        # happened. We could conceivably reduce a roundtrip by using the
3552-        # readv checkstring to populate the servermap, but really we'd have
3553-        # to read enough data to validate the signatures too, so it wouldn't
3554-        # be an overall win.
3555-        read_vector = [(0, struct.calcsize(SIGNED_PREFIX))]
3556-
3557-        # ok, send the messages!
3558-        self.log("sending %d shares" % len(all_tw_vectors), level=log.NOISY)
3559-        started = time.time()
3560-        for (peerid, tw_vectors) in all_tw_vectors.items():
3561-
3562-            write_enabler = self._node.get_write_enabler(peerid)
3563-            renew_secret = self._node.get_renewal_secret(peerid)
3564-            cancel_secret = self._node.get_cancel_secret(peerid)
3565-            secrets = (write_enabler, renew_secret, cancel_secret)
3566-            shnums = tw_vectors.keys()
3567-
3568-            for shnum in shnums:
3569-                self.outstanding.add( (peerid, shnum) )
3570+        elapsed = now - started
3571 
3572hunk ./src/allmydata/mutable/publish.py 973
3573-            d = self._do_testreadwrite(peerid, secrets,
3574-                                       tw_vectors, read_vector)
3575-            d.addCallbacks(self._got_write_answer, self._got_write_error,
3576-                           callbackArgs=(peerid, shnums, started),
3577-                           errbackArgs=(peerid, shnums, started))
3578-            # tolerate immediate errback, like with DeadReferenceError
3579-            d.addBoth(fireEventually)
3580-            d.addCallback(self.loop)
3581-            d.addErrback(self._fatal_error)
3582+        self._status.add_per_server_time(peerid, elapsed)
3583 
3584hunk ./src/allmydata/mutable/publish.py 975
3585-        self._update_status()
3586-        self.log("%d shares sent" % len(all_tw_vectors), level=log.NOISY)
3587+        wrote, read_data = answer
3588 
3589hunk ./src/allmydata/mutable/publish.py 977
3590-    def _do_testreadwrite(self, peerid, secrets,
3591-                          tw_vectors, read_vector):
3592-        storage_index = self._storage_index
3593-        ss = self.connections[peerid]
3594+        surprise_shares = set(read_data.keys()) - set([writer.shnum])
3595 
3596hunk ./src/allmydata/mutable/publish.py 979
3597-        #print "SS[%s] is %s" % (idlib.shortnodeid_b2a(peerid), ss), ss.tracker.interfaceName
3598-        d = ss.callRemote("slot_testv_and_readv_and_writev",
3599-                          storage_index,
3600-                          secrets,
3601-                          tw_vectors,
3602-                          read_vector)
3603-        return d
3604+        # We need to remove from surprise_shares any shares that we are
3605+        # knowingly also writing to that peer from other writers.
3606 
3607hunk ./src/allmydata/mutable/publish.py 982
3608-    def _got_write_answer(self, answer, peerid, shnums, started):
3609-        lp = self.log("_got_write_answer from %s" %
3610-                      idlib.shortnodeid_b2a(peerid))
3611-        for shnum in shnums:
3612-            self.outstanding.discard( (peerid, shnum) )
3613+        # TODO: Precompute this.
3614+        known_shnums = [x.shnum for x in self.writers.values()
3615+                        if x.peerid == peerid]
3616+        surprise_shares -= set(known_shnums)
3617+        self.log("found the following surprise shares: %s" %
3618+                 str(surprise_shares))
3619 
3620hunk ./src/allmydata/mutable/publish.py 989
3621-        now = time.time()
3622-        elapsed = now - started
3623-        self._status.add_per_server_time(peerid, elapsed)
3624-
3625-        wrote, read_data = answer
3626-
3627-        surprise_shares = set(read_data.keys()) - set(shnums)
3628+        # Now surprise shares contains all of the shares that we did not
3629+        # expect to be there.
3630 
3631         surprised = False
3632         for shnum in surprise_shares:
3633hunk ./src/allmydata/mutable/publish.py 996
3634             # read_data is a dict mapping shnum to checkstring (SIGNED_PREFIX)
3635             checkstring = read_data[shnum][0]
3636-            their_version_info = unpack_checkstring(checkstring)
3637-            if their_version_info == self._new_version_info:
3638+            # What we want to do here is to see if their (seqnum,
3639+            # roothash, salt) is the same as our (seqnum, roothash,
3640+            # salt), or the equivalent for MDMF. The best way to do this
3641+            # is to store a packed representation of our checkstring
3642+            # somewhere, then not bother unpacking the other
3643+            # checkstring.
3644+            if checkstring == self._checkstring:
3645                 # they have the right share, somehow
3646 
3647                 if (peerid,shnum) in self.goal:
3648hunk ./src/allmydata/mutable/publish.py 1081
3649             self.log("our testv failed, so the write did not happen",
3650                      parent=lp, level=log.WEIRD, umid="8sc26g")
3651             self.surprised = True
3652-            self.bad_peers.add(peerid) # don't ask them again
3653+            self.bad_peers.add(writer) # don't ask them again
3654             # use the checkstring to add information to the log message
3655             for (shnum,readv) in read_data.items():
3656                 checkstring = readv[0]
3657hunk ./src/allmydata/mutable/publish.py 1103
3658                 # if expected_version==None, then we didn't expect to see a
3659                 # share on that peer, and the 'surprise_shares' clause above
3660                 # will have logged it.
3661-            # self.loop() will take care of finding new homes
3662             return
3663 
3664hunk ./src/allmydata/mutable/publish.py 1105
3665-        for shnum in shnums:
3666-            self.placed.add( (peerid, shnum) )
3667-            # and update the servermap
3668-            self._servermap.add_new_share(peerid, shnum,
3669+        # and update the servermap
3670+        # self.versioninfo is set during the last phase of publishing.
3671+        # If we get there, we know that responses correspond to placed
3672+        # shares, and can safely execute these statements.
3673+        if self.versioninfo:
3674+            self.log("wrote successfully: adding new share to servermap")
3675+            self._servermap.add_new_share(peerid, writer.shnum,
3676                                           self.versioninfo, started)
3677hunk ./src/allmydata/mutable/publish.py 1113
3678-
3679-        # self.loop() will take care of checking to see if we're done
3680+            self.placed.add( (peerid, writer.shnum) )
3681+        self._update_status()
3682+        # the next method in the deferred chain will check to see if
3683+        # we're done and successful.
3684         return
3685 
3686hunk ./src/allmydata/mutable/publish.py 1119
3687-    def _got_write_error(self, f, peerid, shnums, started):
3688-        for shnum in shnums:
3689-            self.outstanding.discard( (peerid, shnum) )
3690-        self.bad_peers.add(peerid)
3691-        if self._first_write_error is None:
3692-            self._first_write_error = f
3693-        self.log(format="error while writing shares %(shnums)s to peerid %(peerid)s",
3694-                 shnums=list(shnums), peerid=idlib.shortnodeid_b2a(peerid),
3695-                 failure=f,
3696-                 level=log.UNUSUAL)
3697-        # self.loop() will take care of checking to see if we're done
3698-        return
3699-
3700 
3701     def _done(self, res):
3702         if not self._running:
3703hunk ./src/allmydata/mutable/publish.py 1126
3704         self._running = False
3705         now = time.time()
3706         self._status.timings["total"] = now - self._started
3707+
3708+        elapsed = now - self._started_pushing
3709+        self._status.timings['push'] = elapsed
3710+
3711         self._status.set_active(False)
3712hunk ./src/allmydata/mutable/publish.py 1131
3713-        if isinstance(res, failure.Failure):
3714-            self.log("Publish done, with failure", failure=res,
3715-                     level=log.WEIRD, umid="nRsR9Q")
3716-            self._status.set_status("Failed")
3717-        elif self.surprised:
3718-            self.log("Publish done, UncoordinatedWriteError", level=log.UNUSUAL)
3719-            self._status.set_status("UncoordinatedWriteError")
3720-            # deliver a failure
3721-            res = failure.Failure(UncoordinatedWriteError())
3722-            # TODO: recovery
3723-        else:
3724-            self.log("Publish done, success")
3725-            self._status.set_status("Finished")
3726-            self._status.set_progress(1.0)
3727+        self.log("Publish done, success")
3728+        self._status.set_status("Finished")
3729+        self._status.set_progress(1.0)
3730         eventually(self.done_deferred.callback, res)
3731 
3732hunk ./src/allmydata/mutable/publish.py 1136
3733+    def _failure(self):
3734+
3735+        if not self.surprised:
3736+            # We ran out of servers
3737+            self.log("Publish ran out of good servers, "
3738+                     "last failure was: %s" % str(self._last_failure))
3739+            e = NotEnoughServersError("Ran out of non-bad servers, "
3740+                                      "last failure was %s" %
3741+                                      str(self._last_failure))
3742+        else:
3743+            # We ran into shares that we didn't recognize, which means
3744+            # that we need to return an UncoordinatedWriteError.
3745+            self.log("Publish failed with UncoordinatedWriteError")
3746+            e = UncoordinatedWriteError()
3747+        f = failure.Failure(e)
3748+        eventually(self.done_deferred.callback, f)
3749+
3750+
3751+class MutableFileHandle:
3752+    """
3753+    I am a mutable uploadable built around a filehandle-like object,
3754+    usually either a StringIO instance or a handle to an actual file.
3755+    """
3756+    implements(IMutableUploadable)
3757+
3758+    def __init__(self, filehandle):
3759+        # The filehandle is defined as a generally file-like object that
3760+        # has these two methods. We don't care beyond that.
3761+        assert hasattr(filehandle, "read")
3762+        assert hasattr(filehandle, "close")
3763+
3764+        self._filehandle = filehandle
3765+        # We must start reading at the beginning of the file, or we risk
3766+        # encountering errors when the data read does not match the size
3767+        # reported to the uploader.
3768+        self._filehandle.seek(0)
3769+
3770+        # We have not yet read anything, so our position is 0.
3771+        self._marker = 0
3772+
3773+
3774+    def get_size(self):
3775+        """
3776+        I return the amount of data in my filehandle.
3777+        """
3778+        if not hasattr(self, "_size"):
3779+            old_position = self._filehandle.tell()
3780+            # Seek to the end of the file by seeking 0 bytes from the
3781+            # file's end
3782+            self._filehandle.seek(0, 2) # 2 == os.SEEK_END in 2.5+
3783+            self._size = self._filehandle.tell()
3784+            # Restore the previous position, in case this was called
3785+            # after a read.
3786+            self._filehandle.seek(old_position)
3787+            assert self._filehandle.tell() == old_position
3788+
3789+        assert hasattr(self, "_size")
3790+        return self._size
3791+
3792+
3793+    def pos(self):
3794+        """
3795+        I return the position of my read marker -- i.e., how much data I
3796+        have already read and returned to callers.
3797+        """
3798+        return self._marker
3799+
3800+
3801+    def read(self, length):
3802+        """
3803+        I return some data (up to length bytes) from my filehandle.
3804+
3805+        In most cases, I return length bytes, but sometimes I won't --
3806+        for example, if I am asked to read beyond the end of a file, or
3807+        an error occurs.
3808+        """
3809+        results = self._filehandle.read(length)
3810+        self._marker += len(results)
3811+        return [results]
3812+
3813+
3814+    def close(self):
3815+        """
3816+        I close the underlying filehandle. Any further operations on the
3817+        filehandle fail at this point.
3818+        """
3819+        self._filehandle.close()
3820+
3821+
3822+class MutableData(MutableFileHandle):
3823+    """
3824+    I am a mutable uploadable built around a string, which I then cast
3825+    into a StringIO and treat as a filehandle.
3826+    """
3827+
3828+    def __init__(self, s):
3829+        # Take a string and return a file-like uploadable.
3830+        assert isinstance(s, str)
3831+
3832+        MutableFileHandle.__init__(self, StringIO(s))
3833+
3834+
3835+class TransformingUploadable:
3836+    """
3837+    I am an IMutableUploadable that wraps another IMutableUploadable,
3838+    and some segments that are already on the grid. When I am called to
3839+    read, I handle merging of boundary segments.
3840+    """
3841+    implements(IMutableUploadable)
3842+
3843+
3844+    def __init__(self, data, offset, segment_size, start, end):
3845+        assert IMutableUploadable.providedBy(data)
3846+
3847+        self._newdata = data
3848+        self._offset = offset
3849+        self._segment_size = segment_size
3850+        self._start = start
3851+        self._end = end
3852+
3853+        self._read_marker = 0
3854+
3855+        self._first_segment_offset = offset % segment_size
3856+
3857+        num = self.log("TransformingUploadable: starting", parent=None)
3858+        self._log_number = num
3859+        self.log("got fso: %d" % self._first_segment_offset)
3860+        self.log("got offset: %d" % self._offset)
3861+
3862+
3863+    def log(self, *args, **kwargs):
3864+        if 'parent' not in kwargs:
3865+            kwargs['parent'] = self._log_number
3866+        if "facility" not in kwargs:
3867+            kwargs["facility"] = "tahoe.mutable.transforminguploadable"
3868+        return log.msg(*args, **kwargs)
3869+
3870+
3871+    def get_size(self):
3872+        return self._offset + self._newdata.get_size()
3873+
3874+
3875+    def read(self, length):
3876+        # We can get data from 3 sources here.
3877+        #   1. The first of the segments provided to us.
3878+        #   2. The data that we're replacing things with.
3879+        #   3. The last of the segments provided to us.
3880+
3881+        # are we in state 0?
3882+        self.log("reading %d bytes" % length)
3883+
3884+        old_start_data = ""
3885+        old_data_length = self._first_segment_offset - self._read_marker
3886+        if old_data_length > 0:
3887+            if old_data_length > length:
3888+                old_data_length = length
3889+            self.log("returning %d bytes of old start data" % old_data_length)
3890+
3891+            old_data_end = old_data_length + self._read_marker
3892+            old_start_data = self._start[self._read_marker:old_data_end]
3893+            length -= old_data_length
3894+        else:
3895+            # otherwise calculations later get screwed up.
3896+            old_data_length = 0
3897+
3898+        # Is there enough new data to satisfy this read? If not, we need
3899+        # to pad the end of the data with data from our last segment.
3900+        old_end_length = length - \
3901+            (self._newdata.get_size() - self._newdata.pos())
3902+        old_end_data = ""
3903+        if old_end_length > 0:
3904+            self.log("reading %d bytes of old end data" % old_end_length)
3905+
3906+            # TODO: We're not explicitly checking for tail segment size
3907+            # here. Is that a problem?
3908+            old_data_offset = (length - old_end_length + \
3909+                               old_data_length) % self._segment_size
3910+            self.log("reading at offset %d" % old_data_offset)
3911+            old_end = old_data_offset + old_end_length
3912+            old_end_data = self._end[old_data_offset:old_end]
3913+            length -= old_end_length
3914+            assert length == self._newdata.get_size() - self._newdata.pos()
3915+
3916+        self.log("reading %d bytes of new data" % length)
3917+        new_data = self._newdata.read(length)
3918+        new_data = "".join(new_data)
3919+
3920+        self._read_marker += len(old_start_data + new_data + old_end_data)
3921+
3922+        return old_start_data + new_data + old_end_data
3923 
3924hunk ./src/allmydata/mutable/publish.py 1327
3925+    def close(self):
3926+        pass
3927}
3928[nodemaker.py: Make nodemaker expose a way to create MDMF files
3929Kevan Carstensen <kevan@isnotajoke.com>**20100819003509
3930 Ignore-this: a6701746d6b992fc07bc0556a2b4a61d
3931] {
3932hunk ./src/allmydata/nodemaker.py 3
3933 import weakref
3934 from zope.interface import implements
3935-from allmydata.interfaces import INodeMaker
3936+from allmydata.util.assertutil import precondition
3937+from allmydata.interfaces import INodeMaker, SDMF_VERSION
3938 from allmydata.immutable.literal import LiteralFileNode
3939 from allmydata.immutable.filenode import ImmutableFileNode, CiphertextFileNode
3940 from allmydata.immutable.upload import Data
3941hunk ./src/allmydata/nodemaker.py 9
3942 from allmydata.mutable.filenode import MutableFileNode
3943+from allmydata.mutable.publish import MutableData
3944 from allmydata.dirnode import DirectoryNode, pack_children
3945 from allmydata.unknown import UnknownNode
3946 from allmydata import uri
3947hunk ./src/allmydata/nodemaker.py 92
3948             return self._create_dirnode(filenode)
3949         return None
3950 
3951-    def create_mutable_file(self, contents=None, keysize=None):
3952+    def create_mutable_file(self, contents=None, keysize=None,
3953+                            version=SDMF_VERSION):
3954         n = MutableFileNode(self.storage_broker, self.secret_holder,
3955                             self.default_encoding_parameters, self.history)
3956hunk ./src/allmydata/nodemaker.py 96
3957+        n.set_version(version)
3958         d = self.key_generator.generate(keysize)
3959         d.addCallback(n.create_with_keys, contents)
3960         d.addCallback(lambda res: n)
3961hunk ./src/allmydata/nodemaker.py 103
3962         return d
3963 
3964     def create_new_mutable_directory(self, initial_children={}):
3965+        # mutable directories will always be SDMF for now, to help
3966+        # compatibility with older clients.
3967+        version = SDMF_VERSION
3968+        # initial_children must have metadata (i.e. {} instead of None)
3969+        for (name, (node, metadata)) in initial_children.iteritems():
3970+            precondition(isinstance(metadata, dict),
3971+                         "create_new_mutable_directory requires metadata to be a dict, not None", metadata)
3972+            node.raise_error()
3973         d = self.create_mutable_file(lambda n:
3974hunk ./src/allmydata/nodemaker.py 112
3975-                                     pack_children(initial_children, n.get_writekey()))
3976+                                     MutableData(pack_children(initial_children,
3977+                                                    n.get_writekey())),
3978+                                     version=version)
3979         d.addCallback(self._create_dirnode)
3980         return d
3981 
3982}
3983[docs: update docs to mention MDMF
3984Kevan Carstensen <kevan@isnotajoke.com>**20100814225644
3985 Ignore-this: 1c3caa3cd44831007dcfbef297814308
3986] {
3987merger 0.0 (
3988hunk ./docs/configuration.rst 324
3989+Frontend Configuration
3990+======================
3991+
3992+The Tahoe client process can run a variety of frontend file-access protocols.
3993+You will use these to create and retrieve files from the virtual filesystem.
3994+Configuration details for each are documented in the following
3995+protocol-specific guides:
3996+
3997+HTTP
3998+
3999+    Tahoe runs a webserver by default on port 3456. This interface provides a
4000+    human-oriented "WUI", with pages to create, modify, and browse
4001+    directories and files, as well as a number of pages to check on the
4002+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
4003+    with a REST-ful HTTP interface that can be used by other programs
4004+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
4005+    details, and the ``web.port`` and ``web.static`` config variables above.
4006+    The `<frontends/download-status.rst>`_ document also describes a few WUI
4007+    status pages.
4008+
4009+CLI
4010+
4011+    The main "bin/tahoe" executable includes subcommands for manipulating the
4012+    filesystem, uploading/downloading files, and creating/running Tahoe
4013+    nodes. See `<frontends/CLI.rst>`_ for details.
4014+
4015+FTP, SFTP
4016+
4017+    Tahoe can also run both FTP and SFTP servers, and map a username/password
4018+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
4019+    for instructions on configuring these services, and the ``[ftpd]`` and
4020+    ``[sftpd]`` sections of ``tahoe.cfg``.
4021+
4022merger 0.0 (
4023replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
4024merger 0.0 (
4025hunk ./docs/configuration.rst 384
4026-shares.needed = (int, optional) aka "k", default 3
4027-shares.total = (int, optional) aka "N", N >= k, default 10
4028-shares.happy = (int, optional) 1 <= happy <= N, default 7
4029-
4030- These three values set the default encoding parameters. Each time a new file
4031- is uploaded, erasure-coding is used to break the ciphertext into separate
4032- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
4033- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
4034- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
4035- Setting k to 1 is equivalent to simple replication (uploading N copies of
4036- the file).
4037-
4038- These values control the tradeoff between storage overhead, performance, and
4039- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
4040- backend storage space (the actual value will be a bit more, because of other
4041- forms of overhead). Up to N-k shares can be lost before the file becomes
4042- unrecoverable, so assuming there are at least N servers, up to N-k servers
4043- can be offline without losing the file. So large N/k ratios are more
4044- reliable, and small N/k ratios use less disk space. Clearly, k must never be
4045- smaller than N.
4046-
4047- Large values of N will slow down upload operations slightly, since more
4048- servers must be involved, and will slightly increase storage overhead due to
4049- the hash trees that are created. Large values of k will cause downloads to
4050- be marginally slower, because more servers must be involved. N cannot be
4051- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
4052- uses.
4053-
4054- shares.happy allows you control over the distribution of your immutable file.
4055- For a successful upload, shares are guaranteed to be initially placed on
4056- at least 'shares.happy' distinct servers, the correct functioning of any
4057- k of which is sufficient to guarantee the availability of the uploaded file.
4058- This value should not be larger than the number of servers on your grid.
4059-
4060- A value of shares.happy <= k is allowed, but does not provide any redundancy
4061- if some servers fail or lose shares.
4062-
4063- (Mutable files use a different share placement algorithm that does not
4064-  consider this parameter.)
4065-
4066-
4067-== Storage Server Configuration ==
4068-
4069-[storage]
4070-enabled = (boolean, optional)
4071-
4072- If this is True, the node will run a storage server, offering space to other
4073- clients. If it is False, the node will not run a storage server, meaning
4074- that no shares will be stored on this node. Use False this for clients who
4075- do not wish to provide storage service. The default value is True.
4076-
4077-readonly = (boolean, optional)
4078-
4079- If True, the node will run a storage server but will not accept any shares,
4080- making it effectively read-only. Use this for storage servers which are
4081- being decommissioned: the storage/ directory could be mounted read-only,
4082- while shares are moved to other servers. Note that this currently only
4083- affects immutable shares. Mutable shares (used for directories) will be
4084- written and modified anyway. See ticket #390 for the current status of this
4085- bug. The default value is False.
4086-
4087-reserved_space = (str, optional)
4088-
4089- If provided, this value defines how much disk space is reserved: the storage
4090- server will not accept any share which causes the amount of free disk space
4091- to drop below this value. (The free space is measured by a call to statvfs(2)
4092- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
4093- user account under which the storage server runs.)
4094-
4095- This string contains a number, with an optional case-insensitive scale
4096- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
4097- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
4098- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
4099-
4100-expire.enabled =
4101-expire.mode =
4102-expire.override_lease_duration =
4103-expire.cutoff_date =
4104-expire.immutable =
4105-expire.mutable =
4106-
4107- These settings control garbage-collection, in which the server will delete
4108- shares that no longer have an up-to-date lease on them. Please see the
4109- neighboring "garbage-collection.txt" document for full details.
4110-
4111-
4112-== Running A Helper ==
4113+Running A Helper
4114+================
4115hunk ./docs/configuration.rst 424
4116+mutable.format = sdmf or mdmf
4117+
4118+ This value tells Tahoe-LAFS what the default mutable file format should
4119+ be. If mutable.format=sdmf, then newly created mutable files will be in
4120+ the old SDMF format. This is desirable for clients that operate on
4121+ grids where some peers run older versions of Tahoe-LAFS, as these older
4122+ versions cannot read the new MDMF mutable file format. If
4123+ mutable.format = mdmf, then newly created mutable files will use the
4124+ new MDMF format, which supports efficient in-place modification and
4125+ streaming downloads. You can overwrite this value using a special
4126+ mutable-type parameter in the webapi. If you do not specify a value
4127+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
4128+
4129+ Note that this parameter only applies to mutable files. Mutable
4130+ directories, which are stored as mutable files, are not controlled by
4131+ this parameter and will always use SDMF. We may revisit this decision
4132+ in future versions of Tahoe-LAFS.
4133)
4134)
4135)
4136hunk ./docs/frontends/webapi.rst 363
4137  writeable mutable file, that file's contents will be overwritten in-place. If
4138  it is a read-cap for a mutable file, an error will occur. If it is an
4139  immutable file, the old file will be discarded, and a new one will be put in
4140- its place.
4141+ its place. If the target file is a writable mutable file, you may also
4142+ specify an "offset" parameter -- a byte offset that determines where in
4143+ the mutable file the data from the HTTP request body is placed. This
4144+ operation is relatively efficient for MDMF mutable files, and is
4145+ relatively inefficient (but still supported) for SDMF mutable files.
4146 
4147  When creating a new file, if "mutable=true" is in the query arguments, the
4148  operation will create a mutable file instead of an immutable one.
4149hunk ./docs/frontends/webapi.rst 388
4150 
4151  If "mutable=true" is in the query arguments, the operation will create a
4152  mutable file, and return its write-cap in the HTTP respose. The default is
4153- to create an immutable file, returning the read-cap as a response.
4154+ to create an immutable file, returning the read-cap as a response. If
4155+ you create a mutable file, you can also use the "mutable-type" query
4156+ parameter. If "mutable-type=sdmf", then the mutable file will be created
4157+ in the old SDMF mutable file format. This is desirable for files that
4158+ need to be read by old clients. If "mutable-type=mdmf", then the file
4159+ will be created in the new MDMF mutable file format. MDMF mutable files
4160+ can be downloaded more efficiently, and modified in-place efficiently,
4161+ but are not compatible with older versions of Tahoe-LAFS. If no
4162+ "mutable-type" argument is given, the file is created in whatever
4163+ format was configured in tahoe.cfg.
4164 
4165 Creating A New Directory
4166 ------------------------
4167hunk ./docs/frontends/webapi.rst 1082
4168  If a "mutable=true" argument is provided, the operation will create a
4169  mutable file, and the response body will contain the write-cap instead of
4170  the upload results page. The default is to create an immutable file,
4171- returning the upload results page as a response.
4172+ returning the upload results page as a response. If you create a
4173+ mutable file, you may choose to specify the format of that mutable file
4174+ with the "mutable-type" parameter. If "mutable-type=mdmf", then the
4175+ file will be created as an MDMF mutable file. If "mutable-type=sdmf",
4176+ then the file will be created as an SDMF mutable file. If no value is
4177+ specified, the file will be created in whatever format is specified in
4178+ tahoe.cfg.
4179 
4180 
4181 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=upload``
4182}
4183[mutable/layout.py and interfaces.py: add MDMF writer and reader
4184Kevan Carstensen <kevan@isnotajoke.com>**20100819003304
4185 Ignore-this: 44400fec923987b62830da2ed5075fb4
4186 
4187 The MDMF writer is responsible for keeping state as plaintext is
4188 gradually processed into share data by the upload process. When the
4189 upload finishes, it will write all of its share data to a remote server,
4190 reporting its status back to the publisher.
4191 
4192 The MDMF reader is responsible for abstracting an MDMF file as it sits
4193 on the grid from the downloader; specifically, by receiving and
4194 responding to requests for arbitrary data within the MDMF file.
4195 
4196 The interfaces.py file has also been modified to contain an interface
4197 for the writer.
4198] {
4199hunk ./src/allmydata/interfaces.py 7
4200      ChoiceOf, IntegerConstraint, Any, RemoteInterface, Referenceable
4201 
4202 HASH_SIZE=32
4203+SALT_SIZE=16
4204+
4205+SDMF_VERSION=0
4206+MDMF_VERSION=1
4207 
4208 Hash = StringConstraint(maxLength=HASH_SIZE,
4209                         minLength=HASH_SIZE)# binary format 32-byte SHA256 hash
4210hunk ./src/allmydata/interfaces.py 424
4211         """
4212 
4213 
4214+class IMutableSlotWriter(Interface):
4215+    """
4216+    The interface for a writer around a mutable slot on a remote server.
4217+    """
4218+    def set_checkstring(checkstring, *args):
4219+        """
4220+        Set the checkstring that I will pass to the remote server when
4221+        writing.
4222+
4223+            @param checkstring A packed checkstring to use.
4224+
4225+        Note that implementations can differ in which semantics they
4226+        wish to support for set_checkstring -- they can, for example,
4227+        build the checkstring themselves from its constituents, or
4228+        some other thing.
4229+        """
4230+
4231+    def get_checkstring():
4232+        """
4233+        Get the checkstring that I think currently exists on the remote
4234+        server.
4235+        """
4236+
4237+    def put_block(data, segnum, salt):
4238+        """
4239+        Add a block and salt to the share.
4240+        """
4241+
4242+    def put_encprivey(encprivkey):
4243+        """
4244+        Add the encrypted private key to the share.
4245+        """
4246+
4247+    def put_blockhashes(blockhashes=list):
4248+        """
4249+        Add the block hash tree to the share.
4250+        """
4251+
4252+    def put_sharehashes(sharehashes=dict):
4253+        """
4254+        Add the share hash chain to the share.
4255+        """
4256+
4257+    def get_signable():
4258+        """
4259+        Return the part of the share that needs to be signed.
4260+        """
4261+
4262+    def put_signature(signature):
4263+        """
4264+        Add the signature to the share.
4265+        """
4266+
4267+    def put_verification_key(verification_key):
4268+        """
4269+        Add the verification key to the share.
4270+        """
4271+
4272+    def finish_publishing():
4273+        """
4274+        Do anything necessary to finish writing the share to a remote
4275+        server. I require that no further publishing needs to take place
4276+        after this method has been called.
4277+        """
4278+
4279+
4280 class IURI(Interface):
4281     def init_from_string(uri):
4282         """Accept a string (as created by my to_string() method) and populate
4283hunk ./src/allmydata/mutable/layout.py 4
4284 
4285 import struct
4286 from allmydata.mutable.common import NeedMoreDataError, UnknownVersionError
4287+from allmydata.interfaces import HASH_SIZE, SALT_SIZE, SDMF_VERSION, \
4288+                                 MDMF_VERSION, IMutableSlotWriter
4289+from allmydata.util import mathutil, observer
4290+from twisted.python import failure
4291+from twisted.internet import defer
4292+from zope.interface import implements
4293+
4294+
4295+# These strings describe the format of the packed structs they help process
4296+# Here's what they mean:
4297+#
4298+#  PREFIX:
4299+#    >: Big-endian byte order; the most significant byte is first (leftmost).
4300+#    B: The version information; an 8 bit version identifier. Stored as
4301+#       an unsigned char. This is currently 00 00 00 00; our modifications
4302+#       will turn it into 00 00 00 01.
4303+#    Q: The sequence number; this is sort of like a revision history for
4304+#       mutable files; they start at 1 and increase as they are changed after
4305+#       being uploaded. Stored as an unsigned long long, which is 8 bytes in
4306+#       length.
4307+#  32s: The root hash of the share hash tree. We use sha-256d, so we use 32
4308+#       characters = 32 bytes to store the value.
4309+#  16s: The salt for the readkey. This is a 16-byte random value, stored as
4310+#       16 characters.
4311+#
4312+#  SIGNED_PREFIX additions, things that are covered by the signature:
4313+#    B: The "k" encoding parameter. We store this as an 8-bit character,
4314+#       which is convenient because our erasure coding scheme cannot
4315+#       encode if you ask for more than 255 pieces.
4316+#    B: The "N" encoding parameter. Stored as an 8-bit character for the
4317+#       same reasons as above.
4318+#    Q: The segment size of the uploaded file. This will essentially be the
4319+#       length of the file in SDMF. An unsigned long long, so we can store
4320+#       files of quite large size.
4321+#    Q: The data length of the uploaded file. Modulo padding, this will be
4322+#       the same of the data length field. Like the data length field, it is
4323+#       an unsigned long long and can be quite large.
4324+#
4325+#   HEADER additions:
4326+#     L: The offset of the signature of this. An unsigned long.
4327+#     L: The offset of the share hash chain. An unsigned long.
4328+#     L: The offset of the block hash tree. An unsigned long.
4329+#     L: The offset of the share data. An unsigned long.
4330+#     Q: The offset of the encrypted private key. An unsigned long long, to
4331+#        account for the possibility of a lot of share data.
4332+#     Q: The offset of the EOF. An unsigned long long, to account for the
4333+#        possibility of a lot of share data.
4334+#
4335+#  After all of these, we have the following:
4336+#    - The verification key: Occupies the space between the end of the header
4337+#      and the start of the signature (i.e.: data[HEADER_LENGTH:o['signature']].
4338+#    - The signature, which goes from the signature offset to the share hash
4339+#      chain offset.
4340+#    - The share hash chain, which goes from the share hash chain offset to
4341+#      the block hash tree offset.
4342+#    - The share data, which goes from the share data offset to the encrypted
4343+#      private key offset.
4344+#    - The encrypted private key offset, which goes until the end of the file.
4345+#
4346+#  The block hash tree in this encoding has only one share, so the offset of
4347+#  the share data will be 32 bits more than the offset of the block hash tree.
4348+#  Given this, we may need to check to see how many bytes a reasonably sized
4349+#  block hash tree will take up.
4350 
4351 PREFIX = ">BQ32s16s" # each version has a different prefix
4352 SIGNED_PREFIX = ">BQ32s16s BBQQ" # this is covered by the signature
4353hunk ./src/allmydata/mutable/layout.py 73
4354 SIGNED_PREFIX_LENGTH = struct.calcsize(SIGNED_PREFIX)
4355 HEADER = ">BQ32s16s BBQQ LLLLQQ" # includes offsets
4356 HEADER_LENGTH = struct.calcsize(HEADER)
4357+OFFSETS = ">LLLLQQ"
4358+OFFSETS_LENGTH = struct.calcsize(OFFSETS)
4359 
4360hunk ./src/allmydata/mutable/layout.py 76
4361+# These are still used for some tests.
4362 def unpack_header(data):
4363     o = {}
4364     (version,
4365hunk ./src/allmydata/mutable/layout.py 92
4366      o['EOF']) = struct.unpack(HEADER, data[:HEADER_LENGTH])
4367     return (version, seqnum, root_hash, IV, k, N, segsize, datalen, o)
4368 
4369-def unpack_prefix_and_signature(data):
4370-    assert len(data) >= HEADER_LENGTH, len(data)
4371-    prefix = data[:SIGNED_PREFIX_LENGTH]
4372-
4373-    (version,
4374-     seqnum,
4375-     root_hash,
4376-     IV,
4377-     k, N, segsize, datalen,
4378-     o) = unpack_header(data)
4379-
4380-    if version != 0:
4381-        raise UnknownVersionError("got mutable share version %d, but I only understand version 0" % version)
4382-
4383-    if len(data) < o['share_hash_chain']:
4384-        raise NeedMoreDataError(o['share_hash_chain'],
4385-                                o['enc_privkey'], o['EOF']-o['enc_privkey'])
4386-
4387-    pubkey_s = data[HEADER_LENGTH:o['signature']]
4388-    signature = data[o['signature']:o['share_hash_chain']]
4389-
4390-    return (seqnum, root_hash, IV, k, N, segsize, datalen,
4391-            pubkey_s, signature, prefix)
4392-
4393 def unpack_share(data):
4394     assert len(data) >= HEADER_LENGTH
4395     o = {}
4396hunk ./src/allmydata/mutable/layout.py 139
4397             pubkey, signature, share_hash_chain, block_hash_tree,
4398             share_data, enc_privkey)
4399 
4400-def unpack_share_data(verinfo, hash_and_data):
4401-    (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, o_t) = verinfo
4402-
4403-    # hash_and_data starts with the share_hash_chain, so figure out what the
4404-    # offsets really are
4405-    o = dict(o_t)
4406-    o_share_hash_chain = 0
4407-    o_block_hash_tree = o['block_hash_tree'] - o['share_hash_chain']
4408-    o_share_data = o['share_data'] - o['share_hash_chain']
4409-    o_enc_privkey = o['enc_privkey'] - o['share_hash_chain']
4410-
4411-    share_hash_chain_s = hash_and_data[o_share_hash_chain:o_block_hash_tree]
4412-    share_hash_format = ">H32s"
4413-    hsize = struct.calcsize(share_hash_format)
4414-    assert len(share_hash_chain_s) % hsize == 0, len(share_hash_chain_s)
4415-    share_hash_chain = []
4416-    for i in range(0, len(share_hash_chain_s), hsize):
4417-        chunk = share_hash_chain_s[i:i+hsize]
4418-        (hid, h) = struct.unpack(share_hash_format, chunk)
4419-        share_hash_chain.append( (hid, h) )
4420-    share_hash_chain = dict(share_hash_chain)
4421-    block_hash_tree_s = hash_and_data[o_block_hash_tree:o_share_data]
4422-    assert len(block_hash_tree_s) % 32 == 0, len(block_hash_tree_s)
4423-    block_hash_tree = []
4424-    for i in range(0, len(block_hash_tree_s), 32):
4425-        block_hash_tree.append(block_hash_tree_s[i:i+32])
4426-
4427-    share_data = hash_and_data[o_share_data:o_enc_privkey]
4428-
4429-    return (share_hash_chain, block_hash_tree, share_data)
4430-
4431-
4432-def pack_checkstring(seqnum, root_hash, IV):
4433-    return struct.pack(PREFIX,
4434-                       0, # version,
4435-                       seqnum,
4436-                       root_hash,
4437-                       IV)
4438-
4439 def unpack_checkstring(checkstring):
4440     cs_len = struct.calcsize(PREFIX)
4441     version, seqnum, root_hash, IV = struct.unpack(PREFIX, checkstring[:cs_len])
4442hunk ./src/allmydata/mutable/layout.py 146
4443         raise UnknownVersionError("got mutable share version %d, but I only understand version 0" % version)
4444     return (seqnum, root_hash, IV)
4445 
4446-def pack_prefix(seqnum, root_hash, IV,
4447-                required_shares, total_shares,
4448-                segment_size, data_length):
4449-    prefix = struct.pack(SIGNED_PREFIX,
4450-                         0, # version,
4451-                         seqnum,
4452-                         root_hash,
4453-                         IV,
4454-
4455-                         required_shares,
4456-                         total_shares,
4457-                         segment_size,
4458-                         data_length,
4459-                         )
4460-    return prefix
4461 
4462 def pack_offsets(verification_key_length, signature_length,
4463                  share_hash_chain_length, block_hash_tree_length,
4464hunk ./src/allmydata/mutable/layout.py 192
4465                            encprivkey])
4466     return final_share
4467 
4468+def pack_prefix(seqnum, root_hash, IV,
4469+                required_shares, total_shares,
4470+                segment_size, data_length):
4471+    prefix = struct.pack(SIGNED_PREFIX,
4472+                         0, # version,
4473+                         seqnum,
4474+                         root_hash,
4475+                         IV,
4476+                         required_shares,
4477+                         total_shares,
4478+                         segment_size,
4479+                         data_length,
4480+                         )
4481+    return prefix
4482+
4483+
4484+class SDMFSlotWriteProxy:
4485+    implements(IMutableSlotWriter)
4486+    """
4487+    I represent a remote write slot for an SDMF mutable file. I build a
4488+    share in memory, and then write it in one piece to the remote
4489+    server. This mimics how SDMF shares were built before MDMF (and the
4490+    new MDMF uploader), but provides that functionality in a way that
4491+    allows the MDMF uploader to be built without much special-casing for
4492+    file format, which makes the uploader code more readable.
4493+    """
4494+    def __init__(self,
4495+                 shnum,
4496+                 rref, # a remote reference to a storage server
4497+                 storage_index,
4498+                 secrets, # (write_enabler, renew_secret, cancel_secret)
4499+                 seqnum, # the sequence number of the mutable file
4500+                 required_shares,
4501+                 total_shares,
4502+                 segment_size,
4503+                 data_length): # the length of the original file
4504+        self.shnum = shnum
4505+        self._rref = rref
4506+        self._storage_index = storage_index
4507+        self._secrets = secrets
4508+        self._seqnum = seqnum
4509+        self._required_shares = required_shares
4510+        self._total_shares = total_shares
4511+        self._segment_size = segment_size
4512+        self._data_length = data_length
4513+
4514+        # This is an SDMF file, so it should have only one segment, so,
4515+        # modulo padding of the data length, the segment size and the
4516+        # data length should be the same.
4517+        expected_segment_size = mathutil.next_multiple(data_length,
4518+                                                       self._required_shares)
4519+        assert expected_segment_size == segment_size
4520+
4521+        self._block_size = self._segment_size / self._required_shares
4522+
4523+        # This is meant to mimic how SDMF files were built before MDMF
4524+        # entered the picture: we generate each share in its entirety,
4525+        # then push it off to the storage server in one write. When
4526+        # callers call set_*, they are just populating this dict.
4527+        # finish_publishing will stitch these pieces together into a
4528+        # coherent share, and then write the coherent share to the
4529+        # storage server.
4530+        self._share_pieces = {}
4531+
4532+        # This tells the write logic what checkstring to use when
4533+        # writing remote shares.
4534+        self._testvs = []
4535+
4536+        self._readvs = [(0, struct.calcsize(PREFIX))]
4537+
4538+
4539+    def set_checkstring(self, checkstring_or_seqnum,
4540+                              root_hash=None,
4541+                              salt=None):
4542+        """
4543+        Set the checkstring that I will pass to the remote server when
4544+        writing.
4545+
4546+            @param checkstring_or_seqnum: A packed checkstring to use,
4547+                   or a sequence number. I will treat this as a checkstr
4548+
4549+        Note that implementations can differ in which semantics they
4550+        wish to support for set_checkstring -- they can, for example,
4551+        build the checkstring themselves from its constituents, or
4552+        some other thing.
4553+        """
4554+        if root_hash and salt:
4555+            checkstring = struct.pack(PREFIX,
4556+                                      0,
4557+                                      checkstring_or_seqnum,
4558+                                      root_hash,
4559+                                      salt)
4560+        else:
4561+            checkstring = checkstring_or_seqnum
4562+        self._testvs = [(0, len(checkstring), "eq", checkstring)]
4563+
4564+
4565+    def get_checkstring(self):
4566+        """
4567+        Get the checkstring that I think currently exists on the remote
4568+        server.
4569+        """
4570+        if self._testvs:
4571+            return self._testvs[0][3]
4572+        return ""
4573+
4574+
4575+    def put_block(self, data, segnum, salt):
4576+        """
4577+        Add a block and salt to the share.
4578+        """
4579+        # SDMF files have only one segment
4580+        assert segnum == 0
4581+        assert len(data) == self._block_size
4582+        assert len(salt) == SALT_SIZE
4583+
4584+        self._share_pieces['sharedata'] = data
4585+        self._share_pieces['salt'] = salt
4586+
4587+        # TODO: Figure out something intelligent to return.
4588+        return defer.succeed(None)
4589+
4590+
4591+    def put_encprivkey(self, encprivkey):
4592+        """
4593+        Add the encrypted private key to the share.
4594+        """
4595+        self._share_pieces['encprivkey'] = encprivkey
4596+
4597+        return defer.succeed(None)
4598+
4599+
4600+    def put_blockhashes(self, blockhashes):
4601+        """
4602+        Add the block hash tree to the share.
4603+        """
4604+        assert isinstance(blockhashes, list)
4605+        for h in blockhashes:
4606+            assert len(h) == HASH_SIZE
4607+
4608+        # serialize the blockhashes, then set them.
4609+        blockhashes_s = "".join(blockhashes)
4610+        self._share_pieces['block_hash_tree'] = blockhashes_s
4611+
4612+        return defer.succeed(None)
4613+
4614+
4615+    def put_sharehashes(self, sharehashes):
4616+        """
4617+        Add the share hash chain to the share.
4618+        """
4619+        assert isinstance(sharehashes, dict)
4620+        for h in sharehashes.itervalues():
4621+            assert len(h) == HASH_SIZE
4622+
4623+        # serialize the sharehashes, then set them.
4624+        sharehashes_s = "".join([struct.pack(">H32s", i, sharehashes[i])
4625+                                 for i in sorted(sharehashes.keys())])
4626+        self._share_pieces['share_hash_chain'] = sharehashes_s
4627+
4628+        return defer.succeed(None)
4629+
4630+
4631+    def put_root_hash(self, root_hash):
4632+        """
4633+        Add the root hash to the share.
4634+        """
4635+        assert len(root_hash) == HASH_SIZE
4636+
4637+        self._share_pieces['root_hash'] = root_hash
4638+
4639+        return defer.succeed(None)
4640+
4641+
4642+    def put_salt(self, salt):
4643+        """
4644+        Add a salt to an empty SDMF file.
4645+        """
4646+        assert len(salt) == SALT_SIZE
4647+
4648+        self._share_pieces['salt'] = salt
4649+        self._share_pieces['sharedata'] = ""
4650+
4651+
4652+    def get_signable(self):
4653+        """
4654+        Return the part of the share that needs to be signed.
4655+
4656+        SDMF writers need to sign the packed representation of the
4657+        first eight fields of the remote share, that is:
4658+            - version number (0)
4659+            - sequence number
4660+            - root of the share hash tree
4661+            - salt
4662+            - k
4663+            - n
4664+            - segsize
4665+            - datalen
4666+
4667+        This method is responsible for returning that to callers.
4668+        """
4669+        return struct.pack(SIGNED_PREFIX,
4670+                           0,
4671+                           self._seqnum,
4672+                           self._share_pieces['root_hash'],
4673+                           self._share_pieces['salt'],
4674+                           self._required_shares,
4675+                           self._total_shares,
4676+                           self._segment_size,
4677+                           self._data_length)
4678+
4679+
4680+    def put_signature(self, signature):
4681+        """
4682+        Add the signature to the share.
4683+        """
4684+        self._share_pieces['signature'] = signature
4685+
4686+        return defer.succeed(None)
4687+
4688+
4689+    def put_verification_key(self, verification_key):
4690+        """
4691+        Add the verification key to the share.
4692+        """
4693+        self._share_pieces['verification_key'] = verification_key
4694+
4695+        return defer.succeed(None)
4696+
4697+
4698+    def get_verinfo(self):
4699+        """
4700+        I return my verinfo tuple. This is used by the ServermapUpdater
4701+        to keep track of versions of mutable files.
4702+
4703+        The verinfo tuple for MDMF files contains:
4704+            - seqnum
4705+            - root hash
4706+            - a blank (nothing)
4707+            - segsize
4708+            - datalen
4709+            - k
4710+            - n
4711+            - prefix (the thing that you sign)
4712+            - a tuple of offsets
4713+
4714+        We include the nonce in MDMF to simplify processing of version
4715+        information tuples.
4716+
4717+        The verinfo tuple for SDMF files is the same, but contains a
4718+        16-byte IV instead of a hash of salts.
4719+        """
4720+        return (self._seqnum,
4721+                self._share_pieces['root_hash'],
4722+                self._share_pieces['salt'],
4723+                self._segment_size,
4724+                self._data_length,
4725+                self._required_shares,
4726+                self._total_shares,
4727+                self.get_signable(),
4728+                self._get_offsets_tuple())
4729+
4730+    def _get_offsets_dict(self):
4731+        post_offset = HEADER_LENGTH
4732+        offsets = {}
4733+
4734+        verification_key_length = len(self._share_pieces['verification_key'])
4735+        o1 = offsets['signature'] = post_offset + verification_key_length
4736+
4737+        signature_length = len(self._share_pieces['signature'])
4738+        o2 = offsets['share_hash_chain'] = o1 + signature_length
4739+
4740+        share_hash_chain_length = len(self._share_pieces['share_hash_chain'])
4741+        o3 = offsets['block_hash_tree'] = o2 + share_hash_chain_length
4742+
4743+        block_hash_tree_length = len(self._share_pieces['block_hash_tree'])
4744+        o4 = offsets['share_data'] = o3 + block_hash_tree_length
4745+
4746+        share_data_length = len(self._share_pieces['sharedata'])
4747+        o5 = offsets['enc_privkey'] = o4 + share_data_length
4748+
4749+        encprivkey_length = len(self._share_pieces['encprivkey'])
4750+        offsets['EOF'] = o5 + encprivkey_length
4751+        return offsets
4752+
4753+
4754+    def _get_offsets_tuple(self):
4755+        offsets = self._get_offsets_dict()
4756+        return tuple([(key, value) for key, value in offsets.items()])
4757+
4758+
4759+    def _pack_offsets(self):
4760+        offsets = self._get_offsets_dict()
4761+        return struct.pack(">LLLLQQ",
4762+                           offsets['signature'],
4763+                           offsets['share_hash_chain'],
4764+                           offsets['block_hash_tree'],
4765+                           offsets['share_data'],
4766+                           offsets['enc_privkey'],
4767+                           offsets['EOF'])
4768+
4769+
4770+    def finish_publishing(self):
4771+        """
4772+        Do anything necessary to finish writing the share to a remote
4773+        server. I require that no further publishing needs to take place
4774+        after this method has been called.
4775+        """
4776+        for k in ["sharedata", "encprivkey", "signature", "verification_key",
4777+                  "share_hash_chain", "block_hash_tree"]:
4778+            assert k in self._share_pieces
4779+        # This is the only method that actually writes something to the
4780+        # remote server.
4781+        # First, we need to pack the share into data that we can write
4782+        # to the remote server in one write.
4783+        offsets = self._pack_offsets()
4784+        prefix = self.get_signable()
4785+        final_share = "".join([prefix,
4786+                               offsets,
4787+                               self._share_pieces['verification_key'],
4788+                               self._share_pieces['signature'],
4789+                               self._share_pieces['share_hash_chain'],
4790+                               self._share_pieces['block_hash_tree'],
4791+                               self._share_pieces['sharedata'],
4792+                               self._share_pieces['encprivkey']])
4793+
4794+        # Our only data vector is going to be writing the final share,
4795+        # in its entirely.
4796+        datavs = [(0, final_share)]
4797+
4798+        if not self._testvs:
4799+            # Our caller has not provided us with another checkstring
4800+            # yet, so we assume that we are writing a new share, and set
4801+            # a test vector that will allow a new share to be written.
4802+            self._testvs = []
4803+            self._testvs.append(tuple([0, 1, "eq", ""]))
4804+
4805+        tw_vectors = {}
4806+        tw_vectors[self.shnum] = (self._testvs, datavs, None)
4807+        return self._rref.callRemote("slot_testv_and_readv_and_writev",
4808+                                     self._storage_index,
4809+                                     self._secrets,
4810+                                     tw_vectors,
4811+                                     # TODO is it useful to read something?
4812+                                     self._readvs)
4813+
4814+
4815+MDMFHEADER = ">BQ32sBBQQ QQQQQQ"
4816+MDMFHEADERWITHOUTOFFSETS = ">BQ32sBBQQ"
4817+MDMFHEADERSIZE = struct.calcsize(MDMFHEADER)
4818+MDMFHEADERWITHOUTOFFSETSSIZE = struct.calcsize(MDMFHEADERWITHOUTOFFSETS)
4819+MDMFCHECKSTRING = ">BQ32s"
4820+MDMFSIGNABLEHEADER = ">BQ32sBBQQ"
4821+MDMFOFFSETS = ">QQQQQQ"
4822+MDMFOFFSETS_LENGTH = struct.calcsize(MDMFOFFSETS)
4823+
4824+class MDMFSlotWriteProxy:
4825+    implements(IMutableSlotWriter)
4826+
4827+    """
4828+    I represent a remote write slot for an MDMF mutable file.
4829+
4830+    I abstract away from my caller the details of block and salt
4831+    management, and the implementation of the on-disk format for MDMF
4832+    shares.
4833+    """
4834+    # Expected layout, MDMF:
4835+    # offset:     size:       name:
4836+    #-- signed part --
4837+    # 0           1           version number (01)
4838+    # 1           8           sequence number
4839+    # 9           32          share tree root hash
4840+    # 41          1           The "k" encoding parameter
4841+    # 42          1           The "N" encoding parameter
4842+    # 43          8           The segment size of the uploaded file
4843+    # 51          8           The data length of the original plaintext
4844+    #-- end signed part --
4845+    # 59          8           The offset of the encrypted private key
4846+    # 83          8           The offset of the signature
4847+    # 91          8           The offset of the verification key
4848+    # 67          8           The offset of the block hash tree
4849+    # 75          8           The offset of the share hash chain
4850+    # 99          8           The offset of the EOF
4851+    #
4852+    # followed by salts and share data, the encrypted private key, the
4853+    # block hash tree, the salt hash tree, the share hash chain, a
4854+    # signature over the first eight fields, and a verification key.
4855+    #
4856+    # The checkstring is the first three fields -- the version number,
4857+    # sequence number, root hash and root salt hash. This is consistent
4858+    # in meaning to what we have with SDMF files, except now instead of
4859+    # using the literal salt, we use a value derived from all of the
4860+    # salts -- the share hash root.
4861+    #
4862+    # The salt is stored before the block for each segment. The block
4863+    # hash tree is computed over the combination of block and salt for
4864+    # each segment. In this way, we get integrity checking for both
4865+    # block and salt with the current block hash tree arrangement.
4866+    #
4867+    # The ordering of the offsets is different to reflect the dependencies
4868+    # that we'll run into with an MDMF file. The expected write flow is
4869+    # something like this:
4870+    #
4871+    #   0: Initialize with the sequence number, encoding parameters and
4872+    #      data length. From this, we can deduce the number of segments,
4873+    #      and where they should go.. We can also figure out where the
4874+    #      encrypted private key should go, because we can figure out how
4875+    #      big the share data will be.
4876+    #
4877+    #   1: Encrypt, encode, and upload the file in chunks. Do something
4878+    #      like
4879+    #
4880+    #       put_block(data, segnum, salt)
4881+    #
4882+    #      to write a block and a salt to the disk. We can do both of
4883+    #      these operations now because we have enough of the offsets to
4884+    #      know where to put them.
4885+    #
4886+    #   2: Put the encrypted private key. Use:
4887+    #
4888+    #        put_encprivkey(encprivkey)
4889+    #
4890+    #      Now that we know the length of the private key, we can fill
4891+    #      in the offset for the block hash tree.
4892+    #
4893+    #   3: We're now in a position to upload the block hash tree for
4894+    #      a share. Put that using something like:
4895+    #       
4896+    #        put_blockhashes(block_hash_tree)
4897+    #
4898+    #      Note that block_hash_tree is a list of hashes -- we'll take
4899+    #      care of the details of serializing that appropriately. When
4900+    #      we get the block hash tree, we are also in a position to
4901+    #      calculate the offset for the share hash chain, and fill that
4902+    #      into the offsets table.
4903+    #
4904+    #   4: At the same time, we're in a position to upload the salt hash
4905+    #      tree. This is a Merkle tree over all of the salts. We use a
4906+    #      Merkle tree so that we can validate each block,salt pair as
4907+    #      we download them later. We do this using
4908+    #
4909+    #        put_salthashes(salt_hash_tree)
4910+    #
4911+    #      When you do this, I automatically put the root of the tree
4912+    #      (the hash at index 0 of the list) in its appropriate slot in
4913+    #      the signed prefix of the share.
4914+    #
4915+    #   5: We're now in a position to upload the share hash chain for
4916+    #      a share. Do that with something like:
4917+    #     
4918+    #        put_sharehashes(share_hash_chain)
4919+    #
4920+    #      share_hash_chain should be a dictionary mapping shnums to
4921+    #      32-byte hashes -- the wrapper handles serialization.
4922+    #      We'll know where to put the signature at this point, also.
4923+    #      The root of this tree will be put explicitly in the next
4924+    #      step.
4925+    #
4926+    #      TODO: Why? Why not just include it in the tree here?
4927+    #
4928+    #   6: Before putting the signature, we must first put the
4929+    #      root_hash. Do this with:
4930+    #
4931+    #        put_root_hash(root_hash).
4932+    #     
4933+    #      In terms of knowing where to put this value, it was always
4934+    #      possible to place it, but it makes sense semantically to
4935+    #      place it after the share hash tree, so that's why you do it
4936+    #      in this order.
4937+    #
4938+    #   6: With the root hash put, we can now sign the header. Use:
4939+    #
4940+    #        get_signable()
4941+    #
4942+    #      to get the part of the header that you want to sign, and use:
4943+    #       
4944+    #        put_signature(signature)
4945+    #
4946+    #      to write your signature to the remote server.
4947+    #
4948+    #   6: Add the verification key, and finish. Do:
4949+    #
4950+    #        put_verification_key(key)
4951+    #
4952+    #      and
4953+    #
4954+    #        finish_publish()
4955+    #
4956+    # Checkstring management:
4957+    #
4958+    # To write to a mutable slot, we have to provide test vectors to ensure
4959+    # that we are writing to the same data that we think we are. These
4960+    # vectors allow us to detect uncoordinated writes; that is, writes
4961+    # where both we and some other shareholder are writing to the
4962+    # mutable slot, and to report those back to the parts of the program
4963+    # doing the writing.
4964+    #
4965+    # With SDMF, this was easy -- all of the share data was written in
4966+    # one go, so it was easy to detect uncoordinated writes, and we only
4967+    # had to do it once. With MDMF, not all of the file is written at
4968+    # once.
4969+    #
4970+    # If a share is new, we write out as much of the header as we can
4971+    # before writing out anything else. This gives other writers a
4972+    # canary that they can use to detect uncoordinated writes, and, if
4973+    # they do the same thing, gives us the same canary. We them update
4974+    # the share. We won't be able to write out two fields of the header
4975+    # -- the share tree hash and the salt hash -- until we finish
4976+    # writing out the share. We only require the writer to provide the
4977+    # initial checkstring, and keep track of what it should be after
4978+    # updates ourselves.
4979+    #
4980+    # If we haven't written anything yet, then on the first write (which
4981+    # will probably be a block + salt of a share), we'll also write out
4982+    # the header. On subsequent passes, we'll expect to see the header.
4983+    # This changes in two places:
4984+    #
4985+    #   - When we write out the salt hash
4986+    #   - When we write out the root of the share hash tree
4987+    #
4988+    # since these values will change the header. It is possible that we
4989+    # can just make those be written in one operation to minimize
4990+    # disruption.
4991+    def __init__(self,
4992+                 shnum,
4993+                 rref, # a remote reference to a storage server
4994+                 storage_index,
4995+                 secrets, # (write_enabler, renew_secret, cancel_secret)
4996+                 seqnum, # the sequence number of the mutable file
4997+                 required_shares,
4998+                 total_shares,
4999+                 segment_size,
5000+                 data_length): # the length of the original file
5001+        self.shnum = shnum
5002+        self._rref = rref
5003+        self._storage_index = storage_index
5004+        self._seqnum = seqnum
5005+        self._required_shares = required_shares
5006+        assert self.shnum >= 0 and self.shnum < total_shares
5007+        self._total_shares = total_shares
5008+        # We build up the offset table as we write things. It is the
5009+        # last thing we write to the remote server.
5010+        self._offsets = {}
5011+        self._testvs = []
5012+        # This is a list of write vectors that will be sent to our
5013+        # remote server once we are directed to write things there.
5014+        self._writevs = []
5015+        self._secrets = secrets
5016+        # The segment size needs to be a multiple of the k parameter --
5017+        # any padding should have been carried out by the publisher
5018+        # already.
5019+        assert segment_size % required_shares == 0
5020+        self._segment_size = segment_size
5021+        self._data_length = data_length
5022+
5023+        # These are set later -- we define them here so that we can
5024+        # check for their existence easily
5025+
5026+        # This is the root of the share hash tree -- the Merkle tree
5027+        # over the roots of the block hash trees computed for shares in
5028+        # this upload.
5029+        self._root_hash = None
5030+
5031+        # We haven't yet written anything to the remote bucket. By
5032+        # setting this, we tell the _write method as much. The write
5033+        # method will then know that it also needs to add a write vector
5034+        # for the checkstring (or what we have of it) to the first write
5035+        # request. We'll then record that value for future use.  If
5036+        # we're expecting something to be there already, we need to call
5037+        # set_checkstring before we write anything to tell the first
5038+        # write about that.
5039+        self._written = False
5040+
5041+        # When writing data to the storage servers, we get a read vector
5042+        # for free. We'll read the checkstring, which will help us
5043+        # figure out what's gone wrong if a write fails.
5044+        self._readv = [(0, struct.calcsize(MDMFCHECKSTRING))]
5045+
5046+        # We calculate the number of segments because it tells us
5047+        # where the salt part of the file ends/share segment begins,
5048+        # and also because it provides a useful amount of bounds checking.
5049+        self._num_segments = mathutil.div_ceil(self._data_length,
5050+                                               self._segment_size)
5051+        self._block_size = self._segment_size / self._required_shares
5052+        # We also calculate the share size, to help us with block
5053+        # constraints later.
5054+        tail_size = self._data_length % self._segment_size
5055+        if not tail_size:
5056+            self._tail_block_size = self._block_size
5057+        else:
5058+            self._tail_block_size = mathutil.next_multiple(tail_size,
5059+                                                           self._required_shares)
5060+            self._tail_block_size /= self._required_shares
5061+
5062+        # We already know where the sharedata starts; right after the end
5063+        # of the header (which is defined as the signable part + the offsets)
5064+        # We can also calculate where the encrypted private key begins
5065+        # from what we know know.
5066+        self._actual_block_size = self._block_size + SALT_SIZE
5067+        data_size = self._actual_block_size * (self._num_segments - 1)
5068+        data_size += self._tail_block_size
5069+        data_size += SALT_SIZE
5070+        self._offsets['enc_privkey'] = MDMFHEADERSIZE
5071+        self._offsets['enc_privkey'] += data_size
5072+        # We'll wait for the rest. Callers can now call my "put_block" and
5073+        # "set_checkstring" methods.
5074+
5075+
5076+    def set_checkstring(self,
5077+                        seqnum_or_checkstring,
5078+                        root_hash=None,
5079+                        salt=None):
5080+        """
5081+        Set checkstring checkstring for the given shnum.
5082+
5083+        This can be invoked in one of two ways.
5084+
5085+        With one argument, I assume that you are giving me a literal
5086+        checkstring -- e.g., the output of get_checkstring. I will then
5087+        set that checkstring as it is. This form is used by unit tests.
5088+
5089+        With two arguments, I assume that you are giving me a sequence
5090+        number and root hash to make a checkstring from. In that case, I
5091+        will build a checkstring and set it for you. This form is used
5092+        by the publisher.
5093+
5094+        By default, I assume that I am writing new shares to the grid.
5095+        If you don't explcitly set your own checkstring, I will use
5096+        one that requires that the remote share not exist. You will want
5097+        to use this method if you are updating a share in-place;
5098+        otherwise, writes will fail.
5099+        """
5100+        # You're allowed to overwrite checkstrings with this method;
5101+        # I assume that users know what they are doing when they call
5102+        # it.
5103+        if root_hash:
5104+            checkstring = struct.pack(MDMFCHECKSTRING,
5105+                                      1,
5106+                                      seqnum_or_checkstring,
5107+                                      root_hash)
5108+        else:
5109+            checkstring = seqnum_or_checkstring
5110+
5111+        if checkstring == "":
5112+            # We special-case this, since len("") = 0, but we need
5113+            # length of 1 for the case of an empty share to work on the
5114+            # storage server, which is what a checkstring that is the
5115+            # empty string means.
5116+            self._testvs = []
5117+        else:
5118+            self._testvs = []
5119+            self._testvs.append((0, len(checkstring), "eq", checkstring))
5120+
5121+
5122+    def __repr__(self):
5123+        return "MDMFSlotWriteProxy for share %d" % self.shnum
5124+
5125+
5126+    def get_checkstring(self):
5127+        """
5128+        Given a share number, I return a representation of what the
5129+        checkstring for that share on the server will look like.
5130+
5131+        I am mostly used for tests.
5132+        """
5133+        if self._root_hash:
5134+            roothash = self._root_hash
5135+        else:
5136+            roothash = "\x00" * 32
5137+        return struct.pack(MDMFCHECKSTRING,
5138+                           1,
5139+                           self._seqnum,
5140+                           roothash)
5141+
5142+
5143+    def put_block(self, data, segnum, salt):
5144+        """
5145+        I queue a write vector for the data, salt, and segment number
5146+        provided to me. I return None, as I do not actually cause
5147+        anything to be written yet.
5148+        """
5149+        if segnum >= self._num_segments:
5150+            raise LayoutInvalid("I won't overwrite the private key")
5151+        if len(salt) != SALT_SIZE:
5152+            raise LayoutInvalid("I was given a salt of size %d, but "
5153+                                "I wanted a salt of size %d")
5154+        if segnum + 1 == self._num_segments:
5155+            if len(data) != self._tail_block_size:
5156+                raise LayoutInvalid("I was given the wrong size block to write")
5157+        elif len(data) != self._block_size:
5158+            raise LayoutInvalid("I was given the wrong size block to write")
5159+
5160+        # We want to write at len(MDMFHEADER) + segnum * block_size.
5161+
5162+        offset = MDMFHEADERSIZE + (self._actual_block_size * segnum)
5163+        data = salt + data
5164+
5165+        self._writevs.append(tuple([offset, data]))
5166+
5167+
5168+    def put_encprivkey(self, encprivkey):
5169+        """
5170+        I queue a write vector for the encrypted private key provided to
5171+        me.
5172+        """
5173+        assert self._offsets
5174+        assert self._offsets['enc_privkey']
5175+        # You shouldn't re-write the encprivkey after the block hash
5176+        # tree is written, since that could cause the private key to run
5177+        # into the block hash tree. Before it writes the block hash
5178+        # tree, the block hash tree writing method writes the offset of
5179+        # the salt hash tree. So that's a good indicator of whether or
5180+        # not the block hash tree has been written.
5181+        if "share_hash_chain" in self._offsets:
5182+            raise LayoutInvalid("You must write this before the block hash tree")
5183+
5184+        self._offsets['block_hash_tree'] = self._offsets['enc_privkey'] + \
5185+            len(encprivkey)
5186+        self._writevs.append(tuple([self._offsets['enc_privkey'], encprivkey]))
5187+
5188+
5189+    def put_blockhashes(self, blockhashes):
5190+        """
5191+        I queue a write vector to put the block hash tree in blockhashes
5192+        onto the remote server.
5193+
5194+        The encrypted private key must be queued before the block hash
5195+        tree, since we need to know how large it is to know where the
5196+        block hash tree should go. The block hash tree must be put
5197+        before the salt hash tree, since its size determines the
5198+        offset of the share hash chain.
5199+        """
5200+        assert self._offsets
5201+        assert isinstance(blockhashes, list)
5202+        if "block_hash_tree" not in self._offsets:
5203+            raise LayoutInvalid("You must put the encrypted private key "
5204+                                "before you put the block hash tree")
5205+        # If written, the share hash chain causes the signature offset
5206+        # to be defined.
5207+        if "signature" in self._offsets:
5208+            raise LayoutInvalid("You must put the block hash tree before "
5209+                                "you put the share hash chain")
5210+        blockhashes_s = "".join(blockhashes)
5211+        self._offsets['share_hash_chain'] = self._offsets['block_hash_tree'] + len(blockhashes_s)
5212+
5213+        self._writevs.append(tuple([self._offsets['block_hash_tree'],
5214+                                  blockhashes_s]))
5215+
5216+
5217+    def put_sharehashes(self, sharehashes):
5218+        """
5219+        I queue a write vector to put the share hash chain in my
5220+        argument onto the remote server.
5221+
5222+        The salt hash tree must be queued before the share hash chain,
5223+        since we need to know where the salt hash tree ends before we
5224+        can know where the share hash chain starts. The share hash chain
5225+        must be put before the signature, since the length of the packed
5226+        share hash chain determines the offset of the signature. Also,
5227+        semantically, you must know what the root of the salt hash tree
5228+        is before you can generate a valid signature.
5229+        """
5230+        assert isinstance(sharehashes, dict)
5231+        if "share_hash_chain" not in self._offsets:
5232+            raise LayoutInvalid("You need to put the salt hash tree before "
5233+                                "you can put the share hash chain")
5234+        # The signature comes after the share hash chain. If the
5235+        # signature has already been written, we must not write another
5236+        # share hash chain. The signature writes the verification key
5237+        # offset when it gets sent to the remote server, so we look for
5238+        # that.
5239+        if "verification_key" in self._offsets:
5240+            raise LayoutInvalid("You must write the share hash chain "
5241+                                "before you write the signature")
5242+        sharehashes_s = "".join([struct.pack(">H32s", i, sharehashes[i])
5243+                                  for i in sorted(sharehashes.keys())])
5244+        self._offsets['signature'] = self._offsets['share_hash_chain'] + len(sharehashes_s)
5245+        self._writevs.append(tuple([self._offsets['share_hash_chain'],
5246+                            sharehashes_s]))
5247+
5248+
5249+    def put_root_hash(self, roothash):
5250+        """
5251+        Put the root hash (the root of the share hash tree) in the
5252+        remote slot.
5253+        """
5254+        # It does not make sense to be able to put the root
5255+        # hash without first putting the share hashes, since you need
5256+        # the share hashes to generate the root hash.
5257+        #
5258+        # Signature is defined by the routine that places the share hash
5259+        # chain, so it's a good thing to look for in finding out whether
5260+        # or not the share hash chain exists on the remote server.
5261+        if "signature" not in self._offsets:
5262+            raise LayoutInvalid("You need to put the share hash chain "
5263+                                "before you can put the root share hash")
5264+        if len(roothash) != HASH_SIZE:
5265+            raise LayoutInvalid("hashes and salts must be exactly %d bytes"
5266+                                 % HASH_SIZE)
5267+        self._root_hash = roothash
5268+        # To write both of these values, we update the checkstring on
5269+        # the remote server, which includes them
5270+        checkstring = self.get_checkstring()
5271+        self._writevs.append(tuple([0, checkstring]))
5272+        # This write, if successful, changes the checkstring, so we need
5273+        # to update our internal checkstring to be consistent with the
5274+        # one on the server.
5275+
5276+
5277+    def get_signable(self):
5278+        """
5279+        Get the first seven fields of the mutable file; the parts that
5280+        are signed.
5281+        """
5282+        if not self._root_hash:
5283+            raise LayoutInvalid("You need to set the root hash "
5284+                                "before getting something to "
5285+                                "sign")
5286+        return struct.pack(MDMFSIGNABLEHEADER,
5287+                           1,
5288+                           self._seqnum,
5289+                           self._root_hash,
5290+                           self._required_shares,
5291+                           self._total_shares,
5292+                           self._segment_size,
5293+                           self._data_length)
5294+
5295+
5296+    def put_signature(self, signature):
5297+        """
5298+        I queue a write vector for the signature of the MDMF share.
5299+
5300+        I require that the root hash and share hash chain have been put
5301+        to the grid before I will write the signature to the grid.
5302+        """
5303+        if "signature" not in self._offsets:
5304+            raise LayoutInvalid("You must put the share hash chain "
5305+        # It does not make sense to put a signature without first
5306+        # putting the root hash and the salt hash (since otherwise
5307+        # the signature would be incomplete), so we don't allow that.
5308+                       "before putting the signature")
5309+        if not self._root_hash:
5310+            raise LayoutInvalid("You must complete the signed prefix "
5311+                                "before computing a signature")
5312+        # If we put the signature after we put the verification key, we
5313+        # could end up running into the verification key, and will
5314+        # probably screw up the offsets as well. So we don't allow that.
5315+        # The method that writes the verification key defines the EOF
5316+        # offset before writing the verification key, so look for that.
5317+        if "EOF" in self._offsets:
5318+            raise LayoutInvalid("You must write the signature before the verification key")
5319+
5320+        self._offsets['verification_key'] = self._offsets['signature'] + len(signature)
5321+        self._writevs.append(tuple([self._offsets['signature'], signature]))
5322+
5323+
5324+    def put_verification_key(self, verification_key):
5325+        """
5326+        I queue a write vector for the verification key.
5327+
5328+        I require that the signature have been written to the storage
5329+        server before I allow the verification key to be written to the
5330+        remote server.
5331+        """
5332+        if "verification_key" not in self._offsets:
5333+            raise LayoutInvalid("You must put the signature before you "
5334+                                "can put the verification key")
5335+        self._offsets['EOF'] = self._offsets['verification_key'] + len(verification_key)
5336+        self._writevs.append(tuple([self._offsets['verification_key'],
5337+                            verification_key]))
5338+
5339+
5340+    def _get_offsets_tuple(self):
5341+        return tuple([(key, value) for key, value in self._offsets.items()])
5342+
5343+
5344+    def get_verinfo(self):
5345+        return (self._seqnum,
5346+                self._root_hash,
5347+                self._required_shares,
5348+                self._total_shares,
5349+                self._segment_size,
5350+                self._data_length,
5351+                self.get_signable(),
5352+                self._get_offsets_tuple())
5353+
5354+
5355+    def finish_publishing(self):
5356+        """
5357+        I add a write vector for the offsets table, and then cause all
5358+        of the write vectors that I've dealt with so far to be published
5359+        to the remote server, ending the write process.
5360+        """
5361+        if "EOF" not in self._offsets:
5362+            raise LayoutInvalid("You must put the verification key before "
5363+                                "you can publish the offsets")
5364+        offsets_offset = struct.calcsize(MDMFHEADERWITHOUTOFFSETS)
5365+        offsets = struct.pack(MDMFOFFSETS,
5366+                              self._offsets['enc_privkey'],
5367+                              self._offsets['block_hash_tree'],
5368+                              self._offsets['share_hash_chain'],
5369+                              self._offsets['signature'],
5370+                              self._offsets['verification_key'],
5371+                              self._offsets['EOF'])
5372+        self._writevs.append(tuple([offsets_offset, offsets]))
5373+        encoding_parameters_offset = struct.calcsize(MDMFCHECKSTRING)
5374+        params = struct.pack(">BBQQ",
5375+                             self._required_shares,
5376+                             self._total_shares,
5377+                             self._segment_size,
5378+                             self._data_length)
5379+        self._writevs.append(tuple([encoding_parameters_offset, params]))
5380+        return self._write(self._writevs)
5381+
5382+
5383+    def _write(self, datavs, on_failure=None, on_success=None):
5384+        """I write the data vectors in datavs to the remote slot."""
5385+        tw_vectors = {}
5386+        if not self._testvs:
5387+            self._testvs = []
5388+            self._testvs.append(tuple([0, 1, "eq", ""]))
5389+        if not self._written:
5390+            # Write a new checkstring to the share when we write it, so
5391+            # that we have something to check later.
5392+            new_checkstring = self.get_checkstring()
5393+            datavs.append((0, new_checkstring))
5394+            def _first_write():
5395+                self._written = True
5396+                self._testvs = [(0, len(new_checkstring), "eq", new_checkstring)]
5397+            on_success = _first_write
5398+        tw_vectors[self.shnum] = (self._testvs, datavs, None)
5399+        d = self._rref.callRemote("slot_testv_and_readv_and_writev",
5400+                                  self._storage_index,
5401+                                  self._secrets,
5402+                                  tw_vectors,
5403+                                  self._readv)
5404+        def _result(results):
5405+            if isinstance(results, failure.Failure) or not results[0]:
5406+                # Do nothing; the write was unsuccessful.
5407+                if on_failure: on_failure()
5408+            else:
5409+                if on_success: on_success()
5410+            return results
5411+        d.addCallback(_result)
5412+        return d
5413+
5414+
5415+class MDMFSlotReadProxy:
5416+    """
5417+    I read from a mutable slot filled with data written in the MDMF data
5418+    format (which is described above).
5419+
5420+    I can be initialized with some amount of data, which I will use (if
5421+    it is valid) to eliminate some of the need to fetch it from servers.
5422+    """
5423+    def __init__(self,
5424+                 rref,
5425+                 storage_index,
5426+                 shnum,
5427+                 data=""):
5428+        # Start the initialization process.
5429+        self._rref = rref
5430+        self._storage_index = storage_index
5431+        self.shnum = shnum
5432+
5433+        # Before doing anything, the reader is probably going to want to
5434+        # verify that the signature is correct. To do that, they'll need
5435+        # the verification key, and the signature. To get those, we'll
5436+        # need the offset table. So fetch the offset table on the
5437+        # assumption that that will be the first thing that a reader is
5438+        # going to do.
5439+
5440+        # The fact that these encoding parameters are None tells us
5441+        # that we haven't yet fetched them from the remote share, so we
5442+        # should. We could just not set them, but the checks will be
5443+        # easier to read if we don't have to use hasattr.
5444+        self._version_number = None
5445+        self._sequence_number = None
5446+        self._root_hash = None
5447+        # Filled in if we're dealing with an SDMF file. Unused
5448+        # otherwise.
5449+        self._salt = None
5450+        self._required_shares = None
5451+        self._total_shares = None
5452+        self._segment_size = None
5453+        self._data_length = None
5454+        self._offsets = None
5455+
5456+        # If the user has chosen to initialize us with some data, we'll
5457+        # try to satisfy subsequent data requests with that data before
5458+        # asking the storage server for it. If
5459+        self._data = data
5460+        # The way callers interact with cache in the filenode returns
5461+        # None if there isn't any cached data, but the way we index the
5462+        # cached data requires a string, so convert None to "".
5463+        if self._data == None:
5464+            self._data = ""
5465+
5466+        self._queue_observers = observer.ObserverList()
5467+        self._queue_errbacks = observer.ObserverList()
5468+        self._readvs = []
5469+
5470+
5471+    def _maybe_fetch_offsets_and_header(self, force_remote=False):
5472+        """
5473+        I fetch the offset table and the header from the remote slot if
5474+        I don't already have them. If I do have them, I do nothing and
5475+        return an empty Deferred.
5476+        """
5477+        if self._offsets:
5478+            return defer.succeed(None)
5479+        # At this point, we may be either SDMF or MDMF. Fetching 107
5480+        # bytes will be enough to get header and offsets for both SDMF and
5481+        # MDMF, though we'll be left with 4 more bytes than we
5482+        # need if this ends up being MDMF. This is probably less
5483+        # expensive than the cost of a second roundtrip.
5484+        readvs = [(0, 107)]
5485+        d = self._read(readvs, force_remote)
5486+        d.addCallback(self._process_encoding_parameters)
5487+        d.addCallback(self._process_offsets)
5488+        return d
5489+
5490+
5491+    def _process_encoding_parameters(self, encoding_parameters):
5492+        assert self.shnum in encoding_parameters
5493+        encoding_parameters = encoding_parameters[self.shnum][0]
5494+        # The first byte is the version number. It will tell us what
5495+        # to do next.
5496+        (verno,) = struct.unpack(">B", encoding_parameters[:1])
5497+        if verno == MDMF_VERSION:
5498+            read_size = MDMFHEADERWITHOUTOFFSETSSIZE
5499+            (verno,
5500+             seqnum,
5501+             root_hash,
5502+             k,
5503+             n,
5504+             segsize,
5505+             datalen) = struct.unpack(MDMFHEADERWITHOUTOFFSETS,
5506+                                      encoding_parameters[:read_size])
5507+            if segsize == 0 and datalen == 0:
5508+                # Empty file, no segments.
5509+                self._num_segments = 0
5510+            else:
5511+                self._num_segments = mathutil.div_ceil(datalen, segsize)
5512+
5513+        elif verno == SDMF_VERSION:
5514+            read_size = SIGNED_PREFIX_LENGTH
5515+            (verno,
5516+             seqnum,
5517+             root_hash,
5518+             salt,
5519+             k,
5520+             n,
5521+             segsize,
5522+             datalen) = struct.unpack(">BQ32s16s BBQQ",
5523+                                encoding_parameters[:SIGNED_PREFIX_LENGTH])
5524+            self._salt = salt
5525+            if segsize == 0 and datalen == 0:
5526+                # empty file
5527+                self._num_segments = 0
5528+            else:
5529+                # non-empty SDMF files have one segment.
5530+                self._num_segments = 1
5531+        else:
5532+            raise UnknownVersionError("You asked me to read mutable file "
5533+                                      "version %d, but I only understand "
5534+                                      "%d and %d" % (verno, SDMF_VERSION,
5535+                                                     MDMF_VERSION))
5536+
5537+        self._version_number = verno
5538+        self._sequence_number = seqnum
5539+        self._root_hash = root_hash
5540+        self._required_shares = k
5541+        self._total_shares = n
5542+        self._segment_size = segsize
5543+        self._data_length = datalen
5544+
5545+        self._block_size = self._segment_size / self._required_shares
5546+        # We can upload empty files, and need to account for this fact
5547+        # so as to avoid zero-division and zero-modulo errors.
5548+        if datalen > 0:
5549+            tail_size = self._data_length % self._segment_size
5550+        else:
5551+            tail_size = 0
5552+        if not tail_size:
5553+            self._tail_block_size = self._block_size
5554+        else:
5555+            self._tail_block_size = mathutil.next_multiple(tail_size,
5556+                                                    self._required_shares)
5557+            self._tail_block_size /= self._required_shares
5558+
5559+        return encoding_parameters
5560+
5561+
5562+    def _process_offsets(self, offsets):
5563+        if self._version_number == 0:
5564+            read_size = OFFSETS_LENGTH
5565+            read_offset = SIGNED_PREFIX_LENGTH
5566+            end = read_size + read_offset
5567+            (signature,
5568+             share_hash_chain,
5569+             block_hash_tree,
5570+             share_data,
5571+             enc_privkey,
5572+             EOF) = struct.unpack(">LLLLQQ",
5573+                                  offsets[read_offset:end])
5574+            self._offsets = {}
5575+            self._offsets['signature'] = signature
5576+            self._offsets['share_data'] = share_data
5577+            self._offsets['block_hash_tree'] = block_hash_tree
5578+            self._offsets['share_hash_chain'] = share_hash_chain
5579+            self._offsets['enc_privkey'] = enc_privkey
5580+            self._offsets['EOF'] = EOF
5581+
5582+        elif self._version_number == 1:
5583+            read_offset = MDMFHEADERWITHOUTOFFSETSSIZE
5584+            read_length = MDMFOFFSETS_LENGTH
5585+            end = read_offset + read_length
5586+            (encprivkey,
5587+             blockhashes,
5588+             sharehashes,
5589+             signature,
5590+             verification_key,
5591+             eof) = struct.unpack(MDMFOFFSETS,
5592+                                  offsets[read_offset:end])
5593+            self._offsets = {}
5594+            self._offsets['enc_privkey'] = encprivkey
5595+            self._offsets['block_hash_tree'] = blockhashes
5596+            self._offsets['share_hash_chain'] = sharehashes
5597+            self._offsets['signature'] = signature
5598+            self._offsets['verification_key'] = verification_key
5599+            self._offsets['EOF'] = eof
5600+
5601+
5602+    def get_block_and_salt(self, segnum, queue=False):
5603+        """
5604+        I return (block, salt), where block is the block data and
5605+        salt is the salt used to encrypt that segment.
5606+        """
5607+        d = self._maybe_fetch_offsets_and_header()
5608+        def _then(ignored):
5609+            if self._version_number == 1:
5610+                base_share_offset = MDMFHEADERSIZE
5611+            else:
5612+                base_share_offset = self._offsets['share_data']
5613+
5614+            if segnum + 1 > self._num_segments:
5615+                raise LayoutInvalid("Not a valid segment number")
5616+
5617+            if self._version_number == 0:
5618+                share_offset = base_share_offset + self._block_size * segnum
5619+            else:
5620+                share_offset = base_share_offset + (self._block_size + \
5621+                                                    SALT_SIZE) * segnum
5622+            if segnum + 1 == self._num_segments:
5623+                data = self._tail_block_size
5624+            else:
5625+                data = self._block_size
5626+
5627+            if self._version_number == 1:
5628+                data += SALT_SIZE
5629+
5630+            readvs = [(share_offset, data)]
5631+            return readvs
5632+        d.addCallback(_then)
5633+        d.addCallback(lambda readvs:
5634+            self._read(readvs, queue=queue))
5635+        def _process_results(results):
5636+            assert self.shnum in results
5637+            if self._version_number == 0:
5638+                # We only read the share data, but we know the salt from
5639+                # when we fetched the header
5640+                data = results[self.shnum]
5641+                if not data:
5642+                    data = ""
5643+                else:
5644+                    assert len(data) == 1
5645+                    data = data[0]
5646+                salt = self._salt
5647+            else:
5648+                data = results[self.shnum]
5649+                if not data:
5650+                    salt = data = ""
5651+                else:
5652+                    salt_and_data = results[self.shnum][0]
5653+                    salt = salt_and_data[:SALT_SIZE]
5654+                    data = salt_and_data[SALT_SIZE:]
5655+            return data, salt
5656+        d.addCallback(_process_results)
5657+        return d
5658+
5659+
5660+    def get_blockhashes(self, needed=None, queue=False, force_remote=False):
5661+        """
5662+        I return the block hash tree
5663+
5664+        I take an optional argument, needed, which is a set of indices
5665+        correspond to hashes that I should fetch. If this argument is
5666+        missing, I will fetch the entire block hash tree; otherwise, I
5667+        may attempt to fetch fewer hashes, based on what needed says
5668+        that I should do. Note that I may fetch as many hashes as I
5669+        want, so long as the set of hashes that I do fetch is a superset
5670+        of the ones that I am asked for, so callers should be prepared
5671+        to tolerate additional hashes.
5672+        """
5673+        # TODO: Return only the parts of the block hash tree necessary
5674+        # to validate the blocknum provided?
5675+        # This is a good idea, but it is hard to implement correctly. It
5676+        # is bad to fetch any one block hash more than once, so we
5677+        # probably just want to fetch the whole thing at once and then
5678+        # serve it.
5679+        if needed == set([]):
5680+            return defer.succeed([])
5681+        d = self._maybe_fetch_offsets_and_header()
5682+        def _then(ignored):
5683+            blockhashes_offset = self._offsets['block_hash_tree']
5684+            if self._version_number == 1:
5685+                blockhashes_length = self._offsets['share_hash_chain'] - blockhashes_offset
5686+            else:
5687+                blockhashes_length = self._offsets['share_data'] - blockhashes_offset
5688+            readvs = [(blockhashes_offset, blockhashes_length)]
5689+            return readvs
5690+        d.addCallback(_then)
5691+        d.addCallback(lambda readvs:
5692+            self._read(readvs, queue=queue, force_remote=force_remote))
5693+        def _build_block_hash_tree(results):
5694+            assert self.shnum in results
5695+
5696+            rawhashes = results[self.shnum][0]
5697+            results = [rawhashes[i:i+HASH_SIZE]
5698+                       for i in range(0, len(rawhashes), HASH_SIZE)]
5699+            return results
5700+        d.addCallback(_build_block_hash_tree)
5701+        return d
5702+
5703+
5704+    def get_sharehashes(self, needed=None, queue=False, force_remote=False):
5705+        """
5706+        I return the part of the share hash chain placed to validate
5707+        this share.
5708+
5709+        I take an optional argument, needed. Needed is a set of indices
5710+        that correspond to the hashes that I should fetch. If needed is
5711+        not present, I will fetch and return the entire share hash
5712+        chain. Otherwise, I may fetch and return any part of the share
5713+        hash chain that is a superset of the part that I am asked to
5714+        fetch. Callers should be prepared to deal with more hashes than
5715+        they've asked for.
5716+        """
5717+        if needed == set([]):
5718+            return defer.succeed([])
5719+        d = self._maybe_fetch_offsets_and_header()
5720+
5721+        def _make_readvs(ignored):
5722+            sharehashes_offset = self._offsets['share_hash_chain']
5723+            if self._version_number == 0:
5724+                sharehashes_length = self._offsets['block_hash_tree'] - sharehashes_offset
5725+            else:
5726+                sharehashes_length = self._offsets['signature'] - sharehashes_offset
5727+            readvs = [(sharehashes_offset, sharehashes_length)]
5728+            return readvs
5729+        d.addCallback(_make_readvs)
5730+        d.addCallback(lambda readvs:
5731+            self._read(readvs, queue=queue, force_remote=force_remote))
5732+        def _build_share_hash_chain(results):
5733+            assert self.shnum in results
5734+
5735+            sharehashes = results[self.shnum][0]
5736+            results = [sharehashes[i:i+(HASH_SIZE + 2)]
5737+                       for i in range(0, len(sharehashes), HASH_SIZE + 2)]
5738+            results = dict([struct.unpack(">H32s", data)
5739+                            for data in results])
5740+            return results
5741+        d.addCallback(_build_share_hash_chain)
5742+        return d
5743+
5744+
5745+    def get_encprivkey(self, queue=False):
5746+        """
5747+        I return the encrypted private key.
5748+        """
5749+        d = self._maybe_fetch_offsets_and_header()
5750+
5751+        def _make_readvs(ignored):
5752+            privkey_offset = self._offsets['enc_privkey']
5753+            if self._version_number == 0:
5754+                privkey_length = self._offsets['EOF'] - privkey_offset
5755+            else:
5756+                privkey_length = self._offsets['block_hash_tree'] - privkey_offset
5757+            readvs = [(privkey_offset, privkey_length)]
5758+            return readvs
5759+        d.addCallback(_make_readvs)
5760+        d.addCallback(lambda readvs:
5761+            self._read(readvs, queue=queue))
5762+        def _process_results(results):
5763+            assert self.shnum in results
5764+            privkey = results[self.shnum][0]
5765+            return privkey
5766+        d.addCallback(_process_results)
5767+        return d
5768+
5769+
5770+    def get_signature(self, queue=False):
5771+        """
5772+        I return the signature of my share.
5773+        """
5774+        d = self._maybe_fetch_offsets_and_header()
5775+
5776+        def _make_readvs(ignored):
5777+            signature_offset = self._offsets['signature']
5778+            if self._version_number == 1:
5779+                signature_length = self._offsets['verification_key'] - signature_offset
5780+            else:
5781+                signature_length = self._offsets['share_hash_chain'] - signature_offset
5782+            readvs = [(signature_offset, signature_length)]
5783+            return readvs
5784+        d.addCallback(_make_readvs)
5785+        d.addCallback(lambda readvs:
5786+            self._read(readvs, queue=queue))
5787+        def _process_results(results):
5788+            assert self.shnum in results
5789+            signature = results[self.shnum][0]
5790+            return signature
5791+        d.addCallback(_process_results)
5792+        return d
5793+
5794+
5795+    def get_verification_key(self, queue=False):
5796+        """
5797+        I return the verification key.
5798+        """
5799+        d = self._maybe_fetch_offsets_and_header()
5800+
5801+        def _make_readvs(ignored):
5802+            if self._version_number == 1:
5803+                vk_offset = self._offsets['verification_key']
5804+                vk_length = self._offsets['EOF'] - vk_offset
5805+            else:
5806+                vk_offset = struct.calcsize(">BQ32s16sBBQQLLLLQQ")
5807+                vk_length = self._offsets['signature'] - vk_offset
5808+            readvs = [(vk_offset, vk_length)]
5809+            return readvs
5810+        d.addCallback(_make_readvs)
5811+        d.addCallback(lambda readvs:
5812+            self._read(readvs, queue=queue))
5813+        def _process_results(results):
5814+            assert self.shnum in results
5815+            verification_key = results[self.shnum][0]
5816+            return verification_key
5817+        d.addCallback(_process_results)
5818+        return d
5819+
5820+
5821+    def get_encoding_parameters(self):
5822+        """
5823+        I return (k, n, segsize, datalen)
5824+        """
5825+        d = self._maybe_fetch_offsets_and_header()
5826+        d.addCallback(lambda ignored:
5827+            (self._required_shares,
5828+             self._total_shares,
5829+             self._segment_size,
5830+             self._data_length))
5831+        return d
5832+
5833+
5834+    def get_seqnum(self):
5835+        """
5836+        I return the sequence number for this share.
5837+        """
5838+        d = self._maybe_fetch_offsets_and_header()
5839+        d.addCallback(lambda ignored:
5840+            self._sequence_number)
5841+        return d
5842+
5843+
5844+    def get_root_hash(self):
5845+        """
5846+        I return the root of the block hash tree
5847+        """
5848+        d = self._maybe_fetch_offsets_and_header()
5849+        d.addCallback(lambda ignored: self._root_hash)
5850+        return d
5851+
5852+
5853+    def get_checkstring(self):
5854+        """
5855+        I return the packed representation of the following:
5856+
5857+            - version number
5858+            - sequence number
5859+            - root hash
5860+            - salt hash
5861+
5862+        which my users use as a checkstring to detect other writers.
5863+        """
5864+        d = self._maybe_fetch_offsets_and_header()
5865+        def _build_checkstring(ignored):
5866+            if self._salt:
5867+                checkstring = struct.pack(PREFIX,
5868+                                          self._version_number,
5869+                                          self._sequence_number,
5870+                                          self._root_hash,
5871+                                          self._salt)
5872+            else:
5873+                checkstring = struct.pack(MDMFCHECKSTRING,
5874+                                          self._version_number,
5875+                                          self._sequence_number,
5876+                                          self._root_hash)
5877+
5878+            return checkstring
5879+        d.addCallback(_build_checkstring)
5880+        return d
5881+
5882+
5883+    def get_prefix(self, force_remote):
5884+        d = self._maybe_fetch_offsets_and_header(force_remote)
5885+        d.addCallback(lambda ignored:
5886+            self._build_prefix())
5887+        return d
5888+
5889+
5890+    def _build_prefix(self):
5891+        # The prefix is another name for the part of the remote share
5892+        # that gets signed. It consists of everything up to and
5893+        # including the datalength, packed by struct.
5894+        if self._version_number == SDMF_VERSION:
5895+            return struct.pack(SIGNED_PREFIX,
5896+                           self._version_number,
5897+                           self._sequence_number,
5898+                           self._root_hash,
5899+                           self._salt,
5900+                           self._required_shares,
5901+                           self._total_shares,
5902+                           self._segment_size,
5903+                           self._data_length)
5904+
5905+        else:
5906+            return struct.pack(MDMFSIGNABLEHEADER,
5907+                           self._version_number,
5908+                           self._sequence_number,
5909+                           self._root_hash,
5910+                           self._required_shares,
5911+                           self._total_shares,
5912+                           self._segment_size,
5913+                           self._data_length)
5914+
5915+
5916+    def _get_offsets_tuple(self):
5917+        # The offsets tuple is another component of the version
5918+        # information tuple. It is basically our offsets dictionary,
5919+        # itemized and in a tuple.
5920+        return self._offsets.copy()
5921+
5922+
5923+    def get_verinfo(self):
5924+        """
5925+        I return my verinfo tuple. This is used by the ServermapUpdater
5926+        to keep track of versions of mutable files.
5927+
5928+        The verinfo tuple for MDMF files contains:
5929+            - seqnum
5930+            - root hash
5931+            - a blank (nothing)
5932+            - segsize
5933+            - datalen
5934+            - k
5935+            - n
5936+            - prefix (the thing that you sign)
5937+            - a tuple of offsets
5938+
5939+        We include the nonce in MDMF to simplify processing of version
5940+        information tuples.
5941+
5942+        The verinfo tuple for SDMF files is the same, but contains a
5943+        16-byte IV instead of a hash of salts.
5944+        """
5945+        d = self._maybe_fetch_offsets_and_header()
5946+        def _build_verinfo(ignored):
5947+            if self._version_number == SDMF_VERSION:
5948+                salt_to_use = self._salt
5949+            else:
5950+                salt_to_use = None
5951+            return (self._sequence_number,
5952+                    self._root_hash,
5953+                    salt_to_use,
5954+                    self._segment_size,
5955+                    self._data_length,
5956+                    self._required_shares,
5957+                    self._total_shares,
5958+                    self._build_prefix(),
5959+                    self._get_offsets_tuple())
5960+        d.addCallback(_build_verinfo)
5961+        return d
5962+
5963+
5964+    def flush(self):
5965+        """
5966+        I flush my queue of read vectors.
5967+        """
5968+        d = self._read(self._readvs)
5969+        def _then(results):
5970+            self._readvs = []
5971+            if isinstance(results, failure.Failure):
5972+                self._queue_errbacks.notify(results)
5973+            else:
5974+                self._queue_observers.notify(results)
5975+            self._queue_observers = observer.ObserverList()
5976+            self._queue_errbacks = observer.ObserverList()
5977+        d.addBoth(_then)
5978+
5979+
5980+    def _read(self, readvs, force_remote=False, queue=False):
5981+        unsatisfiable = filter(lambda x: x[0] + x[1] > len(self._data), readvs)
5982+        # TODO: It's entirely possible to tweak this so that it just
5983+        # fulfills the requests that it can, and not demand that all
5984+        # requests are satisfiable before running it.
5985+        if not unsatisfiable and not force_remote:
5986+            results = [self._data[offset:offset+length]
5987+                       for (offset, length) in readvs]
5988+            results = {self.shnum: results}
5989+            return defer.succeed(results)
5990+        else:
5991+            if queue:
5992+                start = len(self._readvs)
5993+                self._readvs += readvs
5994+                end = len(self._readvs)
5995+                def _get_results(results, start, end):
5996+                    if not self.shnum in results:
5997+                        return {self._shnum: [""]}
5998+                    return {self.shnum: results[self.shnum][start:end]}
5999+                d = defer.Deferred()
6000+                d.addCallback(_get_results, start, end)
6001+                self._queue_observers.subscribe(d.callback)
6002+                self._queue_errbacks.subscribe(d.errback)
6003+                return d
6004+            return self._rref.callRemote("slot_readv",
6005+                                         self._storage_index,
6006+                                         [self.shnum],
6007+                                         readvs)
6008+
6009+
6010+    def is_sdmf(self):
6011+        """I tell my caller whether or not my remote file is SDMF or MDMF
6012+        """
6013+        d = self._maybe_fetch_offsets_and_header()
6014+        d.addCallback(lambda ignored:
6015+            self._version_number == 0)
6016+        return d
6017+
6018+
6019+class LayoutInvalid(Exception):
6020+    """
6021+    This isn't a valid MDMF mutable file
6022+    """
6023merger 0.0 (
6024hunk ./src/allmydata/test/test_storage.py 3
6025-from allmydata.util import log
6026-
6027merger 0.0 (
6028hunk ./src/allmydata/test/test_storage.py 3
6029-import time, os.path, stat, re, simplejson, struct
6030+from allmydata.util import log
6031+
6032+import mock
6033hunk ./src/allmydata/test/test_storage.py 3
6034-import time, os.path, stat, re, simplejson, struct
6035+import time, os.path, stat, re, simplejson, struct, shutil
6036)
6037)
6038hunk ./src/allmydata/test/test_storage.py 23
6039 from allmydata.storage.expirer import LeaseCheckingCrawler
6040 from allmydata.immutable.layout import WriteBucketProxy, WriteBucketProxy_v2, \
6041      ReadBucketProxy
6042-from allmydata.interfaces import BadWriteEnablerError
6043-from allmydata.test.common import LoggingServiceParent
6044+from allmydata.mutable.layout import MDMFSlotWriteProxy, MDMFSlotReadProxy, \
6045+                                     LayoutInvalid, MDMFSIGNABLEHEADER, \
6046+                                     SIGNED_PREFIX, MDMFHEADER, \
6047+                                     MDMFOFFSETS, SDMFSlotWriteProxy
6048+from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \
6049+                                 SDMF_VERSION
6050+from allmydata.test.common import LoggingServiceParent, ShouldFailMixin
6051 from allmydata.test.common_web import WebRenderingMixin
6052 from allmydata.web.storage import StorageStatus, remove_prefix
6053 
6054hunk ./src/allmydata/test/test_storage.py 107
6055 
6056 class RemoteBucket:
6057 
6058+    def __init__(self):
6059+        self.read_count = 0
6060+        self.write_count = 0
6061+
6062     def callRemote(self, methname, *args, **kwargs):
6063         def _call():
6064             meth = getattr(self.target, "remote_" + methname)
6065hunk ./src/allmydata/test/test_storage.py 115
6066             return meth(*args, **kwargs)
6067+
6068+        if methname == "slot_readv":
6069+            self.read_count += 1
6070+        if "writev" in methname:
6071+            self.write_count += 1
6072+
6073         return defer.maybeDeferred(_call)
6074 
6075hunk ./src/allmydata/test/test_storage.py 123
6076+
6077 class BucketProxy(unittest.TestCase):
6078     def make_bucket(self, name, size):
6079         basedir = os.path.join("storage", "BucketProxy", name)
6080hunk ./src/allmydata/test/test_storage.py 1306
6081         self.failUnless(os.path.exists(prefixdir), prefixdir)
6082         self.failIf(os.path.exists(bucketdir), bucketdir)
6083 
6084+
6085+class MDMFProxies(unittest.TestCase, ShouldFailMixin):
6086+    def setUp(self):
6087+        self.sparent = LoggingServiceParent()
6088+        self._lease_secret = itertools.count()
6089+        self.ss = self.create("MDMFProxies storage test server")
6090+        self.rref = RemoteBucket()
6091+        self.rref.target = self.ss
6092+        self.secrets = (self.write_enabler("we_secret"),
6093+                        self.renew_secret("renew_secret"),
6094+                        self.cancel_secret("cancel_secret"))
6095+        self.segment = "aaaaaa"
6096+        self.block = "aa"
6097+        self.salt = "a" * 16
6098+        self.block_hash = "a" * 32
6099+        self.block_hash_tree = [self.block_hash for i in xrange(6)]
6100+        self.share_hash = self.block_hash
6101+        self.share_hash_chain = dict([(i, self.share_hash) for i in xrange(6)])
6102+        self.signature = "foobarbaz"
6103+        self.verification_key = "vvvvvv"
6104+        self.encprivkey = "private"
6105+        self.root_hash = self.block_hash
6106+        self.salt_hash = self.root_hash
6107+        self.salt_hash_tree = [self.salt_hash for i in xrange(6)]
6108+        self.block_hash_tree_s = self.serialize_blockhashes(self.block_hash_tree)
6109+        self.share_hash_chain_s = self.serialize_sharehashes(self.share_hash_chain)
6110+        # blockhashes and salt hashes are serialized in the same way,
6111+        # only we lop off the first element and store that in the
6112+        # header.
6113+        self.salt_hash_tree_s = self.serialize_blockhashes(self.salt_hash_tree[1:])
6114+
6115+
6116+    def tearDown(self):
6117+        self.sparent.stopService()
6118+        shutil.rmtree(self.workdir("MDMFProxies storage test server"))
6119+
6120+
6121+    def write_enabler(self, we_tag):
6122+        return hashutil.tagged_hash("we_blah", we_tag)
6123+
6124+
6125+    def renew_secret(self, tag):
6126+        return hashutil.tagged_hash("renew_blah", str(tag))
6127+
6128+
6129+    def cancel_secret(self, tag):
6130+        return hashutil.tagged_hash("cancel_blah", str(tag))
6131+
6132+
6133+    def workdir(self, name):
6134+        basedir = os.path.join("storage", "MutableServer", name)
6135+        return basedir
6136+
6137+
6138+    def create(self, name):
6139+        workdir = self.workdir(name)
6140+        ss = StorageServer(workdir, "\x00" * 20)
6141+        ss.setServiceParent(self.sparent)
6142+        return ss
6143+
6144+
6145+    def build_test_mdmf_share(self, tail_segment=False, empty=False):
6146+        # Start with the checkstring
6147+        data = struct.pack(">BQ32s",
6148+                           1,
6149+                           0,
6150+                           self.root_hash)
6151+        self.checkstring = data
6152+        # Next, the encoding parameters
6153+        if tail_segment:
6154+            data += struct.pack(">BBQQ",
6155+                                3,
6156+                                10,
6157+                                6,
6158+                                33)
6159+        elif empty:
6160+            data += struct.pack(">BBQQ",
6161+                                3,
6162+                                10,
6163+                                0,
6164+                                0)
6165+        else:
6166+            data += struct.pack(">BBQQ",
6167+                                3,
6168+                                10,
6169+                                6,
6170+                                36)
6171+        # Now we'll build the offsets.
6172+        sharedata = ""
6173+        if not tail_segment and not empty:
6174+            for i in xrange(6):
6175+                sharedata += self.salt + self.block
6176+        elif tail_segment:
6177+            for i in xrange(5):
6178+                sharedata += self.salt + self.block
6179+            sharedata += self.salt + "a"
6180+
6181+        # The encrypted private key comes after the shares + salts
6182+        offset_size = struct.calcsize(MDMFOFFSETS)
6183+        encrypted_private_key_offset = len(data) + offset_size + len(sharedata)
6184+        # The blockhashes come after the private key
6185+        blockhashes_offset = encrypted_private_key_offset + len(self.encprivkey)
6186+        # The sharehashes come after the salt hashes
6187+        sharehashes_offset = blockhashes_offset + len(self.block_hash_tree_s)
6188+        # The signature comes after the share hash chain
6189+        signature_offset = sharehashes_offset + len(self.share_hash_chain_s)
6190+        # The verification key comes after the signature
6191+        verification_offset = signature_offset + len(self.signature)
6192+        # The EOF comes after the verification key
6193+        eof_offset = verification_offset + len(self.verification_key)
6194+        data += struct.pack(MDMFOFFSETS,
6195+                            encrypted_private_key_offset,
6196+                            blockhashes_offset,
6197+                            sharehashes_offset,
6198+                            signature_offset,
6199+                            verification_offset,
6200+                            eof_offset)
6201+        self.offsets = {}
6202+        self.offsets['enc_privkey'] = encrypted_private_key_offset
6203+        self.offsets['block_hash_tree'] = blockhashes_offset
6204+        self.offsets['share_hash_chain'] = sharehashes_offset
6205+        self.offsets['signature'] = signature_offset
6206+        self.offsets['verification_key'] = verification_offset
6207+        self.offsets['EOF'] = eof_offset
6208+        # Next, we'll add in the salts and share data,
6209+        data += sharedata
6210+        # the private key,
6211+        data += self.encprivkey
6212+        # the block hash tree,
6213+        data += self.block_hash_tree_s
6214+        # the share hash chain,
6215+        data += self.share_hash_chain_s
6216+        # the signature,
6217+        data += self.signature
6218+        # and the verification key
6219+        data += self.verification_key
6220+        return data
6221+
6222+
6223+    def write_test_share_to_server(self,
6224+                                   storage_index,
6225+                                   tail_segment=False,
6226+                                   empty=False):
6227+        """
6228+        I write some data for the read tests to read to self.ss
6229+
6230+        If tail_segment=True, then I will write a share that has a
6231+        smaller tail segment than other segments.
6232+        """
6233+        write = self.ss.remote_slot_testv_and_readv_and_writev
6234+        data = self.build_test_mdmf_share(tail_segment, empty)
6235+        # Finally, we write the whole thing to the storage server in one
6236+        # pass.
6237+        testvs = [(0, 1, "eq", "")]
6238+        tws = {}
6239+        tws[0] = (testvs, [(0, data)], None)
6240+        readv = [(0, 1)]
6241+        results = write(storage_index, self.secrets, tws, readv)
6242+        self.failUnless(results[0])
6243+
6244+
6245+    def build_test_sdmf_share(self, empty=False):
6246+        if empty:
6247+            sharedata = ""
6248+        else:
6249+            sharedata = self.segment * 6
6250+        self.sharedata = sharedata
6251+        blocksize = len(sharedata) / 3
6252+        block = sharedata[:blocksize]
6253+        self.blockdata = block
6254+        prefix = struct.pack(">BQ32s16s BBQQ",
6255+                             0, # version,
6256+                             0,
6257+                             self.root_hash,
6258+                             self.salt,
6259+                             3,
6260+                             10,
6261+                             len(sharedata),
6262+                             len(sharedata),
6263+                            )
6264+        post_offset = struct.calcsize(">BQ32s16sBBQQLLLLQQ")
6265+        signature_offset = post_offset + len(self.verification_key)
6266+        sharehashes_offset = signature_offset + len(self.signature)
6267+        blockhashes_offset = sharehashes_offset + len(self.share_hash_chain_s)
6268+        sharedata_offset = blockhashes_offset + len(self.block_hash_tree_s)
6269+        encprivkey_offset = sharedata_offset + len(block)
6270+        eof_offset = encprivkey_offset + len(self.encprivkey)
6271+        offsets = struct.pack(">LLLLQQ",
6272+                              signature_offset,
6273+                              sharehashes_offset,
6274+                              blockhashes_offset,
6275+                              sharedata_offset,
6276+                              encprivkey_offset,
6277+                              eof_offset)
6278+        final_share = "".join([prefix,
6279+                           offsets,
6280+                           self.verification_key,
6281+                           self.signature,
6282+                           self.share_hash_chain_s,
6283+                           self.block_hash_tree_s,
6284+                           block,
6285+                           self.encprivkey])
6286+        self.offsets = {}
6287+        self.offsets['signature'] = signature_offset
6288+        self.offsets['share_hash_chain'] = sharehashes_offset
6289+        self.offsets['block_hash_tree'] = blockhashes_offset
6290+        self.offsets['share_data'] = sharedata_offset
6291+        self.offsets['enc_privkey'] = encprivkey_offset
6292+        self.offsets['EOF'] = eof_offset
6293+        return final_share
6294+
6295+
6296+    def write_sdmf_share_to_server(self,
6297+                                   storage_index,
6298+                                   empty=False):
6299+        # Some tests need SDMF shares to verify that we can still
6300+        # read them. This method writes one, which resembles but is not
6301+        assert self.rref
6302+        write = self.ss.remote_slot_testv_and_readv_and_writev
6303+        share = self.build_test_sdmf_share(empty)
6304+        testvs = [(0, 1, "eq", "")]
6305+        tws = {}
6306+        tws[0] = (testvs, [(0, share)], None)
6307+        readv = []
6308+        results = write(storage_index, self.secrets, tws, readv)
6309+        self.failUnless(results[0])
6310+
6311+
6312+    def test_read(self):
6313+        self.write_test_share_to_server("si1")
6314+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6315+        # Check that every method equals what we expect it to.
6316+        d = defer.succeed(None)
6317+        def _check_block_and_salt((block, salt)):
6318+            self.failUnlessEqual(block, self.block)
6319+            self.failUnlessEqual(salt, self.salt)
6320+
6321+        for i in xrange(6):
6322+            d.addCallback(lambda ignored, i=i:
6323+                mr.get_block_and_salt(i))
6324+            d.addCallback(_check_block_and_salt)
6325+
6326+        d.addCallback(lambda ignored:
6327+            mr.get_encprivkey())
6328+        d.addCallback(lambda encprivkey:
6329+            self.failUnlessEqual(self.encprivkey, encprivkey))
6330+
6331+        d.addCallback(lambda ignored:
6332+            mr.get_blockhashes())
6333+        d.addCallback(lambda blockhashes:
6334+            self.failUnlessEqual(self.block_hash_tree, blockhashes))
6335+
6336+        d.addCallback(lambda ignored:
6337+            mr.get_sharehashes())
6338+        d.addCallback(lambda sharehashes:
6339+            self.failUnlessEqual(self.share_hash_chain, sharehashes))
6340+
6341+        d.addCallback(lambda ignored:
6342+            mr.get_signature())
6343+        d.addCallback(lambda signature:
6344+            self.failUnlessEqual(signature, self.signature))
6345+
6346+        d.addCallback(lambda ignored:
6347+            mr.get_verification_key())
6348+        d.addCallback(lambda verification_key:
6349+            self.failUnlessEqual(verification_key, self.verification_key))
6350+
6351+        d.addCallback(lambda ignored:
6352+            mr.get_seqnum())
6353+        d.addCallback(lambda seqnum:
6354+            self.failUnlessEqual(seqnum, 0))
6355+
6356+        d.addCallback(lambda ignored:
6357+            mr.get_root_hash())
6358+        d.addCallback(lambda root_hash:
6359+            self.failUnlessEqual(self.root_hash, root_hash))
6360+
6361+        d.addCallback(lambda ignored:
6362+            mr.get_seqnum())
6363+        d.addCallback(lambda seqnum:
6364+            self.failUnlessEqual(0, seqnum))
6365+
6366+        d.addCallback(lambda ignored:
6367+            mr.get_encoding_parameters())
6368+        def _check_encoding_parameters((k, n, segsize, datalen)):
6369+            self.failUnlessEqual(k, 3)
6370+            self.failUnlessEqual(n, 10)
6371+            self.failUnlessEqual(segsize, 6)
6372+            self.failUnlessEqual(datalen, 36)
6373+        d.addCallback(_check_encoding_parameters)
6374+
6375+        d.addCallback(lambda ignored:
6376+            mr.get_checkstring())
6377+        d.addCallback(lambda checkstring:
6378+            self.failUnlessEqual(checkstring, checkstring))
6379+        return d
6380+
6381+
6382+    def test_read_with_different_tail_segment_size(self):
6383+        self.write_test_share_to_server("si1", tail_segment=True)
6384+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6385+        d = mr.get_block_and_salt(5)
6386+        def _check_tail_segment(results):
6387+            block, salt = results
6388+            self.failUnlessEqual(len(block), 1)
6389+            self.failUnlessEqual(block, "a")
6390+        d.addCallback(_check_tail_segment)
6391+        return d
6392+
6393+
6394+    def test_get_block_with_invalid_segnum(self):
6395+        self.write_test_share_to_server("si1")
6396+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6397+        d = defer.succeed(None)
6398+        d.addCallback(lambda ignored:
6399+            self.shouldFail(LayoutInvalid, "test invalid segnum",
6400+                            None,
6401+                            mr.get_block_and_salt, 7))
6402+        return d
6403+
6404+
6405+    def test_get_encoding_parameters_first(self):
6406+        self.write_test_share_to_server("si1")
6407+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6408+        d = mr.get_encoding_parameters()
6409+        def _check_encoding_parameters((k, n, segment_size, datalen)):
6410+            self.failUnlessEqual(k, 3)
6411+            self.failUnlessEqual(n, 10)
6412+            self.failUnlessEqual(segment_size, 6)
6413+            self.failUnlessEqual(datalen, 36)
6414+        d.addCallback(_check_encoding_parameters)
6415+        return d
6416+
6417+
6418+    def test_get_seqnum_first(self):
6419+        self.write_test_share_to_server("si1")
6420+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6421+        d = mr.get_seqnum()
6422+        d.addCallback(lambda seqnum:
6423+            self.failUnlessEqual(seqnum, 0))
6424+        return d
6425+
6426+
6427+    def test_get_root_hash_first(self):
6428+        self.write_test_share_to_server("si1")
6429+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6430+        d = mr.get_root_hash()
6431+        d.addCallback(lambda root_hash:
6432+            self.failUnlessEqual(root_hash, self.root_hash))
6433+        return d
6434+
6435+
6436+    def test_get_checkstring_first(self):
6437+        self.write_test_share_to_server("si1")
6438+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6439+        d = mr.get_checkstring()
6440+        d.addCallback(lambda checkstring:
6441+            self.failUnlessEqual(checkstring, self.checkstring))
6442+        return d
6443+
6444+
6445+    def test_write_read_vectors(self):
6446+        # When writing for us, the storage server will return to us a
6447+        # read vector, along with its result. If a write fails because
6448+        # the test vectors failed, this read vector can help us to
6449+        # diagnose the problem. This test ensures that the read vector
6450+        # is working appropriately.
6451+        mw = self._make_new_mw("si1", 0)
6452+
6453+        for i in xrange(6):
6454+            mw.put_block(self.block, i, self.salt)
6455+        mw.put_encprivkey(self.encprivkey)
6456+        mw.put_blockhashes(self.block_hash_tree)
6457+        mw.put_sharehashes(self.share_hash_chain)
6458+        mw.put_root_hash(self.root_hash)
6459+        mw.put_signature(self.signature)
6460+        mw.put_verification_key(self.verification_key)
6461+        d = mw.finish_publishing()
6462+        def _then(results):
6463+            self.failUnless(len(results), 2)
6464+            result, readv = results
6465+            self.failUnless(result)
6466+            self.failIf(readv)
6467+            self.old_checkstring = mw.get_checkstring()
6468+            mw.set_checkstring("")
6469+        d.addCallback(_then)
6470+        d.addCallback(lambda ignored:
6471+            mw.finish_publishing())
6472+        def _then_again(results):
6473+            self.failUnlessEqual(len(results), 2)
6474+            result, readvs = results
6475+            self.failIf(result)
6476+            self.failUnlessIn(0, readvs)
6477+            readv = readvs[0][0]
6478+            self.failUnlessEqual(readv, self.old_checkstring)
6479+        d.addCallback(_then_again)
6480+        # The checkstring remains the same for the rest of the process.
6481+        return d
6482+
6483+
6484+    def test_blockhashes_after_share_hash_chain(self):
6485+        mw = self._make_new_mw("si1", 0)
6486+        d = defer.succeed(None)
6487+        # Put everything up to and including the share hash chain
6488+        for i in xrange(6):
6489+            d.addCallback(lambda ignored, i=i:
6490+                mw.put_block(self.block, i, self.salt))
6491+        d.addCallback(lambda ignored:
6492+            mw.put_encprivkey(self.encprivkey))
6493+        d.addCallback(lambda ignored:
6494+            mw.put_blockhashes(self.block_hash_tree))
6495+        d.addCallback(lambda ignored:
6496+            mw.put_sharehashes(self.share_hash_chain))
6497+
6498+        # Now try to put the block hash tree again.
6499+        d.addCallback(lambda ignored:
6500+            self.shouldFail(LayoutInvalid, "test repeat salthashes",
6501+                            None,
6502+                            mw.put_blockhashes, self.block_hash_tree))
6503+        return d
6504+
6505+
6506+    def test_encprivkey_after_blockhashes(self):
6507+        mw = self._make_new_mw("si1", 0)
6508+        d = defer.succeed(None)
6509+        # Put everything up to and including the block hash tree
6510+        for i in xrange(6):
6511+            d.addCallback(lambda ignored, i=i:
6512+                mw.put_block(self.block, i, self.salt))
6513+        d.addCallback(lambda ignored:
6514+            mw.put_encprivkey(self.encprivkey))
6515+        d.addCallback(lambda ignored:
6516+            mw.put_blockhashes(self.block_hash_tree))
6517+        d.addCallback(lambda ignored:
6518+            self.shouldFail(LayoutInvalid, "out of order private key",
6519+                            None,
6520+                            mw.put_encprivkey, self.encprivkey))
6521+        return d
6522+
6523+
6524+    def test_share_hash_chain_after_signature(self):
6525+        mw = self._make_new_mw("si1", 0)
6526+        d = defer.succeed(None)
6527+        # Put everything up to and including the signature
6528+        for i in xrange(6):
6529+            d.addCallback(lambda ignored, i=i:
6530+                mw.put_block(self.block, i, self.salt))
6531+        d.addCallback(lambda ignored:
6532+            mw.put_encprivkey(self.encprivkey))
6533+        d.addCallback(lambda ignored:
6534+            mw.put_blockhashes(self.block_hash_tree))
6535+        d.addCallback(lambda ignored:
6536+            mw.put_sharehashes(self.share_hash_chain))
6537+        d.addCallback(lambda ignored:
6538+            mw.put_root_hash(self.root_hash))
6539+        d.addCallback(lambda ignored:
6540+            mw.put_signature(self.signature))
6541+        # Now try to put the share hash chain again. This should fail
6542+        d.addCallback(lambda ignored:
6543+            self.shouldFail(LayoutInvalid, "out of order share hash chain",
6544+                            None,
6545+                            mw.put_sharehashes, self.share_hash_chain))
6546+        return d
6547+
6548+
6549+    def test_signature_after_verification_key(self):
6550+        mw = self._make_new_mw("si1", 0)
6551+        d = defer.succeed(None)
6552+        # Put everything up to and including the verification key.
6553+        for i in xrange(6):
6554+            d.addCallback(lambda ignored, i=i:
6555+                mw.put_block(self.block, i, self.salt))
6556+        d.addCallback(lambda ignored:
6557+            mw.put_encprivkey(self.encprivkey))
6558+        d.addCallback(lambda ignored:
6559+            mw.put_blockhashes(self.block_hash_tree))
6560+        d.addCallback(lambda ignored:
6561+            mw.put_sharehashes(self.share_hash_chain))
6562+        d.addCallback(lambda ignored:
6563+            mw.put_root_hash(self.root_hash))
6564+        d.addCallback(lambda ignored:
6565+            mw.put_signature(self.signature))
6566+        d.addCallback(lambda ignored:
6567+            mw.put_verification_key(self.verification_key))
6568+        # Now try to put the signature again. This should fail
6569+        d.addCallback(lambda ignored:
6570+            self.shouldFail(LayoutInvalid, "signature after verification",
6571+                            None,
6572+                            mw.put_signature, self.signature))
6573+        return d
6574+
6575+
6576+    def test_uncoordinated_write(self):
6577+        # Make two mutable writers, both pointing to the same storage
6578+        # server, both at the same storage index, and try writing to the
6579+        # same share.
6580+        mw1 = self._make_new_mw("si1", 0)
6581+        mw2 = self._make_new_mw("si1", 0)
6582+
6583+        def _check_success(results):
6584+            result, readvs = results
6585+            self.failUnless(result)
6586+
6587+        def _check_failure(results):
6588+            result, readvs = results
6589+            self.failIf(result)
6590+
6591+        def _write_share(mw):
6592+            for i in xrange(6):
6593+                mw.put_block(self.block, i, self.salt)
6594+            mw.put_encprivkey(self.encprivkey)
6595+            mw.put_blockhashes(self.block_hash_tree)
6596+            mw.put_sharehashes(self.share_hash_chain)
6597+            mw.put_root_hash(self.root_hash)
6598+            mw.put_signature(self.signature)
6599+            mw.put_verification_key(self.verification_key)
6600+            return mw.finish_publishing()
6601+        d = _write_share(mw1)
6602+        d.addCallback(_check_success)
6603+        d.addCallback(lambda ignored:
6604+            _write_share(mw2))
6605+        d.addCallback(_check_failure)
6606+        return d
6607+
6608+
6609+    def test_invalid_salt_size(self):
6610+        # Salts need to be 16 bytes in size. Writes that attempt to
6611+        # write more or less than this should be rejected.
6612+        mw = self._make_new_mw("si1", 0)
6613+        invalid_salt = "a" * 17 # 17 bytes
6614+        another_invalid_salt = "b" * 15 # 15 bytes
6615+        d = defer.succeed(None)
6616+        d.addCallback(lambda ignored:
6617+            self.shouldFail(LayoutInvalid, "salt too big",
6618+                            None,
6619+                            mw.put_block, self.block, 0, invalid_salt))
6620+        d.addCallback(lambda ignored:
6621+            self.shouldFail(LayoutInvalid, "salt too small",
6622+                            None,
6623+                            mw.put_block, self.block, 0,
6624+                            another_invalid_salt))
6625+        return d
6626+
6627+
6628+    def test_write_test_vectors(self):
6629+        # If we give the write proxy a bogus test vector at
6630+        # any point during the process, it should fail to write when we
6631+        # tell it to write.
6632+        def _check_failure(results):
6633+            self.failUnlessEqual(len(results), 2)
6634+            res, d = results
6635+            self.failIf(res)
6636+
6637+        def _check_success(results):
6638+            self.failUnlessEqual(len(results), 2)
6639+            res, d = results
6640+            self.failUnless(results)
6641+
6642+        mw = self._make_new_mw("si1", 0)
6643+        mw.set_checkstring("this is a lie")
6644+        for i in xrange(6):
6645+            mw.put_block(self.block, i, self.salt)
6646+        mw.put_encprivkey(self.encprivkey)
6647+        mw.put_blockhashes(self.block_hash_tree)
6648+        mw.put_sharehashes(self.share_hash_chain)
6649+        mw.put_root_hash(self.root_hash)
6650+        mw.put_signature(self.signature)
6651+        mw.put_verification_key(self.verification_key)
6652+        d = mw.finish_publishing()
6653+        d.addCallback(_check_failure)
6654+        d.addCallback(lambda ignored:
6655+            mw.set_checkstring(""))
6656+        d.addCallback(lambda ignored:
6657+            mw.finish_publishing())
6658+        d.addCallback(_check_success)
6659+        return d
6660+
6661+
6662+    def serialize_blockhashes(self, blockhashes):
6663+        return "".join(blockhashes)
6664+
6665+
6666+    def serialize_sharehashes(self, sharehashes):
6667+        ret = "".join([struct.pack(">H32s", i, sharehashes[i])
6668+                        for i in sorted(sharehashes.keys())])
6669+        return ret
6670+
6671+
6672+    def test_write(self):
6673+        # This translates to a file with 6 6-byte segments, and with 2-byte
6674+        # blocks.
6675+        mw = self._make_new_mw("si1", 0)
6676+        # Test writing some blocks.
6677+        read = self.ss.remote_slot_readv
6678+        expected_sharedata_offset = struct.calcsize(MDMFHEADER)
6679+        written_block_size = 2 + len(self.salt)
6680+        written_block = self.block + self.salt
6681+        for i in xrange(6):
6682+            mw.put_block(self.block, i, self.salt)
6683+
6684+        mw.put_encprivkey(self.encprivkey)
6685+        mw.put_blockhashes(self.block_hash_tree)
6686+        mw.put_sharehashes(self.share_hash_chain)
6687+        mw.put_root_hash(self.root_hash)
6688+        mw.put_signature(self.signature)
6689+        mw.put_verification_key(self.verification_key)
6690+        d = mw.finish_publishing()
6691+        def _check_publish(results):
6692+            self.failUnlessEqual(len(results), 2)
6693+            result, ign = results
6694+            self.failUnless(result, "publish failed")
6695+            for i in xrange(6):
6696+                self.failUnlessEqual(read("si1", [0], [(expected_sharedata_offset + (i * written_block_size), written_block_size)]),
6697+                                {0: [written_block]})
6698+
6699+            expected_private_key_offset = expected_sharedata_offset + \
6700+                                      len(written_block) * 6
6701+            self.failUnlessEqual(len(self.encprivkey), 7)
6702+            self.failUnlessEqual(read("si1", [0], [(expected_private_key_offset, 7)]),
6703+                                 {0: [self.encprivkey]})
6704+
6705+            expected_block_hash_offset = expected_private_key_offset + len(self.encprivkey)
6706+            self.failUnlessEqual(len(self.block_hash_tree_s), 32 * 6)
6707+            self.failUnlessEqual(read("si1", [0], [(expected_block_hash_offset, 32 * 6)]),
6708+                                 {0: [self.block_hash_tree_s]})
6709+
6710+            expected_share_hash_offset = expected_block_hash_offset + len(self.block_hash_tree_s)
6711+            self.failUnlessEqual(read("si1", [0],[(expected_share_hash_offset, (32 + 2) * 6)]),
6712+                                 {0: [self.share_hash_chain_s]})
6713+
6714+            self.failUnlessEqual(read("si1", [0], [(9, 32)]),
6715+                                 {0: [self.root_hash]})
6716+            expected_signature_offset = expected_share_hash_offset + len(self.share_hash_chain_s)
6717+            self.failUnlessEqual(len(self.signature), 9)
6718+            self.failUnlessEqual(read("si1", [0], [(expected_signature_offset, 9)]),
6719+                                 {0: [self.signature]})
6720+
6721+            expected_verification_key_offset = expected_signature_offset + len(self.signature)
6722+            self.failUnlessEqual(len(self.verification_key), 6)
6723+            self.failUnlessEqual(read("si1", [0], [(expected_verification_key_offset, 6)]),
6724+                                 {0: [self.verification_key]})
6725+
6726+            signable = mw.get_signable()
6727+            verno, seq, roothash, k, n, segsize, datalen = \
6728+                                            struct.unpack(">BQ32sBBQQ",
6729+                                                          signable)
6730+            self.failUnlessEqual(verno, 1)
6731+            self.failUnlessEqual(seq, 0)
6732+            self.failUnlessEqual(roothash, self.root_hash)
6733+            self.failUnlessEqual(k, 3)
6734+            self.failUnlessEqual(n, 10)
6735+            self.failUnlessEqual(segsize, 6)
6736+            self.failUnlessEqual(datalen, 36)
6737+            expected_eof_offset = expected_verification_key_offset + len(self.verification_key)
6738+
6739+            # Check the version number to make sure that it is correct.
6740+            expected_version_number = struct.pack(">B", 1)
6741+            self.failUnlessEqual(read("si1", [0], [(0, 1)]),
6742+                                 {0: [expected_version_number]})
6743+            # Check the sequence number to make sure that it is correct
6744+            expected_sequence_number = struct.pack(">Q", 0)
6745+            self.failUnlessEqual(read("si1", [0], [(1, 8)]),
6746+                                 {0: [expected_sequence_number]})
6747+            # Check that the encoding parameters (k, N, segement size, data
6748+            # length) are what they should be. These are  3, 10, 6, 36
6749+            expected_k = struct.pack(">B", 3)
6750+            self.failUnlessEqual(read("si1", [0], [(41, 1)]),
6751+                                 {0: [expected_k]})
6752+            expected_n = struct.pack(">B", 10)
6753+            self.failUnlessEqual(read("si1", [0], [(42, 1)]),
6754+                                 {0: [expected_n]})
6755+            expected_segment_size = struct.pack(">Q", 6)
6756+            self.failUnlessEqual(read("si1", [0], [(43, 8)]),
6757+                                 {0: [expected_segment_size]})
6758+            expected_data_length = struct.pack(">Q", 36)
6759+            self.failUnlessEqual(read("si1", [0], [(51, 8)]),
6760+                                 {0: [expected_data_length]})
6761+            expected_offset = struct.pack(">Q", expected_private_key_offset)
6762+            self.failUnlessEqual(read("si1", [0], [(59, 8)]),
6763+                                 {0: [expected_offset]})
6764+            expected_offset = struct.pack(">Q", expected_block_hash_offset)
6765+            self.failUnlessEqual(read("si1", [0], [(67, 8)]),
6766+                                 {0: [expected_offset]})
6767+            expected_offset = struct.pack(">Q", expected_share_hash_offset)
6768+            self.failUnlessEqual(read("si1", [0], [(75, 8)]),
6769+                                 {0: [expected_offset]})
6770+            expected_offset = struct.pack(">Q", expected_signature_offset)
6771+            self.failUnlessEqual(read("si1", [0], [(83, 8)]),
6772+                                 {0: [expected_offset]})
6773+            expected_offset = struct.pack(">Q", expected_verification_key_offset)
6774+            self.failUnlessEqual(read("si1", [0], [(91, 8)]),
6775+                                 {0: [expected_offset]})
6776+            expected_offset = struct.pack(">Q", expected_eof_offset)
6777+            self.failUnlessEqual(read("si1", [0], [(99, 8)]),
6778+                                 {0: [expected_offset]})
6779+        d.addCallback(_check_publish)
6780+        return d
6781+
6782+    def _make_new_mw(self, si, share, datalength=36):
6783+        # This is a file of size 36 bytes. Since it has a segment
6784+        # size of 6, we know that it has 6 byte segments, which will
6785+        # be split into blocks of 2 bytes because our FEC k
6786+        # parameter is 3.
6787+        mw = MDMFSlotWriteProxy(share, self.rref, si, self.secrets, 0, 3, 10,
6788+                                6, datalength)
6789+        return mw
6790+
6791+
6792+    def test_write_rejected_with_too_many_blocks(self):
6793+        mw = self._make_new_mw("si0", 0)
6794+
6795+        # Try writing too many blocks. We should not be able to write
6796+        # more than 6
6797+        # blocks into each share.
6798+        d = defer.succeed(None)
6799+        for i in xrange(6):
6800+            d.addCallback(lambda ignored, i=i:
6801+                mw.put_block(self.block, i, self.salt))
6802+        d.addCallback(lambda ignored:
6803+            self.shouldFail(LayoutInvalid, "too many blocks",
6804+                            None,
6805+                            mw.put_block, self.block, 7, self.salt))
6806+        return d
6807+
6808+
6809+    def test_write_rejected_with_invalid_salt(self):
6810+        # Try writing an invalid salt. Salts are 16 bytes -- any more or
6811+        # less should cause an error.
6812+        mw = self._make_new_mw("si1", 0)
6813+        bad_salt = "a" * 17 # 17 bytes
6814+        d = defer.succeed(None)
6815+        d.addCallback(lambda ignored:
6816+            self.shouldFail(LayoutInvalid, "test_invalid_salt",
6817+                            None, mw.put_block, self.block, 7, bad_salt))
6818+        return d
6819+
6820+
6821+    def test_write_rejected_with_invalid_root_hash(self):
6822+        # Try writing an invalid root hash. This should be SHA256d, and
6823+        # 32 bytes long as a result.
6824+        mw = self._make_new_mw("si2", 0)
6825+        # 17 bytes != 32 bytes
6826+        invalid_root_hash = "a" * 17
6827+        d = defer.succeed(None)
6828+        # Before this test can work, we need to put some blocks + salts,
6829+        # a block hash tree, and a share hash tree. Otherwise, we'll see
6830+        # failures that match what we are looking for, but are caused by
6831+        # the constraints imposed on operation ordering.
6832+        for i in xrange(6):
6833+            d.addCallback(lambda ignored, i=i:
6834+                mw.put_block(self.block, i, self.salt))
6835+        d.addCallback(lambda ignored:
6836+            mw.put_encprivkey(self.encprivkey))
6837+        d.addCallback(lambda ignored:
6838+            mw.put_blockhashes(self.block_hash_tree))
6839+        d.addCallback(lambda ignored:
6840+            mw.put_sharehashes(self.share_hash_chain))
6841+        d.addCallback(lambda ignored:
6842+            self.shouldFail(LayoutInvalid, "invalid root hash",
6843+                            None, mw.put_root_hash, invalid_root_hash))
6844+        return d
6845+
6846+
6847+    def test_write_rejected_with_invalid_blocksize(self):
6848+        # The blocksize implied by the writer that we get from
6849+        # _make_new_mw is 2bytes -- any more or any less than this
6850+        # should be cause for failure, unless it is the tail segment, in
6851+        # which case it may not be failure.
6852+        invalid_block = "a"
6853+        mw = self._make_new_mw("si3", 0, 33) # implies a tail segment with
6854+                                             # one byte blocks
6855+        # 1 bytes != 2 bytes
6856+        d = defer.succeed(None)
6857+        d.addCallback(lambda ignored, invalid_block=invalid_block:
6858+            self.shouldFail(LayoutInvalid, "test blocksize too small",
6859+                            None, mw.put_block, invalid_block, 0,
6860+                            self.salt))
6861+        invalid_block = invalid_block * 3
6862+        # 3 bytes != 2 bytes
6863+        d.addCallback(lambda ignored:
6864+            self.shouldFail(LayoutInvalid, "test blocksize too large",
6865+                            None,
6866+                            mw.put_block, invalid_block, 0, self.salt))
6867+        for i in xrange(5):
6868+            d.addCallback(lambda ignored, i=i:
6869+                mw.put_block(self.block, i, self.salt))
6870+        # Try to put an invalid tail segment
6871+        d.addCallback(lambda ignored:
6872+            self.shouldFail(LayoutInvalid, "test invalid tail segment",
6873+                            None,
6874+                            mw.put_block, self.block, 5, self.salt))
6875+        valid_block = "a"
6876+        d.addCallback(lambda ignored:
6877+            mw.put_block(valid_block, 5, self.salt))
6878+        return d
6879+
6880+
6881+    def test_write_enforces_order_constraints(self):
6882+        # We require that the MDMFSlotWriteProxy be interacted with in a
6883+        # specific way.
6884+        # That way is:
6885+        # 0: __init__
6886+        # 1: write blocks and salts
6887+        # 2: Write the encrypted private key
6888+        # 3: Write the block hashes
6889+        # 4: Write the share hashes
6890+        # 5: Write the root hash and salt hash
6891+        # 6: Write the signature and verification key
6892+        # 7: Write the file.
6893+        #
6894+        # Some of these can be performed out-of-order, and some can't.
6895+        # The dependencies that I want to test here are:
6896+        #  - Private key before block hashes
6897+        #  - share hashes and block hashes before root hash
6898+        #  - root hash before signature
6899+        #  - signature before verification key
6900+        mw0 = self._make_new_mw("si0", 0)
6901+        # Write some shares
6902+        d = defer.succeed(None)
6903+        for i in xrange(6):
6904+            d.addCallback(lambda ignored, i=i:
6905+                mw0.put_block(self.block, i, self.salt))
6906+        # Try to write the block hashes before writing the encrypted
6907+        # private key
6908+        d.addCallback(lambda ignored:
6909+            self.shouldFail(LayoutInvalid, "block hashes before key",
6910+                            None, mw0.put_blockhashes,
6911+                            self.block_hash_tree))
6912+
6913+        # Write the private key.
6914+        d.addCallback(lambda ignored:
6915+            mw0.put_encprivkey(self.encprivkey))
6916+
6917+
6918+        # Try to write the share hash chain without writing the block
6919+        # hash tree
6920+        d.addCallback(lambda ignored:
6921+            self.shouldFail(LayoutInvalid, "share hash chain before "
6922+                                           "salt hash tree",
6923+                            None,
6924+                            mw0.put_sharehashes, self.share_hash_chain))
6925+
6926+        # Try to write the root hash and without writing either the
6927+        # block hashes or the or the share hashes
6928+        d.addCallback(lambda ignored:
6929+            self.shouldFail(LayoutInvalid, "root hash before share hashes",
6930+                            None,
6931+                            mw0.put_root_hash, self.root_hash))
6932+
6933+        # Now write the block hashes and try again
6934+        d.addCallback(lambda ignored:
6935+            mw0.put_blockhashes(self.block_hash_tree))
6936+
6937+        d.addCallback(lambda ignored:
6938+            self.shouldFail(LayoutInvalid, "root hash before share hashes",
6939+                            None, mw0.put_root_hash, self.root_hash))
6940+
6941+        # We haven't yet put the root hash on the share, so we shouldn't
6942+        # be able to sign it.
6943+        d.addCallback(lambda ignored:
6944+            self.shouldFail(LayoutInvalid, "signature before root hash",
6945+                            None, mw0.put_signature, self.signature))
6946+
6947+        d.addCallback(lambda ignored:
6948+            self.failUnlessRaises(LayoutInvalid, mw0.get_signable))
6949+
6950+        # ..and, since that fails, we also shouldn't be able to put the
6951+        # verification key.
6952+        d.addCallback(lambda ignored:
6953+            self.shouldFail(LayoutInvalid, "key before signature",
6954+                            None, mw0.put_verification_key,
6955+                            self.verification_key))
6956+
6957+        # Now write the share hashes.
6958+        d.addCallback(lambda ignored:
6959+            mw0.put_sharehashes(self.share_hash_chain))
6960+        # We should be able to write the root hash now too
6961+        d.addCallback(lambda ignored:
6962+            mw0.put_root_hash(self.root_hash))
6963+
6964+        # We should still be unable to put the verification key
6965+        d.addCallback(lambda ignored:
6966+            self.shouldFail(LayoutInvalid, "key before signature",
6967+                            None, mw0.put_verification_key,
6968+                            self.verification_key))
6969+
6970+        d.addCallback(lambda ignored:
6971+            mw0.put_signature(self.signature))
6972+
6973+        # We shouldn't be able to write the offsets to the remote server
6974+        # until the offset table is finished; IOW, until we have written
6975+        # the verification key.
6976+        d.addCallback(lambda ignored:
6977+            self.shouldFail(LayoutInvalid, "offsets before verification key",
6978+                            None,
6979+                            mw0.finish_publishing))
6980+
6981+        d.addCallback(lambda ignored:
6982+            mw0.put_verification_key(self.verification_key))
6983+        return d
6984+
6985+
6986+    def test_end_to_end(self):
6987+        mw = self._make_new_mw("si1", 0)
6988+        # Write a share using the mutable writer, and make sure that the
6989+        # reader knows how to read everything back to us.
6990+        d = defer.succeed(None)
6991+        for i in xrange(6):
6992+            d.addCallback(lambda ignored, i=i:
6993+                mw.put_block(self.block, i, self.salt))
6994+        d.addCallback(lambda ignored:
6995+            mw.put_encprivkey(self.encprivkey))
6996+        d.addCallback(lambda ignored:
6997+            mw.put_blockhashes(self.block_hash_tree))
6998+        d.addCallback(lambda ignored:
6999+            mw.put_sharehashes(self.share_hash_chain))
7000+        d.addCallback(lambda ignored:
7001+            mw.put_root_hash(self.root_hash))
7002+        d.addCallback(lambda ignored:
7003+            mw.put_signature(self.signature))
7004+        d.addCallback(lambda ignored:
7005+            mw.put_verification_key(self.verification_key))
7006+        d.addCallback(lambda ignored:
7007+            mw.finish_publishing())
7008+
7009+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7010+        def _check_block_and_salt((block, salt)):
7011+            self.failUnlessEqual(block, self.block)
7012+            self.failUnlessEqual(salt, self.salt)
7013+
7014+        for i in xrange(6):
7015+            d.addCallback(lambda ignored, i=i:
7016+                mr.get_block_and_salt(i))
7017+            d.addCallback(_check_block_and_salt)
7018+
7019+        d.addCallback(lambda ignored:
7020+            mr.get_encprivkey())
7021+        d.addCallback(lambda encprivkey:
7022+            self.failUnlessEqual(self.encprivkey, encprivkey))
7023+
7024+        d.addCallback(lambda ignored:
7025+            mr.get_blockhashes())
7026+        d.addCallback(lambda blockhashes:
7027+            self.failUnlessEqual(self.block_hash_tree, blockhashes))
7028+
7029+        d.addCallback(lambda ignored:
7030+            mr.get_sharehashes())
7031+        d.addCallback(lambda sharehashes:
7032+            self.failUnlessEqual(self.share_hash_chain, sharehashes))
7033+
7034+        d.addCallback(lambda ignored:
7035+            mr.get_signature())
7036+        d.addCallback(lambda signature:
7037+            self.failUnlessEqual(signature, self.signature))
7038+
7039+        d.addCallback(lambda ignored:
7040+            mr.get_verification_key())
7041+        d.addCallback(lambda verification_key:
7042+            self.failUnlessEqual(verification_key, self.verification_key))
7043+
7044+        d.addCallback(lambda ignored:
7045+            mr.get_seqnum())
7046+        d.addCallback(lambda seqnum:
7047+            self.failUnlessEqual(seqnum, 0))
7048+
7049+        d.addCallback(lambda ignored:
7050+            mr.get_root_hash())
7051+        d.addCallback(lambda root_hash:
7052+            self.failUnlessEqual(self.root_hash, root_hash))
7053+
7054+        d.addCallback(lambda ignored:
7055+            mr.get_encoding_parameters())
7056+        def _check_encoding_parameters((k, n, segsize, datalen)):
7057+            self.failUnlessEqual(k, 3)
7058+            self.failUnlessEqual(n, 10)
7059+            self.failUnlessEqual(segsize, 6)
7060+            self.failUnlessEqual(datalen, 36)
7061+        d.addCallback(_check_encoding_parameters)
7062+
7063+        d.addCallback(lambda ignored:
7064+            mr.get_checkstring())
7065+        d.addCallback(lambda checkstring:
7066+            self.failUnlessEqual(checkstring, mw.get_checkstring()))
7067+        return d
7068+
7069+
7070+    def test_is_sdmf(self):
7071+        # The MDMFSlotReadProxy should also know how to read SDMF files,
7072+        # since it will encounter them on the grid. Callers use the
7073+        # is_sdmf method to test this.
7074+        self.write_sdmf_share_to_server("si1")
7075+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7076+        d = mr.is_sdmf()
7077+        d.addCallback(lambda issdmf:
7078+            self.failUnless(issdmf))
7079+        return d
7080+
7081+
7082+    def test_reads_sdmf(self):
7083+        # The slot read proxy should, naturally, know how to tell us
7084+        # about data in the SDMF format
7085+        self.write_sdmf_share_to_server("si1")
7086+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7087+        d = defer.succeed(None)
7088+        d.addCallback(lambda ignored:
7089+            mr.is_sdmf())
7090+        d.addCallback(lambda issdmf:
7091+            self.failUnless(issdmf))
7092+
7093+        # What do we need to read?
7094+        #  - The sharedata
7095+        #  - The salt
7096+        d.addCallback(lambda ignored:
7097+            mr.get_block_and_salt(0))
7098+        def _check_block_and_salt(results):
7099+            block, salt = results
7100+            # Our original file is 36 bytes long. Then each share is 12
7101+            # bytes in size. The share is composed entirely of the
7102+            # letter a. self.block contains 2 as, so 6 * self.block is
7103+            # what we are looking for.
7104+            self.failUnlessEqual(block, self.block * 6)
7105+            self.failUnlessEqual(salt, self.salt)
7106+        d.addCallback(_check_block_and_salt)
7107+
7108+        #  - The blockhashes
7109+        d.addCallback(lambda ignored:
7110+            mr.get_blockhashes())
7111+        d.addCallback(lambda blockhashes:
7112+            self.failUnlessEqual(self.block_hash_tree,
7113+                                 blockhashes,
7114+                                 blockhashes))
7115+        #  - The sharehashes
7116+        d.addCallback(lambda ignored:
7117+            mr.get_sharehashes())
7118+        d.addCallback(lambda sharehashes:
7119+            self.failUnlessEqual(self.share_hash_chain,
7120+                                 sharehashes))
7121+        #  - The keys
7122+        d.addCallback(lambda ignored:
7123+            mr.get_encprivkey())
7124+        d.addCallback(lambda encprivkey:
7125+            self.failUnlessEqual(encprivkey, self.encprivkey, encprivkey))
7126+        d.addCallback(lambda ignored:
7127+            mr.get_verification_key())
7128+        d.addCallback(lambda verification_key:
7129+            self.failUnlessEqual(verification_key,
7130+                                 self.verification_key,
7131+                                 verification_key))
7132+        #  - The signature
7133+        d.addCallback(lambda ignored:
7134+            mr.get_signature())
7135+        d.addCallback(lambda signature:
7136+            self.failUnlessEqual(signature, self.signature, signature))
7137+
7138+        #  - The sequence number
7139+        d.addCallback(lambda ignored:
7140+            mr.get_seqnum())
7141+        d.addCallback(lambda seqnum:
7142+            self.failUnlessEqual(seqnum, 0, seqnum))
7143+
7144+        #  - The root hash
7145+        d.addCallback(lambda ignored:
7146+            mr.get_root_hash())
7147+        d.addCallback(lambda root_hash:
7148+            self.failUnlessEqual(root_hash, self.root_hash, root_hash))
7149+        return d
7150+
7151+
7152+    def test_only_reads_one_segment_sdmf(self):
7153+        # SDMF shares have only one segment, so it doesn't make sense to
7154+        # read more segments than that. The reader should know this and
7155+        # complain if we try to do that.
7156+        self.write_sdmf_share_to_server("si1")
7157+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7158+        d = defer.succeed(None)
7159+        d.addCallback(lambda ignored:
7160+            mr.is_sdmf())
7161+        d.addCallback(lambda issdmf:
7162+            self.failUnless(issdmf))
7163+        d.addCallback(lambda ignored:
7164+            self.shouldFail(LayoutInvalid, "test bad segment",
7165+                            None,
7166+                            mr.get_block_and_salt, 1))
7167+        return d
7168+
7169+
7170+    def test_read_with_prefetched_mdmf_data(self):
7171+        # The MDMFSlotReadProxy will prefill certain fields if you pass
7172+        # it data that you have already fetched. This is useful for
7173+        # cases like the Servermap, which prefetches ~2kb of data while
7174+        # finding out which shares are on the remote peer so that it
7175+        # doesn't waste round trips.
7176+        mdmf_data = self.build_test_mdmf_share()
7177+        self.write_test_share_to_server("si1")
7178+        def _make_mr(ignored, length):
7179+            mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:length])
7180+            return mr
7181+
7182+        d = defer.succeed(None)
7183+        # This should be enough to fill in both the encoding parameters
7184+        # and the table of offsets, which will complete the version
7185+        # information tuple.
7186+        d.addCallback(_make_mr, 107)
7187+        d.addCallback(lambda mr:
7188+            mr.get_verinfo())
7189+        def _check_verinfo(verinfo):
7190+            self.failUnless(verinfo)
7191+            self.failUnlessEqual(len(verinfo), 9)
7192+            (seqnum,
7193+             root_hash,
7194+             salt_hash,
7195+             segsize,
7196+             datalen,
7197+             k,
7198+             n,
7199+             prefix,
7200+             offsets) = verinfo
7201+            self.failUnlessEqual(seqnum, 0)
7202+            self.failUnlessEqual(root_hash, self.root_hash)
7203+            self.failUnlessEqual(segsize, 6)
7204+            self.failUnlessEqual(datalen, 36)
7205+            self.failUnlessEqual(k, 3)
7206+            self.failUnlessEqual(n, 10)
7207+            expected_prefix = struct.pack(MDMFSIGNABLEHEADER,
7208+                                          1,
7209+                                          seqnum,
7210+                                          root_hash,
7211+                                          k,
7212+                                          n,
7213+                                          segsize,
7214+                                          datalen)
7215+            self.failUnlessEqual(expected_prefix, prefix)
7216+            self.failUnlessEqual(self.rref.read_count, 0)
7217+        d.addCallback(_check_verinfo)
7218+        # This is not enough data to read a block and a share, so the
7219+        # wrapper should attempt to read this from the remote server.
7220+        d.addCallback(_make_mr, 107)
7221+        d.addCallback(lambda mr:
7222+            mr.get_block_and_salt(0))
7223+        def _check_block_and_salt((block, salt)):
7224+            self.failUnlessEqual(block, self.block)
7225+            self.failUnlessEqual(salt, self.salt)
7226+            self.failUnlessEqual(self.rref.read_count, 1)
7227+        # This should be enough data to read one block.
7228+        d.addCallback(_make_mr, 249)
7229+        d.addCallback(lambda mr:
7230+            mr.get_block_and_salt(0))
7231+        d.addCallback(_check_block_and_salt)
7232+        return d
7233+
7234+
7235+    def test_read_with_prefetched_sdmf_data(self):
7236+        sdmf_data = self.build_test_sdmf_share()
7237+        self.write_sdmf_share_to_server("si1")
7238+        def _make_mr(ignored, length):
7239+            mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:length])
7240+            return mr
7241+
7242+        d = defer.succeed(None)
7243+        # This should be enough to get us the encoding parameters,
7244+        # offset table, and everything else we need to build a verinfo
7245+        # string.
7246+        d.addCallback(_make_mr, 107)
7247+        d.addCallback(lambda mr:
7248+            mr.get_verinfo())
7249+        def _check_verinfo(verinfo):
7250+            self.failUnless(verinfo)
7251+            self.failUnlessEqual(len(verinfo), 9)
7252+            (seqnum,
7253+             root_hash,
7254+             salt,
7255+             segsize,
7256+             datalen,
7257+             k,
7258+             n,
7259+             prefix,
7260+             offsets) = verinfo
7261+            self.failUnlessEqual(seqnum, 0)
7262+            self.failUnlessEqual(root_hash, self.root_hash)
7263+            self.failUnlessEqual(salt, self.salt)
7264+            self.failUnlessEqual(segsize, 36)
7265+            self.failUnlessEqual(datalen, 36)
7266+            self.failUnlessEqual(k, 3)
7267+            self.failUnlessEqual(n, 10)
7268+            expected_prefix = struct.pack(SIGNED_PREFIX,
7269+                                          0,
7270+                                          seqnum,
7271+                                          root_hash,
7272+                                          salt,
7273+                                          k,
7274+                                          n,
7275+                                          segsize,
7276+                                          datalen)
7277+            self.failUnlessEqual(expected_prefix, prefix)
7278+            self.failUnlessEqual(self.rref.read_count, 0)
7279+        d.addCallback(_check_verinfo)
7280+        # This shouldn't be enough to read any share data.
7281+        d.addCallback(_make_mr, 107)
7282+        d.addCallback(lambda mr:
7283+            mr.get_block_and_salt(0))
7284+        def _check_block_and_salt((block, salt)):
7285+            self.failUnlessEqual(block, self.block * 6)
7286+            self.failUnlessEqual(salt, self.salt)
7287+            # TODO: Fix the read routine so that it reads only the data
7288+            #       that it has cached if it can't read all of it.
7289+            self.failUnlessEqual(self.rref.read_count, 2)
7290+
7291+        # This should be enough to read share data.
7292+        d.addCallback(_make_mr, self.offsets['share_data'])
7293+        d.addCallback(lambda mr:
7294+            mr.get_block_and_salt(0))
7295+        d.addCallback(_check_block_and_salt)
7296+        return d
7297+
7298+
7299+    def test_read_with_empty_mdmf_file(self):
7300+        # Some tests upload a file with no contents to test things
7301+        # unrelated to the actual handling of the content of the file.
7302+        # The reader should behave intelligently in these cases.
7303+        self.write_test_share_to_server("si1", empty=True)
7304+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7305+        # We should be able to get the encoding parameters, and they
7306+        # should be correct.
7307+        d = defer.succeed(None)
7308+        d.addCallback(lambda ignored:
7309+            mr.get_encoding_parameters())
7310+        def _check_encoding_parameters(params):
7311+            self.failUnlessEqual(len(params), 4)
7312+            k, n, segsize, datalen = params
7313+            self.failUnlessEqual(k, 3)
7314+            self.failUnlessEqual(n, 10)
7315+            self.failUnlessEqual(segsize, 0)
7316+            self.failUnlessEqual(datalen, 0)
7317+        d.addCallback(_check_encoding_parameters)
7318+
7319+        # We should not be able to fetch a block, since there are no
7320+        # blocks to fetch
7321+        d.addCallback(lambda ignored:
7322+            self.shouldFail(LayoutInvalid, "get block on empty file",
7323+                            None,
7324+                            mr.get_block_and_salt, 0))
7325+        return d
7326+
7327+
7328+    def test_read_with_empty_sdmf_file(self):
7329+        self.write_sdmf_share_to_server("si1", empty=True)
7330+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7331+        # We should be able to get the encoding parameters, and they
7332+        # should be correct
7333+        d = defer.succeed(None)
7334+        d.addCallback(lambda ignored:
7335+            mr.get_encoding_parameters())
7336+        def _check_encoding_parameters(params):
7337+            self.failUnlessEqual(len(params), 4)
7338+            k, n, segsize, datalen = params
7339+            self.failUnlessEqual(k, 3)
7340+            self.failUnlessEqual(n, 10)
7341+            self.failUnlessEqual(segsize, 0)
7342+            self.failUnlessEqual(datalen, 0)
7343+        d.addCallback(_check_encoding_parameters)
7344+
7345+        # It does not make sense to get a block in this format, so we
7346+        # should not be able to.
7347+        d.addCallback(lambda ignored:
7348+            self.shouldFail(LayoutInvalid, "get block on an empty file",
7349+                            None,
7350+                            mr.get_block_and_salt, 0))
7351+        return d
7352+
7353+
7354+    def test_verinfo_with_sdmf_file(self):
7355+        self.write_sdmf_share_to_server("si1")
7356+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7357+        # We should be able to get the version information.
7358+        d = defer.succeed(None)
7359+        d.addCallback(lambda ignored:
7360+            mr.get_verinfo())
7361+        def _check_verinfo(verinfo):
7362+            self.failUnless(verinfo)
7363+            self.failUnlessEqual(len(verinfo), 9)
7364+            (seqnum,
7365+             root_hash,
7366+             salt,
7367+             segsize,
7368+             datalen,
7369+             k,
7370+             n,
7371+             prefix,
7372+             offsets) = verinfo
7373+            self.failUnlessEqual(seqnum, 0)
7374+            self.failUnlessEqual(root_hash, self.root_hash)
7375+            self.failUnlessEqual(salt, self.salt)
7376+            self.failUnlessEqual(segsize, 36)
7377+            self.failUnlessEqual(datalen, 36)
7378+            self.failUnlessEqual(k, 3)
7379+            self.failUnlessEqual(n, 10)
7380+            expected_prefix = struct.pack(">BQ32s16s BBQQ",
7381+                                          0,
7382+                                          seqnum,
7383+                                          root_hash,
7384+                                          salt,
7385+                                          k,
7386+                                          n,
7387+                                          segsize,
7388+                                          datalen)
7389+            self.failUnlessEqual(prefix, expected_prefix)
7390+            self.failUnlessEqual(offsets, self.offsets)
7391+        d.addCallback(_check_verinfo)
7392+        return d
7393+
7394+
7395+    def test_verinfo_with_mdmf_file(self):
7396+        self.write_test_share_to_server("si1")
7397+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7398+        d = defer.succeed(None)
7399+        d.addCallback(lambda ignored:
7400+            mr.get_verinfo())
7401+        def _check_verinfo(verinfo):
7402+            self.failUnless(verinfo)
7403+            self.failUnlessEqual(len(verinfo), 9)
7404+            (seqnum,
7405+             root_hash,
7406+             IV,
7407+             segsize,
7408+             datalen,
7409+             k,
7410+             n,
7411+             prefix,
7412+             offsets) = verinfo
7413+            self.failUnlessEqual(seqnum, 0)
7414+            self.failUnlessEqual(root_hash, self.root_hash)
7415+            self.failIf(IV)
7416+            self.failUnlessEqual(segsize, 6)
7417+            self.failUnlessEqual(datalen, 36)
7418+            self.failUnlessEqual(k, 3)
7419+            self.failUnlessEqual(n, 10)
7420+            expected_prefix = struct.pack(">BQ32s BBQQ",
7421+                                          1,
7422+                                          seqnum,
7423+                                          root_hash,
7424+                                          k,
7425+                                          n,
7426+                                          segsize,
7427+                                          datalen)
7428+            self.failUnlessEqual(prefix, expected_prefix)
7429+            self.failUnlessEqual(offsets, self.offsets)
7430+        d.addCallback(_check_verinfo)
7431+        return d
7432+
7433+
7434+    def test_reader_queue(self):
7435+        self.write_test_share_to_server('si1')
7436+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7437+        d1 = mr.get_block_and_salt(0, queue=True)
7438+        d2 = mr.get_blockhashes(queue=True)
7439+        d3 = mr.get_sharehashes(queue=True)
7440+        d4 = mr.get_signature(queue=True)
7441+        d5 = mr.get_verification_key(queue=True)
7442+        dl = defer.DeferredList([d1, d2, d3, d4, d5])
7443+        mr.flush()
7444+        def _print(results):
7445+            self.failUnlessEqual(len(results), 5)
7446+            # We have one read for version information and offsets, and
7447+            # one for everything else.
7448+            self.failUnlessEqual(self.rref.read_count, 2)
7449+            block, salt = results[0][1] # results[0] is a boolean that says
7450+                                           # whether or not the operation
7451+                                           # worked.
7452+            self.failUnlessEqual(self.block, block)
7453+            self.failUnlessEqual(self.salt, salt)
7454+
7455+            blockhashes = results[1][1]
7456+            self.failUnlessEqual(self.block_hash_tree, blockhashes)
7457+
7458+            sharehashes = results[2][1]
7459+            self.failUnlessEqual(self.share_hash_chain, sharehashes)
7460+
7461+            signature = results[3][1]
7462+            self.failUnlessEqual(self.signature, signature)
7463+
7464+            verification_key = results[4][1]
7465+            self.failUnlessEqual(self.verification_key, verification_key)
7466+        dl.addCallback(_print)
7467+        return dl
7468+
7469+
7470+    def test_sdmf_writer(self):
7471+        # Go through the motions of writing an SDMF share to the storage
7472+        # server. Then read the storage server to see that the share got
7473+        # written in the way that we think it should have.
7474+
7475+        # We do this first so that the necessary instance variables get
7476+        # set the way we want them for the tests below.
7477+        data = self.build_test_sdmf_share()
7478+        sdmfr = SDMFSlotWriteProxy(0,
7479+                                   self.rref,
7480+                                   "si1",
7481+                                   self.secrets,
7482+                                   0, 3, 10, 36, 36)
7483+        # Put the block and salt.
7484+        sdmfr.put_block(self.blockdata, 0, self.salt)
7485+
7486+        # Put the encprivkey
7487+        sdmfr.put_encprivkey(self.encprivkey)
7488+
7489+        # Put the block and share hash chains
7490+        sdmfr.put_blockhashes(self.block_hash_tree)
7491+        sdmfr.put_sharehashes(self.share_hash_chain)
7492+        sdmfr.put_root_hash(self.root_hash)
7493+
7494+        # Put the signature
7495+        sdmfr.put_signature(self.signature)
7496+
7497+        # Put the verification key
7498+        sdmfr.put_verification_key(self.verification_key)
7499+
7500+        # Now check to make sure that nothing has been written yet.
7501+        self.failUnlessEqual(self.rref.write_count, 0)
7502+
7503+        # Now finish publishing
7504+        d = sdmfr.finish_publishing()
7505+        def _then(ignored):
7506+            self.failUnlessEqual(self.rref.write_count, 1)
7507+            read = self.ss.remote_slot_readv
7508+            self.failUnlessEqual(read("si1", [0], [(0, len(data))]),
7509+                                 {0: [data]})
7510+        d.addCallback(_then)
7511+        return d
7512+
7513+
7514+    def test_sdmf_writer_preexisting_share(self):
7515+        data = self.build_test_sdmf_share()
7516+        self.write_sdmf_share_to_server("si1")
7517+
7518+        # Now there is a share on the storage server. To successfully
7519+        # write, we need to set the checkstring correctly. When we
7520+        # don't, no write should occur.
7521+        sdmfw = SDMFSlotWriteProxy(0,
7522+                                   self.rref,
7523+                                   "si1",
7524+                                   self.secrets,
7525+                                   1, 3, 10, 36, 36)
7526+        sdmfw.put_block(self.blockdata, 0, self.salt)
7527+
7528+        # Put the encprivkey
7529+        sdmfw.put_encprivkey(self.encprivkey)
7530+
7531+        # Put the block and share hash chains
7532+        sdmfw.put_blockhashes(self.block_hash_tree)
7533+        sdmfw.put_sharehashes(self.share_hash_chain)
7534+
7535+        # Put the root hash
7536+        sdmfw.put_root_hash(self.root_hash)
7537+
7538+        # Put the signature
7539+        sdmfw.put_signature(self.signature)
7540+
7541+        # Put the verification key
7542+        sdmfw.put_verification_key(self.verification_key)
7543+
7544+        # We shouldn't have a checkstring yet
7545+        self.failUnlessEqual(sdmfw.get_checkstring(), "")
7546+
7547+        d = sdmfw.finish_publishing()
7548+        def _then(results):
7549+            self.failIf(results[0])
7550+            # this is the correct checkstring
7551+            self._expected_checkstring = results[1][0][0]
7552+            return self._expected_checkstring
7553+
7554+        d.addCallback(_then)
7555+        d.addCallback(sdmfw.set_checkstring)
7556+        d.addCallback(lambda ignored:
7557+            sdmfw.get_checkstring())
7558+        d.addCallback(lambda checkstring:
7559+            self.failUnlessEqual(checkstring, self._expected_checkstring))
7560+        d.addCallback(lambda ignored:
7561+            sdmfw.finish_publishing())
7562+        def _then_again(results):
7563+            self.failUnless(results[0])
7564+            read = self.ss.remote_slot_readv
7565+            self.failUnlessEqual(read("si1", [0], [(1, 8)]),
7566+                                 {0: [struct.pack(">Q", 1)]})
7567+            self.failUnlessEqual(read("si1", [0], [(9, len(data) - 9)]),
7568+                                 {0: [data[9:]]})
7569+        d.addCallback(_then_again)
7570+        return d
7571+
7572+
7573 class Stats(unittest.TestCase):
7574 
7575     def setUp(self):
7576}
7577[mutable/retrieve.py: Modify the retrieval process to support MDMF
7578Kevan Carstensen <kevan@isnotajoke.com>**20100819003409
7579 Ignore-this: c03f4e41aaa0366a9bf44847f2caf9db
7580 
7581 The logic behind a mutable file download had to be adapted to work with
7582 segmented mutable files; this patch performs those adaptations. It also
7583 exposes some decoding and decrypting functionality to make partial-file
7584 updates a little easier, and supports efficient random-access downloads
7585 of parts of an MDMF file.
7586] {
7587hunk ./src/allmydata/mutable/retrieve.py 2
7588 
7589-import struct, time
7590+import time
7591 from itertools import count
7592 from zope.interface import implements
7593 from twisted.internet import defer
7594merger 0.0 (
7595hunk ./src/allmydata/mutable/retrieve.py 10
7596+from allmydata.util.dictutil import DictOfSets
7597hunk ./src/allmydata/mutable/retrieve.py 7
7598-from foolscap.api import DeadReferenceError, eventually, fireEventually
7599-from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError
7600-from allmydata.util import hashutil, idlib, log
7601+from twisted.internet.interfaces import IPushProducer, IConsumer
7602+from foolscap.api import eventually, fireEventually
7603+from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError, \
7604+                                 MDMF_VERSION, SDMF_VERSION
7605+from allmydata.util import hashutil, log, mathutil
7606)
7607hunk ./src/allmydata/mutable/retrieve.py 16
7608 from pycryptopp.publickey import rsa
7609 
7610 from allmydata.mutable.common import CorruptShareError, UncoordinatedWriteError
7611-from allmydata.mutable.layout import SIGNED_PREFIX, unpack_share_data
7612+from allmydata.mutable.layout import MDMFSlotReadProxy
7613 
7614 class RetrieveStatus:
7615     implements(IRetrieveStatus)
7616hunk ./src/allmydata/mutable/retrieve.py 83
7617     # times, and each will have a separate response chain. However the
7618     # Retrieve object will remain tied to a specific version of the file, and
7619     # will use a single ServerMap instance.
7620+    implements(IPushProducer)
7621 
7622hunk ./src/allmydata/mutable/retrieve.py 85
7623-    def __init__(self, filenode, servermap, verinfo, fetch_privkey=False):
7624+    def __init__(self, filenode, servermap, verinfo, fetch_privkey=False,
7625+                 verify=False):
7626         self._node = filenode
7627         assert self._node.get_pubkey()
7628         self._storage_index = filenode.get_storage_index()
7629hunk ./src/allmydata/mutable/retrieve.py 104
7630         self.verinfo = verinfo
7631         # during repair, we may be called upon to grab the private key, since
7632         # it wasn't picked up during a verify=False checker run, and we'll
7633-        # need it for repair to generate the a new version.
7634-        self._need_privkey = fetch_privkey
7635-        if self._node.get_privkey():
7636+        # need it for repair to generate a new version.
7637+        self._need_privkey = fetch_privkey or verify
7638+        if self._node.get_privkey() and not verify:
7639             self._need_privkey = False
7640 
7641hunk ./src/allmydata/mutable/retrieve.py 109
7642+        if self._need_privkey:
7643+            # TODO: Evaluate the need for this. We'll use it if we want
7644+            # to limit how many queries are on the wire for the privkey
7645+            # at once.
7646+            self._privkey_query_markers = [] # one Marker for each time we've
7647+                                             # tried to get the privkey.
7648+
7649+        # verify means that we are using the downloader logic to verify all
7650+        # of our shares. This tells the downloader a few things.
7651+        #
7652+        # 1. We need to download all of the shares.
7653+        # 2. We don't need to decode or decrypt the shares, since our
7654+        #    caller doesn't care about the plaintext, only the
7655+        #    information about which shares are or are not valid.
7656+        # 3. When we are validating readers, we need to validate the
7657+        #    signature on the prefix. Do we? We already do this in the
7658+        #    servermap update?
7659+        self._verify = False
7660+        if verify:
7661+            self._verify = True
7662+
7663         self._status = RetrieveStatus()
7664         self._status.set_storage_index(self._storage_index)
7665         self._status.set_helper(False)
7666hunk ./src/allmydata/mutable/retrieve.py 139
7667          offsets_tuple) = self.verinfo
7668         self._status.set_size(datalength)
7669         self._status.set_encoding(k, N)
7670+        self.readers = {}
7671+        self._paused = False
7672+        self._paused_deferred = None
7673+        self._offset = None
7674+        self._read_length = None
7675+        self.log("got seqnum %d" % self.verinfo[0])
7676+
7677 
7678     def get_status(self):
7679         return self._status
7680hunk ./src/allmydata/mutable/retrieve.py 157
7681             kwargs["facility"] = "tahoe.mutable.retrieve"
7682         return log.msg(*args, **kwargs)
7683 
7684-    def download(self):
7685+
7686+    ###################
7687+    # IPushProducer
7688+
7689+    def pauseProducing(self):
7690+        """
7691+        I am called by my download target if we have produced too much
7692+        data for it to handle. I make the downloader stop producing new
7693+        data until my resumeProducing method is called.
7694+        """
7695+        if self._paused:
7696+            return
7697+
7698+        # fired when the download is unpaused.
7699+        self._old_status = self._status.get_status()
7700+        self._status.set_status("Paused")
7701+
7702+        self._pause_deferred = defer.Deferred()
7703+        self._paused = True
7704+
7705+
7706+    def resumeProducing(self):
7707+        """
7708+        I am called by my download target once it is ready to begin
7709+        receiving data again.
7710+        """
7711+        if not self._paused:
7712+            return
7713+
7714+        self._paused = False
7715+        p = self._pause_deferred
7716+        self._pause_deferred = None
7717+        self._status.set_status(self._old_status)
7718+
7719+        eventually(p.callback, None)
7720+
7721+
7722+    def _check_for_paused(self, res):
7723+        """
7724+        I am called just before a write to the consumer. I return a
7725+        Deferred that eventually fires with the data that is to be
7726+        written to the consumer. If the download has not been paused,
7727+        the Deferred fires immediately. Otherwise, the Deferred fires
7728+        when the downloader is unpaused.
7729+        """
7730+        if self._paused:
7731+            d = defer.Deferred()
7732+            self._pause_defered.addCallback(lambda ignored: d.callback(res))
7733+            return d
7734+        return defer.succeed(res)
7735+
7736+
7737+    def download(self, consumer=None, offset=0, size=None):
7738+        assert IConsumer.providedBy(consumer) or self._verify
7739+
7740+        if consumer:
7741+            self._consumer = consumer
7742+            # we provide IPushProducer, so streaming=True, per
7743+            # IConsumer.
7744+            self._consumer.registerProducer(self, streaming=True)
7745+
7746         self._done_deferred = defer.Deferred()
7747         self._started = time.time()
7748         self._status.set_status("Retrieving Shares")
7749hunk ./src/allmydata/mutable/retrieve.py 222
7750 
7751+        self._offset = offset
7752+        self._read_length = size
7753+
7754         # first, which servers can we use?
7755         versionmap = self.servermap.make_versionmap()
7756         shares = versionmap[self.verinfo]
7757hunk ./src/allmydata/mutable/retrieve.py 232
7758         self.remaining_sharemap = DictOfSets()
7759         for (shnum, peerid, timestamp) in shares:
7760             self.remaining_sharemap.add(shnum, peerid)
7761+            # If the servermap update fetched anything, it fetched at least 1
7762+            # KiB, so we ask for that much.
7763+            # TODO: Change the cache methods to allow us to fetch all of the
7764+            # data that they have, then change this method to do that.
7765+            any_cache, timestamp = self._node._read_from_cache(self.verinfo,
7766+                                                               shnum,
7767+                                                               0,
7768+                                                               1000)
7769+            ss = self.servermap.connections[peerid]
7770+            reader = MDMFSlotReadProxy(ss,
7771+                                       self._storage_index,
7772+                                       shnum,
7773+                                       any_cache)
7774+            reader.peerid = peerid
7775+            self.readers[shnum] = reader
7776+
7777 
7778         self.shares = {} # maps shnum to validated blocks
7779hunk ./src/allmydata/mutable/retrieve.py 250
7780+        self._active_readers = [] # list of active readers for this dl.
7781+        self._validated_readers = set() # set of readers that we have
7782+                                        # validated the prefix of
7783+        self._block_hash_trees = {} # shnum => hashtree
7784 
7785         # how many shares do we need?
7786hunk ./src/allmydata/mutable/retrieve.py 256
7787-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
7788+        (seqnum,
7789+         root_hash,
7790+         IV,
7791+         segsize,
7792+         datalength,
7793+         k,
7794+         N,
7795+         prefix,
7796          offsets_tuple) = self.verinfo
7797hunk ./src/allmydata/mutable/retrieve.py 265
7798-        assert len(self.remaining_sharemap) >= k
7799-        # we start with the lowest shnums we have available, since FEC is
7800-        # faster if we're using "primary shares"
7801-        self.active_shnums = set(sorted(self.remaining_sharemap.keys())[:k])
7802-        for shnum in self.active_shnums:
7803-            # we use an arbitrary peer who has the share. If shares are
7804-            # doubled up (more than one share per peer), we could make this
7805-            # run faster by spreading the load among multiple peers. But the
7806-            # algorithm to do that is more complicated than I want to write
7807-            # right now, and a well-provisioned grid shouldn't have multiple
7808-            # shares per peer.
7809-            peerid = list(self.remaining_sharemap[shnum])[0]
7810-            self.get_data(shnum, peerid)
7811 
7812hunk ./src/allmydata/mutable/retrieve.py 266
7813-        # control flow beyond this point: state machine. Receiving responses
7814-        # from queries is the input. We might send out more queries, or we
7815-        # might produce a result.
7816 
7817hunk ./src/allmydata/mutable/retrieve.py 267
7818+        # We need one share hash tree for the entire file; its leaves
7819+        # are the roots of the block hash trees for the shares that
7820+        # comprise it, and its root is in the verinfo.
7821+        self.share_hash_tree = hashtree.IncompleteHashTree(N)
7822+        self.share_hash_tree.set_hashes({0: root_hash})
7823+
7824+        # This will set up both the segment decoder and the tail segment
7825+        # decoder, as well as a variety of other instance variables that
7826+        # the download process will use.
7827+        self._setup_encoding_parameters()
7828+        assert len(self.remaining_sharemap) >= k
7829+
7830+        self.log("starting download")
7831+        self._paused = False
7832+        self._started_fetching = time.time()
7833+
7834+        self._add_active_peers()
7835+        # The download process beyond this is a state machine.
7836+        # _add_active_peers will select the peers that we want to use
7837+        # for the download, and then attempt to start downloading. After
7838+        # each segment, it will check for doneness, reacting to broken
7839+        # peers and corrupt shares as necessary. If it runs out of good
7840+        # peers before downloading all of the segments, _done_deferred
7841+        # will errback.  Otherwise, it will eventually callback with the
7842+        # contents of the mutable file.
7843         return self._done_deferred
7844 
7845hunk ./src/allmydata/mutable/retrieve.py 294
7846-    def get_data(self, shnum, peerid):
7847-        self.log(format="sending sh#%(shnum)d request to [%(peerid)s]",
7848-                 shnum=shnum,
7849-                 peerid=idlib.shortnodeid_b2a(peerid),
7850-                 level=log.NOISY)
7851-        ss = self.servermap.connections[peerid]
7852-        started = time.time()
7853-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
7854+
7855+    def decode(self, blocks_and_salts, segnum):
7856+        """
7857+        I am a helper method that the mutable file update process uses
7858+        as a shortcut to decode and decrypt the segments that it needs
7859+        to fetch in order to perform a file update. I take in a
7860+        collection of blocks and salts, and pick some of those to make a
7861+        segment with. I return the plaintext associated with that
7862+        segment.
7863+        """
7864+        # shnum => block hash tree. Unusued, but setup_encoding_parameters will
7865+        # want to set this.
7866+        # XXX: Make it so that it won't set this if we're just decoding.
7867+        self._block_hash_trees = {}
7868+        self._setup_encoding_parameters()
7869+        # This is the form expected by decode.
7870+        blocks_and_salts = blocks_and_salts.items()
7871+        blocks_and_salts = [(True, [d]) for d in blocks_and_salts]
7872+
7873+        d = self._decode_blocks(blocks_and_salts, segnum)
7874+        d.addCallback(self._decrypt_segment)
7875+        return d
7876+
7877+
7878+    def _setup_encoding_parameters(self):
7879+        """
7880+        I set up the encoding parameters, including k, n, the number
7881+        of segments associated with this file, and the segment decoder.
7882+        """
7883+        (seqnum,
7884+         root_hash,
7885+         IV,
7886+         segsize,
7887+         datalength,
7888+         k,
7889+         n,
7890+         known_prefix,
7891          offsets_tuple) = self.verinfo
7892hunk ./src/allmydata/mutable/retrieve.py 332
7893-        offsets = dict(offsets_tuple)
7894+        self._required_shares = k
7895+        self._total_shares = n
7896+        self._segment_size = segsize
7897+        self._data_length = datalength
7898 
7899hunk ./src/allmydata/mutable/retrieve.py 337
7900-        # we read the checkstring, to make sure that the data we grab is from
7901-        # the right version.
7902-        readv = [ (0, struct.calcsize(SIGNED_PREFIX)) ]
7903+        if not IV:
7904+            self._version = MDMF_VERSION
7905+        else:
7906+            self._version = SDMF_VERSION
7907 
7908hunk ./src/allmydata/mutable/retrieve.py 342
7909-        # We also read the data, and the hashes necessary to validate them
7910-        # (share_hash_chain, block_hash_tree, share_data). We don't read the
7911-        # signature or the pubkey, since that was handled during the
7912-        # servermap phase, and we'll be comparing the share hash chain
7913-        # against the roothash that was validated back then.
7914+        if datalength and segsize:
7915+            self._num_segments = mathutil.div_ceil(datalength, segsize)
7916+            self._tail_data_size = datalength % segsize
7917+        else:
7918+            self._num_segments = 0
7919+            self._tail_data_size = 0
7920 
7921hunk ./src/allmydata/mutable/retrieve.py 349
7922-        readv.append( (offsets['share_hash_chain'],
7923-                       offsets['enc_privkey'] - offsets['share_hash_chain'] ) )
7924+        self._segment_decoder = codec.CRSDecoder()
7925+        self._segment_decoder.set_params(segsize, k, n)
7926 
7927hunk ./src/allmydata/mutable/retrieve.py 352
7928-        # if we need the private key (for repair), we also fetch that
7929-        if self._need_privkey:
7930-            readv.append( (offsets['enc_privkey'],
7931-                           offsets['EOF'] - offsets['enc_privkey']) )
7932+        if  not self._tail_data_size:
7933+            self._tail_data_size = segsize
7934+
7935+        self._tail_segment_size = mathutil.next_multiple(self._tail_data_size,
7936+                                                         self._required_shares)
7937+        if self._tail_segment_size == self._segment_size:
7938+            self._tail_decoder = self._segment_decoder
7939+        else:
7940+            self._tail_decoder = codec.CRSDecoder()
7941+            self._tail_decoder.set_params(self._tail_segment_size,
7942+                                          self._required_shares,
7943+                                          self._total_shares)
7944 
7945hunk ./src/allmydata/mutable/retrieve.py 365
7946-        m = Marker()
7947-        self._outstanding_queries[m] = (peerid, shnum, started)
7948+        self.log("got encoding parameters: "
7949+                 "k: %d "
7950+                 "n: %d "
7951+                 "%d segments of %d bytes each (%d byte tail segment)" % \
7952+                 (k, n, self._num_segments, self._segment_size,
7953+                  self._tail_segment_size))
7954 
7955         # ask the cache first
7956         got_from_cache = False
7957merger 0.0 (
7958hunk ./src/allmydata/mutable/retrieve.py 376
7959-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
7960-                                                            offset, length)
7961+            data = self._node._read_from_cache(self.verinfo, shnum, offset, length)
7962hunk ./src/allmydata/mutable/retrieve.py 372
7963-        # ask the cache first
7964-        got_from_cache = False
7965-        datavs = []
7966-        for (offset, length) in readv:
7967-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
7968-                                                            offset, length)
7969-            if data is not None:
7970-                datavs.append(data)
7971-        if len(datavs) == len(readv):
7972-            self.log("got data from cache")
7973-            got_from_cache = True
7974-            d = fireEventually({shnum: datavs})
7975-            # datavs is a dict mapping shnum to a pair of strings
7976+        for i in xrange(self._total_shares):
7977+            # So we don't have to do this later.
7978+            self._block_hash_trees[i] = hashtree.IncompleteHashTree(self._num_segments)
7979+
7980+        # Our last task is to tell the downloader where to start and
7981+        # where to stop. We use three parameters for that:
7982+        #   - self._start_segment: the segment that we need to start
7983+        #     downloading from.
7984+        #   - self._current_segment: the next segment that we need to
7985+        #     download.
7986+        #   - self._last_segment: The last segment that we were asked to
7987+        #     download.
7988+        #
7989+        #  We say that the download is complete when
7990+        #  self._current_segment > self._last_segment. We use
7991+        #  self._start_segment and self._last_segment to know when to
7992+        #  strip things off of segments, and how much to strip.
7993+        if self._offset:
7994+            self.log("got offset: %d" % self._offset)
7995+            # our start segment is the first segment containing the
7996+            # offset we were given.
7997+            start = mathutil.div_ceil(self._offset,
7998+                                      self._segment_size)
7999+            # this gets us the first segment after self._offset. Then
8000+            # our start segment is the one before it.
8001+            start -= 1
8002+
8003+            assert start < self._num_segments
8004+            self._start_segment = start
8005+            self.log("got start segment: %d" % self._start_segment)
8006)
8007hunk ./src/allmydata/mutable/retrieve.py 386
8008             d = fireEventually({shnum: datavs})
8009             # datavs is a dict mapping shnum to a pair of strings
8010         else:
8011-            d = self._do_read(ss, peerid, self._storage_index, [shnum], readv)
8012-        self.remaining_sharemap.discard(shnum, peerid)
8013+            self._start_segment = 0
8014 
8015hunk ./src/allmydata/mutable/retrieve.py 388
8016-        d.addCallback(self._got_results, m, peerid, started, got_from_cache)
8017-        d.addErrback(self._query_failed, m, peerid)
8018-        # errors that aren't handled by _query_failed (and errors caused by
8019-        # _query_failed) get logged, but we still want to check for doneness.
8020-        def _oops(f):
8021-            self.log(format="problem in _query_failed for sh#%(shnum)d to %(peerid)s",
8022-                     shnum=shnum,
8023-                     peerid=idlib.shortnodeid_b2a(peerid),
8024-                     failure=f,
8025-                     level=log.WEIRD, umid="W0xnQA")
8026-        d.addErrback(_oops)
8027-        d.addBoth(self._check_for_done)
8028-        # any error during _check_for_done means the download fails. If the
8029-        # download is successful, _check_for_done will fire _done by itself.
8030-        d.addErrback(self._done)
8031-        d.addErrback(log.err)
8032-        return d # purely for testing convenience
8033 
8034hunk ./src/allmydata/mutable/retrieve.py 389
8035-    def _do_read(self, ss, peerid, storage_index, shnums, readv):
8036-        # isolate the callRemote to a separate method, so tests can subclass
8037-        # Publish and override it
8038-        d = ss.callRemote("slot_readv", storage_index, shnums, readv)
8039-        return d
8040+        if self._read_length:
8041+            # our end segment is the last segment containing part of the
8042+            # segment that we were asked to read.
8043+            self.log("got read length %d" % self._read_length)
8044+            end_data = self._offset + self._read_length
8045+            end = mathutil.div_ceil(end_data,
8046+                                    self._segment_size)
8047+            end -= 1
8048+            assert end < self._num_segments
8049+            self._last_segment = end
8050+            self.log("got end segment: %d" % self._last_segment)
8051+        else:
8052+            self._last_segment = self._num_segments - 1
8053 
8054hunk ./src/allmydata/mutable/retrieve.py 403
8055-    def remove_peer(self, peerid):
8056-        for shnum in list(self.remaining_sharemap.keys()):
8057-            self.remaining_sharemap.discard(shnum, peerid)
8058+        self._current_segment = self._start_segment
8059 
8060hunk ./src/allmydata/mutable/retrieve.py 405
8061-    def _got_results(self, datavs, marker, peerid, started, got_from_cache):
8062-        now = time.time()
8063-        elapsed = now - started
8064-        if not got_from_cache:
8065-            self._status.add_fetch_timing(peerid, elapsed)
8066-        self.log(format="got results (%(shares)d shares) from [%(peerid)s]",
8067-                 shares=len(datavs),
8068-                 peerid=idlib.shortnodeid_b2a(peerid),
8069-                 level=log.NOISY)
8070-        self._outstanding_queries.pop(marker, None)
8071-        if not self._running:
8072-            return
8073+    def _add_active_peers(self):
8074+        """
8075+        I populate self._active_readers with enough active readers to
8076+        retrieve the contents of this mutable file. I am called before
8077+        downloading starts, and (eventually) after each validation
8078+        error, connection error, or other problem in the download.
8079+        """
8080+        # TODO: It would be cool to investigate other heuristics for
8081+        # reader selection. For instance, the cost (in time the user
8082+        # spends waiting for their file) of selecting a really slow peer
8083+        # that happens to have a primary share is probably more than
8084+        # selecting a really fast peer that doesn't have a primary
8085+        # share. Maybe the servermap could be extended to provide this
8086+        # information; it could keep track of latency information while
8087+        # it gathers more important data, and then this routine could
8088+        # use that to select active readers.
8089+        #
8090+        # (these and other questions would be easier to answer with a
8091+        #  robust, configurable tahoe-lafs simulator, which modeled node
8092+        #  failures, differences in node speed, and other characteristics
8093+        #  that we expect storage servers to have.  You could have
8094+        #  presets for really stable grids (like allmydata.com),
8095+        #  friendnets, make it easy to configure your own settings, and
8096+        #  then simulate the effect of big changes on these use cases
8097+        #  instead of just reasoning about what the effect might be. Out
8098+        #  of scope for MDMF, though.)
8099 
8100hunk ./src/allmydata/mutable/retrieve.py 432
8101-        # note that we only ask for a single share per query, so we only
8102-        # expect a single share back. On the other hand, we use the extra
8103-        # shares if we get them.. seems better than an assert().
8104+        # We need at least self._required_shares readers to download a
8105+        # segment.
8106+        if self._verify:
8107+            needed = self._total_shares
8108+        else:
8109+            needed = self._required_shares - len(self._active_readers)
8110+        # XXX: Why don't format= log messages work here?
8111+        self.log("adding %d peers to the active peers list" % needed)
8112 
8113hunk ./src/allmydata/mutable/retrieve.py 441
8114-        for shnum,datav in datavs.items():
8115-            (prefix, hash_and_data) = datav[:2]
8116-            try:
8117-                self._got_results_one_share(shnum, peerid,
8118-                                            prefix, hash_and_data)
8119-            except CorruptShareError, e:
8120-                # log it and give the other shares a chance to be processed
8121-                f = failure.Failure()
8122-                self.log(format="bad share: %(f_value)s",
8123-                         f_value=str(f.value), failure=f,
8124-                         level=log.WEIRD, umid="7fzWZw")
8125-                self.notify_server_corruption(peerid, shnum, str(e))
8126-                self.remove_peer(peerid)
8127-                self.servermap.mark_bad_share(peerid, shnum, prefix)
8128-                self._bad_shares.add( (peerid, shnum) )
8129-                self._status.problems[peerid] = f
8130-                self._last_failure = f
8131-                pass
8132-            if self._need_privkey and len(datav) > 2:
8133-                lp = None
8134-                self._try_to_validate_privkey(datav[2], peerid, shnum, lp)
8135-        # all done!
8136+        # We favor lower numbered shares, since FEC is faster with
8137+        # primary shares than with other shares, and lower-numbered
8138+        # shares are more likely to be primary than higher numbered
8139+        # shares.
8140+        active_shnums = set(sorted(self.remaining_sharemap.keys()))
8141+        # We shouldn't consider adding shares that we already have; this
8142+        # will cause problems later.
8143+        active_shnums -= set([reader.shnum for reader in self._active_readers])
8144+        active_shnums = list(active_shnums)[:needed]
8145+        if len(active_shnums) < needed and not self._verify:
8146+            # We don't have enough readers to retrieve the file; fail.
8147+            return self._failed()
8148 
8149hunk ./src/allmydata/mutable/retrieve.py 454
8150-    def notify_server_corruption(self, peerid, shnum, reason):
8151-        ss = self.servermap.connections[peerid]
8152-        ss.callRemoteOnly("advise_corrupt_share",
8153-                          "mutable", self._storage_index, shnum, reason)
8154+        for shnum in active_shnums:
8155+            self._active_readers.append(self.readers[shnum])
8156+            self.log("added reader for share %d" % shnum)
8157+        assert len(self._active_readers) >= self._required_shares
8158+        # Conceptually, this is part of the _add_active_peers step. It
8159+        # validates the prefixes of newly added readers to make sure
8160+        # that they match what we are expecting for self.verinfo. If
8161+        # validation is successful, _validate_active_prefixes will call
8162+        # _download_current_segment for us. If validation is
8163+        # unsuccessful, then _validate_prefixes will remove the peer and
8164+        # call _add_active_peers again, where we will attempt to rectify
8165+        # the problem by choosing another peer.
8166+        return self._validate_active_prefixes()
8167 
8168hunk ./src/allmydata/mutable/retrieve.py 468
8169-    def _got_results_one_share(self, shnum, peerid,
8170-                               got_prefix, got_hash_and_data):
8171-        self.log("_got_results: got shnum #%d from peerid %s"
8172-                 % (shnum, idlib.shortnodeid_b2a(peerid)))
8173-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8174-         offsets_tuple) = self.verinfo
8175-        assert len(got_prefix) == len(prefix), (len(got_prefix), len(prefix))
8176-        if got_prefix != prefix:
8177-            msg = "someone wrote to the data since we read the servermap: prefix changed"
8178-            raise UncoordinatedWriteError(msg)
8179-        (share_hash_chain, block_hash_tree,
8180-         share_data) = unpack_share_data(self.verinfo, got_hash_and_data)
8181 
8182hunk ./src/allmydata/mutable/retrieve.py 469
8183-        assert isinstance(share_data, str)
8184-        # build the block hash tree. SDMF has only one leaf.
8185-        leaves = [hashutil.block_hash(share_data)]
8186-        t = hashtree.HashTree(leaves)
8187-        if list(t) != block_hash_tree:
8188-            raise CorruptShareError(peerid, shnum, "block hash tree failure")
8189-        share_hash_leaf = t[0]
8190-        t2 = hashtree.IncompleteHashTree(N)
8191-        # root_hash was checked by the signature
8192-        t2.set_hashes({0: root_hash})
8193-        try:
8194-            t2.set_hashes(hashes=share_hash_chain,
8195-                          leaves={shnum: share_hash_leaf})
8196-        except (hashtree.BadHashError, hashtree.NotEnoughHashesError,
8197-                IndexError), e:
8198-            msg = "corrupt hashes: %s" % (e,)
8199-            raise CorruptShareError(peerid, shnum, msg)
8200-        self.log(" data valid! len=%d" % len(share_data))
8201-        # each query comes down to this: placing validated share data into
8202-        # self.shares
8203-        self.shares[shnum] = share_data
8204+    def _validate_active_prefixes(self):
8205+        """
8206+        I check to make sure that the prefixes on the peers that I am
8207+        currently reading from match the prefix that we want to see, as
8208+        said in self.verinfo.
8209 
8210hunk ./src/allmydata/mutable/retrieve.py 475
8211-    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
8212+        If I find that all of the active peers have acceptable prefixes,
8213+        I pass control to _download_current_segment, which will use
8214+        those peers to do cool things. If I find that some of the active
8215+        peers have unacceptable prefixes, I will remove them from active
8216+        peers (and from further consideration) and call
8217+        _add_active_peers to attempt to rectify the situation. I keep
8218+        track of which peers I have already validated so that I don't
8219+        need to do so again.
8220+        """
8221+        assert self._active_readers, "No more active readers"
8222 
8223hunk ./src/allmydata/mutable/retrieve.py 486
8224-        alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
8225-        alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
8226-        if alleged_writekey != self._node.get_writekey():
8227-            self.log("invalid privkey from %s shnum %d" %
8228-                     (idlib.nodeid_b2a(peerid)[:8], shnum),
8229-                     parent=lp, level=log.WEIRD, umid="YIw4tA")
8230-            return
8231+        ds = []
8232+        new_readers = set(self._active_readers) - self._validated_readers
8233+        self.log('validating %d newly-added active readers' % len(new_readers))
8234 
8235hunk ./src/allmydata/mutable/retrieve.py 490
8236-        # it's good
8237-        self.log("got valid privkey from shnum %d on peerid %s" %
8238-                 (shnum, idlib.shortnodeid_b2a(peerid)),
8239-                 parent=lp)
8240-        privkey = rsa.create_signing_key_from_string(alleged_privkey_s)
8241-        self._node._populate_encprivkey(enc_privkey)
8242-        self._node._populate_privkey(privkey)
8243-        self._need_privkey = False
8244+        for reader in new_readers:
8245+            # We force a remote read here -- otherwise, we are relying
8246+            # on cached data that we already verified as valid, and we
8247+            # won't detect an uncoordinated write that has occurred
8248+            # since the last servermap update.
8249+            d = reader.get_prefix(force_remote=True)
8250+            d.addCallback(self._try_to_validate_prefix, reader)
8251+            ds.append(d)
8252+        dl = defer.DeferredList(ds, consumeErrors=True)
8253+        def _check_results(results):
8254+            # Each result in results will be of the form (success, msg).
8255+            # We don't care about msg, but success will tell us whether
8256+            # or not the checkstring validated. If it didn't, we need to
8257+            # remove the offending (peer,share) from our active readers,
8258+            # and ensure that active readers is again populated.
8259+            bad_readers = []
8260+            for i, result in enumerate(results):
8261+                if not result[0]:
8262+                    reader = self._active_readers[i]
8263+                    f = result[1]
8264+                    assert isinstance(f, failure.Failure)
8265 
8266hunk ./src/allmydata/mutable/retrieve.py 512
8267-    def _query_failed(self, f, marker, peerid):
8268-        self.log(format="query to [%(peerid)s] failed",
8269-                 peerid=idlib.shortnodeid_b2a(peerid),
8270-                 level=log.NOISY)
8271-        self._status.problems[peerid] = f
8272-        self._outstanding_queries.pop(marker, None)
8273-        if not self._running:
8274-            return
8275-        self._last_failure = f
8276-        self.remove_peer(peerid)
8277-        level = log.WEIRD
8278-        if f.check(DeadReferenceError):
8279-            level = log.UNUSUAL
8280-        self.log(format="error during query: %(f_value)s",
8281-                 f_value=str(f.value), failure=f, level=level, umid="gOJB5g")
8282+                    self.log("The reader %s failed to "
8283+                             "properly validate: %s" % \
8284+                             (reader, str(f.value)))
8285+                    bad_readers.append((reader, f))
8286+                else:
8287+                    reader = self._active_readers[i]
8288+                    self.log("the reader %s checks out, so we'll use it" % \
8289+                             reader)
8290+                    self._validated_readers.add(reader)
8291+                    # Each time we validate a reader, we check to see if
8292+                    # we need the private key. If we do, we politely ask
8293+                    # for it and then continue computing. If we find
8294+                    # that we haven't gotten it at the end of
8295+                    # segment decoding, then we'll take more drastic
8296+                    # measures.
8297+                    if self._need_privkey and not self._node.is_readonly():
8298+                        d = reader.get_encprivkey()
8299+                        d.addCallback(self._try_to_validate_privkey, reader)
8300+            if bad_readers:
8301+                # We do them all at once, or else we screw up list indexing.
8302+                for (reader, f) in bad_readers:
8303+                    self._mark_bad_share(reader, f)
8304+                if self._verify:
8305+                    if len(self._active_readers) >= self._required_shares:
8306+                        return self._download_current_segment()
8307+                    else:
8308+                        return self._failed()
8309+                else:
8310+                    return self._add_active_peers()
8311+            else:
8312+                return self._download_current_segment()
8313+            # The next step will assert that it has enough active
8314+            # readers to fetch shares; we just need to remove it.
8315+        dl.addCallback(_check_results)
8316+        return dl
8317 
8318hunk ./src/allmydata/mutable/retrieve.py 548
8319-    def _check_for_done(self, res):
8320-        # exit paths:
8321-        #  return : keep waiting, no new queries
8322-        #  return self._send_more_queries(outstanding) : send some more queries
8323-        #  fire self._done(plaintext) : download successful
8324-        #  raise exception : download fails
8325 
8326hunk ./src/allmydata/mutable/retrieve.py 549
8327-        self.log(format="_check_for_done: running=%(running)s, decoding=%(decoding)s",
8328-                 running=self._running, decoding=self._decoding,
8329-                 level=log.NOISY)
8330-        if not self._running:
8331-            return
8332-        if self._decoding:
8333-            return
8334-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8335+    def _try_to_validate_prefix(self, prefix, reader):
8336+        """
8337+        I check that the prefix returned by a candidate server for
8338+        retrieval matches the prefix that the servermap knows about
8339+        (and, hence, the prefix that was validated earlier). If it does,
8340+        I return True, which means that I approve of the use of the
8341+        candidate server for segment retrieval. If it doesn't, I return
8342+        False, which means that another server must be chosen.
8343+        """
8344+        (seqnum,
8345+         root_hash,
8346+         IV,
8347+         segsize,
8348+         datalength,
8349+         k,
8350+         N,
8351+         known_prefix,
8352          offsets_tuple) = self.verinfo
8353hunk ./src/allmydata/mutable/retrieve.py 567
8354+        if known_prefix != prefix:
8355+            self.log("prefix from share %d doesn't match" % reader.shnum)
8356+            raise UncoordinatedWriteError("Mismatched prefix -- this could "
8357+                                          "indicate an uncoordinated write")
8358+        # Otherwise, we're okay -- no issues.
8359 
8360hunk ./src/allmydata/mutable/retrieve.py 573
8361-        if len(self.shares) < k:
8362-            # we don't have enough shares yet
8363-            return self._maybe_send_more_queries(k)
8364-        if self._need_privkey:
8365-            # we got k shares, but none of them had a valid privkey. TODO:
8366-            # look further. Adding code to do this is a bit complicated, and
8367-            # I want to avoid that complication, and this should be pretty
8368-            # rare (k shares with bitflips in the enc_privkey but not in the
8369-            # data blocks). If we actually do get here, the subsequent repair
8370-            # will fail for lack of a privkey.
8371-            self.log("got k shares but still need_privkey, bummer",
8372-                     level=log.WEIRD, umid="MdRHPA")
8373 
8374hunk ./src/allmydata/mutable/retrieve.py 574
8375-        # we have enough to finish. All the shares have had their hashes
8376-        # checked, so if something fails at this point, we don't know how
8377-        # to fix it, so the download will fail.
8378+    def _remove_reader(self, reader):
8379+        """
8380+        At various points, we will wish to remove a peer from
8381+        consideration and/or use. These include, but are not necessarily
8382+        limited to:
8383 
8384hunk ./src/allmydata/mutable/retrieve.py 580
8385-        self._decoding = True # avoid reentrancy
8386-        self._status.set_status("decoding")
8387-        now = time.time()
8388-        elapsed = now - self._started
8389-        self._status.timings["fetch"] = elapsed
8390+            - A connection error.
8391+            - A mismatched prefix (that is, a prefix that does not match
8392+              our conception of the version information string).
8393+            - A failing block hash, salt hash, or share hash, which can
8394+              indicate disk failure/bit flips, or network trouble.
8395 
8396hunk ./src/allmydata/mutable/retrieve.py 586
8397-        d = defer.maybeDeferred(self._decode)
8398-        d.addCallback(self._decrypt, IV, self._node.get_readkey())
8399-        d.addBoth(self._done)
8400-        return d # purely for test convenience
8401+        This method will do that. I will make sure that the
8402+        (shnum,reader) combination represented by my reader argument is
8403+        not used for anything else during this download. I will not
8404+        advise the reader of any corruption, something that my callers
8405+        may wish to do on their own.
8406+        """
8407+        # TODO: When you're done writing this, see if this is ever
8408+        # actually used for something that _mark_bad_share isn't. I have
8409+        # a feeling that they will be used for very similar things, and
8410+        # that having them both here is just going to be an epic amount
8411+        # of code duplication.
8412+        #
8413+        # (well, okay, not epic, but meaningful)
8414+        self.log("removing reader %s" % reader)
8415+        # Remove the reader from _active_readers
8416+        self._active_readers.remove(reader)
8417+        # TODO: self.readers.remove(reader)?
8418+        for shnum in list(self.remaining_sharemap.keys()):
8419+            self.remaining_sharemap.discard(shnum, reader.peerid)
8420 
8421hunk ./src/allmydata/mutable/retrieve.py 606
8422-    def _maybe_send_more_queries(self, k):
8423-        # we don't have enough shares yet. Should we send out more queries?
8424-        # There are some number of queries outstanding, each for a single
8425-        # share. If we can generate 'needed_shares' additional queries, we do
8426-        # so. If we can't, then we know this file is a goner, and we raise
8427-        # NotEnoughSharesError.
8428-        self.log(format=("_maybe_send_more_queries, have=%(have)d, k=%(k)d, "
8429-                         "outstanding=%(outstanding)d"),
8430-                 have=len(self.shares), k=k,
8431-                 outstanding=len(self._outstanding_queries),
8432-                 level=log.NOISY)
8433 
8434hunk ./src/allmydata/mutable/retrieve.py 607
8435-        remaining_shares = k - len(self.shares)
8436-        needed = remaining_shares - len(self._outstanding_queries)
8437-        if not needed:
8438-            # we have enough queries in flight already
8439+    def _mark_bad_share(self, reader, f):
8440+        """
8441+        I mark the (peerid, shnum) encapsulated by my reader argument as
8442+        a bad share, which means that it will not be used anywhere else.
8443 
8444hunk ./src/allmydata/mutable/retrieve.py 612
8445-            # TODO: but if they've been in flight for a long time, and we
8446-            # have reason to believe that new queries might respond faster
8447-            # (i.e. we've seen other queries come back faster, then consider
8448-            # sending out new queries. This could help with peers which have
8449-            # silently gone away since the servermap was updated, for which
8450-            # we're still waiting for the 15-minute TCP disconnect to happen.
8451-            self.log("enough queries are in flight, no more are needed",
8452-                     level=log.NOISY)
8453-            return
8454+        There are several reasons to want to mark something as a bad
8455+        share. These include:
8456+
8457+            - A connection error to the peer.
8458+            - A mismatched prefix (that is, a prefix that does not match
8459+              our local conception of the version information string).
8460+            - A failing block hash, salt hash, share hash, or other
8461+              integrity check.
8462 
8463hunk ./src/allmydata/mutable/retrieve.py 621
8464-        outstanding_shnums = set([shnum
8465-                                  for (peerid, shnum, started)
8466-                                  in self._outstanding_queries.values()])
8467-        # prefer low-numbered shares, they are more likely to be primary
8468-        available_shnums = sorted(self.remaining_sharemap.keys())
8469-        for shnum in available_shnums:
8470-            if shnum in outstanding_shnums:
8471-                # skip ones that are already in transit
8472-                continue
8473-            if shnum not in self.remaining_sharemap:
8474-                # no servers for that shnum. note that DictOfSets removes
8475-                # empty sets from the dict for us.
8476-                continue
8477-            peerid = list(self.remaining_sharemap[shnum])[0]
8478-            # get_data will remove that peerid from the sharemap, and add the
8479-            # query to self._outstanding_queries
8480-            self._status.set_status("Retrieving More Shares")
8481-            self.get_data(shnum, peerid)
8482-            needed -= 1
8483-            if not needed:
8484+        This method will ensure that readers that we wish to mark bad
8485+        (for these reasons or other reasons) are not used for the rest
8486+        of the download. Additionally, it will attempt to tell the
8487+        remote peer (with no guarantee of success) that its share is
8488+        corrupt.
8489+        """
8490+        self.log("marking share %d on server %s as bad" % \
8491+                 (reader.shnum, reader))
8492+        prefix = self.verinfo[-2]
8493+        self.servermap.mark_bad_share(reader.peerid,
8494+                                      reader.shnum,
8495+                                      prefix)
8496+        self._remove_reader(reader)
8497+        self._bad_shares.add((reader.peerid, reader.shnum, f))
8498+        self._status.problems[reader.peerid] = f
8499+        self._last_failure = f
8500+        self.notify_server_corruption(reader.peerid, reader.shnum,
8501+                                      str(f.value))
8502+
8503+
8504+    def _download_current_segment(self):
8505+        """
8506+        I download, validate, decode, decrypt, and assemble the segment
8507+        that this Retrieve is currently responsible for downloading.
8508+        """
8509+        assert len(self._active_readers) >= self._required_shares
8510+        if self._current_segment <= self._last_segment:
8511+            d = self._process_segment(self._current_segment)
8512+        else:
8513+            d = defer.succeed(None)
8514+        d.addBoth(self._turn_barrier)
8515+        d.addCallback(self._check_for_done)
8516+        return d
8517+
8518+
8519+    def _turn_barrier(self, result):
8520+        """
8521+        I help the download process avoid the recursion limit issues
8522+        discussed in #237.
8523+        """
8524+        return fireEventually(result)
8525+
8526+
8527+    def _process_segment(self, segnum):
8528+        """
8529+        I download, validate, decode, and decrypt one segment of the
8530+        file that this Retrieve is retrieving. This means coordinating
8531+        the process of getting k blocks of that file, validating them,
8532+        assembling them into one segment with the decoder, and then
8533+        decrypting them.
8534+        """
8535+        self.log("processing segment %d" % segnum)
8536+
8537+        # TODO: The old code uses a marker. Should this code do that
8538+        # too? What did the Marker do?
8539+        assert len(self._active_readers) >= self._required_shares
8540+
8541+        # We need to ask each of our active readers for its block and
8542+        # salt. We will then validate those. If validation is
8543+        # successful, we will assemble the results into plaintext.
8544+        ds = []
8545+        for reader in self._active_readers:
8546+            started = time.time()
8547+            d = reader.get_block_and_salt(segnum, queue=True)
8548+            d2 = self._get_needed_hashes(reader, segnum)
8549+            dl = defer.DeferredList([d, d2], consumeErrors=True)
8550+            dl.addCallback(self._validate_block, segnum, reader, started)
8551+            dl.addErrback(self._validation_or_decoding_failed, [reader])
8552+            ds.append(dl)
8553+            reader.flush()
8554+        dl = defer.DeferredList(ds)
8555+        if self._verify:
8556+            dl.addCallback(lambda ignored: "")
8557+            dl.addCallback(self._set_segment)
8558+        else:
8559+            dl.addCallback(self._maybe_decode_and_decrypt_segment, segnum)
8560+        return dl
8561+
8562+
8563+    def _maybe_decode_and_decrypt_segment(self, blocks_and_salts, segnum):
8564+        """
8565+        I take the results of fetching and validating the blocks from a
8566+        callback chain in another method. If the results are such that
8567+        they tell me that validation and fetching succeeded without
8568+        incident, I will proceed with decoding and decryption.
8569+        Otherwise, I will do nothing.
8570+        """
8571+        self.log("trying to decode and decrypt segment %d" % segnum)
8572+        failures = False
8573+        for block_and_salt in blocks_and_salts:
8574+            if not block_and_salt[0] or block_and_salt[1] == None:
8575+                self.log("some validation operations failed; not proceeding")
8576+                failures = True
8577                 break
8578hunk ./src/allmydata/mutable/retrieve.py 715
8579+        if not failures:
8580+            self.log("everything looks ok, building segment %d" % segnum)
8581+            d = self._decode_blocks(blocks_and_salts, segnum)
8582+            d.addCallback(self._decrypt_segment)
8583+            d.addErrback(self._validation_or_decoding_failed,
8584+                         self._active_readers)
8585+            # check to see whether we've been paused before writing
8586+            # anything.
8587+            d.addCallback(self._check_for_paused)
8588+            d.addCallback(self._set_segment)
8589+            return d
8590+        else:
8591+            return defer.succeed(None)
8592+
8593+
8594+    def _set_segment(self, segment):
8595+        """
8596+        Given a plaintext segment, I register that segment with the
8597+        target that is handling the file download.
8598+        """
8599+        self.log("got plaintext for segment %d" % self._current_segment)
8600+        if self._current_segment == self._start_segment:
8601+            # We're on the first segment. It's possible that we want
8602+            # only some part of the end of this segment, and that we
8603+            # just downloaded the whole thing to get that part. If so,
8604+            # we need to account for that and give the reader just the
8605+            # data that they want.
8606+            n = self._offset % self._segment_size
8607+            self.log("stripping %d bytes off of the first segment" % n)
8608+            self.log("original segment length: %d" % len(segment))
8609+            segment = segment[n:]
8610+            self.log("new segment length: %d" % len(segment))
8611+
8612+        if self._current_segment == self._last_segment and self._read_length is not None:
8613+            # We're on the last segment. It's possible that we only want
8614+            # part of the beginning of this segment, and that we
8615+            # downloaded the whole thing anyway. Make sure to give the
8616+            # caller only the portion of the segment that they want to
8617+            # receive.
8618+            extra = self._read_length
8619+            if self._start_segment != self._last_segment:
8620+                extra -= self._segment_size - \
8621+                            (self._offset % self._segment_size)
8622+            extra %= self._segment_size
8623+            self.log("original segment length: %d" % len(segment))
8624+            segment = segment[:extra]
8625+            self.log("new segment length: %d" % len(segment))
8626+            self.log("only taking %d bytes of the last segment" % extra)
8627+
8628+        if not self._verify:
8629+            self._consumer.write(segment)
8630+        else:
8631+            # we don't care about the plaintext if we are doing a verify.
8632+            segment = None
8633+        self._current_segment += 1
8634 
8635hunk ./src/allmydata/mutable/retrieve.py 771
8636-        # at this point, we have as many outstanding queries as we can. If
8637-        # needed!=0 then we might not have enough to recover the file.
8638-        if needed:
8639-            format = ("ran out of peers: "
8640-                      "have %(have)d shares (k=%(k)d), "
8641-                      "%(outstanding)d queries in flight, "
8642-                      "need %(need)d more, "
8643-                      "found %(bad)d bad shares")
8644-            args = {"have": len(self.shares),
8645-                    "k": k,
8646-                    "outstanding": len(self._outstanding_queries),
8647-                    "need": needed,
8648-                    "bad": len(self._bad_shares),
8649-                    }
8650-            self.log(format=format,
8651-                     level=log.WEIRD, umid="ezTfjw", **args)
8652-            err = NotEnoughSharesError("%s, last failure: %s" %
8653-                                      (format % args, self._last_failure))
8654-            if self._bad_shares:
8655-                self.log("We found some bad shares this pass. You should "
8656-                         "update the servermap and try again to check "
8657-                         "more peers",
8658-                         level=log.WEIRD, umid="EFkOlA")
8659-                err.servermap = self.servermap
8660-            raise err
8661 
8662hunk ./src/allmydata/mutable/retrieve.py 772
8663+    def _validation_or_decoding_failed(self, f, readers):
8664+        """
8665+        I am called when a block or a salt fails to correctly validate, or when
8666+        the decryption or decoding operation fails for some reason.  I react to
8667+        this failure by notifying the remote server of corruption, and then
8668+        removing the remote peer from further activity.
8669+        """
8670+        assert isinstance(readers, list)
8671+        bad_shnums = [reader.shnum for reader in readers]
8672+
8673+        self.log("validation or decoding failed on share(s) %s, peer(s) %s "
8674+                 ", segment %d: %s" % \
8675+                 (bad_shnums, readers, self._current_segment, str(f)))
8676+        for reader in readers:
8677+            self._mark_bad_share(reader, f)
8678         return
8679 
8680hunk ./src/allmydata/mutable/retrieve.py 789
8681-    def _decode(self):
8682-        started = time.time()
8683-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8684-         offsets_tuple) = self.verinfo
8685 
8686hunk ./src/allmydata/mutable/retrieve.py 790
8687-        # shares_dict is a dict mapping shnum to share data, but the codec
8688-        # wants two lists.
8689-        shareids = []; shares = []
8690-        for shareid, share in self.shares.items():
8691+    def _validate_block(self, results, segnum, reader, started):
8692+        """
8693+        I validate a block from one share on a remote server.
8694+        """
8695+        # Grab the part of the block hash tree that is necessary to
8696+        # validate this block, then generate the block hash root.
8697+        self.log("validating share %d for segment %d" % (reader.shnum,
8698+                                                             segnum))
8699+        self._status.add_fetch_timing(reader.peerid, started)
8700+        self._status.set_status("Valdiating blocks for segment %d" % segnum)
8701+        # Did we fail to fetch either of the things that we were
8702+        # supposed to? Fail if so.
8703+        if not results[0][0] and results[1][0]:
8704+            # handled by the errback handler.
8705+
8706+            # These all get batched into one query, so the resulting
8707+            # failure should be the same for all of them, so we can just
8708+            # use the first one.
8709+            assert isinstance(results[0][1], failure.Failure)
8710+
8711+            f = results[0][1]
8712+            raise CorruptShareError(reader.peerid,
8713+                                    reader.shnum,
8714+                                    "Connection error: %s" % str(f))
8715+
8716+        block_and_salt, block_and_sharehashes = results
8717+        block, salt = block_and_salt[1]
8718+        blockhashes, sharehashes = block_and_sharehashes[1]
8719+
8720+        blockhashes = dict(enumerate(blockhashes[1]))
8721+        self.log("the reader gave me the following blockhashes: %s" % \
8722+                 blockhashes.keys())
8723+        self.log("the reader gave me the following sharehashes: %s" % \
8724+                 sharehashes[1].keys())
8725+        bht = self._block_hash_trees[reader.shnum]
8726+
8727+        if bht.needed_hashes(segnum, include_leaf=True):
8728+            try:
8729+                bht.set_hashes(blockhashes)
8730+            except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8731+                    IndexError), e:
8732+                raise CorruptShareError(reader.peerid,
8733+                                        reader.shnum,
8734+                                        "block hash tree failure: %s" % e)
8735+
8736+        if self._version == MDMF_VERSION:
8737+            blockhash = hashutil.block_hash(salt + block)
8738+        else:
8739+            blockhash = hashutil.block_hash(block)
8740+        # If this works without an error, then validation is
8741+        # successful.
8742+        try:
8743+           bht.set_hashes(leaves={segnum: blockhash})
8744+        except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8745+                IndexError), e:
8746+            raise CorruptShareError(reader.peerid,
8747+                                    reader.shnum,
8748+                                    "block hash tree failure: %s" % e)
8749+
8750+        # Reaching this point means that we know that this segment
8751+        # is correct. Now we need to check to see whether the share
8752+        # hash chain is also correct.
8753+        # SDMF wrote share hash chains that didn't contain the
8754+        # leaves, which would be produced from the block hash tree.
8755+        # So we need to validate the block hash tree first. If
8756+        # successful, then bht[0] will contain the root for the
8757+        # shnum, which will be a leaf in the share hash tree, which
8758+        # will allow us to validate the rest of the tree.
8759+        if self.share_hash_tree.needed_hashes(reader.shnum,
8760+                                              include_leaf=True) or \
8761+                                              self._verify:
8762+            try:
8763+                self.share_hash_tree.set_hashes(hashes=sharehashes[1],
8764+                                            leaves={reader.shnum: bht[0]})
8765+            except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8766+                    IndexError), e:
8767+                raise CorruptShareError(reader.peerid,
8768+                                        reader.shnum,
8769+                                        "corrupt hashes: %s" % e)
8770+
8771+        self.log('share %d is valid for segment %d' % (reader.shnum,
8772+                                                       segnum))
8773+        return {reader.shnum: (block, salt)}
8774+
8775+
8776+    def _get_needed_hashes(self, reader, segnum):
8777+        """
8778+        I get the hashes needed to validate segnum from the reader, then return
8779+        to my caller when this is done.
8780+        """
8781+        bht = self._block_hash_trees[reader.shnum]
8782+        needed = bht.needed_hashes(segnum, include_leaf=True)
8783+        # The root of the block hash tree is also a leaf in the share
8784+        # hash tree. So we don't need to fetch it from the remote
8785+        # server. In the case of files with one segment, this means that
8786+        # we won't fetch any block hash tree from the remote server,
8787+        # since the hash of each share of the file is the entire block
8788+        # hash tree, and is a leaf in the share hash tree. This is fine,
8789+        # since any share corruption will be detected in the share hash
8790+        # tree.
8791+        #needed.discard(0)
8792+        self.log("getting blockhashes for segment %d, share %d: %s" % \
8793+                 (segnum, reader.shnum, str(needed)))
8794+        d1 = reader.get_blockhashes(needed, queue=True, force_remote=True)
8795+        if self.share_hash_tree.needed_hashes(reader.shnum):
8796+            need = self.share_hash_tree.needed_hashes(reader.shnum)
8797+            self.log("also need sharehashes for share %d: %s" % (reader.shnum,
8798+                                                                 str(need)))
8799+            d2 = reader.get_sharehashes(need, queue=True, force_remote=True)
8800+        else:
8801+            d2 = defer.succeed({}) # the logic in the next method
8802+                                   # expects a dict
8803+        dl = defer.DeferredList([d1, d2], consumeErrors=True)
8804+        return dl
8805+
8806+
8807+    def _decode_blocks(self, blocks_and_salts, segnum):
8808+        """
8809+        I take a list of k blocks and salts, and decode that into a
8810+        single encrypted segment.
8811+        """
8812+        d = {}
8813+        # We want to merge our dictionaries to the form
8814+        # {shnum: blocks_and_salts}
8815+        #
8816+        # The dictionaries come from validate block that way, so we just
8817+        # need to merge them.
8818+        for block_and_salt in blocks_and_salts:
8819+            d.update(block_and_salt[1])
8820+
8821+        # All of these blocks should have the same salt; in SDMF, it is
8822+        # the file-wide IV, while in MDMF it is the per-segment salt. In
8823+        # either case, we just need to get one of them and use it.
8824+        #
8825+        # d.items()[0] is like (shnum, (block, salt))
8826+        # d.items()[0][1] is like (block, salt)
8827+        # d.items()[0][1][1] is the salt.
8828+        salt = d.items()[0][1][1]
8829+        # Next, extract just the blocks from the dict. We'll use the
8830+        # salt in the next step.
8831+        share_and_shareids = [(k, v[0]) for k, v in d.items()]
8832+        d2 = dict(share_and_shareids)
8833+        shareids = []
8834+        shares = []
8835+        for shareid, share in d2.items():
8836             shareids.append(shareid)
8837             shares.append(share)
8838 
8839hunk ./src/allmydata/mutable/retrieve.py 938
8840-        assert len(shareids) >= k, len(shareids)
8841+        self._status.set_status("Decoding")
8842+        started = time.time()
8843+        assert len(shareids) >= self._required_shares, len(shareids)
8844         # zfec really doesn't want extra shares
8845hunk ./src/allmydata/mutable/retrieve.py 942
8846-        shareids = shareids[:k]
8847-        shares = shares[:k]
8848-
8849-        fec = codec.CRSDecoder()
8850-        fec.set_params(segsize, k, N)
8851-
8852-        self.log("params %s, we have %d shares" % ((segsize, k, N), len(shares)))
8853-        self.log("about to decode, shareids=%s" % (shareids,))
8854-        d = defer.maybeDeferred(fec.decode, shares, shareids)
8855-        def _done(buffers):
8856-            self._status.timings["decode"] = time.time() - started
8857-            self.log(" decode done, %d buffers" % len(buffers))
8858+        shareids = shareids[:self._required_shares]
8859+        shares = shares[:self._required_shares]
8860+        self.log("decoding segment %d" % segnum)
8861+        if segnum == self._num_segments - 1:
8862+            d = defer.maybeDeferred(self._tail_decoder.decode, shares, shareids)
8863+        else:
8864+            d = defer.maybeDeferred(self._segment_decoder.decode, shares, shareids)
8865+        def _process(buffers):
8866             segment = "".join(buffers)
8867hunk ./src/allmydata/mutable/retrieve.py 951
8868+            self.log(format="now decoding segment %(segnum)s of %(numsegs)s",
8869+                     segnum=segnum,
8870+                     numsegs=self._num_segments,
8871+                     level=log.NOISY)
8872             self.log(" joined length %d, datalength %d" %
8873hunk ./src/allmydata/mutable/retrieve.py 956
8874-                     (len(segment), datalength))
8875-            segment = segment[:datalength]
8876+                     (len(segment), self._data_length))
8877+            if segnum == self._num_segments - 1:
8878+                size_to_use = self._tail_data_size
8879+            else:
8880+                size_to_use = self._segment_size
8881+            segment = segment[:size_to_use]
8882             self.log(" segment len=%d" % len(segment))
8883hunk ./src/allmydata/mutable/retrieve.py 963
8884-            return segment
8885-        def _err(f):
8886-            self.log(" decode failed: %s" % f)
8887-            return f
8888-        d.addCallback(_done)
8889-        d.addErrback(_err)
8890+            self._status.timings.setdefault("decode", 0)
8891+            self._status.timings['decode'] = time.time() - started
8892+            return segment, salt
8893+        d.addCallback(_process)
8894         return d
8895 
8896hunk ./src/allmydata/mutable/retrieve.py 969
8897-    def _decrypt(self, crypttext, IV, readkey):
8898+
8899+    def _decrypt_segment(self, segment_and_salt):
8900+        """
8901+        I take a single segment and its salt, and decrypt it. I return
8902+        the plaintext of the segment that is in my argument.
8903+        """
8904+        segment, salt = segment_and_salt
8905         self._status.set_status("decrypting")
8906hunk ./src/allmydata/mutable/retrieve.py 977
8907+        self.log("decrypting segment %d" % self._current_segment)
8908         started = time.time()
8909hunk ./src/allmydata/mutable/retrieve.py 979
8910-        key = hashutil.ssk_readkey_data_hash(IV, readkey)
8911+        key = hashutil.ssk_readkey_data_hash(salt, self._node.get_readkey())
8912         decryptor = AES(key)
8913hunk ./src/allmydata/mutable/retrieve.py 981
8914-        plaintext = decryptor.process(crypttext)
8915-        self._status.timings["decrypt"] = time.time() - started
8916+        plaintext = decryptor.process(segment)
8917+        self._status.timings.setdefault("decrypt", 0)
8918+        self._status.timings['decrypt'] = time.time() - started
8919         return plaintext
8920 
8921hunk ./src/allmydata/mutable/retrieve.py 986
8922-    def _done(self, res):
8923-        if not self._running:
8924+
8925+    def notify_server_corruption(self, peerid, shnum, reason):
8926+        ss = self.servermap.connections[peerid]
8927+        ss.callRemoteOnly("advise_corrupt_share",
8928+                          "mutable", self._storage_index, shnum, reason)
8929+
8930+
8931+    def _try_to_validate_privkey(self, enc_privkey, reader):
8932+        alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
8933+        alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
8934+        if alleged_writekey != self._node.get_writekey():
8935+            self.log("invalid privkey from %s shnum %d" %
8936+                     (reader, reader.shnum),
8937+                     level=log.WEIRD, umid="YIw4tA")
8938+            if self._verify:
8939+                self.servermap.mark_bad_share(reader.peerid, reader.shnum,
8940+                                              self.verinfo[-2])
8941+                e = CorruptShareError(reader.peerid,
8942+                                      reader.shnum,
8943+                                      "invalid privkey")
8944+                f = failure.Failure(e)
8945+                self._bad_shares.add((reader.peerid, reader.shnum, f))
8946             return
8947hunk ./src/allmydata/mutable/retrieve.py 1009
8948+
8949+        # it's good
8950+        self.log("got valid privkey from shnum %d on reader %s" %
8951+                 (reader.shnum, reader))
8952+        privkey = rsa.create_signing_key_from_string(alleged_privkey_s)
8953+        self._node._populate_encprivkey(enc_privkey)
8954+        self._node._populate_privkey(privkey)
8955+        self._need_privkey = False
8956+
8957+
8958+    def _check_for_done(self, res):
8959+        """
8960+        I check to see if this Retrieve object has successfully finished
8961+        its work.
8962+
8963+        I can exit in the following ways:
8964+            - If there are no more segments to download, then I exit by
8965+              causing self._done_deferred to fire with the plaintext
8966+              content requested by the caller.
8967+            - If there are still segments to be downloaded, and there
8968+              are enough active readers (readers which have not broken
8969+              and have not given us corrupt data) to continue
8970+              downloading, I send control back to
8971+              _download_current_segment.
8972+            - If there are still segments to be downloaded but there are
8973+              not enough active peers to download them, I ask
8974+              _add_active_peers to add more peers. If it is successful,
8975+              it will call _download_current_segment. If there are not
8976+              enough peers to retrieve the file, then that will cause
8977+              _done_deferred to errback.
8978+        """
8979+        self.log("checking for doneness")
8980+        if self._current_segment > self._last_segment:
8981+            # No more segments to download, we're done.
8982+            self.log("got plaintext, done")
8983+            return self._done()
8984+
8985+        if len(self._active_readers) >= self._required_shares:
8986+            # More segments to download, but we have enough good peers
8987+            # in self._active_readers that we can do that without issue,
8988+            # so go nab the next segment.
8989+            self.log("not done yet: on segment %d of %d" % \
8990+                     (self._current_segment + 1, self._num_segments))
8991+            return self._download_current_segment()
8992+
8993+        self.log("not done yet: on segment %d of %d, need to add peers" % \
8994+                 (self._current_segment + 1, self._num_segments))
8995+        return self._add_active_peers()
8996+
8997+
8998+    def _done(self):
8999+        """
9000+        I am called by _check_for_done when the download process has
9001+        finished successfully. After making some useful logging
9002+        statements, I return the decrypted contents to the owner of this
9003+        Retrieve object through self._done_deferred.
9004+        """
9005         self._running = False
9006         self._status.set_active(False)
9007hunk ./src/allmydata/mutable/retrieve.py 1068
9008-        self._status.timings["total"] = time.time() - self._started
9009-        # res is either the new contents, or a Failure
9010-        if isinstance(res, failure.Failure):
9011-            self.log("Retrieve done, with failure", failure=res,
9012-                     level=log.UNUSUAL)
9013-            self._status.set_status("Failed")
9014+        now = time.time()
9015+        self._status.timings['total'] = now - self._started
9016+        self._status.timings['fetch'] = now - self._started_fetching
9017+
9018+        if self._verify:
9019+            ret = list(self._bad_shares)
9020+            self.log("done verifying, found %d bad shares" % len(ret))
9021         else:
9022hunk ./src/allmydata/mutable/retrieve.py 1076
9023-            self.log("Retrieve done, success!")
9024-            self._status.set_status("Finished")
9025-            self._status.set_progress(1.0)
9026-            # remember the encoding parameters, use them again next time
9027-            (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9028-             offsets_tuple) = self.verinfo
9029-            self._node._populate_required_shares(k)
9030-            self._node._populate_total_shares(N)
9031-        eventually(self._done_deferred.callback, res)
9032+            # TODO: upload status here?
9033+            ret = self._consumer
9034+            self._consumer.unregisterProducer()
9035+        eventually(self._done_deferred.callback, ret)
9036+
9037 
9038hunk ./src/allmydata/mutable/retrieve.py 1082
9039+    def _failed(self):
9040+        """
9041+        I am called by _add_active_peers when there are not enough
9042+        active peers left to complete the download. After making some
9043+        useful logging statements, I return an exception to that effect
9044+        to the caller of this Retrieve object through
9045+        self._done_deferred.
9046+        """
9047+        self._running = False
9048+        self._status.set_active(False)
9049+        now = time.time()
9050+        self._status.timings['total'] = now - self._started
9051+        self._status.timings['fetch'] = now - self._started_fetching
9052+
9053+        if self._verify:
9054+            ret = list(self._bad_shares)
9055+        else:
9056+            format = ("ran out of peers: "
9057+                      "have %(have)d of %(total)d segments "
9058+                      "found %(bad)d bad shares "
9059+                      "encoding %(k)d-of-%(n)d")
9060+            args = {"have": self._current_segment,
9061+                    "total": self._num_segments,
9062+                    "need": self._last_segment,
9063+                    "k": self._required_shares,
9064+                    "n": self._total_shares,
9065+                    "bad": len(self._bad_shares)}
9066+            e = NotEnoughSharesError("%s, last failure: %s" % \
9067+                                     (format % args, str(self._last_failure)))
9068+            f = failure.Failure(e)
9069+            ret = f
9070+        eventually(self._done_deferred.callback, ret)
9071}
9072[mutable/servermap.py: Alter the servermap updater to work with MDMF files
9073Kevan Carstensen <kevan@isnotajoke.com>**20100819003439
9074 Ignore-this: 7e408303194834bd59a2f27efab3bdb
9075 
9076 These modifications were basically all to the end of having the
9077 servermap updater use the unified MDMF + SDMF read interface whenever
9078 possible -- this reduces the complexity of the code, making it easier to
9079 read and maintain. To do this, I needed to modify the process of
9080 updating the servermap a little bit.
9081 
9082 To support partial-file updates, I also modified the servermap updater
9083 to fetch the block hash trees and certain segments of files while it
9084 performed a servermap update (this can be done without adding any new
9085 roundtrips because of batch-read functionality that the read proxy has).
9086 
9087] {
9088hunk ./src/allmydata/mutable/servermap.py 2
9089 
9090-import sys, time
9091+import sys, time, struct
9092 from zope.interface import implements
9093 from itertools import count
9094 from twisted.internet import defer
9095merger 0.0 (
9096hunk ./src/allmydata/mutable/servermap.py 9
9097+from allmydata.util.dictutil import DictOfSets
9098hunk ./src/allmydata/mutable/servermap.py 7
9099-from foolscap.api import DeadReferenceError, RemoteException, eventually
9100-from allmydata.util import base32, hashutil, idlib, log
9101+from foolscap.api import DeadReferenceError, RemoteException, eventually, \
9102+                         fireEventually
9103+from allmydata.util import base32, hashutil, idlib, log, deferredutil
9104)
9105merger 0.0 (
9106hunk ./src/allmydata/mutable/servermap.py 14
9107-     DictOfSets, CorruptShareError, NeedMoreDataError
9108+     CorruptShareError, NeedMoreDataError
9109hunk ./src/allmydata/mutable/servermap.py 14
9110-     DictOfSets, CorruptShareError, NeedMoreDataError
9111-from allmydata.mutable.layout import unpack_prefix_and_signature, unpack_header, unpack_share, \
9112-     SIGNED_PREFIX_LENGTH
9113+     DictOfSets, CorruptShareError
9114+from allmydata.mutable.layout import SIGNED_PREFIX_LENGTH, MDMFSlotReadProxy
9115)
9116hunk ./src/allmydata/mutable/servermap.py 123
9117         self.bad_shares = {} # maps (peerid,shnum) to old checkstring
9118         self.last_update_mode = None
9119         self.last_update_time = 0
9120+        self.update_data = {} # (verinfo,shnum) => data
9121 
9122     def copy(self):
9123         s = ServerMap()
9124hunk ./src/allmydata/mutable/servermap.py 254
9125         """Return a set of versionids, one for each version that is currently
9126         recoverable."""
9127         versionmap = self.make_versionmap()
9128-
9129         recoverable_versions = set()
9130         for (verinfo, shares) in versionmap.items():
9131             (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9132hunk ./src/allmydata/mutable/servermap.py 339
9133         return False
9134 
9135 
9136+    def get_update_data_for_share_and_verinfo(self, shnum, verinfo):
9137+        """
9138+        I return the update data for the given shnum
9139+        """
9140+        update_data = self.update_data[shnum]
9141+        update_datum = [i[1] for i in update_data if i[0] == verinfo][0]
9142+        return update_datum
9143+
9144+
9145+    def set_update_data_for_share_and_verinfo(self, shnum, verinfo, data):
9146+        """
9147+        I record the block hash tree for the given shnum.
9148+        """
9149+        self.update_data.setdefault(shnum , []).append((verinfo, data))
9150+
9151+
9152 class ServermapUpdater:
9153     def __init__(self, filenode, storage_broker, monitor, servermap,
9154hunk ./src/allmydata/mutable/servermap.py 357
9155-                 mode=MODE_READ, add_lease=False):
9156+                 mode=MODE_READ, add_lease=False, update_range=None):
9157         """I update a servermap, locating a sufficient number of useful
9158         shares and remembering where they are located.
9159 
9160hunk ./src/allmydata/mutable/servermap.py 382
9161         self._servers_responded = set()
9162 
9163         # how much data should we read?
9164+        # SDMF:
9165         #  * if we only need the checkstring, then [0:75]
9166         #  * if we need to validate the checkstring sig, then [543ish:799ish]
9167         #  * if we need the verification key, then [107:436ish]
9168merger 0.0 (
9169hunk ./src/allmydata/mutable/servermap.py 392
9170-        # read 2000 bytes, which also happens to read enough actual data to
9171-        # pre-fetch a 9-entry dirnode.
9172+        # read 4000 bytes, which also happens to read enough actual data to
9173+        # pre-fetch an 18-entry dirnode.
9174hunk ./src/allmydata/mutable/servermap.py 390
9175-        # A future version of the SMDF slot format should consider using
9176-        # fixed-size slots so we can retrieve less data. For now, we'll just
9177-        # read 2000 bytes, which also happens to read enough actual data to
9178-        # pre-fetch a 9-entry dirnode.
9179+        # MDMF:
9180+        #  * Checkstring? [0:72]
9181+        #  * If we want to validate the checkstring, then [0:72], [143:?] --
9182+        #    the offset table will tell us for sure.
9183+        #  * If we need the verification key, we have to consult the offset
9184+        #    table as well.
9185+        # At this point, we don't know which we are. Our filenode can
9186+        # tell us, but it might be lying -- in some cases, we're
9187+        # responsible for telling it which kind of file it is.
9188)
9189hunk ./src/allmydata/mutable/servermap.py 399
9190             # we use unpack_prefix_and_signature, so we need 1k
9191             self._read_size = 1000
9192         self._need_privkey = False
9193+
9194         if mode == MODE_WRITE and not self._node.get_privkey():
9195             self._need_privkey = True
9196         # check+repair: repair requires the privkey, so if we didn't happen
9197hunk ./src/allmydata/mutable/servermap.py 406
9198         # to ask for it during the check, we'll have problems doing the
9199         # publish.
9200 
9201+        self.fetch_update_data = False
9202+        if mode == MODE_WRITE and update_range:
9203+            # We're updating the servermap in preparation for an
9204+            # in-place file update, so we need to fetch some additional
9205+            # data from each share that we find.
9206+            assert len(update_range) == 2
9207+
9208+            self.start_segment = update_range[0]
9209+            self.end_segment = update_range[1]
9210+            self.fetch_update_data = True
9211+
9212         prefix = si_b2a(self._storage_index)[:5]
9213         self._log_number = log.msg(format="SharemapUpdater(%(si)s): starting (%(mode)s)",
9214                                    si=prefix, mode=mode)
9215merger 0.0 (
9216hunk ./src/allmydata/mutable/servermap.py 455
9217-        full_peerlist = sb.get_servers_for_index(self._storage_index)
9218+        full_peerlist = [(s.get_serverid(), s.get_rref())
9219+                         for s in sb.get_servers_for_psi(self._storage_index)]
9220hunk ./src/allmydata/mutable/servermap.py 455
9221+        # All of the peers, permuted by the storage index, as usual.
9222)
9223hunk ./src/allmydata/mutable/servermap.py 461
9224         self._good_peers = set() # peers who had some shares
9225         self._empty_peers = set() # peers who don't have any shares
9226         self._bad_peers = set() # peers to whom our queries failed
9227+        self._readers = {} # peerid -> dict(sharewriters), filled in
9228+                           # after responses come in.
9229 
9230         k = self._node.get_required_shares()
9231hunk ./src/allmydata/mutable/servermap.py 465
9232+        # For what cases can these conditions work?
9233         if k is None:
9234             # make a guess
9235             k = 3
9236hunk ./src/allmydata/mutable/servermap.py 478
9237         self.num_peers_to_query = k + self.EPSILON
9238 
9239         if self.mode == MODE_CHECK:
9240+            # We want to query all of the peers.
9241             initial_peers_to_query = dict(full_peerlist)
9242             must_query = set(initial_peers_to_query.keys())
9243             self.extra_peers = []
9244hunk ./src/allmydata/mutable/servermap.py 486
9245             # we're planning to replace all the shares, so we want a good
9246             # chance of finding them all. We will keep searching until we've
9247             # seen epsilon that don't have a share.
9248+            # We don't query all of the peers because that could take a while.
9249             self.num_peers_to_query = N + self.EPSILON
9250             initial_peers_to_query, must_query = self._build_initial_querylist()
9251             self.required_num_empty_peers = self.EPSILON
9252hunk ./src/allmydata/mutable/servermap.py 496
9253             # might also avoid the round trip required to read the encrypted
9254             # private key.
9255 
9256-        else:
9257+        else: # MODE_READ, MODE_ANYTHING
9258+            # 2k peers is good enough.
9259             initial_peers_to_query, must_query = self._build_initial_querylist()
9260 
9261         # this is a set of peers that we are required to get responses from:
9262hunk ./src/allmydata/mutable/servermap.py 512
9263         # before we can consider ourselves finished, and self.extra_peers
9264         # contains the overflow (peers that we should tap if we don't get
9265         # enough responses)
9266+        # I guess that self._must_query is a subset of
9267+        # initial_peers_to_query?
9268+        assert set(must_query).issubset(set(initial_peers_to_query))
9269 
9270         self._send_initial_requests(initial_peers_to_query)
9271         self._status.timings["initial_queries"] = time.time() - self._started
9272hunk ./src/allmydata/mutable/servermap.py 571
9273         # errors that aren't handled by _query_failed (and errors caused by
9274         # _query_failed) get logged, but we still want to check for doneness.
9275         d.addErrback(log.err)
9276-        d.addBoth(self._check_for_done)
9277         d.addErrback(self._fatal_error)
9278hunk ./src/allmydata/mutable/servermap.py 572
9279+        d.addCallback(self._check_for_done)
9280         return d
9281 
9282     def _do_read(self, ss, peerid, storage_index, shnums, readv):
9283hunk ./src/allmydata/mutable/servermap.py 591
9284         d = ss.callRemote("slot_readv", storage_index, shnums, readv)
9285         return d
9286 
9287+
9288+    def _got_corrupt_share(self, e, shnum, peerid, data, lp):
9289+        """
9290+        I am called when a remote server returns a corrupt share in
9291+        response to one of our queries. By corrupt, I mean a share
9292+        without a valid signature. I then record the failure, notify the
9293+        server of the corruption, and record the share as bad.
9294+        """
9295+        f = failure.Failure(e)
9296+        self.log(format="bad share: %(f_value)s", f_value=str(f),
9297+                 failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
9298+        # Notify the server that its share is corrupt.
9299+        self.notify_server_corruption(peerid, shnum, str(e))
9300+        # By flagging this as a bad peer, we won't count any of
9301+        # the other shares on that peer as valid, though if we
9302+        # happen to find a valid version string amongst those
9303+        # shares, we'll keep track of it so that we don't need
9304+        # to validate the signature on those again.
9305+        self._bad_peers.add(peerid)
9306+        self._last_failure = f
9307+        # XXX: Use the reader for this?
9308+        checkstring = data[:SIGNED_PREFIX_LENGTH]
9309+        self._servermap.mark_bad_share(peerid, shnum, checkstring)
9310+        self._servermap.problems.append(f)
9311+
9312+
9313+    def _cache_good_sharedata(self, verinfo, shnum, now, data):
9314+        """
9315+        If one of my queries returns successfully (which means that we
9316+        were able to and successfully did validate the signature), I
9317+        cache the data that we initially fetched from the storage
9318+        server. This will help reduce the number of roundtrips that need
9319+        to occur when the file is downloaded, or when the file is
9320+        updated.
9321+        """
9322+        if verinfo:
9323+            self._node._add_to_cache(verinfo, shnum, 0, data, now)
9324+
9325+
9326     def _got_results(self, datavs, peerid, readsize, stuff, started):
9327         lp = self.log(format="got result from [%(peerid)s], %(numshares)d shares",
9328                       peerid=idlib.shortnodeid_b2a(peerid),
9329hunk ./src/allmydata/mutable/servermap.py 633
9330-                      numshares=len(datavs),
9331-                      level=log.NOISY)
9332+                      numshares=len(datavs))
9333         now = time.time()
9334         elapsed = now - started
9335hunk ./src/allmydata/mutable/servermap.py 636
9336-        self._queries_outstanding.discard(peerid)
9337-        self._servermap.reachable_peers.add(peerid)
9338-        self._must_query.discard(peerid)
9339-        self._queries_completed += 1
9340+        def _done_processing(ignored=None):
9341+            self._queries_outstanding.discard(peerid)
9342+            self._servermap.reachable_peers.add(peerid)
9343+            self._must_query.discard(peerid)
9344+            self._queries_completed += 1
9345         if not self._running:
9346hunk ./src/allmydata/mutable/servermap.py 642
9347-            self.log("but we're not running, so we'll ignore it", parent=lp,
9348-                     level=log.NOISY)
9349+            self.log("but we're not running, so we'll ignore it", parent=lp)
9350+            _done_processing()
9351             self._status.add_per_server_time(peerid, "late", started, elapsed)
9352             return
9353         self._status.add_per_server_time(peerid, "query", started, elapsed)
9354hunk ./src/allmydata/mutable/servermap.py 653
9355         else:
9356             self._empty_peers.add(peerid)
9357 
9358-        last_verinfo = None
9359-        last_shnum = None
9360+        ss, storage_index = stuff
9361+        ds = []
9362+
9363         for shnum,datav in datavs.items():
9364             data = datav[0]
9365             try:
9366merger 0.0 (
9367hunk ./src/allmydata/mutable/servermap.py 662
9368-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
9369+                self._node._add_to_cache(verinfo, shnum, 0, data)
9370hunk ./src/allmydata/mutable/servermap.py 658
9371-            try:
9372-                verinfo = self._got_results_one_share(shnum, data, peerid, lp)
9373-                last_verinfo = verinfo
9374-                last_shnum = shnum
9375-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
9376-            except CorruptShareError, e:
9377-                # log it and give the other shares a chance to be processed
9378-                f = failure.Failure()
9379-                self.log(format="bad share: %(f_value)s", f_value=str(f.value),
9380-                         failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
9381-                self.notify_server_corruption(peerid, shnum, str(e))
9382-                self._bad_peers.add(peerid)
9383-                self._last_failure = f
9384-                checkstring = data[:SIGNED_PREFIX_LENGTH]
9385-                self._servermap.mark_bad_share(peerid, shnum, checkstring)
9386-                self._servermap.problems.append(f)
9387-                pass
9388+            reader = MDMFSlotReadProxy(ss,
9389+                                       storage_index,
9390+                                       shnum,
9391+                                       data)
9392+            self._readers.setdefault(peerid, dict())[shnum] = reader
9393+            # our goal, with each response, is to validate the version
9394+            # information and share data as best we can at this point --
9395+            # we do this by validating the signature. To do this, we
9396+            # need to do the following:
9397+            #   - If we don't already have the public key, fetch the
9398+            #     public key. We use this to validate the signature.
9399+            if not self._node.get_pubkey():
9400+                # fetch and set the public key.
9401+                d = reader.get_verification_key(queue=True)
9402+                d.addCallback(lambda results, shnum=shnum, peerid=peerid:
9403+                    self._try_to_set_pubkey(results, peerid, shnum, lp))
9404+                # XXX: Make self._pubkey_query_failed?
9405+                d.addErrback(lambda error, shnum=shnum, peerid=peerid:
9406+                    self._got_corrupt_share(error, shnum, peerid, data, lp))
9407+            else:
9408+                # we already have the public key.
9409+                d = defer.succeed(None)
9410)
9411hunk ./src/allmydata/mutable/servermap.py 676
9412                 self._servermap.problems.append(f)
9413                 pass
9414 
9415-        self._status.timings["cumulative_verify"] += (time.time() - now)
9416+            # Neither of these two branches return anything of
9417+            # consequence, so the first entry in our deferredlist will
9418+            # be None.
9419 
9420hunk ./src/allmydata/mutable/servermap.py 680
9421-        if self._need_privkey and last_verinfo:
9422-            # send them a request for the privkey. We send one request per
9423-            # server.
9424-            lp2 = self.log("sending privkey request",
9425-                           parent=lp, level=log.NOISY)
9426-            (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9427-             offsets_tuple) = last_verinfo
9428-            o = dict(offsets_tuple)
9429+            # - Next, we need the version information. We almost
9430+            #   certainly got this by reading the first thousand or so
9431+            #   bytes of the share on the storage server, so we
9432+            #   shouldn't need to fetch anything at this step.
9433+            d2 = reader.get_verinfo()
9434+            d2.addErrback(lambda error, shnum=shnum, peerid=peerid:
9435+                self._got_corrupt_share(error, shnum, peerid, data, lp))
9436+            # - Next, we need the signature. For an SDMF share, it is
9437+            #   likely that we fetched this when doing our initial fetch
9438+            #   to get the version information. In MDMF, this lives at
9439+            #   the end of the share, so unless the file is quite small,
9440+            #   we'll need to do a remote fetch to get it.
9441+            d3 = reader.get_signature(queue=True)
9442+            d3.addErrback(lambda error, shnum=shnum, peerid=peerid:
9443+                self._got_corrupt_share(error, shnum, peerid, data, lp))
9444+            #  Once we have all three of these responses, we can move on
9445+            #  to validating the signature
9446 
9447hunk ./src/allmydata/mutable/servermap.py 698
9448-            self._queries_outstanding.add(peerid)
9449-            readv = [ (o['enc_privkey'], (o['EOF'] - o['enc_privkey'])) ]
9450-            ss = self._servermap.connections[peerid]
9451-            privkey_started = time.time()
9452-            d = self._do_read(ss, peerid, self._storage_index,
9453-                              [last_shnum], readv)
9454-            d.addCallback(self._got_privkey_results, peerid, last_shnum,
9455-                          privkey_started, lp2)
9456-            d.addErrback(self._privkey_query_failed, peerid, last_shnum, lp2)
9457-            d.addErrback(log.err)
9458-            d.addCallback(self._check_for_done)
9459-            d.addErrback(self._fatal_error)
9460+            # Does the node already have a privkey? If not, we'll try to
9461+            # fetch it here.
9462+            if self._need_privkey:
9463+                d4 = reader.get_encprivkey(queue=True)
9464+                d4.addCallback(lambda results, shnum=shnum, peerid=peerid:
9465+                    self._try_to_validate_privkey(results, peerid, shnum, lp))
9466+                d4.addErrback(lambda error, shnum=shnum, peerid=peerid:
9467+                    self._privkey_query_failed(error, shnum, data, lp))
9468+            else:
9469+                d4 = defer.succeed(None)
9470+
9471+
9472+            if self.fetch_update_data:
9473+                # fetch the block hash tree and first + last segment, as
9474+                # configured earlier.
9475+                # Then set them in wherever we happen to want to set
9476+                # them.
9477+                ds = []
9478+                # XXX: We do this above, too. Is there a good way to
9479+                # make the two routines share the value without
9480+                # introducing more roundtrips?
9481+                ds.append(reader.get_verinfo())
9482+                ds.append(reader.get_blockhashes(queue=True))
9483+                ds.append(reader.get_block_and_salt(self.start_segment,
9484+                                                    queue=True))
9485+                ds.append(reader.get_block_and_salt(self.end_segment,
9486+                                                    queue=True))
9487+                d5 = deferredutil.gatherResults(ds)
9488+                d5.addCallback(self._got_update_results_one_share, shnum)
9489+            else:
9490+                d5 = defer.succeed(None)
9491 
9492hunk ./src/allmydata/mutable/servermap.py 730
9493+            dl = defer.DeferredList([d, d2, d3, d4, d5])
9494+            dl.addBoth(self._turn_barrier)
9495+            reader.flush()
9496+            dl.addCallback(lambda results, shnum=shnum, peerid=peerid:
9497+                self._got_signature_one_share(results, shnum, peerid, lp))
9498+            dl.addErrback(lambda error, shnum=shnum, data=data:
9499+               self._got_corrupt_share(error, shnum, peerid, data, lp))
9500+            dl.addCallback(lambda verinfo, shnum=shnum, peerid=peerid, data=data:
9501+                self._cache_good_sharedata(verinfo, shnum, now, data))
9502+            ds.append(dl)
9503+        # dl is a deferred list that will fire when all of the shares
9504+        # that we found on this peer are done processing. When dl fires,
9505+        # we know that processing is done, so we can decrement the
9506+        # semaphore-like thing that we incremented earlier.
9507+        dl = defer.DeferredList(ds, fireOnOneErrback=True)
9508+        # Are we done? Done means that there are no more queries to
9509+        # send, that there are no outstanding queries, and that we
9510+        # haven't received any queries that are still processing. If we
9511+        # are done, self._check_for_done will cause the done deferred
9512+        # that we returned to our caller to fire, which tells them that
9513+        # they have a complete servermap, and that we won't be touching
9514+        # the servermap anymore.
9515+        dl.addCallback(_done_processing)
9516+        dl.addCallback(self._check_for_done)
9517+        dl.addErrback(self._fatal_error)
9518         # all done!
9519         self.log("_got_results done", parent=lp, level=log.NOISY)
9520hunk ./src/allmydata/mutable/servermap.py 757
9521+        return dl
9522+
9523+
9524+    def _turn_barrier(self, result):
9525+        """
9526+        I help the servermap updater avoid the recursion limit issues
9527+        discussed in #237.
9528+        """
9529+        return fireEventually(result)
9530+
9531+
9532+    def _try_to_set_pubkey(self, pubkey_s, peerid, shnum, lp):
9533+        if self._node.get_pubkey():
9534+            return # don't go through this again if we don't have to
9535+        fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s)
9536+        assert len(fingerprint) == 32
9537+        if fingerprint != self._node.get_fingerprint():
9538+            raise CorruptShareError(peerid, shnum,
9539+                                "pubkey doesn't match fingerprint")
9540+        self._node._populate_pubkey(self._deserialize_pubkey(pubkey_s))
9541+        assert self._node.get_pubkey()
9542+
9543 
9544     def notify_server_corruption(self, peerid, shnum, reason):
9545         ss = self._servermap.connections[peerid]
9546hunk ./src/allmydata/mutable/servermap.py 785
9547         ss.callRemoteOnly("advise_corrupt_share",
9548                           "mutable", self._storage_index, shnum, reason)
9549 
9550-    def _got_results_one_share(self, shnum, data, peerid, lp):
9551+
9552+    def _got_signature_one_share(self, results, shnum, peerid, lp):
9553+        # It is our job to give versioninfo to our caller. We need to
9554+        # raise CorruptShareError if the share is corrupt for any
9555+        # reason, something that our caller will handle.
9556         self.log(format="_got_results: got shnum #%(shnum)d from peerid %(peerid)s",
9557                  shnum=shnum,
9558                  peerid=idlib.shortnodeid_b2a(peerid),
9559hunk ./src/allmydata/mutable/servermap.py 795
9560                  level=log.NOISY,
9561                  parent=lp)
9562+        if not self._running:
9563+            # We can't process the results, since we can't touch the
9564+            # servermap anymore.
9565+            self.log("but we're not running anymore.")
9566+            return None
9567 
9568hunk ./src/allmydata/mutable/servermap.py 801
9569-        # this might raise NeedMoreDataError, if the pubkey and signature
9570-        # live at some weird offset. That shouldn't happen, so I'm going to
9571-        # treat it as a bad share.
9572-        (seqnum, root_hash, IV, k, N, segsize, datalength,
9573-         pubkey_s, signature, prefix) = unpack_prefix_and_signature(data)
9574-
9575-        if not self._node.get_pubkey():
9576-            fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s)
9577-            assert len(fingerprint) == 32
9578-            if fingerprint != self._node.get_fingerprint():
9579-                raise CorruptShareError(peerid, shnum,
9580-                                        "pubkey doesn't match fingerprint")
9581-            self._node._populate_pubkey(self._deserialize_pubkey(pubkey_s))
9582-
9583-        if self._need_privkey:
9584-            self._try_to_extract_privkey(data, peerid, shnum, lp)
9585-
9586-        (ig_version, ig_seqnum, ig_root_hash, ig_IV, ig_k, ig_N,
9587-         ig_segsize, ig_datalen, offsets) = unpack_header(data)
9588+        _, verinfo, signature, __, ___ = results
9589+        (seqnum,
9590+         root_hash,
9591+         saltish,
9592+         segsize,
9593+         datalen,
9594+         k,
9595+         n,
9596+         prefix,
9597+         offsets) = verinfo[1]
9598         offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
9599 
9600hunk ./src/allmydata/mutable/servermap.py 813
9601-        verinfo = (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9602+        # XXX: This should be done for us in the method, so
9603+        # presumably you can go in there and fix it.
9604+        verinfo = (seqnum,
9605+                   root_hash,
9606+                   saltish,
9607+                   segsize,
9608+                   datalen,
9609+                   k,
9610+                   n,
9611+                   prefix,
9612                    offsets_tuple)
9613hunk ./src/allmydata/mutable/servermap.py 824
9614+        # This tuple uniquely identifies a share on the grid; we use it
9615+        # to keep track of the ones that we've already seen.
9616 
9617         if verinfo not in self._valid_versions:
9618hunk ./src/allmydata/mutable/servermap.py 828
9619-            # it's a new pair. Verify the signature.
9620-            valid = self._node.get_pubkey().verify(prefix, signature)
9621+            # This is a new version tuple, and we need to validate it
9622+            # against the public key before keeping track of it.
9623+            assert self._node.get_pubkey()
9624+            valid = self._node.get_pubkey().verify(prefix, signature[1])
9625             if not valid:
9626hunk ./src/allmydata/mutable/servermap.py 833
9627-                raise CorruptShareError(peerid, shnum, "signature is invalid")
9628+                raise CorruptShareError(peerid, shnum,
9629+                                        "signature is invalid")
9630 
9631hunk ./src/allmydata/mutable/servermap.py 836
9632-            # ok, it's a valid verinfo. Add it to the list of validated
9633-            # versions.
9634-            self.log(" found valid version %d-%s from %s-sh%d: %d-%d/%d/%d"
9635-                     % (seqnum, base32.b2a(root_hash)[:4],
9636-                        idlib.shortnodeid_b2a(peerid), shnum,
9637-                        k, N, segsize, datalength),
9638-                     parent=lp)
9639-            self._valid_versions.add(verinfo)
9640-        # We now know that this is a valid candidate verinfo.
9641+        # ok, it's a valid verinfo. Add it to the list of validated
9642+        # versions.
9643+        self.log(" found valid version %d-%s from %s-sh%d: %d-%d/%d/%d"
9644+                 % (seqnum, base32.b2a(root_hash)[:4],
9645+                    idlib.shortnodeid_b2a(peerid), shnum,
9646+                    k, n, segsize, datalen),
9647+                    parent=lp)
9648+        self._valid_versions.add(verinfo)
9649+        # We now know that this is a valid candidate verinfo. Whether or
9650+        # not this instance of it is valid is a matter for the next
9651+        # statement; at this point, we just know that if we see this
9652+        # version info again, that its signature checks out and that
9653+        # we're okay to skip the signature-checking step.
9654 
9655hunk ./src/allmydata/mutable/servermap.py 850
9656+        # (peerid, shnum) are bound in the method invocation.
9657         if (peerid, shnum) in self._servermap.bad_shares:
9658             # we've been told that the rest of the data in this share is
9659             # unusable, so don't add it to the servermap.
9660hunk ./src/allmydata/mutable/servermap.py 863
9661         self._servermap.add_new_share(peerid, shnum, verinfo, timestamp)
9662         # and the versionmap
9663         self.versionmap.add(verinfo, (shnum, peerid, timestamp))
9664+
9665+        # It's our job to set the protocol version of our parent
9666+        # filenode if it isn't already set.
9667+        if not self._node.get_version():
9668+            # The first byte of the prefix is the version.
9669+            v = struct.unpack(">B", prefix[:1])[0]
9670+            self.log("got version %d" % v)
9671+            self._node.set_version(v)
9672+
9673         return verinfo
9674 
9675hunk ./src/allmydata/mutable/servermap.py 874
9676-    def _deserialize_pubkey(self, pubkey_s):
9677-        verifier = rsa.create_verifying_key_from_string(pubkey_s)
9678-        return verifier
9679 
9680hunk ./src/allmydata/mutable/servermap.py 875
9681-    def _try_to_extract_privkey(self, data, peerid, shnum, lp):
9682-        try:
9683-            r = unpack_share(data)
9684-        except NeedMoreDataError, e:
9685-            # this share won't help us. oh well.
9686-            offset = e.encprivkey_offset
9687-            length = e.encprivkey_length
9688-            self.log("shnum %d on peerid %s: share was too short (%dB) "
9689-                     "to get the encprivkey; [%d:%d] ought to hold it" %
9690-                     (shnum, idlib.shortnodeid_b2a(peerid), len(data),
9691-                      offset, offset+length),
9692-                     parent=lp)
9693-            # NOTE: if uncoordinated writes are taking place, someone might
9694-            # change the share (and most probably move the encprivkey) before
9695-            # we get a chance to do one of these reads and fetch it. This
9696-            # will cause us to see a NotEnoughSharesError(unable to fetch
9697-            # privkey) instead of an UncoordinatedWriteError . This is a
9698-            # nuisance, but it will go away when we move to DSA-based mutable
9699-            # files (since the privkey will be small enough to fit in the
9700-            # write cap).
9701+    def _got_update_results_one_share(self, results, share):
9702+        """
9703+        I record the update results in results.
9704+        """
9705+        assert len(results) == 4
9706+        verinfo, blockhashes, start, end = results
9707+        (seqnum,
9708+         root_hash,
9709+         saltish,
9710+         segsize,
9711+         datalen,
9712+         k,
9713+         n,
9714+         prefix,
9715+         offsets) = verinfo
9716+        offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
9717 
9718hunk ./src/allmydata/mutable/servermap.py 892
9719-            return
9720+        # XXX: This should be done for us in the method, so
9721+        # presumably you can go in there and fix it.
9722+        verinfo = (seqnum,
9723+                   root_hash,
9724+                   saltish,
9725+                   segsize,
9726+                   datalen,
9727+                   k,
9728+                   n,
9729+                   prefix,
9730+                   offsets_tuple)
9731 
9732hunk ./src/allmydata/mutable/servermap.py 904
9733-        (seqnum, root_hash, IV, k, N, segsize, datalen,
9734-         pubkey, signature, share_hash_chain, block_hash_tree,
9735-         share_data, enc_privkey) = r
9736+        update_data = (blockhashes, start, end)
9737+        self._servermap.set_update_data_for_share_and_verinfo(share,
9738+                                                              verinfo,
9739+                                                              update_data)
9740 
9741hunk ./src/allmydata/mutable/servermap.py 909
9742-        return self._try_to_validate_privkey(enc_privkey, peerid, shnum, lp)
9743+
9744+    def _deserialize_pubkey(self, pubkey_s):
9745+        verifier = rsa.create_verifying_key_from_string(pubkey_s)
9746+        return verifier
9747 
9748hunk ./src/allmydata/mutable/servermap.py 914
9749-    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
9750 
9751hunk ./src/allmydata/mutable/servermap.py 915
9752+    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
9753+        """
9754+        Given a writekey from a remote server, I validate it against the
9755+        writekey stored in my node. If it is valid, then I set the
9756+        privkey and encprivkey properties of the node.
9757+        """
9758         alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
9759         alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
9760         if alleged_writekey != self._node.get_writekey():
9761hunk ./src/allmydata/mutable/servermap.py 993
9762         self._queries_completed += 1
9763         self._last_failure = f
9764 
9765-    def _got_privkey_results(self, datavs, peerid, shnum, started, lp):
9766-        now = time.time()
9767-        elapsed = now - started
9768-        self._status.add_per_server_time(peerid, "privkey", started, elapsed)
9769-        self._queries_outstanding.discard(peerid)
9770-        if not self._need_privkey:
9771-            return
9772-        if shnum not in datavs:
9773-            self.log("privkey wasn't there when we asked it",
9774-                     level=log.WEIRD, umid="VA9uDQ")
9775-            return
9776-        datav = datavs[shnum]
9777-        enc_privkey = datav[0]
9778-        self._try_to_validate_privkey(enc_privkey, peerid, shnum, lp)
9779 
9780     def _privkey_query_failed(self, f, peerid, shnum, lp):
9781         self._queries_outstanding.discard(peerid)
9782hunk ./src/allmydata/mutable/servermap.py 1007
9783         self._servermap.problems.append(f)
9784         self._last_failure = f
9785 
9786+
9787     def _check_for_done(self, res):
9788         # exit paths:
9789         #  return self._send_more_queries(outstanding) : send some more queries
9790hunk ./src/allmydata/mutable/servermap.py 1013
9791         #  return self._done() : all done
9792         #  return : keep waiting, no new queries
9793-
9794         lp = self.log(format=("_check_for_done, mode is '%(mode)s', "
9795                               "%(outstanding)d queries outstanding, "
9796                               "%(extra)d extra peers available, "
9797hunk ./src/allmydata/mutable/servermap.py 1204
9798 
9799     def _done(self):
9800         if not self._running:
9801+            self.log("not running; we're already done")
9802             return
9803         self._running = False
9804         now = time.time()
9805hunk ./src/allmydata/mutable/servermap.py 1219
9806         self._servermap.last_update_time = self._started
9807         # the servermap will not be touched after this
9808         self.log("servermap: %s" % self._servermap.summarize_versions())
9809+
9810         eventually(self._done_deferred.callback, self._servermap)
9811 
9812     def _fatal_error(self, f):
9813}
9814[tests:
9815Kevan Carstensen <kevan@isnotajoke.com>**20100819003531
9816 Ignore-this: 314e8bbcce532ea4d5d2cecc9f31cca0
9817 
9818     - A lot of existing tests relied on aspects of the mutable file
9819       implementation that were changed. This patch updates those tests
9820       to work with the changes.
9821     - This patch also adds tests for new features.
9822] {
9823hunk ./src/allmydata/test/common.py 11
9824 from foolscap.api import flushEventualQueue, fireEventually
9825 from allmydata import uri, dirnode, client
9826 from allmydata.introducer.server import IntroducerNode
9827-from allmydata.interfaces import IMutableFileNode, IImmutableFileNode, \
9828-     FileTooLargeError, NotEnoughSharesError, ICheckable
9829+from allmydata.interfaces import IMutableFileNode, IImmutableFileNode,\
9830+                                 NotEnoughSharesError, ICheckable, \
9831+                                 IMutableUploadable, SDMF_VERSION, \
9832+                                 MDMF_VERSION
9833 from allmydata.check_results import CheckResults, CheckAndRepairResults, \
9834      DeepCheckResults, DeepCheckAndRepairResults
9835 from allmydata.mutable.common import CorruptShareError
9836hunk ./src/allmydata/test/common.py 19
9837 from allmydata.mutable.layout import unpack_header
9838+from allmydata.mutable.publish import MutableData
9839 from allmydata.storage.server import storage_index_to_dir
9840 from allmydata.storage.mutable import MutableShareFile
9841 from allmydata.util import hashutil, log, fileutil, pollmixin
9842hunk ./src/allmydata/test/common.py 153
9843         consumer.write(data[start:end])
9844         return consumer
9845 
9846+
9847+    def get_best_readable_version(self):
9848+        return defer.succeed(self)
9849+
9850+
9851+    download_best_version = download_to_data
9852+
9853+
9854+    def download_to_data(self):
9855+        return download_to_data(self)
9856+
9857+
9858+    def get_size_of_best_version(self):
9859+        return defer.succeed(self.get_size)
9860+
9861+
9862 def make_chk_file_cap(size):
9863     return uri.CHKFileURI(key=os.urandom(16),
9864                           uri_extension_hash=os.urandom(32),
9865hunk ./src/allmydata/test/common.py 193
9866     MUTABLE_SIZELIMIT = 10000
9867     all_contents = {}
9868     bad_shares = {}
9869+    file_types = {} # storage index => MDMF_VERSION or SDMF_VERSION
9870 
9871     def __init__(self, storage_broker, secret_holder,
9872                  default_encoding_parameters, history):
9873hunk ./src/allmydata/test/common.py 200
9874         self.init_from_cap(make_mutable_file_cap())
9875     def create(self, contents, key_generator=None, keysize=None):
9876         initial_contents = self._get_initial_contents(contents)
9877-        if len(initial_contents) > self.MUTABLE_SIZELIMIT:
9878-            raise FileTooLargeError("SDMF is limited to one segment, and "
9879-                                    "%d > %d" % (len(initial_contents),
9880-                                                 self.MUTABLE_SIZELIMIT))
9881-        self.all_contents[self.storage_index] = initial_contents
9882+        data = initial_contents.read(initial_contents.get_size())
9883+        data = "".join(data)
9884+        self.all_contents[self.storage_index] = data
9885         return defer.succeed(self)
9886     def _get_initial_contents(self, contents):
9887hunk ./src/allmydata/test/common.py 205
9888-        if isinstance(contents, str):
9889-            return contents
9890         if contents is None:
9891hunk ./src/allmydata/test/common.py 206
9892-            return ""
9893+            return MutableData("")
9894+
9895+        if IMutableUploadable.providedBy(contents):
9896+            return contents
9897+
9898         assert callable(contents), "%s should be callable, not %s" % \
9899                (contents, type(contents))
9900         return contents(self)
9901hunk ./src/allmydata/test/common.py 258
9902     def get_storage_index(self):
9903         return self.storage_index
9904 
9905+    def get_servermap(self, mode):
9906+        return defer.succeed(None)
9907+
9908+    def set_version(self, version):
9909+        assert version in (SDMF_VERSION, MDMF_VERSION)
9910+        self.file_types[self.storage_index] = version
9911+
9912+    def get_version(self):
9913+        assert self.storage_index in self.file_types
9914+        return self.file_types[self.storage_index]
9915+
9916     def check(self, monitor, verify=False, add_lease=False):
9917         r = CheckResults(self.my_uri, self.storage_index)
9918         is_bad = self.bad_shares.get(self.storage_index, None)
9919hunk ./src/allmydata/test/common.py 327
9920         return d
9921 
9922     def download_best_version(self):
9923+        return defer.succeed(self._download_best_version())
9924+
9925+
9926+    def _download_best_version(self, ignored=None):
9927         if isinstance(self.my_uri, uri.LiteralFileURI):
9928hunk ./src/allmydata/test/common.py 332
9929-            return defer.succeed(self.my_uri.data)
9930+            return self.my_uri.data
9931         if self.storage_index not in self.all_contents:
9932hunk ./src/allmydata/test/common.py 334
9933-            return defer.fail(NotEnoughSharesError(None, 0, 3))
9934-        return defer.succeed(self.all_contents[self.storage_index])
9935+            raise NotEnoughSharesError(None, 0, 3)
9936+        return self.all_contents[self.storage_index]
9937+
9938 
9939     def overwrite(self, new_contents):
9940hunk ./src/allmydata/test/common.py 339
9941-        if len(new_contents) > self.MUTABLE_SIZELIMIT:
9942-            raise FileTooLargeError("SDMF is limited to one segment, and "
9943-                                    "%d > %d" % (len(new_contents),
9944-                                                 self.MUTABLE_SIZELIMIT))
9945         assert not self.is_readonly()
9946hunk ./src/allmydata/test/common.py 340
9947-        self.all_contents[self.storage_index] = new_contents
9948+        new_data = new_contents.read(new_contents.get_size())
9949+        new_data = "".join(new_data)
9950+        self.all_contents[self.storage_index] = new_data
9951         return defer.succeed(None)
9952     def modify(self, modifier):
9953         # this does not implement FileTooLargeError, but the real one does
9954hunk ./src/allmydata/test/common.py 350
9955     def _modify(self, modifier):
9956         assert not self.is_readonly()
9957         old_contents = self.all_contents[self.storage_index]
9958-        self.all_contents[self.storage_index] = modifier(old_contents, None, True)
9959+        new_data = modifier(old_contents, None, True)
9960+        self.all_contents[self.storage_index] = new_data
9961         return None
9962 
9963hunk ./src/allmydata/test/common.py 354
9964+    # As actually implemented, MutableFilenode and MutableFileVersion
9965+    # are distinct. However, nothing in the webapi uses (yet) that
9966+    # distinction -- it just uses the unified download interface
9967+    # provided by get_best_readable_version and read. When we start
9968+    # doing cooler things like LDMF, we will want to revise this code to
9969+    # be less simplistic.
9970+    def get_best_readable_version(self):
9971+        return defer.succeed(self)
9972+
9973+
9974+    def get_best_mutable_version(self):
9975+        return defer.succeed(self)
9976+
9977+    # Ditto for this, which is an implementation of IWritable.
9978+    # XXX: Declare that the same is implemented.
9979+    def update(self, data, offset):
9980+        assert not self.is_readonly()
9981+        def modifier(old, servermap, first_time):
9982+            new = old[:offset] + "".join(data.read(data.get_size()))
9983+            new += old[len(new):]
9984+            return new
9985+        return self.modify(modifier)
9986+
9987+
9988+    def read(self, consumer, offset=0, size=None):
9989+        data = self._download_best_version()
9990+        if size:
9991+            data = data[offset:offset+size]
9992+        consumer.write(data)
9993+        return defer.succeed(consumer)
9994+
9995+
9996 def make_mutable_file_cap():
9997     return uri.WriteableSSKFileURI(writekey=os.urandom(16),
9998                                    fingerprint=os.urandom(32))
9999hunk ./src/allmydata/test/test_checker.py 11
10000 from allmydata.test.no_network import GridTestMixin
10001 from allmydata.immutable.upload import Data
10002 from allmydata.test.common_web import WebRenderingMixin
10003+from allmydata.mutable.publish import MutableData
10004 
10005 class FakeClient:
10006     def get_storage_broker(self):
10007hunk ./src/allmydata/test/test_checker.py 291
10008         def _stash_immutable(ur):
10009             self.imm = c0.create_node_from_uri(ur.uri)
10010         d.addCallback(_stash_immutable)
10011-        d.addCallback(lambda ign: c0.create_mutable_file("contents"))
10012+        d.addCallback(lambda ign:
10013+            c0.create_mutable_file(MutableData("contents")))
10014         def _stash_mutable(node):
10015             self.mut = node
10016         d.addCallback(_stash_mutable)
10017hunk ./src/allmydata/test/test_cli.py 13
10018 from allmydata.util import fileutil, hashutil, base32
10019 from allmydata import uri
10020 from allmydata.immutable import upload
10021+from allmydata.mutable.publish import MutableData
10022 from allmydata.dirnode import normalize
10023 
10024 # Test that the scripts can be imported.
10025hunk ./src/allmydata/test/test_cli.py 662
10026 
10027         d = self.do_cli("create-alias", etudes_arg)
10028         def _check_create_unicode((rc, out, err)):
10029-            self.failUnlessReallyEqual(rc, 0)
10030+            #self.failUnlessReallyEqual(rc, 0)
10031             self.failUnlessReallyEqual(err, "")
10032             self.failUnlessIn("Alias %s created" % quote_output(u"\u00E9tudes"), out)
10033 
10034hunk ./src/allmydata/test/test_cli.py 967
10035         d.addCallback(lambda (rc,out,err): self.failUnlessReallyEqual(out, DATA2))
10036         return d
10037 
10038+    def test_mutable_type(self):
10039+        self.basedir = "cli/Put/mutable_type"
10040+        self.set_up_grid()
10041+        data = "data" * 100000
10042+        fn1 = os.path.join(self.basedir, "data")
10043+        fileutil.write(fn1, data)
10044+        d = self.do_cli("create-alias", "tahoe")
10045+        d.addCallback(lambda ignored:
10046+            self.do_cli("put", "--mutable", "--mutable-type=mdmf",
10047+                        fn1, "tahoe:uploaded.txt"))
10048+        d.addCallback(lambda ignored:
10049+            self.do_cli("ls", "--json", "tahoe:uploaded.txt"))
10050+        d.addCallback(lambda (rc, json, err): self.failUnlessIn("mdmf", json))
10051+        d.addCallback(lambda ignored:
10052+            self.do_cli("put", "--mutable", "--mutable-type=sdmf",
10053+                        fn1, "tahoe:uploaded2.txt"))
10054+        d.addCallback(lambda ignored:
10055+            self.do_cli("ls", "--json", "tahoe:uploaded2.txt"))
10056+        d.addCallback(lambda (rc, json, err):
10057+            self.failUnlessIn("sdmf", json))
10058+        return d
10059+
10060+    def test_mutable_type_unlinked(self):
10061+        self.basedir = "cli/Put/mutable_type_unlinked"
10062+        self.set_up_grid()
10063+        data = "data" * 100000
10064+        fn1 = os.path.join(self.basedir, "data")
10065+        fileutil.write(fn1, data)
10066+        d = self.do_cli("put", "--mutable", "--mutable-type=mdmf", fn1)
10067+        d.addCallback(lambda (rc, cap, err):
10068+            self.do_cli("ls", "--json", cap))
10069+        d.addCallback(lambda (rc, json, err): self.failUnlessIn("mdmf", json))
10070+        d.addCallback(lambda ignored:
10071+            self.do_cli("put", "--mutable", "--mutable-type=sdmf", fn1))
10072+        d.addCallback(lambda (rc, cap, err):
10073+            self.do_cli("ls", "--json", cap))
10074+        d.addCallback(lambda (rc, json, err):
10075+            self.failUnlessIn("sdmf", json))
10076+        return d
10077+
10078+    def test_mutable_type_invalid_format(self):
10079+        self.basedir = "cli/Put/mutable_type_invalid_format"
10080+        self.set_up_grid()
10081+        data = "data" * 100000
10082+        fn1 = os.path.join(self.basedir, "data")
10083+        fileutil.write(fn1, data)
10084+        d = self.do_cli("put", "--mutable", "--mutable-type=ldmf", fn1)
10085+        def _check_failure((rc, out, err)):
10086+            self.failIfEqual(rc, 0)
10087+            self.failUnlessIn("invalid", err)
10088+        d.addCallback(_check_failure)
10089+        return d
10090+
10091     def test_put_with_nonexistent_alias(self):
10092         # when invoked with an alias that doesn't exist, 'tahoe put'
10093         # should output a useful error message, not a stack trace
10094hunk ./src/allmydata/test/test_cli.py 2136
10095         self.set_up_grid()
10096         c0 = self.g.clients[0]
10097         DATA = "data" * 100
10098-        d = c0.create_mutable_file(DATA)
10099+        DATA_uploadable = MutableData(DATA)
10100+        d = c0.create_mutable_file(DATA_uploadable)
10101         def _stash_uri(n):
10102             self.uri = n.get_uri()
10103         d.addCallback(_stash_uri)
10104hunk ./src/allmydata/test/test_cli.py 2238
10105                                            upload.Data("literal",
10106                                                         convergence="")))
10107         d.addCallback(_stash_uri, "small")
10108-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"1"))
10109+        d.addCallback(lambda ign:
10110+            c0.create_mutable_file(MutableData(DATA+"1")))
10111         d.addCallback(lambda fn: self.rootnode.set_node(u"mutable", fn))
10112         d.addCallback(_stash_uri, "mutable")
10113 
10114hunk ./src/allmydata/test/test_cli.py 2257
10115         # root/small
10116         # root/mutable
10117 
10118+        # We haven't broken anything yet, so this should all be healthy.
10119         d.addCallback(lambda ign: self.do_cli("deep-check", "--verbose",
10120                                               self.rooturi))
10121         def _check2((rc, out, err)):
10122hunk ./src/allmydata/test/test_cli.py 2272
10123                             in lines, out)
10124         d.addCallback(_check2)
10125 
10126+        # Similarly, all of these results should be as we expect them to
10127+        # be for a healthy file layout.
10128         d.addCallback(lambda ign: self.do_cli("stats", self.rooturi))
10129         def _check_stats((rc, out, err)):
10130             self.failUnlessReallyEqual(err, "")
10131hunk ./src/allmydata/test/test_cli.py 2289
10132             self.failUnlessIn(" 317-1000 : 1    (1000 B, 1000 B)", lines)
10133         d.addCallback(_check_stats)
10134 
10135+        # Now we break things.
10136         def _clobber_shares(ignored):
10137             shares = self.find_uri_shares(self.uris[u"g\u00F6\u00F6d"])
10138             self.failUnlessReallyEqual(len(shares), 10)
10139hunk ./src/allmydata/test/test_cli.py 2314
10140 
10141         d.addCallback(lambda ign:
10142                       self.do_cli("deep-check", "--verbose", self.rooturi))
10143+        # This should reveal the missing share, but not the corrupt
10144+        # share, since we didn't tell the deep check operation to also
10145+        # verify.
10146         def _check3((rc, out, err)):
10147             self.failUnlessReallyEqual(err, "")
10148             self.failUnlessReallyEqual(rc, 0)
10149hunk ./src/allmydata/test/test_cli.py 2365
10150                                   "--verbose", "--verify", "--repair",
10151                                   self.rooturi))
10152         def _check6((rc, out, err)):
10153+            # We've just repaired the directory. There is no reason for
10154+            # that repair to be unsuccessful.
10155             self.failUnlessReallyEqual(err, "")
10156             self.failUnlessReallyEqual(rc, 0)
10157             lines = out.splitlines()
10158hunk ./src/allmydata/test/test_deepcheck.py 9
10159 from twisted.internet import threads # CLI tests use deferToThread
10160 from allmydata.immutable import upload
10161 from allmydata.mutable.common import UnrecoverableFileError
10162+from allmydata.mutable.publish import MutableData
10163 from allmydata.util import idlib
10164 from allmydata.util import base32
10165 from allmydata.scripts import runner
10166hunk ./src/allmydata/test/test_deepcheck.py 38
10167         self.basedir = "deepcheck/MutableChecker/good"
10168         self.set_up_grid()
10169         CONTENTS = "a little bit of data"
10170-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10171+        CONTENTS_uploadable = MutableData(CONTENTS)
10172+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10173         def _created(node):
10174             self.node = node
10175             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10176hunk ./src/allmydata/test/test_deepcheck.py 61
10177         self.basedir = "deepcheck/MutableChecker/corrupt"
10178         self.set_up_grid()
10179         CONTENTS = "a little bit of data"
10180-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10181+        CONTENTS_uploadable = MutableData(CONTENTS)
10182+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10183         def _stash_and_corrupt(node):
10184             self.node = node
10185             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10186hunk ./src/allmydata/test/test_deepcheck.py 99
10187         self.basedir = "deepcheck/MutableChecker/delete_share"
10188         self.set_up_grid()
10189         CONTENTS = "a little bit of data"
10190-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10191+        CONTENTS_uploadable = MutableData(CONTENTS)
10192+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10193         def _stash_and_delete(node):
10194             self.node = node
10195             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10196hunk ./src/allmydata/test/test_deepcheck.py 223
10197             self.root = n
10198             self.root_uri = n.get_uri()
10199         d.addCallback(_created_root)
10200-        d.addCallback(lambda ign: c0.create_mutable_file("mutable file contents"))
10201+        d.addCallback(lambda ign:
10202+            c0.create_mutable_file(MutableData("mutable file contents")))
10203         d.addCallback(lambda n: self.root.set_node(u"mutable", n))
10204         def _created_mutable(n):
10205             self.mutable = n
10206hunk ./src/allmydata/test/test_deepcheck.py 965
10207     def create_mangled(self, ignored, name):
10208         nodetype, mangletype = name.split("-", 1)
10209         if nodetype == "mutable":
10210-            d = self.g.clients[0].create_mutable_file("mutable file contents")
10211+            mutable_uploadable = MutableData("mutable file contents")
10212+            d = self.g.clients[0].create_mutable_file(mutable_uploadable)
10213             d.addCallback(lambda n: self.root.set_node(unicode(name), n))
10214         elif nodetype == "large":
10215             large = upload.Data("Lots of data\n" * 1000 + name + "\n", None)
10216hunk ./src/allmydata/test/test_dirnode.py 1304
10217     implements(IMutableFileNode)
10218     counter = 0
10219     def __init__(self, initial_contents=""):
10220-        self.data = self._get_initial_contents(initial_contents)
10221+        data = self._get_initial_contents(initial_contents)
10222+        self.data = data.read(data.get_size())
10223+        self.data = "".join(self.data)
10224+
10225         counter = FakeMutableFile.counter
10226         FakeMutableFile.counter += 1
10227         writekey = hashutil.ssk_writekey_hash(str(counter))
10228hunk ./src/allmydata/test/test_dirnode.py 1354
10229         pass
10230 
10231     def modify(self, modifier):
10232-        self.data = modifier(self.data, None, True)
10233+        data = modifier(self.data, None, True)
10234+        self.data = data
10235         return defer.succeed(None)
10236 
10237 class FakeNodeMaker(NodeMaker):
10238hunk ./src/allmydata/test/test_dirnode.py 1359
10239-    def create_mutable_file(self, contents="", keysize=None):
10240+    def create_mutable_file(self, contents="", keysize=None, version=None):
10241         return defer.succeed(FakeMutableFile(contents))
10242 
10243 class FakeClient2(Client):
10244hunk ./src/allmydata/test/test_filenode.py 98
10245         def _check_segment(res):
10246             self.failUnlessEqual(res, DATA[1:1+5])
10247         d.addCallback(_check_segment)
10248+        d.addCallback(lambda ignored: fn1.get_best_readable_version())
10249+        d.addCallback(lambda fn2: self.failUnlessEqual(fn1, fn2))
10250+        d.addCallback(lambda ignored:
10251+            fn1.get_size_of_best_version())
10252+        d.addCallback(lambda size:
10253+            self.failUnlessEqual(size, len(DATA)))
10254+        d.addCallback(lambda ignored:
10255+            fn1.download_to_data())
10256+        d.addCallback(lambda data:
10257+            self.failUnlessEqual(data, DATA))
10258+        d.addCallback(lambda ignored:
10259+            fn1.download_best_version())
10260+        d.addCallback(lambda data:
10261+            self.failUnlessEqual(data, DATA))
10262 
10263         return d
10264 
10265hunk ./src/allmydata/test/test_hung_server.py 10
10266 from allmydata.util.consumer import download_to_data
10267 from allmydata.immutable import upload
10268 from allmydata.mutable.common import UnrecoverableFileError
10269+from allmydata.mutable.publish import MutableData
10270 from allmydata.storage.common import storage_index_to_dir
10271 from allmydata.test.no_network import GridTestMixin
10272 from allmydata.test.common import ShouldFailMixin
10273hunk ./src/allmydata/test/test_hung_server.py 110
10274         self.servers = self.servers[5:] + self.servers[:5]
10275 
10276         if mutable:
10277-            d = nm.create_mutable_file(mutable_plaintext)
10278+            uploadable = MutableData(mutable_plaintext)
10279+            d = nm.create_mutable_file(uploadable)
10280             def _uploaded_mutable(node):
10281                 self.uri = node.get_uri()
10282                 self.shares = self.find_uri_shares(self.uri)
10283hunk ./src/allmydata/test/test_immutable.py 263
10284         d.addCallback(_after_attempt)
10285         return d
10286 
10287+    def test_download_to_data(self):
10288+        d = self.n.download_to_data()
10289+        d.addCallback(lambda data:
10290+            self.failUnlessEqual(data, common.TEST_DATA))
10291+        return d
10292 
10293hunk ./src/allmydata/test/test_immutable.py 269
10294+
10295+    def test_download_best_version(self):
10296+        d = self.n.download_best_version()
10297+        d.addCallback(lambda data:
10298+            self.failUnlessEqual(data, common.TEST_DATA))
10299+        return d
10300+
10301+
10302+    def test_get_best_readable_version(self):
10303+        d = self.n.get_best_readable_version()
10304+        d.addCallback(lambda n2:
10305+            self.failUnlessEqual(n2, self.n))
10306+        return d
10307+
10308+    def test_get_size_of_best_version(self):
10309+        d = self.n.get_size_of_best_version()
10310+        d.addCallback(lambda size:
10311+            self.failUnlessEqual(size, len(common.TEST_DATA)))
10312+        return d
10313+
10314+
10315 # XXX extend these tests to show bad behavior of various kinds from servers:
10316 # raising exception from each remove_foo() method, for example
10317 
10318hunk ./src/allmydata/test/test_mutable.py 2
10319 
10320-import struct
10321+import os
10322 from cStringIO import StringIO
10323 from twisted.trial import unittest
10324 from twisted.internet import defer, reactor
10325hunk ./src/allmydata/test/test_mutable.py 8
10326 from allmydata import uri, client
10327 from allmydata.nodemaker import NodeMaker
10328-from allmydata.util import base32
10329+from allmydata.util import base32, consumer
10330 from allmydata.util.hashutil import tagged_hash, ssk_writekey_hash, \
10331      ssk_pubkey_fingerprint_hash
10332hunk ./src/allmydata/test/test_mutable.py 11
10333+from allmydata.util.deferredutil import gatherResults
10334 from allmydata.interfaces import IRepairResults, ICheckAndRepairResults, \
10335hunk ./src/allmydata/test/test_mutable.py 13
10336-     NotEnoughSharesError
10337+     NotEnoughSharesError, SDMF_VERSION, MDMF_VERSION
10338 from allmydata.monitor import Monitor
10339 from allmydata.test.common import ShouldFailMixin
10340 from allmydata.test.no_network import GridTestMixin
10341hunk ./src/allmydata/test/test_mutable.py 27
10342      NeedMoreDataError, UnrecoverableFileError, UncoordinatedWriteError, \
10343      NotEnoughServersError, CorruptShareError
10344 from allmydata.mutable.retrieve import Retrieve
10345-from allmydata.mutable.publish import Publish
10346+from allmydata.mutable.publish import Publish, MutableFileHandle, \
10347+                                      MutableData, \
10348+                                      DEFAULT_MAX_SEGMENT_SIZE
10349 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
10350hunk ./src/allmydata/test/test_mutable.py 31
10351-from allmydata.mutable.layout import unpack_header, unpack_share
10352+from allmydata.mutable.layout import unpack_header, MDMFSlotReadProxy
10353 from allmydata.mutable.repairer import MustForceRepairError
10354 
10355 import allmydata.test.common_util as testutil
10356hunk ./src/allmydata/test/test_mutable.py 100
10357         self.storage = storage
10358         self.queries = 0
10359     def callRemote(self, methname, *args, **kwargs):
10360+        self.queries += 1
10361         def _call():
10362             meth = getattr(self, methname)
10363             return meth(*args, **kwargs)
10364hunk ./src/allmydata/test/test_mutable.py 107
10365         d = fireEventually()
10366         d.addCallback(lambda res: _call())
10367         return d
10368+
10369     def callRemoteOnly(self, methname, *args, **kwargs):
10370hunk ./src/allmydata/test/test_mutable.py 109
10371+        self.queries += 1
10372         d = self.callRemote(methname, *args, **kwargs)
10373         d.addBoth(lambda ignore: None)
10374         pass
10375hunk ./src/allmydata/test/test_mutable.py 157
10376             chr(ord(original[byte_offset]) ^ 0x01) +
10377             original[byte_offset+1:])
10378 
10379+def add_two(original, byte_offset):
10380+    # It isn't enough to simply flip the bit for the version number,
10381+    # because 1 is a valid version number. So we add two instead.
10382+    return (original[:byte_offset] +
10383+            chr(ord(original[byte_offset]) ^ 0x02) +
10384+            original[byte_offset+1:])
10385+
10386 def corrupt(res, s, offset, shnums_to_corrupt=None, offset_offset=0):
10387     # if shnums_to_corrupt is None, corrupt all shares. Otherwise it is a
10388     # list of shnums to corrupt.
10389hunk ./src/allmydata/test/test_mutable.py 167
10390+    ds = []
10391     for peerid in s._peers:
10392         shares = s._peers[peerid]
10393         for shnum in shares:
10394hunk ./src/allmydata/test/test_mutable.py 175
10395                 and shnum not in shnums_to_corrupt):
10396                 continue
10397             data = shares[shnum]
10398-            (version,
10399-             seqnum,
10400-             root_hash,
10401-             IV,
10402-             k, N, segsize, datalen,
10403-             o) = unpack_header(data)
10404-            if isinstance(offset, tuple):
10405-                offset1, offset2 = offset
10406-            else:
10407-                offset1 = offset
10408-                offset2 = 0
10409-            if offset1 == "pubkey":
10410-                real_offset = 107
10411-            elif offset1 in o:
10412-                real_offset = o[offset1]
10413-            else:
10414-                real_offset = offset1
10415-            real_offset = int(real_offset) + offset2 + offset_offset
10416-            assert isinstance(real_offset, int), offset
10417-            shares[shnum] = flip_bit(data, real_offset)
10418-    return res
10419+            # We're feeding the reader all of the share data, so it
10420+            # won't need to use the rref that we didn't provide, nor the
10421+            # storage index that we didn't provide. We do this because
10422+            # the reader will work for both MDMF and SDMF.
10423+            reader = MDMFSlotReadProxy(None, None, shnum, data)
10424+            # We need to get the offsets for the next part.
10425+            d = reader.get_verinfo()
10426+            def _do_corruption(verinfo, data, shnum):
10427+                (seqnum,
10428+                 root_hash,
10429+                 IV,
10430+                 segsize,
10431+                 datalen,
10432+                 k, n, prefix, o) = verinfo
10433+                if isinstance(offset, tuple):
10434+                    offset1, offset2 = offset
10435+                else:
10436+                    offset1 = offset
10437+                    offset2 = 0
10438+                if offset1 == "pubkey" and IV:
10439+                    real_offset = 107
10440+                elif offset1 == "share_data" and not IV:
10441+                    real_offset = 107
10442+                elif offset1 in o:
10443+                    real_offset = o[offset1]
10444+                else:
10445+                    real_offset = offset1
10446+                real_offset = int(real_offset) + offset2 + offset_offset
10447+                assert isinstance(real_offset, int), offset
10448+                if offset1 == 0: # verbyte
10449+                    f = add_two
10450+                else:
10451+                    f = flip_bit
10452+                shares[shnum] = f(data, real_offset)
10453+            d.addCallback(_do_corruption, data, shnum)
10454+            ds.append(d)
10455+    dl = defer.DeferredList(ds)
10456+    dl.addCallback(lambda ignored: res)
10457+    return dl
10458 
10459 def make_storagebroker(s=None, num_peers=10):
10460     if not s:
10461hunk ./src/allmydata/test/test_mutable.py 256
10462             self.failUnlessEqual(len(shnums), 1)
10463         d.addCallback(_created)
10464         return d
10465+    test_create.timeout = 15
10466+
10467+
10468+    def test_create_mdmf(self):
10469+        d = self.nodemaker.create_mutable_file(version=MDMF_VERSION)
10470+        def _created(n):
10471+            self.failUnless(isinstance(n, MutableFileNode))
10472+            self.failUnlessEqual(n.get_storage_index(), n._storage_index)
10473+            sb = self.nodemaker.storage_broker
10474+            peer0 = sorted(sb.get_all_serverids())[0]
10475+            shnums = self._storage._peers[peer0].keys()
10476+            self.failUnlessEqual(len(shnums), 1)
10477+        d.addCallback(_created)
10478+        return d
10479+
10480 
10481     def test_serialize(self):
10482         n = MutableFileNode(None, None, {"k": 3, "n": 10}, None)
10483hunk ./src/allmydata/test/test_mutable.py 301
10484             d.addCallback(lambda smap: smap.dump(StringIO()))
10485             d.addCallback(lambda sio:
10486                           self.failUnless("3-of-10" in sio.getvalue()))
10487-            d.addCallback(lambda res: n.overwrite("contents 1"))
10488+            d.addCallback(lambda res: n.overwrite(MutableData("contents 1")))
10489             d.addCallback(lambda res: self.failUnlessIdentical(res, None))
10490             d.addCallback(lambda res: n.download_best_version())
10491             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10492hunk ./src/allmydata/test/test_mutable.py 308
10493             d.addCallback(lambda res: n.get_size_of_best_version())
10494             d.addCallback(lambda size:
10495                           self.failUnlessEqual(size, len("contents 1")))
10496-            d.addCallback(lambda res: n.overwrite("contents 2"))
10497+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
10498             d.addCallback(lambda res: n.download_best_version())
10499             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10500             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
10501hunk ./src/allmydata/test/test_mutable.py 312
10502-            d.addCallback(lambda smap: n.upload("contents 3", smap))
10503+            d.addCallback(lambda smap: n.upload(MutableData("contents 3"), smap))
10504             d.addCallback(lambda res: n.download_best_version())
10505             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 3"))
10506             d.addCallback(lambda res: n.get_servermap(MODE_ANYTHING))
10507hunk ./src/allmydata/test/test_mutable.py 324
10508             # mapupdate-to-retrieve data caching (i.e. make the shares larger
10509             # than the default readsize, which is 2000 bytes). A 15kB file
10510             # will have 5kB shares.
10511-            d.addCallback(lambda res: n.overwrite("large size file" * 1000))
10512+            d.addCallback(lambda res: n.overwrite(MutableData("large size file" * 1000)))
10513             d.addCallback(lambda res: n.download_best_version())
10514             d.addCallback(lambda res:
10515                           self.failUnlessEqual(res, "large size file" * 1000))
10516hunk ./src/allmydata/test/test_mutable.py 332
10517         d.addCallback(_created)
10518         return d
10519 
10520+
10521+    def test_upload_and_download_mdmf(self):
10522+        d = self.nodemaker.create_mutable_file(version=MDMF_VERSION)
10523+        def _created(n):
10524+            d = defer.succeed(None)
10525+            d.addCallback(lambda ignored:
10526+                n.get_servermap(MODE_READ))
10527+            def _then(servermap):
10528+                dumped = servermap.dump(StringIO())
10529+                self.failUnlessIn("3-of-10", dumped.getvalue())
10530+            d.addCallback(_then)
10531+            # Now overwrite the contents with some new contents. We want
10532+            # to make them big enough to force the file to be uploaded
10533+            # in more than one segment.
10534+            big_contents = "contents1" * 100000 # about 900 KiB
10535+            big_contents_uploadable = MutableData(big_contents)
10536+            d.addCallback(lambda ignored:
10537+                n.overwrite(big_contents_uploadable))
10538+            d.addCallback(lambda ignored:
10539+                n.download_best_version())
10540+            d.addCallback(lambda data:
10541+                self.failUnlessEqual(data, big_contents))
10542+            # Overwrite the contents again with some new contents. As
10543+            # before, they need to be big enough to force multiple
10544+            # segments, so that we make the downloader deal with
10545+            # multiple segments.
10546+            bigger_contents = "contents2" * 1000000 # about 9MiB
10547+            bigger_contents_uploadable = MutableData(bigger_contents)
10548+            d.addCallback(lambda ignored:
10549+                n.overwrite(bigger_contents_uploadable))
10550+            d.addCallback(lambda ignored:
10551+                n.download_best_version())
10552+            d.addCallback(lambda data:
10553+                self.failUnlessEqual(data, bigger_contents))
10554+            return d
10555+        d.addCallback(_created)
10556+        return d
10557+
10558+
10559+    def test_mdmf_write_count(self):
10560+        # Publishing an MDMF file should only cause one write for each
10561+        # share that is to be published. Otherwise, we introduce
10562+        # undesirable semantics that are a regression from SDMF
10563+        upload = MutableData("MDMF" * 100000) # about 400 KiB
10564+        d = self.nodemaker.create_mutable_file(upload,
10565+                                               version=MDMF_VERSION)
10566+        def _check_server_write_counts(ignored):
10567+            sb = self.nodemaker.storage_broker
10568+            peers = sb.test_servers.values()
10569+            for peer in peers:
10570+                self.failUnlessEqual(peer.queries, 1)
10571+        d.addCallback(_check_server_write_counts)
10572+        return d
10573+
10574+
10575     def test_create_with_initial_contents(self):
10576hunk ./src/allmydata/test/test_mutable.py 388
10577-        d = self.nodemaker.create_mutable_file("contents 1")
10578+        upload1 = MutableData("contents 1")
10579+        d = self.nodemaker.create_mutable_file(upload1)
10580         def _created(n):
10581             d = n.download_best_version()
10582             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10583hunk ./src/allmydata/test/test_mutable.py 393
10584-            d.addCallback(lambda res: n.overwrite("contents 2"))
10585+            upload2 = MutableData("contents 2")
10586+            d.addCallback(lambda res: n.overwrite(upload2))
10587             d.addCallback(lambda res: n.download_best_version())
10588             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10589             return d
10590hunk ./src/allmydata/test/test_mutable.py 400
10591         d.addCallback(_created)
10592         return d
10593+    test_create_with_initial_contents.timeout = 15
10594+
10595+
10596+    def test_create_mdmf_with_initial_contents(self):
10597+        initial_contents = "foobarbaz" * 131072 # 900KiB
10598+        initial_contents_uploadable = MutableData(initial_contents)
10599+        d = self.nodemaker.create_mutable_file(initial_contents_uploadable,
10600+                                               version=MDMF_VERSION)
10601+        def _created(n):
10602+            d = n.download_best_version()
10603+            d.addCallback(lambda data:
10604+                self.failUnlessEqual(data, initial_contents))
10605+            uploadable2 = MutableData(initial_contents + "foobarbaz")
10606+            d.addCallback(lambda ignored:
10607+                n.overwrite(uploadable2))
10608+            d.addCallback(lambda ignored:
10609+                n.download_best_version())
10610+            d.addCallback(lambda data:
10611+                self.failUnlessEqual(data, initial_contents +
10612+                                           "foobarbaz"))
10613+            return d
10614+        d.addCallback(_created)
10615+        return d
10616+    test_create_mdmf_with_initial_contents.timeout = 20
10617+
10618 
10619     def test_response_cache_memory_leak(self):
10620         d = self.nodemaker.create_mutable_file("contents")
10621hunk ./src/allmydata/test/test_mutable.py 451
10622             key = n.get_writekey()
10623             self.failUnless(isinstance(key, str), key)
10624             self.failUnlessEqual(len(key), 16) # AES key size
10625-            return data
10626+            return MutableData(data)
10627         d = self.nodemaker.create_mutable_file(_make_contents)
10628         def _created(n):
10629             return n.download_best_version()
10630hunk ./src/allmydata/test/test_mutable.py 459
10631         d.addCallback(lambda data2: self.failUnlessEqual(data2, data))
10632         return d
10633 
10634+
10635+    def test_create_mdmf_with_initial_contents_function(self):
10636+        data = "initial contents" * 100000
10637+        def _make_contents(n):
10638+            self.failUnless(isinstance(n, MutableFileNode))
10639+            key = n.get_writekey()
10640+            self.failUnless(isinstance(key, str), key)
10641+            self.failUnlessEqual(len(key), 16)
10642+            return MutableData(data)
10643+        d = self.nodemaker.create_mutable_file(_make_contents,
10644+                                               version=MDMF_VERSION)
10645+        d.addCallback(lambda n:
10646+            n.download_best_version())
10647+        d.addCallback(lambda data2:
10648+            self.failUnlessEqual(data2, data))
10649+        return d
10650+
10651+
10652     def test_create_with_too_large_contents(self):
10653         BIG = "a" * (self.OLD_MAX_SEGMENT_SIZE + 1)
10654hunk ./src/allmydata/test/test_mutable.py 479
10655-        d = self.nodemaker.create_mutable_file(BIG)
10656+        BIG_uploadable = MutableData(BIG)
10657+        d = self.nodemaker.create_mutable_file(BIG_uploadable)
10658         def _created(n):
10659hunk ./src/allmydata/test/test_mutable.py 482
10660-            d = n.overwrite(BIG)
10661+            other_BIG_uploadable = MutableData(BIG)
10662+            d = n.overwrite(other_BIG_uploadable)
10663             return d
10664         d.addCallback(_created)
10665         return d
10666hunk ./src/allmydata/test/test_mutable.py 497
10667 
10668     def test_modify(self):
10669         def _modifier(old_contents, servermap, first_time):
10670-            return old_contents + "line2"
10671+            new_contents = old_contents + "line2"
10672+            return new_contents
10673         def _non_modifier(old_contents, servermap, first_time):
10674             return old_contents
10675         def _none_modifier(old_contents, servermap, first_time):
10676hunk ./src/allmydata/test/test_mutable.py 506
10677         def _error_modifier(old_contents, servermap, first_time):
10678             raise ValueError("oops")
10679         def _toobig_modifier(old_contents, servermap, first_time):
10680-            return "b" * (self.OLD_MAX_SEGMENT_SIZE+1)
10681+            new_content = "b" * (self.OLD_MAX_SEGMENT_SIZE + 1)
10682+            return new_content
10683         calls = []
10684         def _ucw_error_modifier(old_contents, servermap, first_time):
10685             # simulate an UncoordinatedWriteError once
10686hunk ./src/allmydata/test/test_mutable.py 514
10687             calls.append(1)
10688             if len(calls) <= 1:
10689                 raise UncoordinatedWriteError("simulated")
10690-            return old_contents + "line3"
10691+            new_contents = old_contents + "line3"
10692+            return new_contents
10693         def _ucw_error_non_modifier(old_contents, servermap, first_time):
10694             # simulate an UncoordinatedWriteError once, and don't actually
10695             # modify the contents on subsequent invocations
10696hunk ./src/allmydata/test/test_mutable.py 524
10697                 raise UncoordinatedWriteError("simulated")
10698             return old_contents
10699 
10700-        d = self.nodemaker.create_mutable_file("line1")
10701+        initial_contents = "line1"
10702+        d = self.nodemaker.create_mutable_file(MutableData(initial_contents))
10703         def _created(n):
10704             d = n.modify(_modifier)
10705             d.addCallback(lambda res: n.download_best_version())
10706hunk ./src/allmydata/test/test_mutable.py 582
10707             return d
10708         d.addCallback(_created)
10709         return d
10710+    test_modify.timeout = 15
10711+
10712 
10713     def test_modify_backoffer(self):
10714         def _modifier(old_contents, servermap, first_time):
10715hunk ./src/allmydata/test/test_mutable.py 609
10716         giveuper._delay = 0.1
10717         giveuper.factor = 1
10718 
10719-        d = self.nodemaker.create_mutable_file("line1")
10720+        d = self.nodemaker.create_mutable_file(MutableData("line1"))
10721         def _created(n):
10722             d = n.modify(_modifier)
10723             d.addCallback(lambda res: n.download_best_version())
10724hunk ./src/allmydata/test/test_mutable.py 659
10725             d.addCallback(lambda smap: smap.dump(StringIO()))
10726             d.addCallback(lambda sio:
10727                           self.failUnless("3-of-10" in sio.getvalue()))
10728-            d.addCallback(lambda res: n.overwrite("contents 1"))
10729+            d.addCallback(lambda res: n.overwrite(MutableData("contents 1")))
10730             d.addCallback(lambda res: self.failUnlessIdentical(res, None))
10731             d.addCallback(lambda res: n.download_best_version())
10732             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10733hunk ./src/allmydata/test/test_mutable.py 663
10734-            d.addCallback(lambda res: n.overwrite("contents 2"))
10735+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
10736             d.addCallback(lambda res: n.download_best_version())
10737             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10738             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
10739hunk ./src/allmydata/test/test_mutable.py 667
10740-            d.addCallback(lambda smap: n.upload("contents 3", smap))
10741+            d.addCallback(lambda smap: n.upload(MutableData("contents 3"), smap))
10742             d.addCallback(lambda res: n.download_best_version())
10743             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 3"))
10744             d.addCallback(lambda res: n.get_servermap(MODE_ANYTHING))
10745hunk ./src/allmydata/test/test_mutable.py 680
10746         return d
10747 
10748 
10749-class MakeShares(unittest.TestCase):
10750-    def test_encrypt(self):
10751-        nm = make_nodemaker()
10752-        CONTENTS = "some initial contents"
10753-        d = nm.create_mutable_file(CONTENTS)
10754-        def _created(fn):
10755-            p = Publish(fn, nm.storage_broker, None)
10756-            p.salt = "SALT" * 4
10757-            p.readkey = "\x00" * 16
10758-            p.newdata = CONTENTS
10759-            p.required_shares = 3
10760-            p.total_shares = 10
10761-            p.setup_encoding_parameters()
10762-            return p._encrypt_and_encode()
10763+    def test_size_after_servermap_update(self):
10764+        # a mutable file node should have something to say about how big
10765+        # it is after a servermap update is performed, since this tells
10766+        # us how large the best version of that mutable file is.
10767+        d = self.nodemaker.create_mutable_file()
10768+        def _created(n):
10769+            self.n = n
10770+            return n.get_servermap(MODE_READ)
10771+        d.addCallback(_created)
10772+        d.addCallback(lambda ignored:
10773+            self.failUnlessEqual(self.n.get_size(), 0))
10774+        d.addCallback(lambda ignored:
10775+            self.n.overwrite(MutableData("foobarbaz")))
10776+        d.addCallback(lambda ignored:
10777+            self.failUnlessEqual(self.n.get_size(), 9))
10778+        d.addCallback(lambda ignored:
10779+            self.nodemaker.create_mutable_file(MutableData("foobarbaz")))
10780+        d.addCallback(_created)
10781+        d.addCallback(lambda ignored:
10782+            self.failUnlessEqual(self.n.get_size(), 9))
10783+        return d
10784+
10785+
10786+class PublishMixin:
10787+    def publish_one(self):
10788+        # publish a file and create shares, which can then be manipulated
10789+        # later.
10790+        self.CONTENTS = "New contents go here" * 1000
10791+        self.uploadable = MutableData(self.CONTENTS)
10792+        self._storage = FakeStorage()
10793+        self._nodemaker = make_nodemaker(self._storage)
10794+        self._storage_broker = self._nodemaker.storage_broker
10795+        d = self._nodemaker.create_mutable_file(self.uploadable)
10796+        def _created(node):
10797+            self._fn = node
10798+            self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10799         d.addCallback(_created)
10800hunk ./src/allmydata/test/test_mutable.py 717
10801-        def _done(shares_and_shareids):
10802-            (shares, share_ids) = shares_and_shareids
10803-            self.failUnlessEqual(len(shares), 10)
10804-            for sh in shares:
10805-                self.failUnless(isinstance(sh, str))
10806-                self.failUnlessEqual(len(sh), 7)
10807-            self.failUnlessEqual(len(share_ids), 10)
10808-        d.addCallback(_done)
10809         return d
10810 
10811hunk ./src/allmydata/test/test_mutable.py 719
10812-    def test_generate(self):
10813-        nm = make_nodemaker()
10814-        CONTENTS = "some initial contents"
10815-        d = nm.create_mutable_file(CONTENTS)
10816-        def _created(fn):
10817-            self._fn = fn
10818-            p = Publish(fn, nm.storage_broker, None)
10819-            self._p = p
10820-            p.newdata = CONTENTS
10821-            p.required_shares = 3
10822-            p.total_shares = 10
10823-            p.setup_encoding_parameters()
10824-            p._new_seqnum = 3
10825-            p.salt = "SALT" * 4
10826-            # make some fake shares
10827-            shares_and_ids = ( ["%07d" % i for i in range(10)], range(10) )
10828-            p._privkey = fn.get_privkey()
10829-            p._encprivkey = fn.get_encprivkey()
10830-            p._pubkey = fn.get_pubkey()
10831-            return p._generate_shares(shares_and_ids)
10832+    def publish_mdmf(self):
10833+        # like publish_one, except that the result is guaranteed to be
10834+        # an MDMF file.
10835+        # self.CONTENTS should have more than one segment.
10836+        self.CONTENTS = "This is an MDMF file" * 100000
10837+        self.uploadable = MutableData(self.CONTENTS)
10838+        self._storage = FakeStorage()
10839+        self._nodemaker = make_nodemaker(self._storage)
10840+        self._storage_broker = self._nodemaker.storage_broker
10841+        d = self._nodemaker.create_mutable_file(self.uploadable, version=MDMF_VERSION)
10842+        def _created(node):
10843+            self._fn = node
10844+            self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10845         d.addCallback(_created)
10846hunk ./src/allmydata/test/test_mutable.py 733
10847-        def _generated(res):
10848-            p = self._p
10849-            final_shares = p.shares
10850-            root_hash = p.root_hash
10851-            self.failUnlessEqual(len(root_hash), 32)
10852-            self.failUnless(isinstance(final_shares, dict))
10853-            self.failUnlessEqual(len(final_shares), 10)
10854-            self.failUnlessEqual(sorted(final_shares.keys()), range(10))
10855-            for i,sh in final_shares.items():
10856-                self.failUnless(isinstance(sh, str))
10857-                # feed the share through the unpacker as a sanity-check
10858-                pieces = unpack_share(sh)
10859-                (u_seqnum, u_root_hash, IV, k, N, segsize, datalen,
10860-                 pubkey, signature, share_hash_chain, block_hash_tree,
10861-                 share_data, enc_privkey) = pieces
10862-                self.failUnlessEqual(u_seqnum, 3)
10863-                self.failUnlessEqual(u_root_hash, root_hash)
10864-                self.failUnlessEqual(k, 3)
10865-                self.failUnlessEqual(N, 10)
10866-                self.failUnlessEqual(segsize, 21)
10867-                self.failUnlessEqual(datalen, len(CONTENTS))
10868-                self.failUnlessEqual(pubkey, p._pubkey.serialize())
10869-                sig_material = struct.pack(">BQ32s16s BBQQ",
10870-                                           0, p._new_seqnum, root_hash, IV,
10871-                                           k, N, segsize, datalen)
10872-                self.failUnless(p._pubkey.verify(sig_material, signature))
10873-                #self.failUnlessEqual(signature, p._privkey.sign(sig_material))
10874-                self.failUnless(isinstance(share_hash_chain, dict))
10875-                self.failUnlessEqual(len(share_hash_chain), 4) # ln2(10)++
10876-                for shnum,share_hash in share_hash_chain.items():
10877-                    self.failUnless(isinstance(shnum, int))
10878-                    self.failUnless(isinstance(share_hash, str))
10879-                    self.failUnlessEqual(len(share_hash), 32)
10880-                self.failUnless(isinstance(block_hash_tree, list))
10881-                self.failUnlessEqual(len(block_hash_tree), 1) # very small tree
10882-                self.failUnlessEqual(IV, "SALT"*4)
10883-                self.failUnlessEqual(len(share_data), len("%07d" % 1))
10884-                self.failUnlessEqual(enc_privkey, self._fn.get_encprivkey())
10885-        d.addCallback(_generated)
10886         return d
10887 
10888hunk ./src/allmydata/test/test_mutable.py 735
10889-    # TODO: when we publish to 20 peers, we should get one share per peer on 10
10890-    # when we publish to 3 peers, we should get either 3 or 4 shares per peer
10891-    # when we publish to zero peers, we should get a NotEnoughSharesError
10892 
10893hunk ./src/allmydata/test/test_mutable.py 736
10894-class PublishMixin:
10895-    def publish_one(self):
10896-        # publish a file and create shares, which can then be manipulated
10897-        # later.
10898-        self.CONTENTS = "New contents go here" * 1000
10899+    def publish_sdmf(self):
10900+        # like publish_one, except that the result is guaranteed to be
10901+        # an SDMF file
10902+        self.CONTENTS = "This is an SDMF file" * 1000
10903+        self.uploadable = MutableData(self.CONTENTS)
10904         self._storage = FakeStorage()
10905         self._nodemaker = make_nodemaker(self._storage)
10906         self._storage_broker = self._nodemaker.storage_broker
10907hunk ./src/allmydata/test/test_mutable.py 744
10908-        d = self._nodemaker.create_mutable_file(self.CONTENTS)
10909+        d = self._nodemaker.create_mutable_file(self.uploadable, version=SDMF_VERSION)
10910         def _created(node):
10911             self._fn = node
10912             self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10913hunk ./src/allmydata/test/test_mutable.py 751
10914         d.addCallback(_created)
10915         return d
10916 
10917-    def publish_multiple(self):
10918+
10919+    def publish_multiple(self, version=0):
10920         self.CONTENTS = ["Contents 0",
10921                          "Contents 1",
10922                          "Contents 2",
10923hunk ./src/allmydata/test/test_mutable.py 758
10924                          "Contents 3a",
10925                          "Contents 3b"]
10926+        self.uploadables = [MutableData(d) for d in self.CONTENTS]
10927         self._copied_shares = {}
10928         self._storage = FakeStorage()
10929         self._nodemaker = make_nodemaker(self._storage)
10930hunk ./src/allmydata/test/test_mutable.py 762
10931-        d = self._nodemaker.create_mutable_file(self.CONTENTS[0]) # seqnum=1
10932+        d = self._nodemaker.create_mutable_file(self.uploadables[0], version=version) # seqnum=1
10933         def _created(node):
10934             self._fn = node
10935             # now create multiple versions of the same file, and accumulate
10936hunk ./src/allmydata/test/test_mutable.py 769
10937             # their shares, so we can mix and match them later.
10938             d = defer.succeed(None)
10939             d.addCallback(self._copy_shares, 0)
10940-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[1])) #s2
10941+            d.addCallback(lambda res: node.overwrite(self.uploadables[1])) #s2
10942             d.addCallback(self._copy_shares, 1)
10943hunk ./src/allmydata/test/test_mutable.py 771
10944-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[2])) #s3
10945+            d.addCallback(lambda res: node.overwrite(self.uploadables[2])) #s3
10946             d.addCallback(self._copy_shares, 2)
10947hunk ./src/allmydata/test/test_mutable.py 773
10948-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[3])) #s4a
10949+            d.addCallback(lambda res: node.overwrite(self.uploadables[3])) #s4a
10950             d.addCallback(self._copy_shares, 3)
10951             # now we replace all the shares with version s3, and upload a new
10952             # version to get s4b.
10953hunk ./src/allmydata/test/test_mutable.py 779
10954             rollback = dict([(i,2) for i in range(10)])
10955             d.addCallback(lambda res: self._set_versions(rollback))
10956-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[4])) #s4b
10957+            d.addCallback(lambda res: node.overwrite(self.uploadables[4])) #s4b
10958             d.addCallback(self._copy_shares, 4)
10959             # we leave the storage in state 4
10960             return d
10961hunk ./src/allmydata/test/test_mutable.py 786
10962         d.addCallback(_created)
10963         return d
10964 
10965+
10966     def _copy_shares(self, ignored, index):
10967         shares = self._storage._peers
10968         # we need a deep copy
10969hunk ./src/allmydata/test/test_mutable.py 810
10970                     shares[peerid][shnum] = oldshares[index][peerid][shnum]
10971 
10972 
10973+
10974+
10975 class Servermap(unittest.TestCase, PublishMixin):
10976     def setUp(self):
10977         return self.publish_one()
10978hunk ./src/allmydata/test/test_mutable.py 816
10979 
10980-    def make_servermap(self, mode=MODE_CHECK, fn=None, sb=None):
10981+    def make_servermap(self, mode=MODE_CHECK, fn=None, sb=None,
10982+                       update_range=None):
10983         if fn is None:
10984             fn = self._fn
10985         if sb is None:
10986hunk ./src/allmydata/test/test_mutable.py 823
10987             sb = self._storage_broker
10988         smu = ServermapUpdater(fn, sb, Monitor(),
10989-                               ServerMap(), mode)
10990+                               ServerMap(), mode, update_range=update_range)
10991         d = smu.update()
10992         return d
10993 
10994hunk ./src/allmydata/test/test_mutable.py 889
10995         # create a new file, which is large enough to knock the privkey out
10996         # of the early part of the file
10997         LARGE = "These are Larger contents" * 200 # about 5KB
10998-        d.addCallback(lambda res: self._nodemaker.create_mutable_file(LARGE))
10999+        LARGE_uploadable = MutableData(LARGE)
11000+        d.addCallback(lambda res: self._nodemaker.create_mutable_file(LARGE_uploadable))
11001         def _created(large_fn):
11002             large_fn2 = self._nodemaker.create_from_cap(large_fn.get_uri())
11003             return self.make_servermap(MODE_WRITE, large_fn2)
11004hunk ./src/allmydata/test/test_mutable.py 898
11005         d.addCallback(lambda sm: self.failUnlessOneRecoverable(sm, 10))
11006         return d
11007 
11008+
11009     def test_mark_bad(self):
11010         d = defer.succeed(None)
11011         ms = self.make_servermap
11012hunk ./src/allmydata/test/test_mutable.py 944
11013         self._storage._peers = {} # delete all shares
11014         ms = self.make_servermap
11015         d = defer.succeed(None)
11016-
11017+#
11018         d.addCallback(lambda res: ms(mode=MODE_CHECK))
11019         d.addCallback(lambda sm: self.failUnlessNoneRecoverable(sm))
11020 
11021hunk ./src/allmydata/test/test_mutable.py 996
11022         return d
11023 
11024 
11025+    def test_servermapupdater_finds_mdmf_files(self):
11026+        # setUp already published an MDMF file for us. We just need to
11027+        # make sure that when we run the ServermapUpdater, the file is
11028+        # reported to have one recoverable version.
11029+        d = defer.succeed(None)
11030+        d.addCallback(lambda ignored:
11031+            self.publish_mdmf())
11032+        d.addCallback(lambda ignored:
11033+            self.make_servermap(mode=MODE_CHECK))
11034+        # Calling make_servermap also updates the servermap in the mode
11035+        # that we specify, so we just need to see what it says.
11036+        def _check_servermap(sm):
11037+            self.failUnlessEqual(len(sm.recoverable_versions()), 1)
11038+        d.addCallback(_check_servermap)
11039+        return d
11040+
11041+
11042+    def test_fetch_update(self):
11043+        d = defer.succeed(None)
11044+        d.addCallback(lambda ignored:
11045+            self.publish_mdmf())
11046+        d.addCallback(lambda ignored:
11047+            self.make_servermap(mode=MODE_WRITE, update_range=(1, 2)))
11048+        def _check_servermap(sm):
11049+            # 10 shares
11050+            self.failUnlessEqual(len(sm.update_data), 10)
11051+            # one version
11052+            for data in sm.update_data.itervalues():
11053+                self.failUnlessEqual(len(data), 1)
11054+        d.addCallback(_check_servermap)
11055+        return d
11056+
11057+
11058+    def test_servermapupdater_finds_sdmf_files(self):
11059+        d = defer.succeed(None)
11060+        d.addCallback(lambda ignored:
11061+            self.publish_sdmf())
11062+        d.addCallback(lambda ignored:
11063+            self.make_servermap(mode=MODE_CHECK))
11064+        d.addCallback(lambda servermap:
11065+            self.failUnlessEqual(len(servermap.recoverable_versions()), 1))
11066+        return d
11067+
11068 
11069 class Roundtrip(unittest.TestCase, testutil.ShouldFailMixin, PublishMixin):
11070     def setUp(self):
11071hunk ./src/allmydata/test/test_mutable.py 1079
11072         if version is None:
11073             version = servermap.best_recoverable_version()
11074         r = Retrieve(self._fn, servermap, version)
11075-        return r.download()
11076+        c = consumer.MemoryConsumer()
11077+        d = r.download(consumer=c)
11078+        d.addCallback(lambda mc: "".join(mc.chunks))
11079+        return d
11080+
11081 
11082     def test_basic(self):
11083         d = self.make_servermap()
11084hunk ./src/allmydata/test/test_mutable.py 1160
11085         return d
11086     test_no_servers_download.timeout = 15
11087 
11088+
11089     def _test_corrupt_all(self, offset, substring,
11090hunk ./src/allmydata/test/test_mutable.py 1162
11091-                          should_succeed=False, corrupt_early=True,
11092-                          failure_checker=None):
11093+                          should_succeed=False,
11094+                          corrupt_early=True,
11095+                          failure_checker=None,
11096+                          fetch_privkey=False):
11097         d = defer.succeed(None)
11098         if corrupt_early:
11099             d.addCallback(corrupt, self._storage, offset)
11100hunk ./src/allmydata/test/test_mutable.py 1182
11101                     self.failUnlessIn(substring, "".join(allproblems))
11102                 return servermap
11103             if should_succeed:
11104-                d1 = self._fn.download_version(servermap, ver)
11105+                d1 = self._fn.download_version(servermap, ver,
11106+                                               fetch_privkey)
11107                 d1.addCallback(lambda new_contents:
11108                                self.failUnlessEqual(new_contents, self.CONTENTS))
11109             else:
11110hunk ./src/allmydata/test/test_mutable.py 1190
11111                 d1 = self.shouldFail(NotEnoughSharesError,
11112                                      "_corrupt_all(offset=%s)" % (offset,),
11113                                      substring,
11114-                                     self._fn.download_version, servermap, ver)
11115+                                     self._fn.download_version, servermap,
11116+                                                                ver,
11117+                                                                fetch_privkey)
11118             if failure_checker:
11119                 d1.addCallback(failure_checker)
11120             d1.addCallback(lambda res: servermap)
11121hunk ./src/allmydata/test/test_mutable.py 1201
11122         return d
11123 
11124     def test_corrupt_all_verbyte(self):
11125-        # when the version byte is not 0, we hit an UnknownVersionError error
11126-        # in unpack_share().
11127+        # when the version byte is not 0 or 1, we hit an UnknownVersionError
11128+        # error in unpack_share().
11129         d = self._test_corrupt_all(0, "UnknownVersionError")
11130         def _check_servermap(servermap):
11131             # and the dump should mention the problems
11132hunk ./src/allmydata/test/test_mutable.py 1208
11133             s = StringIO()
11134             dump = servermap.dump(s).getvalue()
11135-            self.failUnless("10 PROBLEMS" in dump, dump)
11136+            self.failUnless("30 PROBLEMS" in dump, dump)
11137         d.addCallback(_check_servermap)
11138         return d
11139 
11140hunk ./src/allmydata/test/test_mutable.py 1278
11141         return self._test_corrupt_all("enc_privkey", None, should_succeed=True)
11142 
11143 
11144+    def test_corrupt_all_encprivkey_late(self):
11145+        # this should work for the same reason as above, but we corrupt
11146+        # after the servermap update to exercise the error handling
11147+        # code.
11148+        # We need to remove the privkey from the node, or the retrieve
11149+        # process won't know to update it.
11150+        self._fn._privkey = None
11151+        return self._test_corrupt_all("enc_privkey",
11152+                                      None, # this shouldn't fail
11153+                                      should_succeed=True,
11154+                                      corrupt_early=False,
11155+                                      fetch_privkey=True)
11156+
11157+
11158     def test_corrupt_all_seqnum_late(self):
11159         # corrupting the seqnum between mapupdate and retrieve should result
11160         # in NotEnoughSharesError, since each share will look invalid
11161hunk ./src/allmydata/test/test_mutable.py 1298
11162         def _check(res):
11163             f = res[0]
11164             self.failUnless(f.check(NotEnoughSharesError))
11165-            self.failUnless("someone wrote to the data since we read the servermap" in str(f))
11166+            self.failUnless("uncoordinated write" in str(f))
11167         return self._test_corrupt_all(1, "ran out of peers",
11168                                       corrupt_early=False,
11169                                       failure_checker=_check)
11170hunk ./src/allmydata/test/test_mutable.py 1342
11171                             in str(servermap.problems[0]))
11172             ver = servermap.best_recoverable_version()
11173             r = Retrieve(self._fn, servermap, ver)
11174-            return r.download()
11175+            c = consumer.MemoryConsumer()
11176+            return r.download(c)
11177         d.addCallback(_do_retrieve)
11178hunk ./src/allmydata/test/test_mutable.py 1345
11179+        d.addCallback(lambda mc: "".join(mc.chunks))
11180         d.addCallback(lambda new_contents:
11181                       self.failUnlessEqual(new_contents, self.CONTENTS))
11182         return d
11183hunk ./src/allmydata/test/test_mutable.py 1350
11184 
11185-    def test_corrupt_some(self):
11186-        # corrupt the data of first five shares (so the servermap thinks
11187-        # they're good but retrieve marks them as bad), so that the
11188-        # MODE_READ set of 6 will be insufficient, forcing node.download to
11189-        # retry with more servers.
11190-        corrupt(None, self._storage, "share_data", range(5))
11191-        d = self.make_servermap()
11192+
11193+    def _test_corrupt_some(self, offset, mdmf=False):
11194+        if mdmf:
11195+            d = self.publish_mdmf()
11196+        else:
11197+            d = defer.succeed(None)
11198+        d.addCallback(lambda ignored:
11199+            corrupt(None, self._storage, offset, range(5)))
11200+        d.addCallback(lambda ignored:
11201+            self.make_servermap())
11202         def _do_retrieve(servermap):
11203             ver = servermap.best_recoverable_version()
11204             self.failUnless(ver)
11205hunk ./src/allmydata/test/test_mutable.py 1366
11206             return self._fn.download_best_version()
11207         d.addCallback(_do_retrieve)
11208         d.addCallback(lambda new_contents:
11209-                      self.failUnlessEqual(new_contents, self.CONTENTS))
11210+            self.failUnlessEqual(new_contents, self.CONTENTS))
11211         return d
11212 
11213hunk ./src/allmydata/test/test_mutable.py 1369
11214+
11215+    def test_corrupt_some(self):
11216+        # corrupt the data of first five shares (so the servermap thinks
11217+        # they're good but retrieve marks them as bad), so that the
11218+        # MODE_READ set of 6 will be insufficient, forcing node.download to
11219+        # retry with more servers.
11220+        return self._test_corrupt_some("share_data")
11221+
11222+
11223     def test_download_fails(self):
11224hunk ./src/allmydata/test/test_mutable.py 1379
11225-        corrupt(None, self._storage, "signature")
11226-        d = self.shouldFail(UnrecoverableFileError, "test_download_anyway",
11227+        d = corrupt(None, self._storage, "signature")
11228+        d.addCallback(lambda ignored:
11229+            self.shouldFail(UnrecoverableFileError, "test_download_anyway",
11230                             "no recoverable versions",
11231hunk ./src/allmydata/test/test_mutable.py 1383
11232-                            self._fn.download_best_version)
11233+                            self._fn.download_best_version))
11234         return d
11235 
11236 
11237hunk ./src/allmydata/test/test_mutable.py 1387
11238+
11239+    def test_corrupt_mdmf_block_hash_tree(self):
11240+        d = self.publish_mdmf()
11241+        d.addCallback(lambda ignored:
11242+            self._test_corrupt_all(("block_hash_tree", 12 * 32),
11243+                                   "block hash tree failure",
11244+                                   corrupt_early=False,
11245+                                   should_succeed=False))
11246+        return d
11247+
11248+
11249+    def test_corrupt_mdmf_block_hash_tree_late(self):
11250+        d = self.publish_mdmf()
11251+        d.addCallback(lambda ignored:
11252+            self._test_corrupt_all(("block_hash_tree", 12 * 32),
11253+                                   "block hash tree failure",
11254+                                   corrupt_early=True,
11255+                                   should_succeed=False))
11256+        return d
11257+
11258+
11259+    def test_corrupt_mdmf_share_data(self):
11260+        d = self.publish_mdmf()
11261+        d.addCallback(lambda ignored:
11262+            # TODO: Find out what the block size is and corrupt a
11263+            # specific block, rather than just guessing.
11264+            self._test_corrupt_all(("share_data", 12 * 40),
11265+                                    "block hash tree failure",
11266+                                    corrupt_early=True,
11267+                                    should_succeed=False))
11268+        return d
11269+
11270+
11271+    def test_corrupt_some_mdmf(self):
11272+        return self._test_corrupt_some(("share_data", 12 * 40),
11273+                                       mdmf=True)
11274+
11275+
11276 class CheckerMixin:
11277     def check_good(self, r, where):
11278         self.failUnless(r.is_healthy(), where)
11279hunk ./src/allmydata/test/test_mutable.py 1455
11280         d.addCallback(self.check_good, "test_check_good")
11281         return d
11282 
11283+    def test_check_mdmf_good(self):
11284+        d = self.publish_mdmf()
11285+        d.addCallback(lambda ignored:
11286+            self._fn.check(Monitor()))
11287+        d.addCallback(self.check_good, "test_check_mdmf_good")
11288+        return d
11289+
11290     def test_check_no_shares(self):
11291         for shares in self._storage._peers.values():
11292             shares.clear()
11293hunk ./src/allmydata/test/test_mutable.py 1469
11294         d.addCallback(self.check_bad, "test_check_no_shares")
11295         return d
11296 
11297+    def test_check_mdmf_no_shares(self):
11298+        d = self.publish_mdmf()
11299+        def _then(ignored):
11300+            for share in self._storage._peers.values():
11301+                share.clear()
11302+        d.addCallback(_then)
11303+        d.addCallback(lambda ignored:
11304+            self._fn.check(Monitor()))
11305+        d.addCallback(self.check_bad, "test_check_mdmf_no_shares")
11306+        return d
11307+
11308     def test_check_not_enough_shares(self):
11309         for shares in self._storage._peers.values():
11310             for shnum in shares.keys():
11311hunk ./src/allmydata/test/test_mutable.py 1489
11312         d.addCallback(self.check_bad, "test_check_not_enough_shares")
11313         return d
11314 
11315+    def test_check_mdmf_not_enough_shares(self):
11316+        d = self.publish_mdmf()
11317+        def _then(ignored):
11318+            for shares in self._storage._peers.values():
11319+                for shnum in shares.keys():
11320+                    if shnum > 0:
11321+                        del shares[shnum]
11322+        d.addCallback(_then)
11323+        d.addCallback(lambda ignored:
11324+            self._fn.check(Monitor()))
11325+        d.addCallback(self.check_bad, "test_check_mdmf_not_enougH_shares")
11326+        return d
11327+
11328+
11329     def test_check_all_bad_sig(self):
11330hunk ./src/allmydata/test/test_mutable.py 1504
11331-        corrupt(None, self._storage, 1) # bad sig
11332-        d = self._fn.check(Monitor())
11333+        d = corrupt(None, self._storage, 1) # bad sig
11334+        d.addCallback(lambda ignored:
11335+            self._fn.check(Monitor()))
11336         d.addCallback(self.check_bad, "test_check_all_bad_sig")
11337         return d
11338 
11339hunk ./src/allmydata/test/test_mutable.py 1510
11340+    def test_check_mdmf_all_bad_sig(self):
11341+        d = self.publish_mdmf()
11342+        d.addCallback(lambda ignored:
11343+            corrupt(None, self._storage, 1))
11344+        d.addCallback(lambda ignored:
11345+            self._fn.check(Monitor()))
11346+        d.addCallback(self.check_bad, "test_check_mdmf_all_bad_sig")
11347+        return d
11348+
11349     def test_check_all_bad_blocks(self):
11350hunk ./src/allmydata/test/test_mutable.py 1520
11351-        corrupt(None, self._storage, "share_data", [9]) # bad blocks
11352+        d = corrupt(None, self._storage, "share_data", [9]) # bad blocks
11353         # the Checker won't notice this.. it doesn't look at actual data
11354hunk ./src/allmydata/test/test_mutable.py 1522
11355-        d = self._fn.check(Monitor())
11356+        d.addCallback(lambda ignored:
11357+            self._fn.check(Monitor()))
11358         d.addCallback(self.check_good, "test_check_all_bad_blocks")
11359         return d
11360 
11361hunk ./src/allmydata/test/test_mutable.py 1527
11362+
11363+    def test_check_mdmf_all_bad_blocks(self):
11364+        d = self.publish_mdmf()
11365+        d.addCallback(lambda ignored:
11366+            corrupt(None, self._storage, "share_data"))
11367+        d.addCallback(lambda ignored:
11368+            self._fn.check(Monitor()))
11369+        d.addCallback(self.check_good, "test_check_mdmf_all_bad_blocks")
11370+        return d
11371+
11372     def test_verify_good(self):
11373         d = self._fn.check(Monitor(), verify=True)
11374         d.addCallback(self.check_good, "test_verify_good")
11375hunk ./src/allmydata/test/test_mutable.py 1541
11376         return d
11377+    test_verify_good.timeout = 15
11378 
11379     def test_verify_all_bad_sig(self):
11380hunk ./src/allmydata/test/test_mutable.py 1544
11381-        corrupt(None, self._storage, 1) # bad sig
11382-        d = self._fn.check(Monitor(), verify=True)
11383+        d = corrupt(None, self._storage, 1) # bad sig
11384+        d.addCallback(lambda ignored:
11385+            self._fn.check(Monitor(), verify=True))
11386         d.addCallback(self.check_bad, "test_verify_all_bad_sig")
11387         return d
11388 
11389hunk ./src/allmydata/test/test_mutable.py 1551
11390     def test_verify_one_bad_sig(self):
11391-        corrupt(None, self._storage, 1, [9]) # bad sig
11392-        d = self._fn.check(Monitor(), verify=True)
11393+        d = corrupt(None, self._storage, 1, [9]) # bad sig
11394+        d.addCallback(lambda ignored:
11395+            self._fn.check(Monitor(), verify=True))
11396         d.addCallback(self.check_bad, "test_verify_one_bad_sig")
11397         return d
11398 
11399hunk ./src/allmydata/test/test_mutable.py 1558
11400     def test_verify_one_bad_block(self):
11401-        corrupt(None, self._storage, "share_data", [9]) # bad blocks
11402+        d = corrupt(None, self._storage, "share_data", [9]) # bad blocks
11403         # the Verifier *will* notice this, since it examines every byte
11404hunk ./src/allmydata/test/test_mutable.py 1560
11405-        d = self._fn.check(Monitor(), verify=True)
11406+        d.addCallback(lambda ignored:
11407+            self._fn.check(Monitor(), verify=True))
11408         d.addCallback(self.check_bad, "test_verify_one_bad_block")
11409         d.addCallback(self.check_expected_failure,
11410                       CorruptShareError, "block hash tree failure",
11411hunk ./src/allmydata/test/test_mutable.py 1569
11412         return d
11413 
11414     def test_verify_one_bad_sharehash(self):
11415-        corrupt(None, self._storage, "share_hash_chain", [9], 5)
11416-        d = self._fn.check(Monitor(), verify=True)
11417+        d = corrupt(None, self._storage, "share_hash_chain", [9], 5)
11418+        d.addCallback(lambda ignored:
11419+            self._fn.check(Monitor(), verify=True))
11420         d.addCallback(self.check_bad, "test_verify_one_bad_sharehash")
11421         d.addCallback(self.check_expected_failure,
11422                       CorruptShareError, "corrupt hashes",
11423hunk ./src/allmydata/test/test_mutable.py 1579
11424         return d
11425 
11426     def test_verify_one_bad_encprivkey(self):
11427-        corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11428-        d = self._fn.check(Monitor(), verify=True)
11429+        d = corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11430+        d.addCallback(lambda ignored:
11431+            self._fn.check(Monitor(), verify=True))
11432         d.addCallback(self.check_bad, "test_verify_one_bad_encprivkey")
11433         d.addCallback(self.check_expected_failure,
11434                       CorruptShareError, "invalid privkey",
11435hunk ./src/allmydata/test/test_mutable.py 1589
11436         return d
11437 
11438     def test_verify_one_bad_encprivkey_uncheckable(self):
11439-        corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11440+        d = corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11441         readonly_fn = self._fn.get_readonly()
11442         # a read-only node has no way to validate the privkey
11443hunk ./src/allmydata/test/test_mutable.py 1592
11444-        d = readonly_fn.check(Monitor(), verify=True)
11445+        d.addCallback(lambda ignored:
11446+            readonly_fn.check(Monitor(), verify=True))
11447         d.addCallback(self.check_good,
11448                       "test_verify_one_bad_encprivkey_uncheckable")
11449         return d
11450hunk ./src/allmydata/test/test_mutable.py 1598
11451 
11452+
11453+    def test_verify_mdmf_good(self):
11454+        d = self.publish_mdmf()
11455+        d.addCallback(lambda ignored:
11456+            self._fn.check(Monitor(), verify=True))
11457+        d.addCallback(self.check_good, "test_verify_mdmf_good")
11458+        return d
11459+
11460+
11461+    def test_verify_mdmf_one_bad_block(self):
11462+        d = self.publish_mdmf()
11463+        d.addCallback(lambda ignored:
11464+            corrupt(None, self._storage, "share_data", [1]))
11465+        d.addCallback(lambda ignored:
11466+            self._fn.check(Monitor(), verify=True))
11467+        # We should find one bad block here
11468+        d.addCallback(self.check_bad, "test_verify_mdmf_one_bad_block")
11469+        d.addCallback(self.check_expected_failure,
11470+                      CorruptShareError, "block hash tree failure",
11471+                      "test_verify_mdmf_one_bad_block")
11472+        return d
11473+
11474+
11475+    def test_verify_mdmf_bad_encprivkey(self):
11476+        d = self.publish_mdmf()
11477+        d.addCallback(lambda ignored:
11478+            corrupt(None, self._storage, "enc_privkey", [1]))
11479+        d.addCallback(lambda ignored:
11480+            self._fn.check(Monitor(), verify=True))
11481+        d.addCallback(self.check_bad, "test_verify_mdmf_bad_encprivkey")
11482+        d.addCallback(self.check_expected_failure,
11483+                      CorruptShareError, "privkey",
11484+                      "test_verify_mdmf_bad_encprivkey")
11485+        return d
11486+
11487+
11488+    def test_verify_mdmf_bad_sig(self):
11489+        d = self.publish_mdmf()
11490+        d.addCallback(lambda ignored:
11491+            corrupt(None, self._storage, 1, [1]))
11492+        d.addCallback(lambda ignored:
11493+            self._fn.check(Monitor(), verify=True))
11494+        d.addCallback(self.check_bad, "test_verify_mdmf_bad_sig")
11495+        return d
11496+
11497+
11498+    def test_verify_mdmf_bad_encprivkey_uncheckable(self):
11499+        d = self.publish_mdmf()
11500+        d.addCallback(lambda ignored:
11501+            corrupt(None, self._storage, "enc_privkey", [1]))
11502+        d.addCallback(lambda ignored:
11503+            self._fn.get_readonly())
11504+        d.addCallback(lambda fn:
11505+            fn.check(Monitor(), verify=True))
11506+        d.addCallback(self.check_good,
11507+                      "test_verify_mdmf_bad_encprivkey_uncheckable")
11508+        return d
11509+
11510+
11511 class Repair(unittest.TestCase, PublishMixin, ShouldFailMixin):
11512 
11513     def get_shares(self, s):
11514hunk ./src/allmydata/test/test_mutable.py 1722
11515         current_shares = self.old_shares[-1]
11516         self.failUnlessEqual(old_shares, current_shares)
11517 
11518+
11519     def test_unrepairable_0shares(self):
11520         d = self.publish_one()
11521         def _delete_all_shares(ign):
11522hunk ./src/allmydata/test/test_mutable.py 1737
11523         d.addCallback(_check)
11524         return d
11525 
11526+    def test_mdmf_unrepairable_0shares(self):
11527+        d = self.publish_mdmf()
11528+        def _delete_all_shares(ign):
11529+            shares = self._storage._peers
11530+            for peerid in shares:
11531+                shares[peerid] = {}
11532+        d.addCallback(_delete_all_shares)
11533+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11534+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11535+        d.addCallback(lambda crr: self.failIf(crr.get_successful()))
11536+        return d
11537+
11538+
11539     def test_unrepairable_1share(self):
11540         d = self.publish_one()
11541         def _delete_all_shares(ign):
11542hunk ./src/allmydata/test/test_mutable.py 1766
11543         d.addCallback(_check)
11544         return d
11545 
11546+    def test_mdmf_unrepairable_1share(self):
11547+        d = self.publish_mdmf()
11548+        def _delete_all_shares(ign):
11549+            shares = self._storage._peers
11550+            for peerid in shares:
11551+                for shnum in list(shares[peerid]):
11552+                    if shnum > 0:
11553+                        del shares[peerid][shnum]
11554+        d.addCallback(_delete_all_shares)
11555+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11556+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11557+        def _check(crr):
11558+            self.failUnlessEqual(crr.get_successful(), False)
11559+        d.addCallback(_check)
11560+        return d
11561+
11562+    def test_repairable_5shares(self):
11563+        d = self.publish_mdmf()
11564+        def _delete_all_shares(ign):
11565+            shares = self._storage._peers
11566+            for peerid in shares:
11567+                for shnum in list(shares[peerid]):
11568+                    if shnum > 4:
11569+                        del shares[peerid][shnum]
11570+        d.addCallback(_delete_all_shares)
11571+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11572+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11573+        def _check(crr):
11574+            self.failUnlessEqual(crr.get_successful(), True)
11575+        d.addCallback(_check)
11576+        return d
11577+
11578+    def test_mdmf_repairable_5shares(self):
11579+        d = self.publish_mdmf()
11580+        def _delete_some_shares(ign):
11581+            shares = self._storage._peers
11582+            for peerid in shares:
11583+                for shnum in list(shares[peerid]):
11584+                    if shnum > 5:
11585+                        del shares[peerid][shnum]
11586+        d.addCallback(_delete_some_shares)
11587+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11588+        def _check(cr):
11589+            self.failIf(cr.is_healthy())
11590+            self.failUnless(cr.is_recoverable())
11591+            return cr
11592+        d.addCallback(_check)
11593+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11594+        def _check1(crr):
11595+            self.failUnlessEqual(crr.get_successful(), True)
11596+        d.addCallback(_check1)
11597+        return d
11598+
11599+
11600     def test_merge(self):
11601         self.old_shares = []
11602         d = self.publish_multiple()
11603hunk ./src/allmydata/test/test_mutable.py 1934
11604 class MultipleEncodings(unittest.TestCase):
11605     def setUp(self):
11606         self.CONTENTS = "New contents go here"
11607+        self.uploadable = MutableData(self.CONTENTS)
11608         self._storage = FakeStorage()
11609         self._nodemaker = make_nodemaker(self._storage, num_peers=20)
11610         self._storage_broker = self._nodemaker.storage_broker
11611hunk ./src/allmydata/test/test_mutable.py 1938
11612-        d = self._nodemaker.create_mutable_file(self.CONTENTS)
11613+        d = self._nodemaker.create_mutable_file(self.uploadable)
11614         def _created(node):
11615             self._fn = node
11616         d.addCallback(_created)
11617hunk ./src/allmydata/test/test_mutable.py 1944
11618         return d
11619 
11620-    def _encode(self, k, n, data):
11621+    def _encode(self, k, n, data, version=SDMF_VERSION):
11622         # encode 'data' into a peerid->shares dict.
11623 
11624         fn = self._fn
11625hunk ./src/allmydata/test/test_mutable.py 1960
11626         # and set the encoding parameters to something completely different
11627         fn2._required_shares = k
11628         fn2._total_shares = n
11629+        # Normally a servermap update would occur before a publish.
11630+        # Here, it doesn't, so we have to do it ourselves.
11631+        fn2.set_version(version)
11632 
11633         s = self._storage
11634         s._peers = {} # clear existing storage
11635hunk ./src/allmydata/test/test_mutable.py 1967
11636         p2 = Publish(fn2, self._storage_broker, None)
11637-        d = p2.publish(data)
11638+        uploadable = MutableData(data)
11639+        d = p2.publish(uploadable)
11640         def _published(res):
11641             shares = s._peers
11642             s._peers = {}
11643hunk ./src/allmydata/test/test_mutable.py 2235
11644         self.basedir = "mutable/Problems/test_publish_surprise"
11645         self.set_up_grid()
11646         nm = self.g.clients[0].nodemaker
11647-        d = nm.create_mutable_file("contents 1")
11648+        d = nm.create_mutable_file(MutableData("contents 1"))
11649         def _created(n):
11650             d = defer.succeed(None)
11651             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
11652hunk ./src/allmydata/test/test_mutable.py 2245
11653             d.addCallback(_got_smap1)
11654             # then modify the file, leaving the old map untouched
11655             d.addCallback(lambda res: log.msg("starting winning write"))
11656-            d.addCallback(lambda res: n.overwrite("contents 2"))
11657+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11658             # now attempt to modify the file with the old servermap. This
11659             # will look just like an uncoordinated write, in which every
11660             # single share got updated between our mapupdate and our publish
11661hunk ./src/allmydata/test/test_mutable.py 2254
11662                           self.shouldFail(UncoordinatedWriteError,
11663                                           "test_publish_surprise", None,
11664                                           n.upload,
11665-                                          "contents 2a", self.old_map))
11666+                                          MutableData("contents 2a"), self.old_map))
11667             return d
11668         d.addCallback(_created)
11669         return d
11670hunk ./src/allmydata/test/test_mutable.py 2263
11671         self.basedir = "mutable/Problems/test_retrieve_surprise"
11672         self.set_up_grid()
11673         nm = self.g.clients[0].nodemaker
11674-        d = nm.create_mutable_file("contents 1")
11675+        d = nm.create_mutable_file(MutableData("contents 1"))
11676         def _created(n):
11677             d = defer.succeed(None)
11678             d.addCallback(lambda res: n.get_servermap(MODE_READ))
11679hunk ./src/allmydata/test/test_mutable.py 2273
11680             d.addCallback(_got_smap1)
11681             # then modify the file, leaving the old map untouched
11682             d.addCallback(lambda res: log.msg("starting winning write"))
11683-            d.addCallback(lambda res: n.overwrite("contents 2"))
11684+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11685             # now attempt to retrieve the old version with the old servermap.
11686             # This will look like someone has changed the file since we
11687             # updated the servermap.
11688hunk ./src/allmydata/test/test_mutable.py 2282
11689             d.addCallback(lambda res:
11690                           self.shouldFail(NotEnoughSharesError,
11691                                           "test_retrieve_surprise",
11692-                                          "ran out of peers: have 0 shares (k=3)",
11693+                                          "ran out of peers: have 0 of 1",
11694                                           n.download_version,
11695                                           self.old_map,
11696                                           self.old_map.best_recoverable_version(),
11697hunk ./src/allmydata/test/test_mutable.py 2291
11698         d.addCallback(_created)
11699         return d
11700 
11701+
11702     def test_unexpected_shares(self):
11703         # upload the file, take a servermap, shut down one of the servers,
11704         # upload it again (causing shares to appear on a new server), then
11705hunk ./src/allmydata/test/test_mutable.py 2301
11706         self.basedir = "mutable/Problems/test_unexpected_shares"
11707         self.set_up_grid()
11708         nm = self.g.clients[0].nodemaker
11709-        d = nm.create_mutable_file("contents 1")
11710+        d = nm.create_mutable_file(MutableData("contents 1"))
11711         def _created(n):
11712             d = defer.succeed(None)
11713             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
11714hunk ./src/allmydata/test/test_mutable.py 2313
11715                 self.g.remove_server(peer0)
11716                 # then modify the file, leaving the old map untouched
11717                 log.msg("starting winning write")
11718-                return n.overwrite("contents 2")
11719+                return n.overwrite(MutableData("contents 2"))
11720             d.addCallback(_got_smap1)
11721             # now attempt to modify the file with the old servermap. This
11722             # will look just like an uncoordinated write, in which every
11723hunk ./src/allmydata/test/test_mutable.py 2323
11724                           self.shouldFail(UncoordinatedWriteError,
11725                                           "test_surprise", None,
11726                                           n.upload,
11727-                                          "contents 2a", self.old_map))
11728+                                          MutableData("contents 2a"), self.old_map))
11729             return d
11730         d.addCallback(_created)
11731         return d
11732hunk ./src/allmydata/test/test_mutable.py 2327
11733+    test_unexpected_shares.timeout = 15
11734 
11735     def test_bad_server(self):
11736         # Break one server, then create the file: the initial publish should
11737hunk ./src/allmydata/test/test_mutable.py 2361
11738         d.addCallback(_break_peer0)
11739         # now "create" the file, using the pre-established key, and let the
11740         # initial publish finally happen
11741-        d.addCallback(lambda res: nm.create_mutable_file("contents 1"))
11742+        d.addCallback(lambda res: nm.create_mutable_file(MutableData("contents 1")))
11743         # that ought to work
11744         def _got_node(n):
11745             d = n.download_best_version()
11746hunk ./src/allmydata/test/test_mutable.py 2370
11747             def _break_peer1(res):
11748                 self.g.break_server(self.server1.get_serverid())
11749             d.addCallback(_break_peer1)
11750-            d.addCallback(lambda res: n.overwrite("contents 2"))
11751+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11752             # that ought to work too
11753             d.addCallback(lambda res: n.download_best_version())
11754             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
11755hunk ./src/allmydata/test/test_mutable.py 2402
11756         peerids = [s.get_serverid() for s in sb.get_connected_servers()]
11757         self.g.break_server(peerids[0])
11758 
11759-        d = nm.create_mutable_file("contents 1")
11760+        d = nm.create_mutable_file(MutableData("contents 1"))
11761         def _created(n):
11762             d = n.download_best_version()
11763             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
11764hunk ./src/allmydata/test/test_mutable.py 2410
11765             def _break_second_server(res):
11766                 self.g.break_server(peerids[1])
11767             d.addCallback(_break_second_server)
11768-            d.addCallback(lambda res: n.overwrite("contents 2"))
11769+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11770             # that ought to work too
11771             d.addCallback(lambda res: n.download_best_version())
11772             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
11773hunk ./src/allmydata/test/test_mutable.py 2429
11774         d = self.shouldFail(NotEnoughServersError,
11775                             "test_publish_all_servers_bad",
11776                             "Ran out of non-bad servers",
11777-                            nm.create_mutable_file, "contents")
11778+                            nm.create_mutable_file, MutableData("contents"))
11779         return d
11780 
11781     def test_publish_no_servers(self):
11782hunk ./src/allmydata/test/test_mutable.py 2441
11783         d = self.shouldFail(NotEnoughServersError,
11784                             "test_publish_no_servers",
11785                             "Ran out of non-bad servers",
11786-                            nm.create_mutable_file, "contents")
11787+                            nm.create_mutable_file, MutableData("contents"))
11788         return d
11789     test_publish_no_servers.timeout = 30
11790 
11791hunk ./src/allmydata/test/test_mutable.py 2459
11792         # we need some contents that are large enough to push the privkey out
11793         # of the early part of the file
11794         LARGE = "These are Larger contents" * 2000 # about 50KB
11795-        d = nm.create_mutable_file(LARGE)
11796+        LARGE_uploadable = MutableData(LARGE)
11797+        d = nm.create_mutable_file(LARGE_uploadable)
11798         def _created(n):
11799             self.uri = n.get_uri()
11800             self.n2 = nm.create_from_cap(self.uri)
11801hunk ./src/allmydata/test/test_mutable.py 2495
11802         self.basedir = "mutable/Problems/test_privkey_query_missing"
11803         self.set_up_grid(num_servers=20)
11804         nm = self.g.clients[0].nodemaker
11805-        LARGE = "These are Larger contents" * 2000 # about 50KB
11806+        LARGE = "These are Larger contents" * 2000 # about 50KiB
11807+        LARGE_uploadable = MutableData(LARGE)
11808         nm._node_cache = DevNullDictionary() # disable the nodecache
11809 
11810hunk ./src/allmydata/test/test_mutable.py 2499
11811-        d = nm.create_mutable_file(LARGE)
11812+        d = nm.create_mutable_file(LARGE_uploadable)
11813         def _created(n):
11814             self.uri = n.get_uri()
11815             self.n2 = nm.create_from_cap(self.uri)
11816hunk ./src/allmydata/test/test_mutable.py 2509
11817         d.addCallback(_created)
11818         d.addCallback(lambda res: self.n2.get_servermap(MODE_WRITE))
11819         return d
11820+
11821+
11822+    def test_block_and_hash_query_error(self):
11823+        # This tests for what happens when a query to a remote server
11824+        # fails in either the hash validation step or the block getting
11825+        # step (because of batching, this is the same actual query).
11826+        # We need to have the storage server persist up until the point
11827+        # that its prefix is validated, then suddenly die. This
11828+        # exercises some exception handling code in Retrieve.
11829+        self.basedir = "mutable/Problems/test_block_and_hash_query_error"
11830+        self.set_up_grid(num_servers=20)
11831+        nm = self.g.clients[0].nodemaker
11832+        CONTENTS = "contents" * 2000
11833+        CONTENTS_uploadable = MutableData(CONTENTS)
11834+        d = nm.create_mutable_file(CONTENTS_uploadable)
11835+        def _created(node):
11836+            self._node = node
11837+        d.addCallback(_created)
11838+        d.addCallback(lambda ignored:
11839+            self._node.get_servermap(MODE_READ))
11840+        def _then(servermap):
11841+            # we have our servermap. Now we set up the servers like the
11842+            # tests above -- the first one that gets a read call should
11843+            # start throwing errors, but only after returning its prefix
11844+            # for validation. Since we'll download without fetching the
11845+            # private key, the next query to the remote server will be
11846+            # for either a block and salt or for hashes, either of which
11847+            # will exercise the error handling code.
11848+            killer = FirstServerGetsKilled()
11849+            for (serverid, ss) in nm.storage_broker.get_all_servers():
11850+                ss.post_call_notifier = killer.notify
11851+            ver = servermap.best_recoverable_version()
11852+            assert ver
11853+            return self._node.download_version(servermap, ver)
11854+        d.addCallback(_then)
11855+        d.addCallback(lambda data:
11856+            self.failUnlessEqual(data, CONTENTS))
11857+        return d
11858+
11859+
11860+class FileHandle(unittest.TestCase):
11861+    def setUp(self):
11862+        self.test_data = "Test Data" * 50000
11863+        self.sio = StringIO(self.test_data)
11864+        self.uploadable = MutableFileHandle(self.sio)
11865+
11866+
11867+    def test_filehandle_read(self):
11868+        self.basedir = "mutable/FileHandle/test_filehandle_read"
11869+        chunk_size = 10
11870+        for i in xrange(0, len(self.test_data), chunk_size):
11871+            data = self.uploadable.read(chunk_size)
11872+            data = "".join(data)
11873+            start = i
11874+            end = i + chunk_size
11875+            self.failUnlessEqual(data, self.test_data[start:end])
11876+
11877+
11878+    def test_filehandle_get_size(self):
11879+        self.basedir = "mutable/FileHandle/test_filehandle_get_size"
11880+        actual_size = len(self.test_data)
11881+        size = self.uploadable.get_size()
11882+        self.failUnlessEqual(size, actual_size)
11883+
11884+
11885+    def test_filehandle_get_size_out_of_order(self):
11886+        # We should be able to call get_size whenever we want without
11887+        # disturbing the location of the seek pointer.
11888+        chunk_size = 100
11889+        data = self.uploadable.read(chunk_size)
11890+        self.failUnlessEqual("".join(data), self.test_data[:chunk_size])
11891+
11892+        # Now get the size.
11893+        size = self.uploadable.get_size()
11894+        self.failUnlessEqual(size, len(self.test_data))
11895+
11896+        # Now get more data. We should be right where we left off.
11897+        more_data = self.uploadable.read(chunk_size)
11898+        start = chunk_size
11899+        end = chunk_size * 2
11900+        self.failUnlessEqual("".join(more_data), self.test_data[start:end])
11901+
11902+
11903+    def test_filehandle_file(self):
11904+        # Make sure that the MutableFileHandle works on a file as well
11905+        # as a StringIO object, since in some cases it will be asked to
11906+        # deal with files.
11907+        self.basedir = self.mktemp()
11908+        # necessary? What am I doing wrong here?
11909+        os.mkdir(self.basedir)
11910+        f_path = os.path.join(self.basedir, "test_file")
11911+        f = open(f_path, "w")
11912+        f.write(self.test_data)
11913+        f.close()
11914+        f = open(f_path, "r")
11915+
11916+        uploadable = MutableFileHandle(f)
11917+
11918+        data = uploadable.read(len(self.test_data))
11919+        self.failUnlessEqual("".join(data), self.test_data)
11920+        size = uploadable.get_size()
11921+        self.failUnlessEqual(size, len(self.test_data))
11922+
11923+
11924+    def test_close(self):
11925+        # Make sure that the MutableFileHandle closes its handle when
11926+        # told to do so.
11927+        self.uploadable.close()
11928+        self.failUnless(self.sio.closed)
11929+
11930+
11931+class DataHandle(unittest.TestCase):
11932+    def setUp(self):
11933+        self.test_data = "Test Data" * 50000
11934+        self.uploadable = MutableData(self.test_data)
11935+
11936+
11937+    def test_datahandle_read(self):
11938+        chunk_size = 10
11939+        for i in xrange(0, len(self.test_data), chunk_size):
11940+            data = self.uploadable.read(chunk_size)
11941+            data = "".join(data)
11942+            start = i
11943+            end = i + chunk_size
11944+            self.failUnlessEqual(data, self.test_data[start:end])
11945+
11946+
11947+    def test_datahandle_get_size(self):
11948+        actual_size = len(self.test_data)
11949+        size = self.uploadable.get_size()
11950+        self.failUnlessEqual(size, actual_size)
11951+
11952+
11953+    def test_datahandle_get_size_out_of_order(self):
11954+        # We should be able to call get_size whenever we want without
11955+        # disturbing the location of the seek pointer.
11956+        chunk_size = 100
11957+        data = self.uploadable.read(chunk_size)
11958+        self.failUnlessEqual("".join(data), self.test_data[:chunk_size])
11959+
11960+        # Now get the size.
11961+        size = self.uploadable.get_size()
11962+        self.failUnlessEqual(size, len(self.test_data))
11963+
11964+        # Now get more data. We should be right where we left off.
11965+        more_data = self.uploadable.read(chunk_size)
11966+        start = chunk_size
11967+        end = chunk_size * 2
11968+        self.failUnlessEqual("".join(more_data), self.test_data[start:end])
11969+
11970+
11971+class Version(GridTestMixin, unittest.TestCase, testutil.ShouldFailMixin, \
11972+              PublishMixin):
11973+    def setUp(self):
11974+        GridTestMixin.setUp(self)
11975+        self.basedir = self.mktemp()
11976+        self.set_up_grid()
11977+        self.c = self.g.clients[0]
11978+        self.nm = self.c.nodemaker
11979+        self.data = "test data" * 100000 # about 900 KiB; MDMF
11980+        self.small_data = "test data" * 10 # about 90 B; SDMF
11981+        return self.do_upload()
11982+
11983+
11984+    def do_upload(self):
11985+        d1 = self.nm.create_mutable_file(MutableData(self.data),
11986+                                         version=MDMF_VERSION)
11987+        d2 = self.nm.create_mutable_file(MutableData(self.small_data))
11988+        dl = gatherResults([d1, d2])
11989+        def _then((n1, n2)):
11990+            assert isinstance(n1, MutableFileNode)
11991+            assert isinstance(n2, MutableFileNode)
11992+
11993+            self.mdmf_node = n1
11994+            self.sdmf_node = n2
11995+        dl.addCallback(_then)
11996+        return dl
11997+
11998+
11999+    def test_get_readonly_mutable_version(self):
12000+        # Attempting to get a mutable version of a mutable file from a
12001+        # filenode initialized with a readcap should return a readonly
12002+        # version of that same node.
12003+        ro = self.mdmf_node.get_readonly()
12004+        d = ro.get_best_mutable_version()
12005+        d.addCallback(lambda version:
12006+            self.failUnless(version.is_readonly()))
12007+        d.addCallback(lambda ignored:
12008+            self.sdmf_node.get_readonly())
12009+        d.addCallback(lambda version:
12010+            self.failUnless(version.is_readonly()))
12011+        return d
12012+
12013+
12014+    def test_get_sequence_number(self):
12015+        d = self.mdmf_node.get_best_readable_version()
12016+        d.addCallback(lambda bv:
12017+            self.failUnlessEqual(bv.get_sequence_number(), 1))
12018+        d.addCallback(lambda ignored:
12019+            self.sdmf_node.get_best_readable_version())
12020+        d.addCallback(lambda bv:
12021+            self.failUnlessEqual(bv.get_sequence_number(), 1))
12022+        # Now update. The sequence number in both cases should be 1 in
12023+        # both cases.
12024+        def _do_update(ignored):
12025+            new_data = MutableData("foo bar baz" * 100000)
12026+            new_small_data = MutableData("foo bar baz" * 10)
12027+            d1 = self.mdmf_node.overwrite(new_data)
12028+            d2 = self.sdmf_node.overwrite(new_small_data)
12029+            dl = gatherResults([d1, d2])
12030+            return dl
12031+        d.addCallback(_do_update)
12032+        d.addCallback(lambda ignored:
12033+            self.mdmf_node.get_best_readable_version())
12034+        d.addCallback(lambda bv:
12035+            self.failUnlessEqual(bv.get_sequence_number(), 2))
12036+        d.addCallback(lambda ignored:
12037+            self.sdmf_node.get_best_readable_version())
12038+        d.addCallback(lambda bv:
12039+            self.failUnlessEqual(bv.get_sequence_number(), 2))
12040+        return d
12041+
12042+
12043+    def test_get_writekey(self):
12044+        d = self.mdmf_node.get_best_mutable_version()
12045+        d.addCallback(lambda bv:
12046+            self.failUnlessEqual(bv.get_writekey(),
12047+                                 self.mdmf_node.get_writekey()))
12048+        d.addCallback(lambda ignored:
12049+            self.sdmf_node.get_best_mutable_version())
12050+        d.addCallback(lambda bv:
12051+            self.failUnlessEqual(bv.get_writekey(),
12052+                                 self.sdmf_node.get_writekey()))
12053+        return d
12054+
12055+
12056+    def test_get_storage_index(self):
12057+        d = self.mdmf_node.get_best_mutable_version()
12058+        d.addCallback(lambda bv:
12059+            self.failUnlessEqual(bv.get_storage_index(),
12060+                                 self.mdmf_node.get_storage_index()))
12061+        d.addCallback(lambda ignored:
12062+            self.sdmf_node.get_best_mutable_version())
12063+        d.addCallback(lambda bv:
12064+            self.failUnlessEqual(bv.get_storage_index(),
12065+                                 self.sdmf_node.get_storage_index()))
12066+        return d
12067+
12068+
12069+    def test_get_readonly_version(self):
12070+        d = self.mdmf_node.get_best_readable_version()
12071+        d.addCallback(lambda bv:
12072+            self.failUnless(bv.is_readonly()))
12073+        d.addCallback(lambda ignored:
12074+            self.sdmf_node.get_best_readable_version())
12075+        d.addCallback(lambda bv:
12076+            self.failUnless(bv.is_readonly()))
12077+        return d
12078+
12079+
12080+    def test_get_mutable_version(self):
12081+        d = self.mdmf_node.get_best_mutable_version()
12082+        d.addCallback(lambda bv:
12083+            self.failIf(bv.is_readonly()))
12084+        d.addCallback(lambda ignored:
12085+            self.sdmf_node.get_best_mutable_version())
12086+        d.addCallback(lambda bv:
12087+            self.failIf(bv.is_readonly()))
12088+        return d
12089+
12090+
12091+    def test_toplevel_overwrite(self):
12092+        new_data = MutableData("foo bar baz" * 100000)
12093+        new_small_data = MutableData("foo bar baz" * 10)
12094+        d = self.mdmf_node.overwrite(new_data)
12095+        d.addCallback(lambda ignored:
12096+            self.mdmf_node.download_best_version())
12097+        d.addCallback(lambda data:
12098+            self.failUnlessEqual(data, "foo bar baz" * 100000))
12099+        d.addCallback(lambda ignored:
12100+            self.sdmf_node.overwrite(new_small_data))
12101+        d.addCallback(lambda ignored:
12102+            self.sdmf_node.download_best_version())
12103+        d.addCallback(lambda data:
12104+            self.failUnlessEqual(data, "foo bar baz" * 10))
12105+        return d
12106+
12107+
12108+    def test_toplevel_modify(self):
12109+        def modifier(old_contents, servermap, first_time):
12110+            return old_contents + "modified"
12111+        d = self.mdmf_node.modify(modifier)
12112+        d.addCallback(lambda ignored:
12113+            self.mdmf_node.download_best_version())
12114+        d.addCallback(lambda data:
12115+            self.failUnlessIn("modified", data))
12116+        d.addCallback(lambda ignored:
12117+            self.sdmf_node.modify(modifier))
12118+        d.addCallback(lambda ignored:
12119+            self.sdmf_node.download_best_version())
12120+        d.addCallback(lambda data:
12121+            self.failUnlessIn("modified", data))
12122+        return d
12123+
12124+
12125+    def test_version_modify(self):
12126+        # TODO: When we can publish multiple versions, alter this test
12127+        # to modify a version other than the best usable version, then
12128+        # test to see that the best recoverable version is that.
12129+        def modifier(old_contents, servermap, first_time):
12130+            return old_contents + "modified"
12131+        d = self.mdmf_node.modify(modifier)
12132+        d.addCallback(lambda ignored:
12133+            self.mdmf_node.download_best_version())
12134+        d.addCallback(lambda data:
12135+            self.failUnlessIn("modified", data))
12136+        d.addCallback(lambda ignored:
12137+            self.sdmf_node.modify(modifier))
12138+        d.addCallback(lambda ignored:
12139+            self.sdmf_node.download_best_version())
12140+        d.addCallback(lambda data:
12141+            self.failUnlessIn("modified", data))
12142+        return d
12143+
12144+
12145+    def test_download_version(self):
12146+        d = self.publish_multiple()
12147+        # We want to have two recoverable versions on the grid.
12148+        d.addCallback(lambda res:
12149+                      self._set_versions({0:0,2:0,4:0,6:0,8:0,
12150+                                          1:1,3:1,5:1,7:1,9:1}))
12151+        # Now try to download each version. We should get the plaintext
12152+        # associated with that version.
12153+        d.addCallback(lambda ignored:
12154+            self._fn.get_servermap(mode=MODE_READ))
12155+        def _got_servermap(smap):
12156+            versions = smap.recoverable_versions()
12157+            assert len(versions) == 2
12158+
12159+            self.servermap = smap
12160+            self.version1, self.version2 = versions
12161+            assert self.version1 != self.version2
12162+
12163+            self.version1_seqnum = self.version1[0]
12164+            self.version2_seqnum = self.version2[0]
12165+            self.version1_index = self.version1_seqnum - 1
12166+            self.version2_index = self.version2_seqnum - 1
12167+
12168+        d.addCallback(_got_servermap)
12169+        d.addCallback(lambda ignored:
12170+            self._fn.download_version(self.servermap, self.version1))
12171+        d.addCallback(lambda results:
12172+            self.failUnlessEqual(self.CONTENTS[self.version1_index],
12173+                                 results))
12174+        d.addCallback(lambda ignored:
12175+            self._fn.download_version(self.servermap, self.version2))
12176+        d.addCallback(lambda results:
12177+            self.failUnlessEqual(self.CONTENTS[self.version2_index],
12178+                                 results))
12179+        return d
12180+
12181+
12182+    def test_download_nonexistent_version(self):
12183+        d = self.mdmf_node.get_servermap(mode=MODE_WRITE)
12184+        def _set_servermap(servermap):
12185+            self.servermap = servermap
12186+        d.addCallback(_set_servermap)
12187+        d.addCallback(lambda ignored:
12188+           self.shouldFail(UnrecoverableFileError, "nonexistent version",
12189+                           None,
12190+                           self.mdmf_node.download_version, self.servermap,
12191+                           "not a version"))
12192+        return d
12193+
12194+
12195+    def test_partial_read(self):
12196+        # read only a few bytes at a time, and see that the results are
12197+        # what we expect.
12198+        d = self.mdmf_node.get_best_readable_version()
12199+        def _read_data(version):
12200+            c = consumer.MemoryConsumer()
12201+            d2 = defer.succeed(None)
12202+            for i in xrange(0, len(self.data), 10000):
12203+                d2.addCallback(lambda ignored, i=i: version.read(c, i, 10000))
12204+            d2.addCallback(lambda ignored:
12205+                self.failUnlessEqual(self.data, "".join(c.chunks)))
12206+            return d2
12207+        d.addCallback(_read_data)
12208+        return d
12209+
12210+
12211+    def test_read(self):
12212+        d = self.mdmf_node.get_best_readable_version()
12213+        def _read_data(version):
12214+            c = consumer.MemoryConsumer()
12215+            d2 = defer.succeed(None)
12216+            d2.addCallback(lambda ignored: version.read(c))
12217+            d2.addCallback(lambda ignored:
12218+                self.failUnlessEqual("".join(c.chunks), self.data))
12219+            return d2
12220+        d.addCallback(_read_data)
12221+        return d
12222+
12223+
12224+    def test_download_best_version(self):
12225+        d = self.mdmf_node.download_best_version()
12226+        d.addCallback(lambda data:
12227+            self.failUnlessEqual(data, self.data))
12228+        d.addCallback(lambda ignored:
12229+            self.sdmf_node.download_best_version())
12230+        d.addCallback(lambda data:
12231+            self.failUnlessEqual(data, self.small_data))
12232+        return d
12233+
12234+
12235+class Update(GridTestMixin, unittest.TestCase, testutil.ShouldFailMixin):
12236+    def setUp(self):
12237+        GridTestMixin.setUp(self)
12238+        self.basedir = self.mktemp()
12239+        self.set_up_grid()
12240+        self.c = self.g.clients[0]
12241+        self.nm = self.c.nodemaker
12242+        self.data = "test data" * 100000 # about 900 KiB; MDMF
12243+        self.small_data = "test data" * 10 # about 90 B; SDMF
12244+        return self.do_upload()
12245+
12246+
12247+    def do_upload(self):
12248+        d1 = self.nm.create_mutable_file(MutableData(self.data),
12249+                                         version=MDMF_VERSION)
12250+        d2 = self.nm.create_mutable_file(MutableData(self.small_data))
12251+        dl = gatherResults([d1, d2])
12252+        def _then((n1, n2)):
12253+            assert isinstance(n1, MutableFileNode)
12254+            assert isinstance(n2, MutableFileNode)
12255+
12256+            self.mdmf_node = n1
12257+            self.sdmf_node = n2
12258+        dl.addCallback(_then)
12259+        return dl
12260+
12261+
12262+    def test_append(self):
12263+        # We should be able to append data to the middle of a mutable
12264+        # file and get what we expect.
12265+        new_data = self.data + "appended"
12266+        d = self.mdmf_node.get_best_mutable_version()
12267+        d.addCallback(lambda mv:
12268+            mv.update(MutableData("appended"), len(self.data)))
12269+        d.addCallback(lambda ignored:
12270+            self.mdmf_node.download_best_version())
12271+        d.addCallback(lambda results:
12272+            self.failUnlessEqual(results, new_data))
12273+        return d
12274+    test_append.timeout = 15
12275+
12276+
12277+    def test_replace(self):
12278+        # We should be able to replace data in the middle of a mutable
12279+        # file and get what we expect back.
12280+        new_data = self.data[:100]
12281+        new_data += "appended"
12282+        new_data += self.data[108:]
12283+        d = self.mdmf_node.get_best_mutable_version()
12284+        d.addCallback(lambda mv:
12285+            mv.update(MutableData("appended"), 100))
12286+        d.addCallback(lambda ignored:
12287+            self.mdmf_node.download_best_version())
12288+        d.addCallback(lambda results:
12289+            self.failUnlessEqual(results, new_data))
12290+        return d
12291+
12292+
12293+    def test_replace_and_extend(self):
12294+        # We should be able to replace data in the middle of a mutable
12295+        # file and extend that mutable file and get what we expect.
12296+        new_data = self.data[:100]
12297+        new_data += "modified " * 100000
12298+        d = self.mdmf_node.get_best_mutable_version()
12299+        d.addCallback(lambda mv:
12300+            mv.update(MutableData("modified " * 100000), 100))
12301+        d.addCallback(lambda ignored:
12302+            self.mdmf_node.download_best_version())
12303+        d.addCallback(lambda results:
12304+            self.failUnlessEqual(results, new_data))
12305+        return d
12306+
12307+
12308+    def test_append_power_of_two(self):
12309+        # If we attempt to extend a mutable file so that its segment
12310+        # count crosses a power-of-two boundary, the update operation
12311+        # should know how to reencode the file.
12312+
12313+        # Note that the data populating self.mdmf_node is about 900 KiB
12314+        # long -- this is 7 segments in the default segment size. So we
12315+        # need to add 2 segments worth of data to push it over a
12316+        # power-of-two boundary.
12317+        segment = "a" * DEFAULT_MAX_SEGMENT_SIZE
12318+        new_data = self.data + (segment * 2)
12319+        d = self.mdmf_node.get_best_mutable_version()
12320+        d.addCallback(lambda mv:
12321+            mv.update(MutableData(segment * 2), len(self.data)))
12322+        d.addCallback(lambda ignored:
12323+            self.mdmf_node.download_best_version())
12324+        d.addCallback(lambda results:
12325+            self.failUnlessEqual(results, new_data))
12326+        return d
12327+    test_append_power_of_two.timeout = 15
12328+
12329+
12330+    def test_update_sdmf(self):
12331+        # Running update on a single-segment file should still work.
12332+        new_data = self.small_data + "appended"
12333+        d = self.sdmf_node.get_best_mutable_version()
12334+        d.addCallback(lambda mv:
12335+            mv.update(MutableData("appended"), len(self.small_data)))
12336+        d.addCallback(lambda ignored:
12337+            self.sdmf_node.download_best_version())
12338+        d.addCallback(lambda results:
12339+            self.failUnlessEqual(results, new_data))
12340+        return d
12341+
12342+    def test_replace_in_last_segment(self):
12343+        # The wrapper should know how to handle the tail segment
12344+        # appropriately.
12345+        replace_offset = len(self.data) - 100
12346+        new_data = self.data[:replace_offset] + "replaced"
12347+        rest_offset = replace_offset + len("replaced")
12348+        new_data += self.data[rest_offset:]
12349+        d = self.mdmf_node.get_best_mutable_version()
12350+        d.addCallback(lambda mv:
12351+            mv.update(MutableData("replaced"), replace_offset))
12352+        d.addCallback(lambda ignored:
12353+            self.mdmf_node.download_best_version())
12354+        d.addCallback(lambda results:
12355+            self.failUnlessEqual(results, new_data))
12356+        return d
12357+
12358+
12359+    def test_multiple_segment_replace(self):
12360+        replace_offset = 2 * DEFAULT_MAX_SEGMENT_SIZE
12361+        new_data = self.data[:replace_offset]
12362+        new_segment = "a" * DEFAULT_MAX_SEGMENT_SIZE
12363+        new_data += 2 * new_segment
12364+        new_data += "replaced"
12365+        rest_offset = len(new_data)
12366+        new_data += self.data[rest_offset:]
12367+        d = self.mdmf_node.get_best_mutable_version()
12368+        d.addCallback(lambda mv:
12369+            mv.update(MutableData((2 * new_segment) + "replaced"),
12370+                      replace_offset))
12371+        d.addCallback(lambda ignored:
12372+            self.mdmf_node.download_best_version())
12373+        d.addCallback(lambda results:
12374+            self.failUnlessEqual(results, new_data))
12375+        return d
12376hunk ./src/allmydata/test/test_sftp.py 32
12377 
12378 from allmydata.util.consumer import download_to_data
12379 from allmydata.immutable import upload
12380+from allmydata.mutable import publish
12381 from allmydata.test.no_network import GridTestMixin
12382 from allmydata.test.common import ShouldFailMixin
12383 from allmydata.test.common_util import ReallyEqualMixin
12384hunk ./src/allmydata/test/test_sftp.py 84
12385         return d
12386 
12387     def _set_up_tree(self):
12388-        d = self.client.create_mutable_file("mutable file contents")
12389+        u = publish.MutableData("mutable file contents")
12390+        d = self.client.create_mutable_file(u)
12391         d.addCallback(lambda node: self.root.set_node(u"mutable", node))
12392         def _created_mutable(n):
12393             self.mutable = n
12394hunk ./src/allmydata/test/test_sftp.py 1334
12395         d.addCallback(lambda ign: self.failUnlessEqual(sftpd.all_heisenfiles, {}))
12396         d.addCallback(lambda ign: self.failUnlessEqual(self.handler._heisenfiles, {}))
12397         return d
12398+    test_makeDirectory.timeout = 15
12399 
12400     def test_execCommand_and_openShell(self):
12401         class FakeProtocol:
12402hunk ./src/allmydata/test/test_storage.py 27
12403                                      LayoutInvalid, MDMFSIGNABLEHEADER, \
12404                                      SIGNED_PREFIX, MDMFHEADER, \
12405                                      MDMFOFFSETS, SDMFSlotWriteProxy
12406-from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \
12407-                                 SDMF_VERSION
12408+from allmydata.interfaces import BadWriteEnablerError
12409 from allmydata.test.common import LoggingServiceParent, ShouldFailMixin
12410 from allmydata.test.common_web import WebRenderingMixin
12411 from allmydata.web.storage import StorageStatus, remove_prefix
12412hunk ./src/allmydata/test/test_system.py 26
12413 from allmydata.monitor import Monitor
12414 from allmydata.mutable.common import NotWriteableError
12415 from allmydata.mutable import layout as mutable_layout
12416+from allmydata.mutable.publish import MutableData
12417 from foolscap.api import DeadReferenceError
12418 from twisted.python.failure import Failure
12419 from twisted.web.client import getPage
12420hunk ./src/allmydata/test/test_system.py 467
12421     def test_mutable(self):
12422         self.basedir = "system/SystemTest/test_mutable"
12423         DATA = "initial contents go here."  # 25 bytes % 3 != 0
12424+        DATA_uploadable = MutableData(DATA)
12425         NEWDATA = "new contents yay"
12426hunk ./src/allmydata/test/test_system.py 469
12427+        NEWDATA_uploadable = MutableData(NEWDATA)
12428         NEWERDATA = "this is getting old"
12429hunk ./src/allmydata/test/test_system.py 471
12430+        NEWERDATA_uploadable = MutableData(NEWERDATA)
12431 
12432         d = self.set_up_nodes(use_key_generator=True)
12433 
12434hunk ./src/allmydata/test/test_system.py 478
12435         def _create_mutable(res):
12436             c = self.clients[0]
12437             log.msg("starting create_mutable_file")
12438-            d1 = c.create_mutable_file(DATA)
12439+            d1 = c.create_mutable_file(DATA_uploadable)
12440             def _done(res):
12441                 log.msg("DONE: %s" % (res,))
12442                 self._mutable_node_1 = res
12443hunk ./src/allmydata/test/test_system.py 565
12444             self.failUnlessEqual(res, DATA)
12445             # replace the data
12446             log.msg("starting replace1")
12447-            d1 = newnode.overwrite(NEWDATA)
12448+            d1 = newnode.overwrite(NEWDATA_uploadable)
12449             d1.addCallback(lambda res: newnode.download_best_version())
12450             return d1
12451         d.addCallback(_check_download_3)
12452hunk ./src/allmydata/test/test_system.py 579
12453             newnode2 = self.clients[3].create_node_from_uri(uri)
12454             self._newnode3 = self.clients[3].create_node_from_uri(uri)
12455             log.msg("starting replace2")
12456-            d1 = newnode1.overwrite(NEWERDATA)
12457+            d1 = newnode1.overwrite(NEWERDATA_uploadable)
12458             d1.addCallback(lambda res: newnode2.download_best_version())
12459             return d1
12460         d.addCallback(_check_download_4)
12461hunk ./src/allmydata/test/test_system.py 649
12462         def _check_empty_file(res):
12463             # make sure we can create empty files, this usually screws up the
12464             # segsize math
12465-            d1 = self.clients[2].create_mutable_file("")
12466+            d1 = self.clients[2].create_mutable_file(MutableData(""))
12467             d1.addCallback(lambda newnode: newnode.download_best_version())
12468             d1.addCallback(lambda res: self.failUnlessEqual("", res))
12469             return d1
12470hunk ./src/allmydata/test/test_system.py 680
12471                                  self.key_generator_svc.key_generator.pool_size + size_delta)
12472 
12473         d.addCallback(check_kg_poolsize, 0)
12474-        d.addCallback(lambda junk: self.clients[3].create_mutable_file('hello, world'))
12475+        d.addCallback(lambda junk:
12476+            self.clients[3].create_mutable_file(MutableData('hello, world')))
12477         d.addCallback(check_kg_poolsize, -1)
12478         d.addCallback(lambda junk: self.clients[3].create_dirnode())
12479         d.addCallback(check_kg_poolsize, -2)
12480hunk ./src/allmydata/test/test_web.py 28
12481 from allmydata.util.encodingutil import to_str
12482 from allmydata.test.common import FakeCHKFileNode, FakeMutableFileNode, \
12483      create_chk_filenode, WebErrorMixin, ShouldFailMixin, make_mutable_file_uri
12484-from allmydata.interfaces import IMutableFileNode
12485+from allmydata.interfaces import IMutableFileNode, SDMF_VERSION, MDMF_VERSION
12486 from allmydata.mutable import servermap, publish, retrieve
12487 import allmydata.test.common_util as testutil
12488 from allmydata.test.no_network import GridTestMixin
12489hunk ./src/allmydata/test/test_web.py 57
12490         return FakeCHKFileNode(cap)
12491     def _create_mutable(self, cap):
12492         return FakeMutableFileNode(None, None, None, None).init_from_cap(cap)
12493-    def create_mutable_file(self, contents="", keysize=None):
12494+    def create_mutable_file(self, contents="", keysize=None,
12495+                            version=SDMF_VERSION):
12496         n = FakeMutableFileNode(None, None, None, None)
12497hunk ./src/allmydata/test/test_web.py 60
12498+        n.set_version(version)
12499         return n.create(contents)
12500 
12501 class FakeUploader(service.Service):
12502hunk ./src/allmydata/test/test_web.py 157
12503         self.nodemaker = FakeNodeMaker(None, self._secret_holder, None,
12504                                        self.uploader, None,
12505                                        None, None)
12506+        self.mutable_file_default = SDMF_VERSION
12507 
12508     def startService(self):
12509         return service.MultiService.startService(self)
12510hunk ./src/allmydata/test/test_web.py 762
12511                              self.PUT, base + "/@@name=/blah.txt", "")
12512         return d
12513 
12514+
12515     def test_GET_DIRURL_named_bad(self):
12516         base = "/file/%s" % urllib.quote(self._foo_uri)
12517         d = self.shouldFail2(error.Error, "test_PUT_DIRURL_named_bad",
12518hunk ./src/allmydata/test/test_web.py 878
12519                                                       self.NEWFILE_CONTENTS))
12520         return d
12521 
12522+    def test_PUT_NEWFILEURL_unlinked_mdmf(self):
12523+        # this should get us a few segments of an MDMF mutable file,
12524+        # which we can then test for.
12525+        contents = self.NEWFILE_CONTENTS * 300000
12526+        d = self.PUT("/uri?mutable=true&mutable-type=mdmf",
12527+                     contents)
12528+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12529+        d.addCallback(lambda json: self.failUnlessIn("mdmf", json))
12530+        return d
12531+
12532+    def test_PUT_NEWFILEURL_unlinked_sdmf(self):
12533+        contents = self.NEWFILE_CONTENTS * 300000
12534+        d = self.PUT("/uri?mutable=true&mutable-type=sdmf",
12535+                     contents)
12536+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12537+        d.addCallback(lambda json: self.failUnlessIn("sdmf", json))
12538+        return d
12539+
12540     def test_PUT_NEWFILEURL_range_bad(self):
12541         headers = {"content-range": "bytes 1-10/%d" % len(self.NEWFILE_CONTENTS)}
12542         target = self.public_url + "/foo/new.txt"
12543hunk ./src/allmydata/test/test_web.py 928
12544         return d
12545 
12546     def test_PUT_NEWFILEURL_mutable_toobig(self):
12547-        d = self.shouldFail2(error.Error, "test_PUT_NEWFILEURL_mutable_toobig",
12548-                             "413 Request Entity Too Large",
12549-                             "SDMF is limited to one segment, and 10001 > 10000",
12550-                             self.PUT,
12551-                             self.public_url + "/foo/new.txt?mutable=true",
12552-                             "b" * (self.s.MUTABLE_SIZELIMIT+1))
12553+        # It is okay to upload large mutable files, so we should be able
12554+        # to do that.
12555+        d = self.PUT(self.public_url + "/foo/new.txt?mutable=true",
12556+                     "b" * (self.s.MUTABLE_SIZELIMIT + 1))
12557         return d
12558 
12559     def test_PUT_NEWFILEURL_replace(self):
12560hunk ./src/allmydata/test/test_web.py 1026
12561         d.addCallback(_check1)
12562         return d
12563 
12564+    def test_GET_FILEURL_json_mutable_type(self):
12565+        # The JSON should include mutable-type, which says whether the
12566+        # file is SDMF or MDMF
12567+        d = self.PUT("/uri?mutable=true&mutable-type=mdmf",
12568+                     self.NEWFILE_CONTENTS * 300000)
12569+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12570+        def _got_json(json, version):
12571+            data = simplejson.loads(json)
12572+            assert "filenode" == data[0]
12573+            data = data[1]
12574+            assert isinstance(data, dict)
12575+
12576+            self.failUnlessIn("mutable-type", data)
12577+            self.failUnlessEqual(data['mutable-type'], version)
12578+
12579+        d.addCallback(_got_json, "mdmf")
12580+        # Now make an SDMF file and check that it is reported correctly.
12581+        d.addCallback(lambda ignored:
12582+            self.PUT("/uri?mutable=true&mutable-type=sdmf",
12583+                      self.NEWFILE_CONTENTS * 300000))
12584+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12585+        d.addCallback(_got_json, "sdmf")
12586+        return d
12587+
12588     def test_GET_FILEURL_json_missing(self):
12589         d = self.GET(self.public_url + "/foo/missing?json")
12590         d.addBoth(self.should404, "test_GET_FILEURL_json_missing")
12591hunk ./src/allmydata/test/test_web.py 1088
12592         d.addBoth(self.should404, "test_GET_FILEURL_uri_missing")
12593         return d
12594 
12595-    def test_GET_DIRECTORY_html_banner(self):
12596+    def test_GET_DIRECTORY_html(self):
12597         d = self.GET(self.public_url + "/foo", followRedirect=True)
12598         def _check(res):
12599             self.failUnlessIn('<div class="toolbar-item"><a href="../../..">Return to Welcome page</a></div>',res)
12600hunk ./src/allmydata/test/test_web.py 1092
12601+            self.failUnlessIn("mutable-type-mdmf", res)
12602+            self.failUnlessIn("mutable-type-sdmf", res)
12603         d.addCallback(_check)
12604         return d
12605 
12606hunk ./src/allmydata/test/test_web.py 1097
12607+    def test_GET_root_html(self):
12608+        # make sure that we have the option to upload an unlinked
12609+        # mutable file in SDMF and MDMF formats.
12610+        d = self.GET("/")
12611+        def _got_html(html):
12612+            # These are radio buttons that allow the user to toggle
12613+            # whether a particular mutable file is MDMF or SDMF.
12614+            self.failUnlessIn("mutable-type-mdmf", html)
12615+            self.failUnlessIn("mutable-type-sdmf", html)
12616+        d.addCallback(_got_html)
12617+        return d
12618+
12619+    def test_mutable_type_defaults(self):
12620+        # The checked="checked" attribute of the inputs corresponding to
12621+        # the mutable-type parameter should change as expected with the
12622+        # value configured in tahoe.cfg.
12623+        #
12624+        # By default, the value configured with the client is
12625+        # SDMF_VERSION, so that should be checked.
12626+        assert self.s.mutable_file_default == SDMF_VERSION
12627+
12628+        d = self.GET("/")
12629+        def _got_html(html, value):
12630+            i = 'input checked="checked" type="radio" id="mutable-type-%s"'
12631+            self.failUnlessIn(i % value, html)
12632+        d.addCallback(_got_html, "sdmf")
12633+        d.addCallback(lambda ignored:
12634+            self.GET(self.public_url + "/foo", followRedirect=True))
12635+        d.addCallback(_got_html, "sdmf")
12636+        # Now switch the configuration value to MDMF. The MDMF radio
12637+        # buttons should now be checked on these pages.
12638+        def _swap_values(ignored):
12639+            self.s.mutable_file_default = MDMF_VERSION
12640+        d.addCallback(_swap_values)
12641+        d.addCallback(lambda ignored: self.GET("/"))
12642+        d.addCallback(_got_html, "mdmf")
12643+        d.addCallback(lambda ignored:
12644+            self.GET(self.public_url + "/foo", followRedirect=True))
12645+        d.addCallback(_got_html, "mdmf")
12646+        return d
12647+
12648     def test_GET_DIRURL(self):
12649         # the addSlash means we get a redirect here
12650         # from /uri/$URI/foo/ , we need ../../../ to get back to the root
12651hunk ./src/allmydata/test/test_web.py 1227
12652         d.addCallback(self.failUnlessIsFooJSON)
12653         return d
12654 
12655+    def test_GET_DIRURL_json_mutable_type(self):
12656+        d = self.PUT(self.public_url + \
12657+                     "/foo/sdmf.txt?mutable=true&mutable-type=sdmf",
12658+                     self.NEWFILE_CONTENTS * 300000)
12659+        d.addCallback(lambda ignored:
12660+            self.PUT(self.public_url + \
12661+                     "/foo/mdmf.txt?mutable=true&mutable-type=mdmf",
12662+                     self.NEWFILE_CONTENTS * 300000))
12663+        # Now we have an MDMF and SDMF file in the directory. If we GET
12664+        # its JSON, we should see their encodings.
12665+        d.addCallback(lambda ignored:
12666+            self.GET(self.public_url + "/foo?t=json"))
12667+        def _got_json(json):
12668+            data = simplejson.loads(json)
12669+            assert data[0] == "dirnode"
12670+
12671+            data = data[1]
12672+            kids = data['children']
12673+
12674+            mdmf_data = kids['mdmf.txt'][1]
12675+            self.failUnlessIn("mutable-type", mdmf_data)
12676+            self.failUnlessEqual(mdmf_data['mutable-type'], "mdmf")
12677+
12678+            sdmf_data = kids['sdmf.txt'][1]
12679+            self.failUnlessIn("mutable-type", sdmf_data)
12680+            self.failUnlessEqual(sdmf_data['mutable-type'], "sdmf")
12681+        d.addCallback(_got_json)
12682+        return d
12683+
12684 
12685     def test_POST_DIRURL_manifest_no_ophandle(self):
12686         d = self.shouldFail2(error.Error,
12687hunk ./src/allmydata/test/test_web.py 1810
12688         return d
12689 
12690     def test_POST_upload_no_link_mutable_toobig(self):
12691-        d = self.shouldFail2(error.Error,
12692-                             "test_POST_upload_no_link_mutable_toobig",
12693-                             "413 Request Entity Too Large",
12694-                             "SDMF is limited to one segment, and 10001 > 10000",
12695-                             self.POST,
12696-                             "/uri", t="upload", mutable="true",
12697-                             file=("new.txt",
12698-                                   "b" * (self.s.MUTABLE_SIZELIMIT+1)) )
12699+        # The SDMF size limit is no longer in place, so we should be
12700+        # able to upload mutable files that are as large as we want them
12701+        # to be.
12702+        d = self.POST("/uri", t="upload", mutable="true",
12703+                      file=("new.txt", "b" * (self.s.MUTABLE_SIZELIMIT + 1)))
12704         return d
12705 
12706hunk ./src/allmydata/test/test_web.py 1817
12707+
12708+    def test_POST_upload_mutable_type_unlinked(self):
12709+        d = self.POST("/uri?t=upload&mutable=true&mutable-type=sdmf",
12710+                      file=("sdmf.txt", self.NEWFILE_CONTENTS * 300000))
12711+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12712+        def _got_json(json, version):
12713+            data = simplejson.loads(json)
12714+            data = data[1]
12715+
12716+            self.failUnlessIn("mutable-type", data)
12717+            self.failUnlessEqual(data['mutable-type'], version)
12718+        d.addCallback(_got_json, "sdmf")
12719+        d.addCallback(lambda ignored:
12720+            self.POST("/uri?t=upload&mutable=true&mutable-type=mdmf",
12721+                      file=('mdmf.txt', self.NEWFILE_CONTENTS * 300000)))
12722+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12723+        d.addCallback(_got_json, "mdmf")
12724+        return d
12725+
12726+    def test_POST_upload_mutable_type(self):
12727+        d = self.POST(self.public_url + \
12728+                      "/foo?t=upload&mutable=true&mutable-type=sdmf",
12729+                      file=("sdmf.txt", self.NEWFILE_CONTENTS * 300000))
12730+        fn = self._foo_node
12731+        def _got_cap(filecap, filename):
12732+            filenameu = unicode(filename)
12733+            self.failUnlessURIMatchesRWChild(filecap, fn, filenameu)
12734+            return self.GET(self.public_url + "/foo/%s?t=json" % filename)
12735+        d.addCallback(_got_cap, "sdmf.txt")
12736+        def _got_json(json, version):
12737+            data = simplejson.loads(json)
12738+            data = data[1]
12739+
12740+            self.failUnlessIn("mutable-type", data)
12741+            self.failUnlessEqual(data['mutable-type'], version)
12742+        d.addCallback(_got_json, "sdmf")
12743+        d.addCallback(lambda ignored:
12744+            self.POST(self.public_url + \
12745+                      "/foo?t=upload&mutable=true&mutable-type=mdmf",
12746+                      file=("mdmf.txt", self.NEWFILE_CONTENTS * 300000)))
12747+        d.addCallback(_got_cap, "mdmf.txt")
12748+        d.addCallback(_got_json, "mdmf")
12749+        return d
12750+
12751     def test_POST_upload_mutable(self):
12752         # this creates a mutable file
12753         d = self.POST(self.public_url + "/foo", t="upload", mutable="true",
12754hunk ./src/allmydata/test/test_web.py 1985
12755             self.failUnlessReallyEqual(headers["content-type"], ["text/plain"])
12756         d.addCallback(_got_headers)
12757 
12758-        # make sure that size errors are displayed correctly for overwrite
12759-        d.addCallback(lambda res:
12760-                      self.shouldFail2(error.Error,
12761-                                       "test_POST_upload_mutable-toobig",
12762-                                       "413 Request Entity Too Large",
12763-                                       "SDMF is limited to one segment, and 10001 > 10000",
12764-                                       self.POST,
12765-                                       self.public_url + "/foo", t="upload",
12766-                                       mutable="true",
12767-                                       file=("new.txt",
12768-                                             "b" * (self.s.MUTABLE_SIZELIMIT+1)),
12769-                                       ))
12770-
12771+        # make sure that outdated size limits aren't enforced anymore.
12772+        d.addCallback(lambda ignored:
12773+            self.POST(self.public_url + "/foo", t="upload",
12774+                      mutable="true",
12775+                      file=("new.txt",
12776+                            "b" * (self.s.MUTABLE_SIZELIMIT+1))))
12777         d.addErrback(self.dump_error)
12778         return d
12779 
12780hunk ./src/allmydata/test/test_web.py 1995
12781     def test_POST_upload_mutable_toobig(self):
12782-        d = self.shouldFail2(error.Error,
12783-                             "test_POST_upload_mutable_toobig",
12784-                             "413 Request Entity Too Large",
12785-                             "SDMF is limited to one segment, and 10001 > 10000",
12786-                             self.POST,
12787-                             self.public_url + "/foo",
12788-                             t="upload", mutable="true",
12789-                             file=("new.txt",
12790-                                   "b" * (self.s.MUTABLE_SIZELIMIT+1)) )
12791+        # SDMF had a size limti that was removed a while ago. MDMF has
12792+        # never had a size limit. Test to make sure that we do not
12793+        # encounter errors when trying to upload large mutable files,
12794+        # since there should be no coded prohibitions regarding large
12795+        # mutable files.
12796+        d = self.POST(self.public_url + "/foo",
12797+                      t="upload", mutable="true",
12798+                      file=("new.txt", "b" * (self.s.MUTABLE_SIZELIMIT + 1)))
12799         return d
12800 
12801     def dump_error(self, f):
12802hunk ./src/allmydata/test/test_web.py 3005
12803                                                       contents))
12804         return d
12805 
12806+    def test_PUT_NEWFILEURL_mdmf(self):
12807+        new_contents = self.NEWFILE_CONTENTS * 300000
12808+        d = self.PUT(self.public_url + \
12809+                     "/foo/mdmf.txt?mutable=true&mutable-type=mdmf",
12810+                     new_contents)
12811+        d.addCallback(lambda ignored:
12812+            self.GET(self.public_url + "/foo/mdmf.txt?t=json"))
12813+        def _got_json(json):
12814+            data = simplejson.loads(json)
12815+            data = data[1]
12816+            self.failUnlessIn("mutable-type", data)
12817+            self.failUnlessEqual(data['mutable-type'], "mdmf")
12818+        d.addCallback(_got_json)
12819+        return d
12820+
12821+    def test_PUT_NEWFILEURL_sdmf(self):
12822+        new_contents = self.NEWFILE_CONTENTS * 300000
12823+        d = self.PUT(self.public_url + \
12824+                     "/foo/sdmf.txt?mutable=true&mutable-type=sdmf",
12825+                     new_contents)
12826+        d.addCallback(lambda ignored:
12827+            self.GET(self.public_url + "/foo/sdmf.txt?t=json"))
12828+        def _got_json(json):
12829+            data = simplejson.loads(json)
12830+            data = data[1]
12831+            self.failUnlessIn("mutable-type", data)
12832+            self.failUnlessEqual(data['mutable-type'], "sdmf")
12833+        d.addCallback(_got_json)
12834+        return d
12835+
12836     def test_PUT_NEWFILEURL_uri_replace(self):
12837         contents, n, new_uri = self.makefile(8)
12838         d = self.PUT(self.public_url + "/foo/bar.txt?t=uri", new_uri)
12839hunk ./src/allmydata/test/test_web.py 3156
12840         d.addCallback(_done)
12841         return d
12842 
12843+
12844+    def test_PUT_update_at_offset(self):
12845+        file_contents = "test file" * 100000 # about 900 KiB
12846+        d = self.PUT("/uri?mutable=true", file_contents)
12847+        def _then(filecap):
12848+            self.filecap = filecap
12849+            new_data = file_contents[:100]
12850+            new = "replaced and so on"
12851+            new_data += new
12852+            new_data += file_contents[len(new_data):]
12853+            assert len(new_data) == len(file_contents)
12854+            self.new_data = new_data
12855+        d.addCallback(_then)
12856+        d.addCallback(lambda ignored:
12857+            self.PUT("/uri/%s?replace=True&offset=100" % self.filecap,
12858+                     "replaced and so on"))
12859+        def _get_data(filecap):
12860+            n = self.s.create_node_from_uri(filecap)
12861+            return n.download_best_version()
12862+        d.addCallback(_get_data)
12863+        d.addCallback(lambda results:
12864+            self.failUnlessEqual(results, self.new_data))
12865+        # Now try appending things to the file
12866+        d.addCallback(lambda ignored:
12867+            self.PUT("/uri/%s?offset=%d" % (self.filecap, len(self.new_data)),
12868+                     "puppies" * 100))
12869+        d.addCallback(_get_data)
12870+        d.addCallback(lambda results:
12871+            self.failUnlessEqual(results, self.new_data + ("puppies" * 100)))
12872+        return d
12873+
12874+
12875+    def test_PUT_update_at_offset_immutable(self):
12876+        file_contents = "Test file" * 100000
12877+        d = self.PUT("/uri", file_contents)
12878+        def _then(filecap):
12879+            self.filecap = filecap
12880+        d.addCallback(_then)
12881+        d.addCallback(lambda ignored:
12882+            self.shouldHTTPError("test immutable update",
12883+                                 400, "Bad Request",
12884+                                 "immutable",
12885+                                 self.PUT,
12886+                                 "/uri/%s?offset=50" % self.filecap,
12887+                                 "foo"))
12888+        return d
12889+
12890+
12891     def test_bad_method(self):
12892         url = self.webish_url + self.public_url + "/foo/bar.txt"
12893         d = self.shouldHTTPError("test_bad_method",
12894hunk ./src/allmydata/test/test_web.py 3473
12895         def _stash_mutable_uri(n, which):
12896             self.uris[which] = n.get_uri()
12897             assert isinstance(self.uris[which], str)
12898-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"3"))
12899+        d.addCallback(lambda ign:
12900+            c0.create_mutable_file(publish.MutableData(DATA+"3")))
12901         d.addCallback(_stash_mutable_uri, "corrupt")
12902         d.addCallback(lambda ign:
12903                       c0.upload(upload.Data("literal", convergence="")))
12904hunk ./src/allmydata/test/test_web.py 3620
12905         def _stash_mutable_uri(n, which):
12906             self.uris[which] = n.get_uri()
12907             assert isinstance(self.uris[which], str)
12908-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"3"))
12909+        d.addCallback(lambda ign:
12910+            c0.create_mutable_file(publish.MutableData(DATA+"3")))
12911         d.addCallback(_stash_mutable_uri, "corrupt")
12912 
12913         def _compute_fileurls(ignored):
12914hunk ./src/allmydata/test/test_web.py 4283
12915         def _stash_mutable_uri(n, which):
12916             self.uris[which] = n.get_uri()
12917             assert isinstance(self.uris[which], str)
12918-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"2"))
12919+        d.addCallback(lambda ign:
12920+            c0.create_mutable_file(publish.MutableData(DATA+"2")))
12921         d.addCallback(_stash_mutable_uri, "mutable")
12922 
12923         def _compute_fileurls(ignored):
12924hunk ./src/allmydata/test/test_web.py 4383
12925                                                         convergence="")))
12926         d.addCallback(_stash_uri, "small")
12927 
12928-        d.addCallback(lambda ign: c0.create_mutable_file("mutable"))
12929+        d.addCallback(lambda ign:
12930+            c0.create_mutable_file(publish.MutableData("mutable")))
12931         d.addCallback(lambda fn: self.rootnode.set_node(u"mutable", fn))
12932         d.addCallback(_stash_uri, "mutable")
12933 
12934}
12935[resolve conflicts between 393-MDMF patches and trunk as of 1.8.2
12936"Brian Warner <warner@lothar.com>"**20110220230201
12937 Ignore-this: 9bbf5d26c994e8069202331dcb4cdd95
12938] {
12939merger 0.0 (
12940merger 0.0 (
12941merger 0.0 (
12942replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
12943merger 0.0 (
12944hunk ./docs/configuration.rst 384
12945-shares.needed = (int, optional) aka "k", default 3
12946-shares.total = (int, optional) aka "N", N >= k, default 10
12947-shares.happy = (int, optional) 1 <= happy <= N, default 7
12948-
12949- These three values set the default encoding parameters. Each time a new file
12950- is uploaded, erasure-coding is used to break the ciphertext into separate
12951- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
12952- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
12953- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
12954- Setting k to 1 is equivalent to simple replication (uploading N copies of
12955- the file).
12956-
12957- These values control the tradeoff between storage overhead, performance, and
12958- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
12959- backend storage space (the actual value will be a bit more, because of other
12960- forms of overhead). Up to N-k shares can be lost before the file becomes
12961- unrecoverable, so assuming there are at least N servers, up to N-k servers
12962- can be offline without losing the file. So large N/k ratios are more
12963- reliable, and small N/k ratios use less disk space. Clearly, k must never be
12964- smaller than N.
12965-
12966- Large values of N will slow down upload operations slightly, since more
12967- servers must be involved, and will slightly increase storage overhead due to
12968- the hash trees that are created. Large values of k will cause downloads to
12969- be marginally slower, because more servers must be involved. N cannot be
12970- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
12971- uses.
12972-
12973- shares.happy allows you control over the distribution of your immutable file.
12974- For a successful upload, shares are guaranteed to be initially placed on
12975- at least 'shares.happy' distinct servers, the correct functioning of any
12976- k of which is sufficient to guarantee the availability of the uploaded file.
12977- This value should not be larger than the number of servers on your grid.
12978-
12979- A value of shares.happy <= k is allowed, but does not provide any redundancy
12980- if some servers fail or lose shares.
12981-
12982- (Mutable files use a different share placement algorithm that does not
12983-  consider this parameter.)
12984-
12985-
12986-== Storage Server Configuration ==
12987-
12988-[storage]
12989-enabled = (boolean, optional)
12990-
12991- If this is True, the node will run a storage server, offering space to other
12992- clients. If it is False, the node will not run a storage server, meaning
12993- that no shares will be stored on this node. Use False this for clients who
12994- do not wish to provide storage service. The default value is True.
12995-
12996-readonly = (boolean, optional)
12997-
12998- If True, the node will run a storage server but will not accept any shares,
12999- making it effectively read-only. Use this for storage servers which are
13000- being decommissioned: the storage/ directory could be mounted read-only,
13001- while shares are moved to other servers. Note that this currently only
13002- affects immutable shares. Mutable shares (used for directories) will be
13003- written and modified anyway. See ticket #390 for the current status of this
13004- bug. The default value is False.
13005-
13006-reserved_space = (str, optional)
13007-
13008- If provided, this value defines how much disk space is reserved: the storage
13009- server will not accept any share which causes the amount of free disk space
13010- to drop below this value. (The free space is measured by a call to statvfs(2)
13011- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13012- user account under which the storage server runs.)
13013-
13014- This string contains a number, with an optional case-insensitive scale
13015- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13016- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13017- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13018-
13019-expire.enabled =
13020-expire.mode =
13021-expire.override_lease_duration =
13022-expire.cutoff_date =
13023-expire.immutable =
13024-expire.mutable =
13025-
13026- These settings control garbage-collection, in which the server will delete
13027- shares that no longer have an up-to-date lease on them. Please see the
13028- neighboring "garbage-collection.txt" document for full details.
13029-
13030-
13031-== Running A Helper ==
13032+Running A Helper
13033+================
13034hunk ./docs/configuration.rst 424
13035+mutable.format = sdmf or mdmf
13036+
13037+ This value tells Tahoe-LAFS what the default mutable file format should
13038+ be. If mutable.format=sdmf, then newly created mutable files will be in
13039+ the old SDMF format. This is desirable for clients that operate on
13040+ grids where some peers run older versions of Tahoe-LAFS, as these older
13041+ versions cannot read the new MDMF mutable file format. If
13042+ mutable.format = mdmf, then newly created mutable files will use the
13043+ new MDMF format, which supports efficient in-place modification and
13044+ streaming downloads. You can overwrite this value using a special
13045+ mutable-type parameter in the webapi. If you do not specify a value
13046+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
13047+
13048+ Note that this parameter only applies to mutable files. Mutable
13049+ directories, which are stored as mutable files, are not controlled by
13050+ this parameter and will always use SDMF. We may revisit this decision
13051+ in future versions of Tahoe-LAFS.
13052)
13053)
13054hunk ./docs/configuration.rst 324
13055+Frontend Configuration
13056+======================
13057+
13058+The Tahoe client process can run a variety of frontend file-access protocols.
13059+You will use these to create and retrieve files from the virtual filesystem.
13060+Configuration details for each are documented in the following
13061+protocol-specific guides:
13062+
13063+HTTP
13064+
13065+    Tahoe runs a webserver by default on port 3456. This interface provides a
13066+    human-oriented "WUI", with pages to create, modify, and browse
13067+    directories and files, as well as a number of pages to check on the
13068+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
13069+    with a REST-ful HTTP interface that can be used by other programs
13070+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
13071+    details, and the ``web.port`` and ``web.static`` config variables above.
13072+    The `<frontends/download-status.rst>`_ document also describes a few WUI
13073+    status pages.
13074+
13075+CLI
13076+
13077+    The main "bin/tahoe" executable includes subcommands for manipulating the
13078+    filesystem, uploading/downloading files, and creating/running Tahoe
13079+    nodes. See `<frontends/CLI.rst>`_ for details.
13080+
13081+FTP, SFTP
13082+
13083+    Tahoe can also run both FTP and SFTP servers, and map a username/password
13084+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
13085+    for instructions on configuring these services, and the ``[ftpd]`` and
13086+    ``[sftpd]`` sections of ``tahoe.cfg``.
13087+
13088)
13089hunk ./docs/configuration.rst 324
13090+``mutable.format = sdmf or mdmf``
13091+
13092+    This value tells Tahoe what the default mutable file format should
13093+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
13094+    in the old SDMF format. This is desirable for clients that operate on
13095+    grids where some peers run older versions of Tahoe, as these older
13096+    versions cannot read the new MDMF mutable file format. If
13097+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
13098+    the new MDMF format, which supports efficient in-place modification and
13099+    streaming downloads. You can overwrite this value using a special
13100+    mutable-type parameter in the webapi. If you do not specify a value here,
13101+    Tahoe will use SDMF for all newly-created mutable files.
13102+
13103+    Note that this parameter only applies to mutable files. Mutable
13104+    directories, which are stored as mutable files, are not controlled by
13105+    this parameter and will always use SDMF. We may revisit this decision
13106+    in future versions of Tahoe-LAFS.
13107+
13108)
13109merger 0.0 (
13110merger 0.0 (
13111hunk ./docs/configuration.rst 324
13112+``mutable.format = sdmf or mdmf``
13113+
13114+    This value tells Tahoe what the default mutable file format should
13115+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
13116+    in the old SDMF format. This is desirable for clients that operate on
13117+    grids where some peers run older versions of Tahoe, as these older
13118+    versions cannot read the new MDMF mutable file format. If
13119+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
13120+    the new MDMF format, which supports efficient in-place modification and
13121+    streaming downloads. You can overwrite this value using a special
13122+    mutable-type parameter in the webapi. If you do not specify a value here,
13123+    Tahoe will use SDMF for all newly-created mutable files.
13124+
13125+    Note that this parameter only applies to mutable files. Mutable
13126+    directories, which are stored as mutable files, are not controlled by
13127+    this parameter and will always use SDMF. We may revisit this decision
13128+    in future versions of Tahoe-LAFS.
13129+
13130merger 0.0 (
13131merger 0.0 (
13132replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
13133merger 0.0 (
13134hunk ./docs/configuration.rst 384
13135-shares.needed = (int, optional) aka "k", default 3
13136-shares.total = (int, optional) aka "N", N >= k, default 10
13137-shares.happy = (int, optional) 1 <= happy <= N, default 7
13138-
13139- These three values set the default encoding parameters. Each time a new file
13140- is uploaded, erasure-coding is used to break the ciphertext into separate
13141- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
13142- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
13143- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
13144- Setting k to 1 is equivalent to simple replication (uploading N copies of
13145- the file).
13146-
13147- These values control the tradeoff between storage overhead, performance, and
13148- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
13149- backend storage space (the actual value will be a bit more, because of other
13150- forms of overhead). Up to N-k shares can be lost before the file becomes
13151- unrecoverable, so assuming there are at least N servers, up to N-k servers
13152- can be offline without losing the file. So large N/k ratios are more
13153- reliable, and small N/k ratios use less disk space. Clearly, k must never be
13154- smaller than N.
13155-
13156- Large values of N will slow down upload operations slightly, since more
13157- servers must be involved, and will slightly increase storage overhead due to
13158- the hash trees that are created. Large values of k will cause downloads to
13159- be marginally slower, because more servers must be involved. N cannot be
13160- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
13161- uses.
13162-
13163- shares.happy allows you control over the distribution of your immutable file.
13164- For a successful upload, shares are guaranteed to be initially placed on
13165- at least 'shares.happy' distinct servers, the correct functioning of any
13166- k of which is sufficient to guarantee the availability of the uploaded file.
13167- This value should not be larger than the number of servers on your grid.
13168-
13169- A value of shares.happy <= k is allowed, but does not provide any redundancy
13170- if some servers fail or lose shares.
13171-
13172- (Mutable files use a different share placement algorithm that does not
13173-  consider this parameter.)
13174-
13175-
13176-== Storage Server Configuration ==
13177-
13178-[storage]
13179-enabled = (boolean, optional)
13180-
13181- If this is True, the node will run a storage server, offering space to other
13182- clients. If it is False, the node will not run a storage server, meaning
13183- that no shares will be stored on this node. Use False this for clients who
13184- do not wish to provide storage service. The default value is True.
13185-
13186-readonly = (boolean, optional)
13187-
13188- If True, the node will run a storage server but will not accept any shares,
13189- making it effectively read-only. Use this for storage servers which are
13190- being decommissioned: the storage/ directory could be mounted read-only,
13191- while shares are moved to other servers. Note that this currently only
13192- affects immutable shares. Mutable shares (used for directories) will be
13193- written and modified anyway. See ticket #390 for the current status of this
13194- bug. The default value is False.
13195-
13196-reserved_space = (str, optional)
13197-
13198- If provided, this value defines how much disk space is reserved: the storage
13199- server will not accept any share which causes the amount of free disk space
13200- to drop below this value. (The free space is measured by a call to statvfs(2)
13201- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13202- user account under which the storage server runs.)
13203-
13204- This string contains a number, with an optional case-insensitive scale
13205- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13206- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13207- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13208-
13209-expire.enabled =
13210-expire.mode =
13211-expire.override_lease_duration =
13212-expire.cutoff_date =
13213-expire.immutable =
13214-expire.mutable =
13215-
13216- These settings control garbage-collection, in which the server will delete
13217- shares that no longer have an up-to-date lease on them. Please see the
13218- neighboring "garbage-collection.txt" document for full details.
13219-
13220-
13221-== Running A Helper ==
13222+Running A Helper
13223+================
13224hunk ./docs/configuration.rst 424
13225+mutable.format = sdmf or mdmf
13226+
13227+ This value tells Tahoe-LAFS what the default mutable file format should
13228+ be. If mutable.format=sdmf, then newly created mutable files will be in
13229+ the old SDMF format. This is desirable for clients that operate on
13230+ grids where some peers run older versions of Tahoe-LAFS, as these older
13231+ versions cannot read the new MDMF mutable file format. If
13232+ mutable.format = mdmf, then newly created mutable files will use the
13233+ new MDMF format, which supports efficient in-place modification and
13234+ streaming downloads. You can overwrite this value using a special
13235+ mutable-type parameter in the webapi. If you do not specify a value
13236+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
13237+
13238+ Note that this parameter only applies to mutable files. Mutable
13239+ directories, which are stored as mutable files, are not controlled by
13240+ this parameter and will always use SDMF. We may revisit this decision
13241+ in future versions of Tahoe-LAFS.
13242)
13243)
13244hunk ./docs/configuration.rst 324
13245+Frontend Configuration
13246+======================
13247+
13248+The Tahoe client process can run a variety of frontend file-access protocols.
13249+You will use these to create and retrieve files from the virtual filesystem.
13250+Configuration details for each are documented in the following
13251+protocol-specific guides:
13252+
13253+HTTP
13254+
13255+    Tahoe runs a webserver by default on port 3456. This interface provides a
13256+    human-oriented "WUI", with pages to create, modify, and browse
13257+    directories and files, as well as a number of pages to check on the
13258+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
13259+    with a REST-ful HTTP interface that can be used by other programs
13260+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
13261+    details, and the ``web.port`` and ``web.static`` config variables above.
13262+    The `<frontends/download-status.rst>`_ document also describes a few WUI
13263+    status pages.
13264+
13265+CLI
13266+
13267+    The main "bin/tahoe" executable includes subcommands for manipulating the
13268+    filesystem, uploading/downloading files, and creating/running Tahoe
13269+    nodes. See `<frontends/CLI.rst>`_ for details.
13270+
13271+FTP, SFTP
13272+
13273+    Tahoe can also run both FTP and SFTP servers, and map a username/password
13274+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
13275+    for instructions on configuring these services, and the ``[ftpd]`` and
13276+    ``[sftpd]`` sections of ``tahoe.cfg``.
13277+
13278)
13279)
13280hunk ./docs/configuration.rst 402
13281-shares.needed = (int, optional) aka "k", default 3
13282-shares.total = (int, optional) aka "N", N >= k, default 10
13283-shares.happy = (int, optional) 1 <= happy <= N, default 7
13284-
13285- These three values set the default encoding parameters. Each time a new file
13286- is uploaded, erasure-coding is used to break the ciphertext into separate
13287- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
13288- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
13289- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
13290- Setting k to 1 is equivalent to simple replication (uploading N copies of
13291- the file).
13292-
13293- These values control the tradeoff between storage overhead, performance, and
13294- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
13295- backend storage space (the actual value will be a bit more, because of other
13296- forms of overhead). Up to N-k shares can be lost before the file becomes
13297- unrecoverable, so assuming there are at least N servers, up to N-k servers
13298- can be offline without losing the file. So large N/k ratios are more
13299- reliable, and small N/k ratios use less disk space. Clearly, k must never be
13300- smaller than N.
13301-
13302- Large values of N will slow down upload operations slightly, since more
13303- servers must be involved, and will slightly increase storage overhead due to
13304- the hash trees that are created. Large values of k will cause downloads to
13305- be marginally slower, because more servers must be involved. N cannot be
13306- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
13307- uses.
13308-
13309- shares.happy allows you control over the distribution of your immutable file.
13310- For a successful upload, shares are guaranteed to be initially placed on
13311- at least 'shares.happy' distinct servers, the correct functioning of any
13312- k of which is sufficient to guarantee the availability of the uploaded file.
13313- This value should not be larger than the number of servers on your grid.
13314-
13315- A value of shares.happy <= k is allowed, but does not provide any redundancy
13316- if some servers fail or lose shares.
13317-
13318- (Mutable files use a different share placement algorithm that does not
13319-  consider this parameter.)
13320-
13321-
13322-== Storage Server Configuration ==
13323-
13324-[storage]
13325-enabled = (boolean, optional)
13326-
13327- If this is True, the node will run a storage server, offering space to other
13328- clients. If it is False, the node will not run a storage server, meaning
13329- that no shares will be stored on this node. Use False this for clients who
13330- do not wish to provide storage service. The default value is True.
13331-
13332-readonly = (boolean, optional)
13333-
13334- If True, the node will run a storage server but will not accept any shares,
13335- making it effectively read-only. Use this for storage servers which are
13336- being decommissioned: the storage/ directory could be mounted read-only,
13337- while shares are moved to other servers. Note that this currently only
13338- affects immutable shares. Mutable shares (used for directories) will be
13339- written and modified anyway. See ticket #390 for the current status of this
13340- bug. The default value is False.
13341-
13342-reserved_space = (str, optional)
13343-
13344- If provided, this value defines how much disk space is reserved: the storage
13345- server will not accept any share which causes the amount of free disk space
13346- to drop below this value. (The free space is measured by a call to statvfs(2)
13347- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13348- user account under which the storage server runs.)
13349-
13350- This string contains a number, with an optional case-insensitive scale
13351- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13352- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13353- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13354-
13355-expire.enabled =
13356-expire.mode =
13357-expire.override_lease_duration =
13358-expire.cutoff_date =
13359-expire.immutable =
13360-expire.mutable =
13361-
13362- These settings control garbage-collection, in which the server will delete
13363- shares that no longer have an up-to-date lease on them. Please see the
13364- neighboring "garbage-collection.txt" document for full details.
13365-
13366-
13367-== Running A Helper ==
13368+Running A Helper
13369+================
13370)
13371merger 0.0 (
13372merger 0.0 (
13373hunk ./docs/configuration.rst 402
13374-shares.needed = (int, optional) aka "k", default 3
13375-shares.total = (int, optional) aka "N", N >= k, default 10
13376-shares.happy = (int, optional) 1 <= happy <= N, default 7
13377-
13378- These three values set the default encoding parameters. Each time a new file
13379- is uploaded, erasure-coding is used to break the ciphertext into separate
13380- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
13381- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
13382- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
13383- Setting k to 1 is equivalent to simple replication (uploading N copies of
13384- the file).
13385-
13386- These values control the tradeoff between storage overhead, performance, and
13387- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
13388- backend storage space (the actual value will be a bit more, because of other
13389- forms of overhead). Up to N-k shares can be lost before the file becomes
13390- unrecoverable, so assuming there are at least N servers, up to N-k servers
13391- can be offline without losing the file. So large N/k ratios are more
13392- reliable, and small N/k ratios use less disk space. Clearly, k must never be
13393- smaller than N.
13394-
13395- Large values of N will slow down upload operations slightly, since more
13396- servers must be involved, and will slightly increase storage overhead due to
13397- the hash trees that are created. Large values of k will cause downloads to
13398- be marginally slower, because more servers must be involved. N cannot be
13399- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
13400- uses.
13401-
13402- shares.happy allows you control over the distribution of your immutable file.
13403- For a successful upload, shares are guaranteed to be initially placed on
13404- at least 'shares.happy' distinct servers, the correct functioning of any
13405- k of which is sufficient to guarantee the availability of the uploaded file.
13406- This value should not be larger than the number of servers on your grid.
13407-
13408- A value of shares.happy <= k is allowed, but does not provide any redundancy
13409- if some servers fail or lose shares.
13410-
13411- (Mutable files use a different share placement algorithm that does not
13412-  consider this parameter.)
13413-
13414-
13415-== Storage Server Configuration ==
13416-
13417-[storage]
13418-enabled = (boolean, optional)
13419-
13420- If this is True, the node will run a storage server, offering space to other
13421- clients. If it is False, the node will not run a storage server, meaning
13422- that no shares will be stored on this node. Use False this for clients who
13423- do not wish to provide storage service. The default value is True.
13424-
13425-readonly = (boolean, optional)
13426-
13427- If True, the node will run a storage server but will not accept any shares,
13428- making it effectively read-only. Use this for storage servers which are
13429- being decommissioned: the storage/ directory could be mounted read-only,
13430- while shares are moved to other servers. Note that this currently only
13431- affects immutable shares. Mutable shares (used for directories) will be
13432- written and modified anyway. See ticket #390 for the current status of this
13433- bug. The default value is False.
13434-
13435-reserved_space = (str, optional)
13436-
13437- If provided, this value defines how much disk space is reserved: the storage
13438- server will not accept any share which causes the amount of free disk space
13439- to drop below this value. (The free space is measured by a call to statvfs(2)
13440- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13441- user account under which the storage server runs.)
13442-
13443- This string contains a number, with an optional case-insensitive scale
13444- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13445- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13446- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13447-
13448-expire.enabled =
13449-expire.mode =
13450-expire.override_lease_duration =
13451-expire.cutoff_date =
13452-expire.immutable =
13453-expire.mutable =
13454-
13455- These settings control garbage-collection, in which the server will delete
13456- shares that no longer have an up-to-date lease on them. Please see the
13457- neighboring "garbage-collection.txt" document for full details.
13458-
13459-
13460-== Running A Helper ==
13461+Running A Helper
13462+================
13463merger 0.0 (
13464hunk ./docs/configuration.rst 324
13465+``mutable.format = sdmf or mdmf``
13466+
13467+    This value tells Tahoe what the default mutable file format should
13468+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
13469+    in the old SDMF format. This is desirable for clients that operate on
13470+    grids where some peers run older versions of Tahoe, as these older
13471+    versions cannot read the new MDMF mutable file format. If
13472+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
13473+    the new MDMF format, which supports efficient in-place modification and
13474+    streaming downloads. You can overwrite this value using a special
13475+    mutable-type parameter in the webapi. If you do not specify a value here,
13476+    Tahoe will use SDMF for all newly-created mutable files.
13477+
13478+    Note that this parameter only applies to mutable files. Mutable
13479+    directories, which are stored as mutable files, are not controlled by
13480+    this parameter and will always use SDMF. We may revisit this decision
13481+    in future versions of Tahoe-LAFS.
13482+
13483merger 0.0 (
13484merger 0.0 (
13485replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
13486merger 0.0 (
13487hunk ./docs/configuration.rst 384
13488-shares.needed = (int, optional) aka "k", default 3
13489-shares.total = (int, optional) aka "N", N >= k, default 10
13490-shares.happy = (int, optional) 1 <= happy <= N, default 7
13491-
13492- These three values set the default encoding parameters. Each time a new file
13493- is uploaded, erasure-coding is used to break the ciphertext into separate
13494- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
13495- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
13496- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
13497- Setting k to 1 is equivalent to simple replication (uploading N copies of
13498- the file).
13499-
13500- These values control the tradeoff between storage overhead, performance, and
13501- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
13502- backend storage space (the actual value will be a bit more, because of other
13503- forms of overhead). Up to N-k shares can be lost before the file becomes
13504- unrecoverable, so assuming there are at least N servers, up to N-k servers
13505- can be offline without losing the file. So large N/k ratios are more
13506- reliable, and small N/k ratios use less disk space. Clearly, k must never be
13507- smaller than N.
13508-
13509- Large values of N will slow down upload operations slightly, since more
13510- servers must be involved, and will slightly increase storage overhead due to
13511- the hash trees that are created. Large values of k will cause downloads to
13512- be marginally slower, because more servers must be involved. N cannot be
13513- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
13514- uses.
13515-
13516- shares.happy allows you control over the distribution of your immutable file.
13517- For a successful upload, shares are guaranteed to be initially placed on
13518- at least 'shares.happy' distinct servers, the correct functioning of any
13519- k of which is sufficient to guarantee the availability of the uploaded file.
13520- This value should not be larger than the number of servers on your grid.
13521-
13522- A value of shares.happy <= k is allowed, but does not provide any redundancy
13523- if some servers fail or lose shares.
13524-
13525- (Mutable files use a different share placement algorithm that does not
13526-  consider this parameter.)
13527-
13528-
13529-== Storage Server Configuration ==
13530-
13531-[storage]
13532-enabled = (boolean, optional)
13533-
13534- If this is True, the node will run a storage server, offering space to other
13535- clients. If it is False, the node will not run a storage server, meaning
13536- that no shares will be stored on this node. Use False this for clients who
13537- do not wish to provide storage service. The default value is True.
13538-
13539-readonly = (boolean, optional)
13540-
13541- If True, the node will run a storage server but will not accept any shares,
13542- making it effectively read-only. Use this for storage servers which are
13543- being decommissioned: the storage/ directory could be mounted read-only,
13544- while shares are moved to other servers. Note that this currently only
13545- affects immutable shares. Mutable shares (used for directories) will be
13546- written and modified anyway. See ticket #390 for the current status of this
13547- bug. The default value is False.
13548-
13549-reserved_space = (str, optional)
13550-
13551- If provided, this value defines how much disk space is reserved: the storage
13552- server will not accept any share which causes the amount of free disk space
13553- to drop below this value. (The free space is measured by a call to statvfs(2)
13554- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13555- user account under which the storage server runs.)
13556-
13557- This string contains a number, with an optional case-insensitive scale
13558- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13559- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13560- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13561-
13562-expire.enabled =
13563-expire.mode =
13564-expire.override_lease_duration =
13565-expire.cutoff_date =
13566-expire.immutable =
13567-expire.mutable =
13568-
13569- These settings control garbage-collection, in which the server will delete
13570- shares that no longer have an up-to-date lease on them. Please see the
13571- neighboring "garbage-collection.txt" document for full details.
13572-
13573-
13574-== Running A Helper ==
13575+Running A Helper
13576+================
13577hunk ./docs/configuration.rst 424
13578+mutable.format = sdmf or mdmf
13579+
13580+ This value tells Tahoe-LAFS what the default mutable file format should
13581+ be. If mutable.format=sdmf, then newly created mutable files will be in
13582+ the old SDMF format. This is desirable for clients that operate on
13583+ grids where some peers run older versions of Tahoe-LAFS, as these older
13584+ versions cannot read the new MDMF mutable file format. If
13585+ mutable.format = mdmf, then newly created mutable files will use the
13586+ new MDMF format, which supports efficient in-place modification and
13587+ streaming downloads. You can overwrite this value using a special
13588+ mutable-type parameter in the webapi. If you do not specify a value
13589+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
13590+
13591+ Note that this parameter only applies to mutable files. Mutable
13592+ directories, which are stored as mutable files, are not controlled by
13593+ this parameter and will always use SDMF. We may revisit this decision
13594+ in future versions of Tahoe-LAFS.
13595)
13596)
13597hunk ./docs/configuration.rst 324
13598+Frontend Configuration
13599+======================
13600+
13601+The Tahoe client process can run a variety of frontend file-access protocols.
13602+You will use these to create and retrieve files from the virtual filesystem.
13603+Configuration details for each are documented in the following
13604+protocol-specific guides:
13605+
13606+HTTP
13607+
13608+    Tahoe runs a webserver by default on port 3456. This interface provides a
13609+    human-oriented "WUI", with pages to create, modify, and browse
13610+    directories and files, as well as a number of pages to check on the
13611+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
13612+    with a REST-ful HTTP interface that can be used by other programs
13613+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
13614+    details, and the ``web.port`` and ``web.static`` config variables above.
13615+    The `<frontends/download-status.rst>`_ document also describes a few WUI
13616+    status pages.
13617+
13618+CLI
13619+
13620+    The main "bin/tahoe" executable includes subcommands for manipulating the
13621+    filesystem, uploading/downloading files, and creating/running Tahoe
13622+    nodes. See `<frontends/CLI.rst>`_ for details.
13623+
13624+FTP, SFTP
13625+
13626+    Tahoe can also run both FTP and SFTP servers, and map a username/password
13627+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
13628+    for instructions on configuring these services, and the ``[ftpd]`` and
13629+    ``[sftpd]`` sections of ``tahoe.cfg``.
13630+
13631)
13632)
13633)
13634replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
13635)
13636hunk ./src/allmydata/mutable/retrieve.py 7
13637 from zope.interface import implements
13638 from twisted.internet import defer
13639 from twisted.python import failure
13640-from foolscap.api import DeadReferenceError, eventually, fireEventually
13641-from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError
13642-from allmydata.util import hashutil, idlib, log
13643+from twisted.internet.interfaces import IPushProducer, IConsumer
13644+from foolscap.api import eventually, fireEventually
13645+from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError, \
13646+                                 MDMF_VERSION, SDMF_VERSION
13647+from allmydata.util import hashutil, log, mathutil
13648+from allmydata.util.dictutil import DictOfSets
13649 from allmydata import hashtree, codec
13650 from allmydata.storage.server import si_b2a
13651 from pycryptopp.cipher.aes import AES
13652hunk ./src/allmydata/mutable/retrieve.py 239
13653             # KiB, so we ask for that much.
13654             # TODO: Change the cache methods to allow us to fetch all of the
13655             # data that they have, then change this method to do that.
13656-            any_cache, timestamp = self._node._read_from_cache(self.verinfo,
13657-                                                               shnum,
13658-                                                               0,
13659-                                                               1000)
13660+            any_cache = self._node._read_from_cache(self.verinfo, shnum,
13661+                                                    0, 1000)
13662             ss = self.servermap.connections[peerid]
13663             reader = MDMFSlotReadProxy(ss,
13664                                        self._storage_index,
13665hunk ./src/allmydata/mutable/retrieve.py 373
13666                  (k, n, self._num_segments, self._segment_size,
13667                   self._tail_segment_size))
13668 
13669-        # ask the cache first
13670-        got_from_cache = False
13671-        datavs = []
13672-        for (offset, length) in readv:
13673-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
13674-                                                            offset, length)
13675-            if data is not None:
13676-                datavs.append(data)
13677-        if len(datavs) == len(readv):
13678-            self.log("got data from cache")
13679-            got_from_cache = True
13680-            d = fireEventually({shnum: datavs})
13681-            # datavs is a dict mapping shnum to a pair of strings
13682+        for i in xrange(self._total_shares):
13683+            # So we don't have to do this later.
13684+            self._block_hash_trees[i] = hashtree.IncompleteHashTree(self._num_segments)
13685+
13686+        # Our last task is to tell the downloader where to start and
13687+        # where to stop. We use three parameters for that:
13688+        #   - self._start_segment: the segment that we need to start
13689+        #     downloading from.
13690+        #   - self._current_segment: the next segment that we need to
13691+        #     download.
13692+        #   - self._last_segment: The last segment that we were asked to
13693+        #     download.
13694+        #
13695+        #  We say that the download is complete when
13696+        #  self._current_segment > self._last_segment. We use
13697+        #  self._start_segment and self._last_segment to know when to
13698+        #  strip things off of segments, and how much to strip.
13699+        if self._offset:
13700+            self.log("got offset: %d" % self._offset)
13701+            # our start segment is the first segment containing the
13702+            # offset we were given.
13703+            start = mathutil.div_ceil(self._offset,
13704+                                      self._segment_size)
13705+            # this gets us the first segment after self._offset. Then
13706+            # our start segment is the one before it.
13707+            start -= 1
13708+
13709+            assert start < self._num_segments
13710+            self._start_segment = start
13711+            self.log("got start segment: %d" % self._start_segment)
13712         else:
13713             self._start_segment = 0
13714 
13715hunk ./src/allmydata/mutable/servermap.py 7
13716 from itertools import count
13717 from twisted.internet import defer
13718 from twisted.python import failure
13719-from foolscap.api import DeadReferenceError, RemoteException, eventually
13720-from allmydata.util import base32, hashutil, idlib, log
13721+from foolscap.api import DeadReferenceError, RemoteException, eventually, \
13722+                         fireEventually
13723+from allmydata.util import base32, hashutil, idlib, log, deferredutil
13724+from allmydata.util.dictutil import DictOfSets
13725 from allmydata.storage.server import si_b2a
13726 from allmydata.interfaces import IServermapUpdaterStatus
13727 from pycryptopp.publickey import rsa
13728hunk ./src/allmydata/mutable/servermap.py 16
13729 
13730 from allmydata.mutable.common import MODE_CHECK, MODE_ANYTHING, MODE_WRITE, MODE_READ, \
13731-     DictOfSets, CorruptShareError, NeedMoreDataError
13732-from allmydata.mutable.layout import unpack_prefix_and_signature, unpack_header, unpack_share, \
13733-     SIGNED_PREFIX_LENGTH
13734+     CorruptShareError
13735+from allmydata.mutable.layout import SIGNED_PREFIX_LENGTH, MDMFSlotReadProxy
13736 
13737 class UpdateStatus:
13738     implements(IServermapUpdaterStatus)
13739hunk ./src/allmydata/mutable/servermap.py 391
13740         #  * if we need the encrypted private key, we want [-1216ish:]
13741         #   * but we can't read from negative offsets
13742         #   * the offset table tells us the 'ish', also the positive offset
13743-        # A future version of the SMDF slot format should consider using
13744-        # fixed-size slots so we can retrieve less data. For now, we'll just
13745-        # read 2000 bytes, which also happens to read enough actual data to
13746-        # pre-fetch a 9-entry dirnode.
13747+        # MDMF:
13748+        #  * Checkstring? [0:72]
13749+        #  * If we want to validate the checkstring, then [0:72], [143:?] --
13750+        #    the offset table will tell us for sure.
13751+        #  * If we need the verification key, we have to consult the offset
13752+        #    table as well.
13753+        # At this point, we don't know which we are. Our filenode can
13754+        # tell us, but it might be lying -- in some cases, we're
13755+        # responsible for telling it which kind of file it is.
13756         self._read_size = 4000
13757         if mode == MODE_CHECK:
13758             # we use unpack_prefix_and_signature, so we need 1k
13759hunk ./src/allmydata/mutable/servermap.py 633
13760         updated.
13761         """
13762         if verinfo:
13763-            self._node._add_to_cache(verinfo, shnum, 0, data, now)
13764+            self._node._add_to_cache(verinfo, shnum, 0, data)
13765 
13766 
13767     def _got_results(self, datavs, peerid, readsize, stuff, started):
13768hunk ./src/allmydata/mutable/servermap.py 664
13769 
13770         for shnum,datav in datavs.items():
13771             data = datav[0]
13772-            try:
13773-                verinfo = self._got_results_one_share(shnum, data, peerid, lp)
13774-                last_verinfo = verinfo
13775-                last_shnum = shnum
13776-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
13777-            except CorruptShareError, e:
13778-                # log it and give the other shares a chance to be processed
13779-                f = failure.Failure()
13780-                self.log(format="bad share: %(f_value)s", f_value=str(f.value),
13781-                         failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
13782-                self.notify_server_corruption(peerid, shnum, str(e))
13783-                self._bad_peers.add(peerid)
13784-                self._last_failure = f
13785-                checkstring = data[:SIGNED_PREFIX_LENGTH]
13786-                self._servermap.mark_bad_share(peerid, shnum, checkstring)
13787-                self._servermap.problems.append(f)
13788-                pass
13789+            reader = MDMFSlotReadProxy(ss,
13790+                                       storage_index,
13791+                                       shnum,
13792+                                       data)
13793+            self._readers.setdefault(peerid, dict())[shnum] = reader
13794+            # our goal, with each response, is to validate the version
13795+            # information and share data as best we can at this point --
13796+            # we do this by validating the signature. To do this, we
13797+            # need to do the following:
13798+            #   - If we don't already have the public key, fetch the
13799+            #     public key. We use this to validate the signature.
13800+            if not self._node.get_pubkey():
13801+                # fetch and set the public key.
13802+                d = reader.get_verification_key(queue=True)
13803+                d.addCallback(lambda results, shnum=shnum, peerid=peerid:
13804+                    self._try_to_set_pubkey(results, peerid, shnum, lp))
13805+                # XXX: Make self._pubkey_query_failed?
13806+                d.addErrback(lambda error, shnum=shnum, peerid=peerid:
13807+                    self._got_corrupt_share(error, shnum, peerid, data, lp))
13808+            else:
13809+                # we already have the public key.
13810+                d = defer.succeed(None)
13811 
13812             # Neither of these two branches return anything of
13813             # consequence, so the first entry in our deferredlist will
13814hunk ./src/allmydata/test/test_storage.py 1
13815-import time, os.path, platform, stat, re, simplejson, struct
13816+import time, os.path, platform, stat, re, simplejson, struct, shutil
13817 
13818hunk ./src/allmydata/test/test_storage.py 3
13819-import time, os.path, stat, re, simplejson, struct
13820+import mock
13821 
13822 from twisted.trial import unittest
13823 
13824}
13825[mutable/filenode.py: fix create_mutable_file('string')
13826"Brian Warner <warner@lothar.com>"**20110221014659
13827 Ignore-this: dc6bdad761089f0199681eeb784f1001
13828] hunk ./src/allmydata/mutable/filenode.py 137
13829         if contents is None:
13830             return MutableData("")
13831 
13832+        if isinstance(contents, str):
13833+            return MutableData(contents)
13834+
13835         if IMutableUploadable.providedBy(contents):
13836             return contents
13837 
13838[resolve more conflicts with current trunk
13839"Brian Warner <warner@lothar.com>"**20110221055600
13840 Ignore-this: 77ad038a478dbf5d9b34f7a68159a3e0
13841] hunk ./src/allmydata/mutable/servermap.py 461
13842         self._queries_completed = 0
13843 
13844         sb = self._storage_broker
13845-        full_peerlist = sb.get_servers_for_index(self._storage_index)
13846+        # All of the peers, permuted by the storage index, as usual.
13847+        full_peerlist = [(s.get_serverid(), s.get_rref())
13848+                         for s in sb.get_servers_for_psi(self._storage_index)]
13849         self.full_peerlist = full_peerlist # for use later, immutable
13850         self.extra_peers = full_peerlist[:] # peers are removed as we use them
13851         self._good_peers = set() # peers who had some shares
13852[update MDMF code with StorageFarmBroker changes
13853"Brian Warner <warner@lothar.com>"**20110221061004
13854 Ignore-this: a693b201d31125b391cebe0412ddd027
13855] {
13856hunk ./src/allmydata/mutable/publish.py 203
13857         self._encprivkey = self._node.get_encprivkey()
13858 
13859         sb = self._storage_broker
13860-        full_peerlist = sb.get_servers_for_index(self._storage_index)
13861+        full_peerlist = [(s.get_serverid(), s.get_rref())
13862+                         for s in sb.get_servers_for_psi(self._storage_index)]
13863         self.full_peerlist = full_peerlist # for use later, immutable
13864         self.bad_peers = set() # peerids who have errbacked/refused requests
13865 
13866hunk ./src/allmydata/test/test_mutable.py 2538
13867             # for either a block and salt or for hashes, either of which
13868             # will exercise the error handling code.
13869             killer = FirstServerGetsKilled()
13870-            for (serverid, ss) in nm.storage_broker.get_all_servers():
13871-                ss.post_call_notifier = killer.notify
13872+            for s in nm.storage_broker.get_connected_servers():
13873+                s.get_rref().post_call_notifier = killer.notify
13874             ver = servermap.best_recoverable_version()
13875             assert ver
13876             return self._node.download_version(servermap, ver)
13877}
13878[mutable/filenode: Clean up servermap handling in MutableFileVersion
13879Kevan Carstensen <kevan@isnotajoke.com>**20110226010433
13880 Ignore-this: 2257c9f65502098789f5ea355b94f130
13881 
13882 We want to update the servermap before attempting to modify a file,
13883 which we now do. This introduced code duplication, which was addressed
13884 by refactoring the servermap update into its own method, and then
13885 eliminating duplicate servermap updates throughout the
13886 MutableFileVersion.
13887] {
13888hunk ./src/allmydata/mutable/filenode.py 19
13889 from allmydata.mutable.publish import Publish, MutableData,\
13890                                       DEFAULT_MAX_SEGMENT_SIZE, \
13891                                       TransformingUploadable
13892-from allmydata.mutable.common import MODE_READ, MODE_WRITE, UnrecoverableFileError, \
13893+from allmydata.mutable.common import MODE_READ, MODE_WRITE, MODE_CHECK, UnrecoverableFileError, \
13894      ResponseCache, UncoordinatedWriteError
13895 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
13896 from allmydata.mutable.retrieve import Retrieve
13897hunk ./src/allmydata/mutable/filenode.py 807
13898         a little bit.
13899         """
13900         log.msg("doing modify")
13901-        d = self._modify_once(modifier, first_time)
13902+        if first_time:
13903+            d = self._update_servermap()
13904+        else:
13905+            # We ran into trouble; do MODE_CHECK so we're a little more
13906+            # careful on subsequent tries.
13907+            d = self._update_servermap(mode=MODE_CHECK)
13908+
13909+        d.addCallback(lambda ignored:
13910+            self._modify_once(modifier, first_time))
13911         def _retry(f):
13912             f.trap(UncoordinatedWriteError)
13913hunk ./src/allmydata/mutable/filenode.py 818
13914+            # Uh oh, it broke. We're allowed to trust the servermap for our
13915+            # first try, but after that we need to update it. It's
13916+            # possible that we've failed due to a race with another
13917+            # uploader, and if the race is to converge correctly, we
13918+            # need to know about that upload.
13919             d2 = defer.maybeDeferred(backoffer, self, f)
13920             d2.addCallback(lambda ignored:
13921                            self._modify_and_retry(modifier,
13922hunk ./src/allmydata/mutable/filenode.py 837
13923         I attempt to apply a modifier to the contents of the mutable
13924         file.
13925         """
13926-        # XXX: This is wrong -- we could get more servers if we updated
13927-        # in MODE_ANYTHING and possibly MODE_CHECK. Probably we want to
13928-        # assert that the last update wasn't MODE_READ
13929-        assert self._servermap.last_update_mode == MODE_WRITE
13930+        assert self._servermap.last_update_mode != MODE_READ
13931 
13932         # download_to_data is serialized, so we have to call this to
13933         # avoid deadlock.
13934hunk ./src/allmydata/mutable/filenode.py 1076
13935 
13936         # Now ask for the servermap to be updated in MODE_WRITE with
13937         # this update range.
13938-        u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13939-                             self._servermap,
13940-                             mode=MODE_WRITE,
13941-                             update_range=(start_segment, end_segment))
13942-        return u.update()
13943+        return self._update_servermap(update_range=(start_segment,
13944+                                                    end_segment))
13945 
13946 
13947     def _decode_and_decrypt_segments(self, ignored, data, offset):
13948hunk ./src/allmydata/mutable/filenode.py 1135
13949                                    segments_and_bht[1])
13950         p = Publish(self._node, self._storage_broker, self._servermap)
13951         return p.update(u, offset, segments_and_bht[2], self._version)
13952+
13953+
13954+    def _update_servermap(self, mode=MODE_WRITE, update_range=None):
13955+        """
13956+        I update the servermap. I return a Deferred that fires when the
13957+        servermap update is done.
13958+        """
13959+        if update_range:
13960+            u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13961+                                 self._servermap,
13962+                                 mode=mode,
13963+                                 update_range=update_range)
13964+        else:
13965+            u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13966+                                 self._servermap,
13967+                                 mode=mode)
13968+        return u.update()
13969}
13970[web: Use the string "replace" to trigger whole-file replacement when processing an offset parameter.
13971Kevan Carstensen <kevan@isnotajoke.com>**20110227231643
13972 Ignore-this: 5bbf0b90d68efe20d4c531bb98a8321a
13973] {
13974hunk ./docs/frontends/webapi.rst 360
13975  To use the /uri/$FILECAP form, $FILECAP must be a write-cap for a mutable file.
13976 
13977  In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
13978- writeable mutable file, that file's contents will be overwritten in-place. If
13979- it is a read-cap for a mutable file, an error will occur. If it is an
13980- immutable file, the old file will be discarded, and a new one will be put in
13981- its place. If the target file is a writable mutable file, you may also
13982- specify an "offset" parameter -- a byte offset that determines where in
13983- the mutable file the data from the HTTP request body is placed. This
13984- operation is relatively efficient for MDMF mutable files, and is
13985- relatively inefficient (but still supported) for SDMF mutable files.
13986+ writeable mutable file, that file's contents will be overwritten
13987+ in-place. If it is a read-cap for a mutable file, an error will occur.
13988+ If it is an immutable file, the old file will be discarded, and a new
13989+ one will be put in its place. If the target file is a writable mutable
13990+ file, you may also specify an "offset" parameter -- a byte offset that
13991+ determines where in the mutable file the data from the HTTP request
13992+ body is placed. This operation is relatively efficient for MDMF mutable
13993+ files, and is relatively inefficient (but still supported) for SDMF
13994+ mutable files. If no offset parameter is specified, then the entire
13995+ file is replaced with the data from the HTTP request body. For an
13996+ immutable file, the "offset" parameter is not valid.
13997 
13998  When creating a new file, if "mutable=true" is in the query arguments, the
13999  operation will create a mutable file instead of an immutable one.
14000hunk ./src/allmydata/test/test_web.py 3187
14001             self.failUnlessEqual(results, self.new_data + ("puppies" * 100)))
14002         return d
14003 
14004+    def test_PUT_update_at_invalid_offset(self):
14005+        file_contents = "test file" * 100000 # about 900 KiB
14006+        d = self.PUT("/uri?mutable=true", file_contents)
14007+        def _then(filecap):
14008+            self.filecap = filecap
14009+        d.addCallback(_then)
14010+        # Negative offsets should cause an error.
14011+        d.addCallback(lambda ignored:
14012+            self.shouldHTTPError("test mutable invalid offset negative",
14013+                                 400, "Bad Request",
14014+                                 "Invalid offset",
14015+                                 self.PUT,
14016+                                 "/uri/%s?offset=-1" % self.filecap,
14017+                                 "foo"))
14018+        return d
14019 
14020     def test_PUT_update_at_offset_immutable(self):
14021         file_contents = "Test file" * 100000
14022hunk ./src/allmydata/web/common.py 55
14023     # message? Since this call is going to be used by programmers and
14024     # their tools rather than users (through the wui), it is not
14025     # inconsistent to return that, I guess.
14026-    offset = int(offset)
14027-    return offset
14028+    return int(offset)
14029 
14030 
14031 def get_root(ctx_or_req):
14032hunk ./src/allmydata/web/filenode.py 219
14033         req = IRequest(ctx)
14034         t = get_arg(req, "t", "").strip()
14035         replace = parse_replace_arg(get_arg(req, "replace", "true"))
14036-        offset = parse_offset_arg(get_arg(req, "offset", -1))
14037+        offset = parse_offset_arg(get_arg(req, "offset", False))
14038 
14039         if not t:
14040hunk ./src/allmydata/web/filenode.py 222
14041-            if self.node.is_mutable() and offset >= 0:
14042-                return self.update_my_contents(req, offset)
14043-
14044-            elif self.node.is_mutable():
14045-                return self.replace_my_contents(req)
14046             if not replace:
14047                 # this is the early trap: if someone else modifies the
14048                 # directory while we're uploading, the add_file(overwrite=)
14049hunk ./src/allmydata/web/filenode.py 227
14050                 # call in replace_me_with_a_child will do the late trap.
14051                 raise ExistingChildError()
14052-            if offset >= 0:
14053-                raise WebError("PUT to a file: append operation invoked "
14054-                               "on an immutable cap")
14055 
14056hunk ./src/allmydata/web/filenode.py 228
14057+            if self.node.is_mutable():
14058+                if offset == False:
14059+                    return self.replace_my_contents(req)
14060+
14061+                if offset >= 0:
14062+                    return self.update_my_contents(req, offset)
14063+
14064+                raise WebError("PUT to a mutable file: Invalid offset")
14065+
14066+            else:
14067+                if offset != False:
14068+                    raise WebError("PUT to a file: append operation invoked "
14069+                                   "on an immutable cap")
14070+
14071+                assert self.parentnode and self.name
14072+                return self.replace_me_with_a_child(req, self.client, replace)
14073 
14074hunk ./src/allmydata/web/filenode.py 245
14075-            assert self.parentnode and self.name
14076-            return self.replace_me_with_a_child(req, self.client, replace)
14077         if t == "uri":
14078             if not replace:
14079                 raise ExistingChildError()
14080}
14081[docs/configuration.rst: fix more conflicts between #393 and trunk
14082Kevan Carstensen <kevan@isnotajoke.com>**20110228003426
14083 Ignore-this: 7917effdeecab00d634a06f1df8fe2cf
14084] {
14085replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
14086hunk ./docs/configuration.rst 324
14087     (Mutable files use a different share placement algorithm that does not
14088     currently consider this parameter.)
14089 
14090+``mutable.format = sdmf or mdmf``
14091+
14092+    This value tells Tahoe-LAFS what the default mutable file format should
14093+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
14094+    in the old SDMF format. This is desirable for clients that operate on
14095+    grids where some peers run older versions of Tahoe-LAFS, as these older
14096+    versions cannot read the new MDMF mutable file format. If
14097+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
14098+    the new MDMF format, which supports efficient in-place modification and
14099+    streaming downloads. You can overwrite this value using a special
14100+    mutable-type parameter in the webapi. If you do not specify a value here,
14101+    Tahoe-LAFS will use SDMF for all newly-created mutable files.
14102+
14103+    Note that this parameter only applies to mutable files. Mutable
14104+    directories, which are stored as mutable files, are not controlled by
14105+    this parameter and will always use SDMF. We may revisit this decision
14106+    in future versions of Tahoe-LAFS.
14107+
14108+
14109+Frontend Configuration
14110+======================
14111+
14112+The Tahoe client process can run a variety of frontend file-access protocols.
14113+You will use these to create and retrieve files from the virtual filesystem.
14114+Configuration details for each are documented in the following
14115+protocol-specific guides:
14116+
14117+HTTP
14118+
14119+    Tahoe runs a webserver by default on port 3456. This interface provides a
14120+    human-oriented "WUI", with pages to create, modify, and browse
14121+    directories and files, as well as a number of pages to check on the
14122+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
14123+    with a REST-ful HTTP interface that can be used by other programs
14124+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
14125+    details, and the ``web.port`` and ``web.static`` config variables above.
14126+    The `<frontends/download-status.rst>`_ document also describes a few WUI
14127+    status pages.
14128+
14129+CLI
14130+
14131+    The main "bin/tahoe" executable includes subcommands for manipulating the
14132+    filesystem, uploading/downloading files, and creating/running Tahoe
14133+    nodes. See `<frontends/CLI.rst>`_ for details.
14134+
14135+FTP, SFTP
14136+
14137+    Tahoe can also run both FTP and SFTP servers, and map a username/password
14138+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
14139+    for instructions on configuring these services, and the ``[ftpd]`` and
14140+    ``[sftpd]`` sections of ``tahoe.cfg``.
14141+
14142 
14143 Storage Server Configuration
14144 ============================
14145hunk ./docs/configuration.rst 436
14146     `<garbage-collection.rst>`_ for full details.
14147 
14148 
14149-shares.needed = (int, optional) aka "k", default 3
14150-shares.total = (int, optional) aka "N", N >= k, default 10
14151-shares.happy = (int, optional) 1 <= happy <= N, default 7
14152-
14153- These three values set the default encoding parameters. Each time a new file
14154- is uploaded, erasure-coding is used to break the ciphertext into separate
14155- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
14156- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
14157- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
14158- Setting k to 1 is equivalent to simple replication (uploading N copies of
14159- the file).
14160-
14161- These values control the tradeoff between storage overhead, performance, and
14162- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
14163- backend storage space (the actual value will be a bit more, because of other
14164- forms of overhead). Up to N-k shares can be lost before the file becomes
14165- unrecoverable, so assuming there are at least N servers, up to N-k servers
14166- can be offline without losing the file. So large N/k ratios are more
14167- reliable, and small N/k ratios use less disk space. Clearly, k must never be
14168- smaller than N.
14169-
14170- Large values of N will slow down upload operations slightly, since more
14171- servers must be involved, and will slightly increase storage overhead due to
14172- the hash trees that are created. Large values of k will cause downloads to
14173- be marginally slower, because more servers must be involved. N cannot be
14174- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe-LAFS
14175- uses.
14176-
14177- shares.happy allows you control over the distribution of your immutable file.
14178- For a successful upload, shares are guaranteed to be initially placed on
14179- at least 'shares.happy' distinct servers, the correct functioning of any
14180- k of which is sufficient to guarantee the availability of the uploaded file.
14181- This value should not be larger than the number of servers on your grid.
14182-
14183- A value of shares.happy <= k is allowed, but does not provide any redundancy
14184- if some servers fail or lose shares.
14185-
14186- (Mutable files use a different share placement algorithm that does not
14187-  consider this parameter.)
14188-
14189-
14190-== Storage Server Configuration ==
14191-
14192-[storage]
14193-enabled = (boolean, optional)
14194-
14195- If this is True, the node will run a storage server, offering space to other
14196- clients. If it is False, the node will not run a storage server, meaning
14197- that no shares will be stored on this node. Use False this for clients who
14198- do not wish to provide storage service. The default value is True.
14199-
14200-readonly = (boolean, optional)
14201-
14202- If True, the node will run a storage server but will not accept any shares,
14203- making it effectively read-only. Use this for storage servers which are
14204- being decommissioned: the storage/ directory could be mounted read-only,
14205- while shares are moved to other servers. Note that this currently only
14206- affects immutable shares. Mutable shares (used for directories) will be
14207- written and modified anyway. See ticket #390 for the current status of this
14208- bug. The default value is False.
14209-
14210-reserved_space = (str, optional)
14211-
14212- If provided, this value defines how much disk space is reserved: the storage
14213- server will not accept any share which causes the amount of free disk space
14214- to drop below this value. (The free space is measured by a call to statvfs(2)
14215- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
14216- user account under which the storage server runs.)
14217-
14218- This string contains a number, with an optional case-insensitive scale
14219- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
14220- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
14221- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
14222-
14223-expire.enabled =
14224-expire.mode =
14225-expire.override_lease_duration =
14226-expire.cutoff_date =
14227-expire.immutable =
14228-expire.mutable =
14229-
14230- These settings control garbage-collection, in which the server will delete
14231- shares that no longer have an up-to-date lease on them. Please see the
14232- neighboring "garbage-collection.txt" document for full details.
14233-
14234-
14235-== Running A Helper ==
14236+Running A Helper
14237+================
14238 
14239 A "helper" is a regular client node that also offers the "upload helper"
14240 service.
14241}
14242[mutable/layout: remove references to the salt hash tree.
14243Kevan Carstensen <kevan@isnotajoke.com>**20110228010637
14244 Ignore-this: b3b2963ba4d0b42c78b6bba219d4deb5
14245] {
14246hunk ./src/allmydata/mutable/layout.py 577
14247     # 99          8           The offset of the EOF
14248     #
14249     # followed by salts and share data, the encrypted private key, the
14250-    # block hash tree, the salt hash tree, the share hash chain, a
14251-    # signature over the first eight fields, and a verification key.
14252+    # block hash tree, the share hash chain, a signature over the first
14253+    # eight fields, and a verification key.
14254     #
14255     # The checkstring is the first three fields -- the version number,
14256     # sequence number, root hash and root salt hash. This is consistent
14257hunk ./src/allmydata/mutable/layout.py 628
14258     #      calculate the offset for the share hash chain, and fill that
14259     #      into the offsets table.
14260     #
14261-    #   4: At the same time, we're in a position to upload the salt hash
14262-    #      tree. This is a Merkle tree over all of the salts. We use a
14263-    #      Merkle tree so that we can validate each block,salt pair as
14264-    #      we download them later. We do this using
14265-    #
14266-    #        put_salthashes(salt_hash_tree)
14267-    #
14268-    #      When you do this, I automatically put the root of the tree
14269-    #      (the hash at index 0 of the list) in its appropriate slot in
14270-    #      the signed prefix of the share.
14271-    #
14272-    #   5: We're now in a position to upload the share hash chain for
14273+    #   4: We're now in a position to upload the share hash chain for
14274     #      a share. Do that with something like:
14275     #     
14276     #        put_sharehashes(share_hash_chain)
14277hunk ./src/allmydata/mutable/layout.py 639
14278     #      The root of this tree will be put explicitly in the next
14279     #      step.
14280     #
14281-    #      TODO: Why? Why not just include it in the tree here?
14282-    #
14283-    #   6: Before putting the signature, we must first put the
14284+    #   5: Before putting the signature, we must first put the
14285     #      root_hash. Do this with:
14286     #
14287     #        put_root_hash(root_hash).
14288hunk ./src/allmydata/mutable/layout.py 872
14289             raise LayoutInvalid("I was given the wrong size block to write")
14290 
14291         # We want to write at len(MDMFHEADER) + segnum * block_size.
14292-
14293         offset = MDMFHEADERSIZE + (self._actual_block_size * segnum)
14294         data = salt + data
14295 
14296hunk ./src/allmydata/mutable/layout.py 889
14297         # tree is written, since that could cause the private key to run
14298         # into the block hash tree. Before it writes the block hash
14299         # tree, the block hash tree writing method writes the offset of
14300-        # the salt hash tree. So that's a good indicator of whether or
14301+        # the share hash chain. So that's a good indicator of whether or
14302         # not the block hash tree has been written.
14303         if "share_hash_chain" in self._offsets:
14304             raise LayoutInvalid("You must write this before the block hash tree")
14305hunk ./src/allmydata/mutable/layout.py 907
14306         The encrypted private key must be queued before the block hash
14307         tree, since we need to know how large it is to know where the
14308         block hash tree should go. The block hash tree must be put
14309-        before the salt hash tree, since its size determines the
14310+        before the share hash chain, since its size determines the
14311         offset of the share hash chain.
14312         """
14313         assert self._offsets
14314hunk ./src/allmydata/mutable/layout.py 932
14315         I queue a write vector to put the share hash chain in my
14316         argument onto the remote server.
14317 
14318-        The salt hash tree must be queued before the share hash chain,
14319-        since we need to know where the salt hash tree ends before we
14320+        The block hash tree must be queued before the share hash chain,
14321+        since we need to know where the block hash tree ends before we
14322         can know where the share hash chain starts. The share hash chain
14323         must be put before the signature, since the length of the packed
14324         share hash chain determines the offset of the signature. Also,
14325hunk ./src/allmydata/mutable/layout.py 937
14326-        semantically, you must know what the root of the salt hash tree
14327+        semantically, you must know what the root of the block hash tree
14328         is before you can generate a valid signature.
14329         """
14330         assert isinstance(sharehashes, dict)
14331hunk ./src/allmydata/mutable/layout.py 942
14332         if "share_hash_chain" not in self._offsets:
14333-            raise LayoutInvalid("You need to put the salt hash tree before "
14334+            raise LayoutInvalid("You need to put the block hash tree before "
14335                                 "you can put the share hash chain")
14336         # The signature comes after the share hash chain. If the
14337         # signature has already been written, we must not write another
14338}
14339[test_mutable.py: add test to exercise fencepost bug
14340warner@lothar.com**20110228021056
14341 Ignore-this: d2f9cf237ce6db42fb250c8ad71a4fc3
14342] {
14343hunk ./src/allmydata/test/test_mutable.py 2
14344 
14345-import os
14346+import os, re
14347 from cStringIO import StringIO
14348 from twisted.trial import unittest
14349 from twisted.internet import defer, reactor
14350hunk ./src/allmydata/test/test_mutable.py 2931
14351         self.set_up_grid()
14352         self.c = self.g.clients[0]
14353         self.nm = self.c.nodemaker
14354-        self.data = "test data" * 100000 # about 900 KiB; MDMF
14355+        self.data = "testdata " * 100000 # about 900 KiB; MDMF
14356         self.small_data = "test data" * 10 # about 90 B; SDMF
14357         return self.do_upload()
14358 
14359hunk ./src/allmydata/test/test_mutable.py 2981
14360             self.failUnlessEqual(results, new_data))
14361         return d
14362 
14363+    def test_replace_segstart1(self):
14364+        offset = 128*1024+1
14365+        new_data = "NNNN"
14366+        expected = self.data[:offset]+new_data+self.data[offset+4:]
14367+        d = self.mdmf_node.get_best_mutable_version()
14368+        d.addCallback(lambda mv:
14369+            mv.update(MutableData(new_data), offset))
14370+        d.addCallback(lambda ignored:
14371+            self.mdmf_node.download_best_version())
14372+        def _check(results):
14373+            if results != expected:
14374+                print
14375+                print "got: %s ... %s" % (results[:20], results[-20:])
14376+                print "exp: %s ... %s" % (expected[:20], expected[-20:])
14377+                self.fail("results != expected")
14378+        d.addCallback(_check)
14379+        return d
14380+
14381+    def _check_differences(self, got, expected):
14382+        # displaying arbitrary file corruption is tricky for a
14383+        # 1MB file of repeating data,, so look for likely places
14384+        # with problems and display them separately
14385+        gotmods = [mo.span() for mo in re.finditer('([A-Z]+)', got)]
14386+        expmods = [mo.span() for mo in re.finditer('([A-Z]+)', expected)]
14387+        gotspans = ["%d:%d=%s" % (start,end,got[start:end])
14388+                    for (start,end) in gotmods]
14389+        expspans = ["%d:%d=%s" % (start,end,expected[start:end])
14390+                    for (start,end) in expmods]
14391+        #print "expecting: %s" % expspans
14392+
14393+        SEGSIZE = 128*1024
14394+        if got != expected:
14395+            print "differences:"
14396+            for segnum in range(len(expected)//SEGSIZE):
14397+                start = segnum * SEGSIZE
14398+                end = (segnum+1) * SEGSIZE
14399+                got_ends = "%s .. %s" % (got[start:start+20], got[end-20:end])
14400+                exp_ends = "%s .. %s" % (expected[start:start+20], expected[end-20:end])
14401+                if got_ends != exp_ends:
14402+                    print "expected[%d]: %s" % (start, exp_ends)
14403+                    print "got     [%d]: %s" % (start, got_ends)
14404+            if expspans != gotspans:
14405+                print "expected: %s" % expspans
14406+                print "got     : %s" % gotspans
14407+            open("EXPECTED","wb").write(expected)
14408+            open("GOT","wb").write(got)
14409+            print "wrote data to EXPECTED and GOT"
14410+            self.fail("didn't get expected data")
14411+
14412+
14413+    def test_replace_locations(self):
14414+        # exercise fencepost conditions
14415+        expected = self.data
14416+        SEGSIZE = 128*1024
14417+        suspects = range(SEGSIZE-3, SEGSIZE+1)+range(2*SEGSIZE-3, 2*SEGSIZE+1)
14418+        letters = iter("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
14419+        d = defer.succeed(None)
14420+        for offset in suspects:
14421+            new_data = letters.next()*2 # "AA", then "BB", etc
14422+            expected = expected[:offset]+new_data+expected[offset+2:]
14423+            d.addCallback(lambda ign:
14424+                          self.mdmf_node.get_best_mutable_version())
14425+            def _modify(mv, offset=offset, new_data=new_data):
14426+                # close over 'offset','new_data'
14427+                md = MutableData(new_data)
14428+                return mv.update(md, offset)
14429+            d.addCallback(_modify)
14430+            d.addCallback(lambda ignored:
14431+                          self.mdmf_node.download_best_version())
14432+            d.addCallback(self._check_differences, expected)
14433+        return d
14434+
14435 
14436     def test_replace_and_extend(self):
14437         # We should be able to replace data in the middle of a mutable
14438}
14439[mutable/publish: account for offsets on segment boundaries.
14440Kevan Carstensen <kevan@isnotajoke.com>**20110228083327
14441 Ignore-this: c8758a0580fcc15a22c2f8582d758a6b
14442] {
14443hunk ./src/allmydata/mutable/filenode.py 17
14444 from pycryptopp.cipher.aes import AES
14445 
14446 from allmydata.mutable.publish import Publish, MutableData,\
14447-                                      DEFAULT_MAX_SEGMENT_SIZE, \
14448                                       TransformingUploadable
14449 from allmydata.mutable.common import MODE_READ, MODE_WRITE, MODE_CHECK, UnrecoverableFileError, \
14450      ResponseCache, UncoordinatedWriteError
14451hunk ./src/allmydata/mutable/filenode.py 1058
14452         # appending data to the file.
14453         assert offset <= self.get_size()
14454 
14455+        segsize = self._version[3]
14456         # We'll need the segment that the data starts in, regardless of
14457         # what we'll do later.
14458hunk ./src/allmydata/mutable/filenode.py 1061
14459-        start_segment = mathutil.div_ceil(offset, DEFAULT_MAX_SEGMENT_SIZE)
14460+        start_segment = mathutil.div_ceil(offset, segsize)
14461         start_segment -= 1
14462 
14463         # We only need the end segment if the data we append does not go
14464hunk ./src/allmydata/mutable/filenode.py 1069
14465         end_segment = start_segment
14466         if offset + data.get_size() < self.get_size():
14467             end_data = offset + data.get_size()
14468-            end_segment = mathutil.div_ceil(end_data, DEFAULT_MAX_SEGMENT_SIZE)
14469+            end_segment = mathutil.div_ceil(end_data, segsize)
14470             end_segment -= 1
14471         self._start_segment = start_segment
14472         self._end_segment = end_segment
14473hunk ./src/allmydata/mutable/publish.py 551
14474                                                   segment_size)
14475             self.starting_segment = mathutil.div_ceil(offset,
14476                                                       segment_size)
14477-            self.starting_segment -= 1
14478+            if offset % segment_size != 0:
14479+                self.starting_segment -= 1
14480             if offset == 0:
14481                 self.starting_segment = 0
14482 
14483}
14484[tahoe-put: raise UsageError when given a nonsensical mutable type, move option validation code to the option parser.
14485Kevan Carstensen <kevan@isnotajoke.com>**20110301030807
14486 Ignore-this: 2dc19d8bd741842eff458ca553d0bf2a
14487] {
14488hunk ./src/allmydata/scripts/cli.py 179
14489         if self.from_file == u"-":
14490             self.from_file = None
14491 
14492+        if self['mutable-type'] and self['mutable-type'] not in ("sdmf", "mdmf"):
14493+            raise usage.UsageError("%s is an invalid format" % self['mutable-type'])
14494+
14495+
14496     def getSynopsis(self):
14497         return "Usage:  %s put LOCAL_FILE REMOTE_FILE" % (os.path.basename(sys.argv[0]),)
14498 
14499hunk ./src/allmydata/scripts/tahoe_put.py 33
14500     stdout = options.stdout
14501     stderr = options.stderr
14502 
14503-    if mutable_type and mutable_type not in ('sdmf', 'mdmf'):
14504-        # Don't try to pass unsupported types to the webapi
14505-        print >>stderr, "error: %s is an invalid format" % mutable_type
14506-        return 1
14507-
14508     if nodeurl[-1] != "/":
14509         nodeurl += "/"
14510     if to_file:
14511hunk ./src/allmydata/test/test_cli.py 1008
14512         return d
14513 
14514     def test_mutable_type_invalid_format(self):
14515-        self.basedir = "cli/Put/mutable_type_invalid_format"
14516-        self.set_up_grid()
14517-        data = "data" * 100000
14518-        fn1 = os.path.join(self.basedir, "data")
14519-        fileutil.write(fn1, data)
14520-        d = self.do_cli("put", "--mutable", "--mutable-type=ldmf", fn1)
14521-        def _check_failure((rc, out, err)):
14522-            self.failIfEqual(rc, 0)
14523-            self.failUnlessIn("invalid", err)
14524-        d.addCallback(_check_failure)
14525-        return d
14526+        o = cli.PutOptions()
14527+        self.failUnlessRaises(usage.UsageError,
14528+                              o.parseOptions,
14529+                              ["--mutable", "--mutable-type=ldmf"])
14530 
14531     def test_put_with_nonexistent_alias(self):
14532         # when invoked with an alias that doesn't exist, 'tahoe put'
14533}
14534
14535Context:
14536
14537[docs/configuration.rst: add a "Frontend Configuration" section
14538Brian Warner <warner@lothar.com>**20110222014323
14539 Ignore-this: 657018aa501fe4f0efef9851628444ca
14540 
14541 this points to docs/frontends/*.rst, which were previously underlinked
14542] 
14543[web/filenode.py: avoid calling req.finish() on closed HTTP connections. Closes #1366
14544"Brian Warner <warner@lothar.com>"**20110221061544
14545 Ignore-this: 799d4de19933f2309b3c0c19a63bb888
14546] 
14547[Add unit tests for cross_check_pkg_resources_versus_import, and a regression test for ref #1355. This requires a little refactoring to make it testable.
14548david-sarah@jacaranda.org**20110221015817
14549 Ignore-this: 51d181698f8c20d3aca58b057e9c475a
14550] 
14551[allmydata/__init__.py: .name was used in place of the correct .__name__ when printing an exception. Also, robustify string formatting by using %r instead of %s in some places. fixes #1355.
14552david-sarah@jacaranda.org**20110221020125
14553 Ignore-this: b0744ed58f161bf188e037bad077fc48
14554] 
14555[Refactor StorageFarmBroker handling of servers
14556Brian Warner <warner@lothar.com>**20110221015804
14557 Ignore-this: 842144ed92f5717699b8f580eab32a51
14558 
14559 Pass around IServer instance instead of (peerid, rref) tuple. Replace
14560 "descriptor" with "server". Other replacements:
14561 
14562  get_all_servers -> get_connected_servers/get_known_servers
14563  get_servers_for_index -> get_servers_for_psi (now returns IServers)
14564 
14565 This change still needs to be pushed further down: lots of code is now
14566 getting the IServer and then distributing (peerid, rref) internally.
14567 Instead, it ought to distribute the IServer internally and delay
14568 extracting a serverid or rref until the last moment.
14569 
14570 no_network.py was updated to retain parallelism.
14571] 
14572[TAG allmydata-tahoe-1.8.2
14573warner@lothar.com**20110131020101] 
14574Patch bundle hash:
145751bdecf8eaa7e7ce10d7cfd2a11841f1872af8cf5