Ticket #393: 393status37.dpatch

File 393status37.dpatch, 569.7 KB (added by kevan, at 2011-02-26T07:21:35Z)
Line 
1Mon Aug  9 16:32:44 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
2  * interfaces.py: Add #993 interfaces
3
4Mon Aug  9 16:35:35 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
5  * frontends/sftpd.py: Modify the sftp frontend to work with the MDMF changes
6
7Mon Aug  9 17:06:19 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
8  * immutable/filenode.py: Make the immutable file node implement the same interfaces as the mutable one
9
10Mon Aug  9 17:06:33 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
11  * immutable/literal.py: implement the same interfaces as other filenodes
12
13Fri Aug 13 16:49:57 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
14  * scripts: tell 'tahoe put' about MDMF
15
16Sat Aug 14 01:10:12 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
17  * web: Alter the webapi to get along with and take advantage of the MDMF changes
18 
19  The main benefit that the webapi gets from MDMF, at least initially, is
20  the ability to do a streaming download of an MDMF mutable file. It also
21  exposes a way (through the PUT verb) to append to or otherwise modify
22  (in-place) an MDMF mutable file.
23
24Sat Aug 14 15:57:11 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
25  * client.py: learn how to create different kinds of mutable files
26
27Wed Aug 18 17:32:16 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
28  * mutable/checker.py and mutable/repair.py: Modify checker and repairer to work with MDMF
29 
30  The checker and repairer required minimal changes to work with the MDMF
31  modifications made elsewhere. The checker duplicated a lot of the code
32  that was already in the downloader, so I modified the downloader
33  slightly to expose this functionality to the checker and removed the
34  duplicated code. The repairer only required a minor change to deal with
35  data representation.
36
37Wed Aug 18 17:32:31 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
38  * mutable/filenode.py: add versions and partial-file updates to the mutable file node
39 
40  One of the goals of MDMF as a GSoC project is to lay the groundwork for
41  LDMF, a format that will allow Tahoe-LAFS to deal with and encourage
42  multiple versions of a single cap on the grid. In line with this, there
43  is a now a distinction between an overriding mutable file (which can be
44  thought to correspond to the cap/unique identifier for that mutable
45  file) and versions of the mutable file (which we can download, update,
46  and so on). All download, upload, and modification operations end up
47  happening on a particular version of a mutable file, but there are
48  shortcut methods on the object representing the overriding mutable file
49  that perform these operations on the best version of the mutable file
50  (which is what code should be doing until we have LDMF and better
51  support for other paradigms).
52 
53  Another goal of MDMF was to take advantage of segmentation to give
54  callers more efficient partial file updates or appends. This patch
55  implements methods that do that, too.
56 
57
58Wed Aug 18 17:33:42 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
59  * mutable/publish.py: Modify the publish process to support MDMF
60 
61  The inner workings of the publishing process needed to be reworked to a
62  large extend to cope with segmented mutable files, and to cope with
63  partial-file updates of mutable files. This patch does that. It also
64  introduces wrappers for uploadable data, allowing the use of
65  filehandle-like objects as data sources, in addition to strings. This
66  reduces memory inefficiency when dealing with large files through the
67  webapi, and clarifies update code there.
68
69Wed Aug 18 17:35:09 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
70  * nodemaker.py: Make nodemaker expose a way to create MDMF files
71
72Sat Aug 14 15:56:44 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
73  * docs: update docs to mention MDMF
74
75Wed Aug 18 17:33:04 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
76  * mutable/layout.py and interfaces.py: add MDMF writer and reader
77 
78  The MDMF writer is responsible for keeping state as plaintext is
79  gradually processed into share data by the upload process. When the
80  upload finishes, it will write all of its share data to a remote server,
81  reporting its status back to the publisher.
82 
83  The MDMF reader is responsible for abstracting an MDMF file as it sits
84  on the grid from the downloader; specifically, by receiving and
85  responding to requests for arbitrary data within the MDMF file.
86 
87  The interfaces.py file has also been modified to contain an interface
88  for the writer.
89
90Wed Aug 18 17:34:09 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
91  * mutable/retrieve.py: Modify the retrieval process to support MDMF
92 
93  The logic behind a mutable file download had to be adapted to work with
94  segmented mutable files; this patch performs those adaptations. It also
95  exposes some decoding and decrypting functionality to make partial-file
96  updates a little easier, and supports efficient random-access downloads
97  of parts of an MDMF file.
98
99Wed Aug 18 17:34:39 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
100  * mutable/servermap.py: Alter the servermap updater to work with MDMF files
101 
102  These modifications were basically all to the end of having the
103  servermap updater use the unified MDMF + SDMF read interface whenever
104  possible -- this reduces the complexity of the code, making it easier to
105  read and maintain. To do this, I needed to modify the process of
106  updating the servermap a little bit.
107 
108  To support partial-file updates, I also modified the servermap updater
109  to fetch the block hash trees and certain segments of files while it
110  performed a servermap update (this can be done without adding any new
111  roundtrips because of batch-read functionality that the read proxy has).
112 
113
114Wed Aug 18 17:35:31 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
115  * tests:
116 
117      - A lot of existing tests relied on aspects of the mutable file
118        implementation that were changed. This patch updates those tests
119        to work with the changes.
120      - This patch also adds tests for new features.
121
122Sun Feb 20 15:02:01 PST 2011  "Brian Warner <warner@lothar.com>"
123  * resolve conflicts between 393-MDMF patches and trunk as of 1.8.2
124
125Sun Feb 20 17:46:59 PST 2011  "Brian Warner <warner@lothar.com>"
126  * mutable/filenode.py: fix create_mutable_file('string')
127
128Sun Feb 20 21:56:00 PST 2011  "Brian Warner <warner@lothar.com>"
129  * resolve more conflicts with current trunk
130
131Sun Feb 20 22:10:04 PST 2011  "Brian Warner <warner@lothar.com>"
132  * update MDMF code with StorageFarmBroker changes
133
134Tue Feb 22 16:00:44 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
135  * web: Use the string "replace" to trigger whole-file replacement when processing an offset parameter.
136
137Fri Feb 25 17:04:33 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
138  * mutable/filenode: Clean up servermap handling in MutableFileVersion
139 
140  We want to update the servermap before attempting to modify a file,
141  which we now do. This introduced code duplication, which was addressed
142  by refactoring the servermap update into its own method, and then
143  eliminating duplicate servermap updates throughout the
144  MutableFileVersion.
145
146New patches:
147
148[interfaces.py: Add #993 interfaces
149Kevan Carstensen <kevan@isnotajoke.com>**20100809233244
150 Ignore-this: b58621ac5cc86f1b4b4149f9e6c6a1ce
151] {
152hunk ./src/allmydata/interfaces.py 499
153 class MustNotBeUnknownRWError(CapConstraintError):
154     """Cannot add an unknown child cap specified in a rw_uri field."""
155 
156+
157+class IReadable(Interface):
158+    """I represent a readable object -- either an immutable file, or a
159+    specific version of a mutable file.
160+    """
161+
162+    def is_readonly():
163+        """Return True if this reference provides mutable access to the given
164+        file or directory (i.e. if you can modify it), or False if not. Note
165+        that even if this reference is read-only, someone else may hold a
166+        read-write reference to it.
167+
168+        For an IReadable returned by get_best_readable_version(), this will
169+        always return True, but for instances of subinterfaces such as
170+        IMutableFileVersion, it may return False."""
171+
172+    def is_mutable():
173+        """Return True if this file or directory is mutable (by *somebody*,
174+        not necessarily you), False if it is is immutable. Note that a file
175+        might be mutable overall, but your reference to it might be
176+        read-only. On the other hand, all references to an immutable file
177+        will be read-only; there are no read-write references to an immutable
178+        file."""
179+
180+    def get_storage_index():
181+        """Return the storage index of the file."""
182+
183+    def get_size():
184+        """Return the length (in bytes) of this readable object."""
185+
186+    def download_to_data():
187+        """Download all of the file contents. I return a Deferred that fires
188+        with the contents as a byte string."""
189+
190+    def read(consumer, offset=0, size=None):
191+        """Download a portion (possibly all) of the file's contents, making
192+        them available to the given IConsumer. Return a Deferred that fires
193+        (with the consumer) when the consumer is unregistered (either because
194+        the last byte has been given to it, or because the consumer threw an
195+        exception during write(), possibly because it no longer wants to
196+        receive data). The portion downloaded will start at 'offset' and
197+        contain 'size' bytes (or the remainder of the file if size==None).
198+
199+        The consumer will be used in non-streaming mode: an IPullProducer
200+        will be attached to it.
201+
202+        The consumer will not receive data right away: several network trips
203+        must occur first. The order of events will be::
204+
205+         consumer.registerProducer(p, streaming)
206+          (if streaming == False)::
207+           consumer does p.resumeProducing()
208+            consumer.write(data)
209+           consumer does p.resumeProducing()
210+            consumer.write(data).. (repeat until all data is written)
211+         consumer.unregisterProducer()
212+         deferred.callback(consumer)
213+
214+        If a download error occurs, or an exception is raised by
215+        consumer.registerProducer() or consumer.write(), I will call
216+        consumer.unregisterProducer() and then deliver the exception via
217+        deferred.errback(). To cancel the download, the consumer should call
218+        p.stopProducing(), which will result in an exception being delivered
219+        via deferred.errback().
220+
221+        See src/allmydata/util/consumer.py for an example of a simple
222+        download-to-memory consumer.
223+        """
224+
225+
226+class IWritable(Interface):
227+    """
228+    I define methods that callers can use to update SDMF and MDMF
229+    mutable files on a Tahoe-LAFS grid.
230+    """
231+    # XXX: For the moment, we have only this. It is possible that we
232+    #      want to move overwrite() and modify() in here too.
233+    def update(data, offset):
234+        """
235+        I write the data from my data argument to the MDMF file,
236+        starting at offset. I continue writing data until my data
237+        argument is exhausted, appending data to the file as necessary.
238+        """
239+        # assert IMutableUploadable.providedBy(data)
240+        # to append data: offset=node.get_size_of_best_version()
241+        # do we want to support compacting MDMF?
242+        # for an MDMF file, this can be done with O(data.get_size())
243+        # memory. For an SDMF file, any modification takes
244+        # O(node.get_size_of_best_version()).
245+
246+
247+class IMutableFileVersion(IReadable):
248+    """I provide access to a particular version of a mutable file. The
249+    access is read/write if I was obtained from a filenode derived from
250+    a write cap, or read-only if the filenode was derived from a read cap.
251+    """
252+
253+    def get_sequence_number():
254+        """Return the sequence number of this version."""
255+
256+    def get_servermap():
257+        """Return the IMutableFileServerMap instance that was used to create
258+        this object.
259+        """
260+
261+    def get_writekey():
262+        """Return this filenode's writekey, or None if the node does not have
263+        write-capability. This may be used to assist with data structures
264+        that need to make certain data available only to writers, such as the
265+        read-write child caps in dirnodes. The recommended process is to have
266+        reader-visible data be submitted to the filenode in the clear (where
267+        it will be encrypted by the filenode using the readkey), but encrypt
268+        writer-visible data using this writekey.
269+        """
270+
271+    # TODO: Can this be overwrite instead of replace?
272+    def replace(new_contents):
273+        """Replace the contents of the mutable file, provided that no other
274+        node has published (or is attempting to publish, concurrently) a
275+        newer version of the file than this one.
276+
277+        I will avoid modifying any share that is different than the version
278+        given by get_sequence_number(). However, if another node is writing
279+        to the file at the same time as me, I may manage to update some shares
280+        while they update others. If I see any evidence of this, I will signal
281+        UncoordinatedWriteError, and the file will be left in an inconsistent
282+        state (possibly the version you provided, possibly the old version,
283+        possibly somebody else's version, and possibly a mix of shares from
284+        all of these).
285+
286+        The recommended response to UncoordinatedWriteError is to either
287+        return it to the caller (since they failed to coordinate their
288+        writes), or to attempt some sort of recovery. It may be sufficient to
289+        wait a random interval (with exponential backoff) and repeat your
290+        operation. If I do not signal UncoordinatedWriteError, then I was
291+        able to write the new version without incident.
292+
293+        I return a Deferred that fires (with a PublishStatus object) when the
294+        update has completed.
295+        """
296+
297+    def modify(modifier_cb):
298+        """Modify the contents of the file, by downloading this version,
299+        applying the modifier function (or bound method), then uploading
300+        the new version. This will succeed as long as no other node
301+        publishes a version between the download and the upload.
302+        I return a Deferred that fires (with a PublishStatus object) when
303+        the update is complete.
304+
305+        The modifier callable will be given three arguments: a string (with
306+        the old contents), a 'first_time' boolean, and a servermap. As with
307+        download_to_data(), the old contents will be from this version,
308+        but the modifier can use the servermap to make other decisions
309+        (such as refusing to apply the delta if there are multiple parallel
310+        versions, or if there is evidence of a newer unrecoverable version).
311+        'first_time' will be True the first time the modifier is called,
312+        and False on any subsequent calls.
313+
314+        The callable should return a string with the new contents. The
315+        callable must be prepared to be called multiple times, and must
316+        examine the input string to see if the change that it wants to make
317+        is already present in the old version. If it does not need to make
318+        any changes, it can either return None, or return its input string.
319+
320+        If the modifier raises an exception, it will be returned in the
321+        errback.
322+        """
323+
324+
325 # The hierarchy looks like this:
326 #  IFilesystemNode
327 #   IFileNode
328hunk ./src/allmydata/interfaces.py 758
329     def raise_error():
330         """Raise any error associated with this node."""
331 
332+    # XXX: These may not be appropriate outside the context of an IReadable.
333     def get_size():
334         """Return the length (in bytes) of the data this node represents. For
335         directory nodes, I return the size of the backing store. I return
336hunk ./src/allmydata/interfaces.py 775
337 class IFileNode(IFilesystemNode):
338     """I am a node which represents a file: a sequence of bytes. I am not a
339     container, like IDirectoryNode."""
340+    def get_best_readable_version():
341+        """Return a Deferred that fires with an IReadable for the 'best'
342+        available version of the file. The IReadable provides only read
343+        access, even if this filenode was derived from a write cap.
344 
345hunk ./src/allmydata/interfaces.py 780
346-class IImmutableFileNode(IFileNode):
347-    def read(consumer, offset=0, size=None):
348-        """Download a portion (possibly all) of the file's contents, making
349-        them available to the given IConsumer. Return a Deferred that fires
350-        (with the consumer) when the consumer is unregistered (either because
351-        the last byte has been given to it, or because the consumer threw an
352-        exception during write(), possibly because it no longer wants to
353-        receive data). The portion downloaded will start at 'offset' and
354-        contain 'size' bytes (or the remainder of the file if size==None).
355-
356-        The consumer will be used in non-streaming mode: an IPullProducer
357-        will be attached to it.
358+        For an immutable file, there is only one version. For a mutable
359+        file, the 'best' version is the recoverable version with the
360+        highest sequence number. If no uncoordinated writes have occurred,
361+        and if enough shares are available, then this will be the most
362+        recent version that has been uploaded. If no version is recoverable,
363+        the Deferred will errback with an UnrecoverableFileError.
364+        """
365 
366hunk ./src/allmydata/interfaces.py 788
367-        The consumer will not receive data right away: several network trips
368-        must occur first. The order of events will be::
369+    def download_best_version():
370+        """Download the contents of the version that would be returned
371+        by get_best_readable_version(). This is equivalent to calling
372+        download_to_data() on the IReadable given by that method.
373 
374hunk ./src/allmydata/interfaces.py 793
375-         consumer.registerProducer(p, streaming)
376-          (if streaming == False)::
377-           consumer does p.resumeProducing()
378-            consumer.write(data)
379-           consumer does p.resumeProducing()
380-            consumer.write(data).. (repeat until all data is written)
381-         consumer.unregisterProducer()
382-         deferred.callback(consumer)
383+        I return a Deferred that fires with a byte string when the file
384+        has been fully downloaded. To support streaming download, use
385+        the 'read' method of IReadable. If no version is recoverable,
386+        the Deferred will errback with an UnrecoverableFileError.
387+        """
388 
389hunk ./src/allmydata/interfaces.py 799
390-        If a download error occurs, or an exception is raised by
391-        consumer.registerProducer() or consumer.write(), I will call
392-        consumer.unregisterProducer() and then deliver the exception via
393-        deferred.errback(). To cancel the download, the consumer should call
394-        p.stopProducing(), which will result in an exception being delivered
395-        via deferred.errback().
396+    def get_size_of_best_version():
397+        """Find the size of the version that would be returned by
398+        get_best_readable_version().
399 
400hunk ./src/allmydata/interfaces.py 803
401-        See src/allmydata/util/consumer.py for an example of a simple
402-        download-to-memory consumer.
403+        I return a Deferred that fires with an integer. If no version
404+        is recoverable, the Deferred will errback with an
405+        UnrecoverableFileError.
406         """
407 
408hunk ./src/allmydata/interfaces.py 808
409+
410+class IImmutableFileNode(IFileNode, IReadable):
411+    """I am a node representing an immutable file. Immutable files have
412+    only one version"""
413+
414+
415 class IMutableFileNode(IFileNode):
416     """I provide access to a 'mutable file', which retains its identity
417     regardless of what contents are put in it.
418hunk ./src/allmydata/interfaces.py 873
419     only be retrieved and updated all-at-once, as a single big string. Future
420     versions of our mutable files will remove this restriction.
421     """
422-
423-    def download_best_version():
424-        """Download the 'best' available version of the file, meaning one of
425-        the recoverable versions with the highest sequence number. If no
426+    def get_best_mutable_version():
427+        """Return a Deferred that fires with an IMutableFileVersion for
428+        the 'best' available version of the file. The best version is
429+        the recoverable version with the highest sequence number. If no
430         uncoordinated writes have occurred, and if enough shares are
431hunk ./src/allmydata/interfaces.py 878
432-        available, then this will be the most recent version that has been
433-        uploaded.
434+        available, then this will be the most recent version that has
435+        been uploaded.
436 
437hunk ./src/allmydata/interfaces.py 881
438-        I update an internal servermap with MODE_READ, determine which
439-        version of the file is indicated by
440-        servermap.best_recoverable_version(), and return a Deferred that
441-        fires with its contents. If no version is recoverable, the Deferred
442-        will errback with UnrecoverableFileError.
443-        """
444-
445-    def get_size_of_best_version():
446-        """Find the size of the version that would be downloaded with
447-        download_best_version(), without actually downloading the whole file.
448-
449-        I return a Deferred that fires with an integer.
450+        If no version is recoverable, the Deferred will errback with an
451+        UnrecoverableFileError.
452         """
453 
454     def overwrite(new_contents):
455hunk ./src/allmydata/interfaces.py 921
456         errback.
457         """
458 
459-
460     def get_servermap(mode):
461         """Return a Deferred that fires with an IMutableFileServerMap
462         instance, updated using the given mode.
463hunk ./src/allmydata/interfaces.py 974
464         writer-visible data using this writekey.
465         """
466 
467+    def set_version(version):
468+        """Tahoe-LAFS supports SDMF and MDMF mutable files. By default,
469+        we upload in SDMF for reasons of compatibility. If you want to
470+        change this, set_version will let you do that.
471+
472+        To say that this file should be uploaded in SDMF, pass in a 0. To
473+        say that the file should be uploaded as MDMF, pass in a 1.
474+        """
475+
476+    def get_version():
477+        """Returns the mutable file protocol version."""
478+
479 class NotEnoughSharesError(Exception):
480     """Download was unable to get enough shares"""
481 
482hunk ./src/allmydata/interfaces.py 1822
483         """The upload is finished, and whatever filehandle was in use may be
484         closed."""
485 
486+
487+class IMutableUploadable(Interface):
488+    """
489+    I represent content that is due to be uploaded to a mutable filecap.
490+    """
491+    # This is somewhat simpler than the IUploadable interface above
492+    # because mutable files do not need to be concerned with possibly
493+    # generating a CHK, nor with per-file keys. It is a subset of the
494+    # methods in IUploadable, though, so we could just as well implement
495+    # the mutable uploadables as IUploadables that don't happen to use
496+    # those methods (with the understanding that the unused methods will
497+    # never be called on such objects)
498+    def get_size():
499+        """
500+        Returns a Deferred that fires with the size of the content held
501+        by the uploadable.
502+        """
503+
504+    def read(length):
505+        """
506+        Returns a list of strings which, when concatenated, are the next
507+        length bytes of the file, or fewer if there are fewer bytes
508+        between the current location and the end of the file.
509+        """
510+
511+    def close():
512+        """
513+        The process that used the Uploadable is finished using it, so
514+        the uploadable may be closed.
515+        """
516+
517 class IUploadResults(Interface):
518     """I am returned by upload() methods. I contain a number of public
519     attributes which can be read to determine the results of the upload. Some
520}
521[frontends/sftpd.py: Modify the sftp frontend to work with the MDMF changes
522Kevan Carstensen <kevan@isnotajoke.com>**20100809233535
523 Ignore-this: 2d25e2cfcd0d7bbcbba660c7e1da12f
524] {
525hunk ./src/allmydata/frontends/sftpd.py 33
526 from allmydata.interfaces import IFileNode, IDirectoryNode, ExistingChildError, \
527      NoSuchChildError, ChildOfWrongTypeError
528 from allmydata.mutable.common import NotWriteableError
529+from allmydata.mutable.publish import MutableFileHandle
530 from allmydata.immutable.upload import FileHandle
531 from allmydata.dirnode import update_metadata
532 from allmydata.util.fileutil import EncryptedTemporaryFile
533hunk ./src/allmydata/frontends/sftpd.py 667
534         else:
535             assert IFileNode.providedBy(filenode), filenode
536 
537-            if filenode.is_mutable():
538-                self.async.addCallback(lambda ign: filenode.download_best_version())
539-                def _downloaded(data):
540-                    self.consumer = OverwriteableFileConsumer(len(data), tempfile_maker)
541-                    self.consumer.write(data)
542-                    self.consumer.finish()
543-                    return None
544-                self.async.addCallback(_downloaded)
545-            else:
546-                download_size = filenode.get_size()
547-                assert download_size is not None, "download_size is None"
548+            self.async.addCallback(lambda ignored: filenode.get_best_readable_version())
549+
550+            def _read(version):
551+                if noisy: self.log("_read", level=NOISY)
552+                download_size = version.get_size()
553+                assert download_size is not None
554+
555                 self.consumer = OverwriteableFileConsumer(download_size, tempfile_maker)
556hunk ./src/allmydata/frontends/sftpd.py 675
557-                def _read(ign):
558-                    if noisy: self.log("_read immutable", level=NOISY)
559-                    filenode.read(self.consumer, 0, None)
560-                self.async.addCallback(_read)
561+
562+                version.read(self.consumer, 0, None)
563+            self.async.addCallback(_read)
564 
565         eventually(self.async.callback, None)
566 
567hunk ./src/allmydata/frontends/sftpd.py 821
568                     assert parent and childname, (parent, childname, self.metadata)
569                     d2.addCallback(lambda ign: parent.set_metadata_for(childname, self.metadata))
570 
571-                d2.addCallback(lambda ign: self.consumer.get_current_size())
572-                d2.addCallback(lambda size: self.consumer.read(0, size))
573-                d2.addCallback(lambda new_contents: self.filenode.overwrite(new_contents))
574+                d2.addCallback(lambda ign: self.filenode.overwrite(MutableFileHandle(self.consumer.get_file())))
575             else:
576                 def _add_file(ign):
577                     self.log("_add_file childname=%r" % (childname,), level=OPERATIONAL)
578}
579[immutable/filenode.py: Make the immutable file node implement the same interfaces as the mutable one
580Kevan Carstensen <kevan@isnotajoke.com>**20100810000619
581 Ignore-this: 93e536c0f8efb705310f13ff64621527
582] {
583hunk ./src/allmydata/immutable/filenode.py 8
584 now = time.time
585 from zope.interface import implements, Interface
586 from twisted.internet import defer
587-from twisted.internet.interfaces import IConsumer
588 
589hunk ./src/allmydata/immutable/filenode.py 9
590-from allmydata.interfaces import IImmutableFileNode, IUploadResults
591 from allmydata import uri
592hunk ./src/allmydata/immutable/filenode.py 10
593+from twisted.internet.interfaces import IConsumer
594+from twisted.protocols import basic
595+from foolscap.api import eventually
596+from allmydata.interfaces import IImmutableFileNode, ICheckable, \
597+     IDownloadTarget, IUploadResults
598+from allmydata.util import dictutil, log, base32, consumer
599+from allmydata.immutable.checker import Checker
600 from allmydata.check_results import CheckResults, CheckAndRepairResults
601 from allmydata.util.dictutil import DictOfSets
602 from pycryptopp.cipher.aes import AES
603hunk ./src/allmydata/immutable/filenode.py 296
604         return self._cnode.check_and_repair(monitor, verify, add_lease)
605     def check(self, monitor, verify=False, add_lease=False):
606         return self._cnode.check(monitor, verify, add_lease)
607+
608+    def get_best_readable_version(self):
609+        """
610+        Return an IReadable of the best version of this file. Since
611+        immutable files can have only one version, we just return the
612+        current filenode.
613+        """
614+        return defer.succeed(self)
615+
616+
617+    def download_best_version(self):
618+        """
619+        Download the best version of this file, returning its contents
620+        as a bytestring. Since there is only one version of an immutable
621+        file, we download and return the contents of this file.
622+        """
623+        d = consumer.download_to_data(self)
624+        return d
625+
626+    # for an immutable file, download_to_data (specified in IReadable)
627+    # is the same as download_best_version (specified in IFileNode). For
628+    # mutable files, the difference is more meaningful, since they can
629+    # have multiple versions.
630+    download_to_data = download_best_version
631+
632+
633+    # get_size() (IReadable), get_current_size() (IFilesystemNode), and
634+    # get_size_of_best_version(IFileNode) are all the same for immutable
635+    # files.
636+    get_size_of_best_version = get_current_size
637}
638[immutable/literal.py: implement the same interfaces as other filenodes
639Kevan Carstensen <kevan@isnotajoke.com>**20100810000633
640 Ignore-this: b50dd5df2d34ecd6477b8499a27aef13
641] hunk ./src/allmydata/immutable/literal.py 106
642         d.addCallback(lambda lastSent: consumer)
643         return d
644 
645+    # IReadable, IFileNode, IFilesystemNode
646+    def get_best_readable_version(self):
647+        return defer.succeed(self)
648+
649+
650+    def download_best_version(self):
651+        return defer.succeed(self.u.data)
652+
653+
654+    download_to_data = download_best_version
655+    get_size_of_best_version = get_current_size
656+
657[scripts: tell 'tahoe put' about MDMF
658Kevan Carstensen <kevan@isnotajoke.com>**20100813234957
659 Ignore-this: c106b3384fc676bd3c0fb466d2a52b1b
660] {
661hunk ./src/allmydata/scripts/cli.py 160
662     optFlags = [
663         ("mutable", "m", "Create a mutable file instead of an immutable one."),
664         ]
665+    optParameters = [
666+        ("mutable-type", None, False, "Create a mutable file in the given format. Valid formats are 'sdmf' for SDMF and 'mdmf' for MDMF"),
667+        ]
668 
669     def parseArgs(self, arg1=None, arg2=None):
670         # see Examples below
671hunk ./src/allmydata/scripts/tahoe_put.py 21
672     from_file = options.from_file
673     to_file = options.to_file
674     mutable = options['mutable']
675+    mutable_type = False
676+
677+    if mutable:
678+        mutable_type = options['mutable-type']
679     if options['quiet']:
680         verbosity = 0
681     else:
682hunk ./src/allmydata/scripts/tahoe_put.py 33
683     stdout = options.stdout
684     stderr = options.stderr
685 
686+    if mutable_type and mutable_type not in ('sdmf', 'mdmf'):
687+        # Don't try to pass unsupported types to the webapi
688+        print >>stderr, "error: %s is an invalid format" % mutable_type
689+        return 1
690+
691     if nodeurl[-1] != "/":
692         nodeurl += "/"
693     if to_file:
694hunk ./src/allmydata/scripts/tahoe_put.py 76
695         url = nodeurl + "uri"
696     if mutable:
697         url += "?mutable=true"
698+    if mutable_type:
699+        assert mutable
700+        url += "&mutable-type=%s" % mutable_type
701+
702     if from_file:
703         infileobj = open(os.path.expanduser(from_file), "rb")
704     else:
705}
706[web: Alter the webapi to get along with and take advantage of the MDMF changes
707Kevan Carstensen <kevan@isnotajoke.com>**20100814081012
708 Ignore-this: 96c2ed4e4a9f450fb84db5d711d10bd6
709 
710 The main benefit that the webapi gets from MDMF, at least initially, is
711 the ability to do a streaming download of an MDMF mutable file. It also
712 exposes a way (through the PUT verb) to append to or otherwise modify
713 (in-place) an MDMF mutable file.
714] {
715hunk ./src/allmydata/web/common.py 12
716 from allmydata.interfaces import ExistingChildError, NoSuchChildError, \
717      FileTooLargeError, NotEnoughSharesError, NoSharesError, \
718      EmptyPathnameComponentError, MustBeDeepImmutableError, \
719-     MustBeReadonlyError, MustNotBeUnknownRWError
720+     MustBeReadonlyError, MustNotBeUnknownRWError, SDMF_VERSION, MDMF_VERSION
721 from allmydata.mutable.common import UnrecoverableFileError
722 from allmydata.util import abbreviate
723 from allmydata.util.encodingutil import to_str, quote_output
724hunk ./src/allmydata/web/common.py 35
725     else:
726         return boolean_of_arg(replace)
727 
728+
729+def parse_mutable_type_arg(arg):
730+    if not arg:
731+        return None # interpreted by the caller as "let the nodemaker decide"
732+
733+    arg = arg.lower()
734+    assert arg in ("mdmf", "sdmf")
735+
736+    if arg == "mdmf":
737+        return MDMF_VERSION
738+
739+    return SDMF_VERSION
740+
741+
742+def parse_offset_arg(offset):
743+    # XXX: This will raise a ValueError when invoked on something that
744+    # is not an integer. Is that okay? Or do we want a better error
745+    # message? Since this call is going to be used by programmers and
746+    # their tools rather than users (through the wui), it is not
747+    # inconsistent to return that, I guess.
748+    offset = int(offset)
749+    return offset
750+
751+
752 def get_root(ctx_or_req):
753     req = IRequest(ctx_or_req)
754     # the addSlash=True gives us one extra (empty) segment
755hunk ./src/allmydata/web/directory.py 19
756 from allmydata.uri import from_string_dirnode
757 from allmydata.interfaces import IDirectoryNode, IFileNode, IFilesystemNode, \
758      IImmutableFileNode, IMutableFileNode, ExistingChildError, \
759-     NoSuchChildError, EmptyPathnameComponentError
760+     NoSuchChildError, EmptyPathnameComponentError, SDMF_VERSION, MDMF_VERSION
761 from allmydata.monitor import Monitor, OperationCancelledError
762 from allmydata import dirnode
763 from allmydata.web.common import text_plain, WebError, \
764hunk ./src/allmydata/web/directory.py 153
765         if not t:
766             # render the directory as HTML, using the docFactory and Nevow's
767             # whole templating thing.
768-            return DirectoryAsHTML(self.node)
769+            return DirectoryAsHTML(self.node,
770+                                   self.client.mutable_file_default)
771 
772         if t == "json":
773             return DirectoryJSONMetadata(ctx, self.node)
774hunk ./src/allmydata/web/directory.py 556
775     docFactory = getxmlfile("directory.xhtml")
776     addSlash = True
777 
778-    def __init__(self, node):
779+    def __init__(self, node, default_mutable_format):
780         rend.Page.__init__(self)
781         self.node = node
782 
783hunk ./src/allmydata/web/directory.py 560
784+        assert default_mutable_format in (MDMF_VERSION, SDMF_VERSION)
785+        self.default_mutable_format = default_mutable_format
786+
787     def beforeRender(self, ctx):
788         # attempt to get the dirnode's children, stashing them (or the
789         # failure that results) for later use
790hunk ./src/allmydata/web/directory.py 780
791             ]]
792         forms.append(T.div(class_="freeform-form")[mkdir])
793 
794+        # Build input elements for mutable file type. We do this outside
795+        # of the list so we can check the appropriate format, based on
796+        # the default configured in the client (which reflects the
797+        # default configured in tahoe.cfg)
798+        if self.default_mutable_format == MDMF_VERSION:
799+            mdmf_input = T.input(type='radio', name='mutable-type',
800+                                 id='mutable-type-mdmf', value='mdmf',
801+                                 checked='checked')
802+        else:
803+            mdmf_input = T.input(type='radio', name='mutable-type',
804+                                 id='mutable-type-mdmf', value='mdmf')
805+
806+        if self.default_mutable_format == SDMF_VERSION:
807+            sdmf_input = T.input(type='radio', name='mutable-type',
808+                                 id='mutable-type-sdmf', value='sdmf',
809+                                 checked="checked")
810+        else:
811+            sdmf_input = T.input(type='radio', name='mutable-type',
812+                                 id='mutable-type-sdmf', value='sdmf')
813+
814         upload = T.form(action=".", method="post",
815                         enctype="multipart/form-data")[
816             T.fieldset[
817hunk ./src/allmydata/web/directory.py 812
818             T.input(type="submit", value="Upload"),
819             " Mutable?:",
820             T.input(type="checkbox", name="mutable"),
821+            sdmf_input, T.label(for_="mutable-type-sdmf")["SDMF"],
822+            mdmf_input,
823+            T.label(for_="mutable-type-mdmf")["MDMF (experimental)"],
824             ]]
825         forms.append(T.div(class_="freeform-form")[upload])
826 
827hunk ./src/allmydata/web/directory.py 850
828                 kiddata = ("filenode", {'size': childnode.get_size(),
829                                         'mutable': childnode.is_mutable(),
830                                         })
831+                if childnode.is_mutable() and \
832+                    childnode.get_version() is not None:
833+                    mutable_type = childnode.get_version()
834+                    assert mutable_type in (SDMF_VERSION, MDMF_VERSION)
835+
836+                    if mutable_type == MDMF_VERSION:
837+                        mutable_type = "mdmf"
838+                    else:
839+                        mutable_type = "sdmf"
840+                    kiddata[1]['mutable-type'] = mutable_type
841+
842             elif IDirectoryNode.providedBy(childnode):
843                 kiddata = ("dirnode", {'mutable': childnode.is_mutable()})
844             else:
845hunk ./src/allmydata/web/filenode.py 9
846 from nevow import url, rend
847 from nevow.inevow import IRequest
848 
849-from allmydata.interfaces import ExistingChildError
850+from allmydata.interfaces import ExistingChildError, SDMF_VERSION, MDMF_VERSION
851 from allmydata.monitor import Monitor
852 from allmydata.immutable.upload import FileHandle
853hunk ./src/allmydata/web/filenode.py 12
854+from allmydata.mutable.publish import MutableFileHandle
855+from allmydata.mutable.common import MODE_READ
856 from allmydata.util import log, base32
857 
858 from allmydata.web.common import text_plain, WebError, RenderMixin, \
859hunk ./src/allmydata/web/filenode.py 18
860      boolean_of_arg, get_arg, should_create_intermediate_directories, \
861-     MyExceptionHandler, parse_replace_arg
862+     MyExceptionHandler, parse_replace_arg, parse_offset_arg, \
863+     parse_mutable_type_arg
864 from allmydata.web.check_results import CheckResults, \
865      CheckAndRepairResults, LiteralCheckResults
866 from allmydata.web.info import MoreInfo
867hunk ./src/allmydata/web/filenode.py 29
868         # a new file is being uploaded in our place.
869         mutable = boolean_of_arg(get_arg(req, "mutable", "false"))
870         if mutable:
871-            req.content.seek(0)
872-            data = req.content.read()
873-            d = client.create_mutable_file(data)
874+            mutable_type = parse_mutable_type_arg(get_arg(req,
875+                                                          "mutable-type",
876+                                                          None))
877+            data = MutableFileHandle(req.content)
878+            d = client.create_mutable_file(data, version=mutable_type)
879             def _uploaded(newnode):
880                 d2 = self.parentnode.set_node(self.name, newnode,
881                                               overwrite=replace)
882hunk ./src/allmydata/web/filenode.py 66
883         d.addCallback(lambda res: childnode.get_uri())
884         return d
885 
886-    def _read_data_from_formpost(self, req):
887-        # SDMF: files are small, and we can only upload data, so we read
888-        # the whole file into memory before uploading.
889-        contents = req.fields["file"]
890-        contents.file.seek(0)
891-        data = contents.file.read()
892-        return data
893 
894     def replace_me_with_a_formpost(self, req, client, replace):
895         # create a new file, maybe mutable, maybe immutable
896hunk ./src/allmydata/web/filenode.py 71
897         mutable = boolean_of_arg(get_arg(req, "mutable", "false"))
898 
899+        # create an immutable file
900+        contents = req.fields["file"]
901         if mutable:
902hunk ./src/allmydata/web/filenode.py 74
903-            data = self._read_data_from_formpost(req)
904-            d = client.create_mutable_file(data)
905+            mutable_type = parse_mutable_type_arg(get_arg(req, "mutable-type",
906+                                                          None))
907+            uploadable = MutableFileHandle(contents.file)
908+            d = client.create_mutable_file(uploadable, version=mutable_type)
909             def _uploaded(newnode):
910                 d2 = self.parentnode.set_node(self.name, newnode,
911                                               overwrite=replace)
912hunk ./src/allmydata/web/filenode.py 85
913                 return d2
914             d.addCallback(_uploaded)
915             return d
916-        # create an immutable file
917-        contents = req.fields["file"]
918+
919         uploadable = FileHandle(contents.file, convergence=client.convergence)
920         d = self.parentnode.add_file(self.name, uploadable, overwrite=replace)
921         d.addCallback(lambda newnode: newnode.get_uri())
922hunk ./src/allmydata/web/filenode.py 91
923         return d
924 
925+
926 class PlaceHolderNodeHandler(RenderMixin, rend.Page, ReplaceMeMixin):
927     def __init__(self, client, parentnode, name):
928         rend.Page.__init__(self)
929hunk ./src/allmydata/web/filenode.py 174
930             # properly. So we assume that at least the browser will agree
931             # with itself, and echo back the same bytes that we were given.
932             filename = get_arg(req, "filename", self.name) or "unknown"
933-            if self.node.is_mutable():
934-                # some day: d = self.node.get_best_version()
935-                d = makeMutableDownloadable(self.node)
936-            else:
937-                d = defer.succeed(self.node)
938+            d = self.node.get_best_readable_version()
939             d.addCallback(lambda dn: FileDownloader(dn, filename))
940             return d
941         if t == "json":
942hunk ./src/allmydata/web/filenode.py 178
943-            if self.parentnode and self.name:
944-                d = self.parentnode.get_metadata_for(self.name)
945+            # We do this to make sure that fields like size and
946+            # mutable-type (which depend on the file on the grid and not
947+            # just on the cap) are filled in. The latter gets used in
948+            # tests, in particular.
949+            #
950+            # TODO: Make it so that the servermap knows how to update in
951+            # a mode specifically designed to fill in these fields, and
952+            # then update it in that mode.
953+            if self.node.is_mutable():
954+                d = self.node.get_servermap(MODE_READ)
955             else:
956                 d = defer.succeed(None)
957hunk ./src/allmydata/web/filenode.py 190
958+            if self.parentnode and self.name:
959+                d.addCallback(lambda ignored:
960+                    self.parentnode.get_metadata_for(self.name))
961+            else:
962+                d.addCallback(lambda ignored: None)
963             d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
964             return d
965         if t == "info":
966hunk ./src/allmydata/web/filenode.py 211
967         if t:
968             raise WebError("GET file: bad t=%s" % t)
969         filename = get_arg(req, "filename", self.name) or "unknown"
970-        if self.node.is_mutable():
971-            # some day: d = self.node.get_best_version()
972-            d = makeMutableDownloadable(self.node)
973-        else:
974-            d = defer.succeed(self.node)
975+        d = self.node.get_best_readable_version()
976         d.addCallback(lambda dn: FileDownloader(dn, filename))
977         return d
978 
979hunk ./src/allmydata/web/filenode.py 219
980         req = IRequest(ctx)
981         t = get_arg(req, "t", "").strip()
982         replace = parse_replace_arg(get_arg(req, "replace", "true"))
983+        offset = parse_offset_arg(get_arg(req, "offset", -1))
984 
985         if not t:
986hunk ./src/allmydata/web/filenode.py 222
987-            if self.node.is_mutable():
988+            if self.node.is_mutable() and offset >= 0:
989+                return self.update_my_contents(req, offset)
990+
991+            elif self.node.is_mutable():
992                 return self.replace_my_contents(req)
993             if not replace:
994                 # this is the early trap: if someone else modifies the
995hunk ./src/allmydata/web/filenode.py 232
996                 # directory while we're uploading, the add_file(overwrite=)
997                 # call in replace_me_with_a_child will do the late trap.
998                 raise ExistingChildError()
999+            if offset >= 0:
1000+                raise WebError("PUT to a file: append operation invoked "
1001+                               "on an immutable cap")
1002+
1003+
1004             assert self.parentnode and self.name
1005             return self.replace_me_with_a_child(req, self.client, replace)
1006         if t == "uri":
1007hunk ./src/allmydata/web/filenode.py 299
1008 
1009     def replace_my_contents(self, req):
1010         req.content.seek(0)
1011-        new_contents = req.content.read()
1012+        new_contents = MutableFileHandle(req.content)
1013         d = self.node.overwrite(new_contents)
1014         d.addCallback(lambda res: self.node.get_uri())
1015         return d
1016hunk ./src/allmydata/web/filenode.py 304
1017 
1018+
1019+    def update_my_contents(self, req, offset):
1020+        req.content.seek(0)
1021+        added_contents = MutableFileHandle(req.content)
1022+
1023+        d = self.node.get_best_mutable_version()
1024+        d.addCallback(lambda mv:
1025+            mv.update(added_contents, offset))
1026+        d.addCallback(lambda ignored:
1027+            self.node.get_uri())
1028+        return d
1029+
1030+
1031     def replace_my_contents_with_a_formpost(self, req):
1032         # we have a mutable file. Get the data from the formpost, and replace
1033         # the mutable file's contents with it.
1034hunk ./src/allmydata/web/filenode.py 320
1035-        new_contents = self._read_data_from_formpost(req)
1036+        new_contents = req.fields['file']
1037+        new_contents = MutableFileHandle(new_contents.file)
1038+
1039         d = self.node.overwrite(new_contents)
1040         d.addCallback(lambda res: self.node.get_uri())
1041         return d
1042hunk ./src/allmydata/web/filenode.py 327
1043 
1044-class MutableDownloadable:
1045-    #implements(IDownloadable)
1046-    def __init__(self, size, node):
1047-        self.size = size
1048-        self.node = node
1049-    def get_size(self):
1050-        return self.size
1051-    def is_mutable(self):
1052-        return True
1053-    def read(self, consumer, offset=0, size=None):
1054-        d = self.node.download_best_version()
1055-        d.addCallback(self._got_data, consumer, offset, size)
1056-        return d
1057-    def _got_data(self, contents, consumer, offset, size):
1058-        start = offset
1059-        if size is not None:
1060-            end = offset+size
1061-        else:
1062-            end = self.size
1063-        # SDMF: we can write the whole file in one big chunk
1064-        consumer.write(contents[start:end])
1065-        return consumer
1066-
1067-def makeMutableDownloadable(n):
1068-    d = defer.maybeDeferred(n.get_size_of_best_version)
1069-    d.addCallback(MutableDownloadable, n)
1070-    return d
1071 
1072 class FileDownloader(rend.Page):
1073     # since we override the rendering process (to let the tahoe Downloader
1074hunk ./src/allmydata/web/filenode.py 509
1075     data[1]['mutable'] = filenode.is_mutable()
1076     if edge_metadata is not None:
1077         data[1]['metadata'] = edge_metadata
1078+
1079+    if filenode.is_mutable() and filenode.get_version() is not None:
1080+        mutable_type = filenode.get_version()
1081+        assert mutable_type in (MDMF_VERSION, SDMF_VERSION)
1082+        if mutable_type == MDMF_VERSION:
1083+            mutable_type = "mdmf"
1084+        else:
1085+            mutable_type = "sdmf"
1086+        data[1]['mutable-type'] = mutable_type
1087+
1088     return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx)
1089 
1090 def FileURI(ctx, filenode):
1091hunk ./src/allmydata/web/root.py 15
1092 from allmydata import get_package_versions_string
1093 from allmydata import provisioning
1094 from allmydata.util import idlib, log
1095-from allmydata.interfaces import IFileNode
1096+from allmydata.interfaces import IFileNode, MDMF_VERSION, SDMF_VERSION
1097 from allmydata.web import filenode, directory, unlinked, status, operations
1098 from allmydata.web import reliability, storage
1099 from allmydata.web.common import abbreviate_size, getxmlfile, WebError, \
1100hunk ./src/allmydata/web/root.py 19
1101-     get_arg, RenderMixin, boolean_of_arg
1102+     get_arg, RenderMixin, boolean_of_arg, parse_mutable_type_arg
1103 
1104 
1105 class URIHandler(RenderMixin, rend.Page):
1106hunk ./src/allmydata/web/root.py 50
1107         if t == "":
1108             mutable = boolean_of_arg(get_arg(req, "mutable", "false").strip())
1109             if mutable:
1110-                return unlinked.PUTUnlinkedSSK(req, self.client)
1111+                version = parse_mutable_type_arg(get_arg(req, "mutable-type",
1112+                                                 None))
1113+                return unlinked.PUTUnlinkedSSK(req, self.client, version)
1114             else:
1115                 return unlinked.PUTUnlinkedCHK(req, self.client)
1116         if t == "mkdir":
1117hunk ./src/allmydata/web/root.py 70
1118         if t in ("", "upload"):
1119             mutable = bool(get_arg(req, "mutable", "").strip())
1120             if mutable:
1121-                return unlinked.POSTUnlinkedSSK(req, self.client)
1122+                version = parse_mutable_type_arg(get_arg(req, "mutable-type",
1123+                                                         None))
1124+                return unlinked.POSTUnlinkedSSK(req, self.client, version)
1125             else:
1126                 return unlinked.POSTUnlinkedCHK(req, self.client)
1127         if t == "mkdir":
1128hunk ./src/allmydata/web/root.py 324
1129 
1130     def render_upload_form(self, ctx, data):
1131         # this is a form where users can upload unlinked files
1132+        #
1133+        # for mutable files, users can choose the format by selecting
1134+        # MDMF or SDMF from a radio button. They can also configure a
1135+        # default format in tahoe.cfg, which they rightly expect us to
1136+        # obey. we convey to them that we are obeying their choice by
1137+        # ensuring that the one that they've chosen is selected in the
1138+        # interface.
1139+        if self.client.mutable_file_default == MDMF_VERSION:
1140+            mdmf_input = T.input(type='radio', name='mutable-type',
1141+                                 value='mdmf', id='mutable-type-mdmf',
1142+                                 checked='checked')
1143+        else:
1144+            mdmf_input = T.input(type='radio', name='mutable-type',
1145+                                 value='mdmf', id='mutable-type-mdmf')
1146+
1147+        if self.client.mutable_file_default == SDMF_VERSION:
1148+            sdmf_input = T.input(type='radio', name='mutable-type',
1149+                                 value='sdmf', id='mutable-type-sdmf',
1150+                                 checked='checked')
1151+        else:
1152+            sdmf_input = T.input(type='radio', name='mutable-type',
1153+                                 value='sdmf', id='mutable-type-sdmf')
1154+
1155+
1156         form = T.form(action="uri", method="post",
1157                       enctype="multipart/form-data")[
1158             T.fieldset[
1159hunk ./src/allmydata/web/root.py 356
1160                   T.input(type="file", name="file", class_="freeform-input-file")],
1161             T.input(type="hidden", name="t", value="upload"),
1162             T.div[T.input(type="checkbox", name="mutable"), T.label(for_="mutable")["Create mutable file"],
1163+                  sdmf_input, T.label(for_="mutable-type-sdmf")["SDMF"],
1164+                  mdmf_input,
1165+                  T.label(for_='mutable-type-mdmf')['MDMF (experimental)'],
1166                   " ", T.input(type="submit", value="Upload!")],
1167             ]]
1168         return T.div[form]
1169hunk ./src/allmydata/web/unlinked.py 7
1170 from twisted.internet import defer
1171 from nevow import rend, url, tags as T
1172 from allmydata.immutable.upload import FileHandle
1173+from allmydata.mutable.publish import MutableFileHandle
1174 from allmydata.web.common import getxmlfile, get_arg, boolean_of_arg, \
1175      convert_children_json, WebError
1176 from allmydata.web import status
1177hunk ./src/allmydata/web/unlinked.py 20
1178     # that fires with the URI of the new file
1179     return d
1180 
1181-def PUTUnlinkedSSK(req, client):
1182+def PUTUnlinkedSSK(req, client, version):
1183     # SDMF: files are small, and we can only upload data
1184     req.content.seek(0)
1185hunk ./src/allmydata/web/unlinked.py 23
1186-    data = req.content.read()
1187-    d = client.create_mutable_file(data)
1188+    data = MutableFileHandle(req.content)
1189+    d = client.create_mutable_file(data, version=version)
1190     d.addCallback(lambda n: n.get_uri())
1191     return d
1192 
1193hunk ./src/allmydata/web/unlinked.py 83
1194                       ["/uri/" + res.uri])
1195         return d
1196 
1197-def POSTUnlinkedSSK(req, client):
1198+def POSTUnlinkedSSK(req, client, version):
1199     # "POST /uri", to create an unlinked file.
1200     # SDMF: files are small, and we can only upload data
1201hunk ./src/allmydata/web/unlinked.py 86
1202-    contents = req.fields["file"]
1203-    contents.file.seek(0)
1204-    data = contents.file.read()
1205-    d = client.create_mutable_file(data)
1206+    contents = req.fields["file"].file
1207+    data = MutableFileHandle(contents)
1208+    d = client.create_mutable_file(data, version=version)
1209     d.addCallback(lambda n: n.get_uri())
1210     return d
1211 
1212}
1213[client.py: learn how to create different kinds of mutable files
1214Kevan Carstensen <kevan@isnotajoke.com>**20100814225711
1215 Ignore-this: 61ff665bc050cba5f58bf2ed779d692b
1216] {
1217hunk ./src/allmydata/client.py 25
1218 from allmydata.util.time_format import parse_duration, parse_date
1219 from allmydata.stats import StatsProvider
1220 from allmydata.history import History
1221-from allmydata.interfaces import IStatsProducer, RIStubClient
1222+from allmydata.interfaces import IStatsProducer, RIStubClient, \
1223+                                 SDMF_VERSION, MDMF_VERSION
1224 from allmydata.nodemaker import NodeMaker
1225 
1226 
1227hunk ./src/allmydata/client.py 357
1228                                    self.terminator,
1229                                    self.get_encoding_parameters(),
1230                                    self._key_generator)
1231+        default = self.get_config("client", "mutable.format", default="sdmf")
1232+        if default == "mdmf":
1233+            self.mutable_file_default = MDMF_VERSION
1234+        else:
1235+            self.mutable_file_default = SDMF_VERSION
1236 
1237     def get_history(self):
1238         return self.history
1239hunk ./src/allmydata/client.py 500
1240     def create_immutable_dirnode(self, children, convergence=None):
1241         return self.nodemaker.create_immutable_directory(children, convergence)
1242 
1243-    def create_mutable_file(self, contents=None, keysize=None):
1244-        return self.nodemaker.create_mutable_file(contents, keysize)
1245+    def create_mutable_file(self, contents=None, keysize=None, version=None):
1246+        if not version:
1247+            version = self.mutable_file_default
1248+        return self.nodemaker.create_mutable_file(contents, keysize,
1249+                                                  version=version)
1250 
1251     def upload(self, uploadable):
1252         uploader = self.getServiceNamed("uploader")
1253}
1254[mutable/checker.py and mutable/repair.py: Modify checker and repairer to work with MDMF
1255Kevan Carstensen <kevan@isnotajoke.com>**20100819003216
1256 Ignore-this: d3bd3260742be8964877f0a53543b01b
1257 
1258 The checker and repairer required minimal changes to work with the MDMF
1259 modifications made elsewhere. The checker duplicated a lot of the code
1260 that was already in the downloader, so I modified the downloader
1261 slightly to expose this functionality to the checker and removed the
1262 duplicated code. The repairer only required a minor change to deal with
1263 data representation.
1264] {
1265hunk ./src/allmydata/mutable/checker.py 2
1266 
1267-from twisted.internet import defer
1268-from twisted.python import failure
1269-from allmydata import hashtree
1270 from allmydata.uri import from_string
1271hunk ./src/allmydata/mutable/checker.py 3
1272-from allmydata.util import hashutil, base32, idlib, log
1273+from allmydata.util import base32, idlib, log
1274 from allmydata.check_results import CheckAndRepairResults, CheckResults
1275 
1276 from allmydata.mutable.common import MODE_CHECK, CorruptShareError
1277hunk ./src/allmydata/mutable/checker.py 8
1278 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
1279-from allmydata.mutable.layout import unpack_share, SIGNED_PREFIX_LENGTH
1280+from allmydata.mutable.retrieve import Retrieve # for verifying
1281 
1282 class MutableChecker:
1283 
1284hunk ./src/allmydata/mutable/checker.py 25
1285 
1286     def check(self, verify=False, add_lease=False):
1287         servermap = ServerMap()
1288+        # Updating the servermap in MODE_CHECK will stand a good chance
1289+        # of finding all of the shares, and getting a good idea of
1290+        # recoverability, etc, without verifying.
1291         u = ServermapUpdater(self._node, self._storage_broker, self._monitor,
1292                              servermap, MODE_CHECK, add_lease=add_lease)
1293         if self._history:
1294hunk ./src/allmydata/mutable/checker.py 51
1295         if num_recoverable:
1296             self.best_version = servermap.best_recoverable_version()
1297 
1298+        # The file is unhealthy and needs to be repaired if:
1299+        # - There are unrecoverable versions.
1300         if servermap.unrecoverable_versions():
1301             self.need_repair = True
1302hunk ./src/allmydata/mutable/checker.py 55
1303+        # - There isn't a recoverable version.
1304         if num_recoverable != 1:
1305             self.need_repair = True
1306hunk ./src/allmydata/mutable/checker.py 58
1307+        # - The best recoverable version is missing some shares.
1308         if self.best_version:
1309             available_shares = servermap.shares_available()
1310             (num_distinct_shares, k, N) = available_shares[self.best_version]
1311hunk ./src/allmydata/mutable/checker.py 69
1312 
1313     def _verify_all_shares(self, servermap):
1314         # read every byte of each share
1315+        #
1316+        # This logic is going to be very nearly the same as the
1317+        # downloader. I bet we could pass the downloader a flag that
1318+        # makes it do this, and piggyback onto that instead of
1319+        # duplicating a bunch of code.
1320+        #
1321+        # Like:
1322+        #  r = Retrieve(blah, blah, blah, verify=True)
1323+        #  d = r.download()
1324+        #  (wait, wait, wait, d.callback)
1325+        # 
1326+        #  Then, when it has finished, we can check the servermap (which
1327+        #  we provided to Retrieve) to figure out which shares are bad,
1328+        #  since the Retrieve process will have updated the servermap as
1329+        #  it went along.
1330+        #
1331+        #  By passing the verify=True flag to the constructor, we are
1332+        #  telling the downloader a few things.
1333+        #
1334+        #  1. It needs to download all N shares, not just K shares.
1335+        #  2. It doesn't need to decrypt or decode the shares, only
1336+        #     verify them.
1337         if not self.best_version:
1338             return
1339hunk ./src/allmydata/mutable/checker.py 93
1340-        versionmap = servermap.make_versionmap()
1341-        shares = versionmap[self.best_version]
1342-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
1343-         offsets_tuple) = self.best_version
1344-        offsets = dict(offsets_tuple)
1345-        readv = [ (0, offsets["EOF"]) ]
1346-        dl = []
1347-        for (shnum, peerid, timestamp) in shares:
1348-            ss = servermap.connections[peerid]
1349-            d = self._do_read(ss, peerid, self._storage_index, [shnum], readv)
1350-            d.addCallback(self._got_answer, peerid, servermap)
1351-            dl.append(d)
1352-        return defer.DeferredList(dl, fireOnOneErrback=True, consumeErrors=True)
1353 
1354hunk ./src/allmydata/mutable/checker.py 94
1355-    def _do_read(self, ss, peerid, storage_index, shnums, readv):
1356-        # isolate the callRemote to a separate method, so tests can subclass
1357-        # Publish and override it
1358-        d = ss.callRemote("slot_readv", storage_index, shnums, readv)
1359+        r = Retrieve(self._node, servermap, self.best_version, verify=True)
1360+        d = r.download()
1361+        d.addCallback(self._process_bad_shares)
1362         return d
1363 
1364hunk ./src/allmydata/mutable/checker.py 99
1365-    def _got_answer(self, datavs, peerid, servermap):
1366-        for shnum,datav in datavs.items():
1367-            data = datav[0]
1368-            try:
1369-                self._got_results_one_share(shnum, peerid, data)
1370-            except CorruptShareError:
1371-                f = failure.Failure()
1372-                self.need_repair = True
1373-                self.bad_shares.append( (peerid, shnum, f) )
1374-                prefix = data[:SIGNED_PREFIX_LENGTH]
1375-                servermap.mark_bad_share(peerid, shnum, prefix)
1376-                ss = servermap.connections[peerid]
1377-                self.notify_server_corruption(ss, shnum, str(f.value))
1378-
1379-    def check_prefix(self, peerid, shnum, data):
1380-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
1381-         offsets_tuple) = self.best_version
1382-        got_prefix = data[:SIGNED_PREFIX_LENGTH]
1383-        if got_prefix != prefix:
1384-            raise CorruptShareError(peerid, shnum,
1385-                                    "prefix mismatch: share changed while we were reading it")
1386-
1387-    def _got_results_one_share(self, shnum, peerid, data):
1388-        self.check_prefix(peerid, shnum, data)
1389-
1390-        # the [seqnum:signature] pieces are validated by _compare_prefix,
1391-        # which checks their signature against the pubkey known to be
1392-        # associated with this file.
1393 
1394hunk ./src/allmydata/mutable/checker.py 100
1395-        (seqnum, root_hash, IV, k, N, segsize, datalen, pubkey, signature,
1396-         share_hash_chain, block_hash_tree, share_data,
1397-         enc_privkey) = unpack_share(data)
1398-
1399-        # validate [share_hash_chain,block_hash_tree,share_data]
1400-
1401-        leaves = [hashutil.block_hash(share_data)]
1402-        t = hashtree.HashTree(leaves)
1403-        if list(t) != block_hash_tree:
1404-            raise CorruptShareError(peerid, shnum, "block hash tree failure")
1405-        share_hash_leaf = t[0]
1406-        t2 = hashtree.IncompleteHashTree(N)
1407-        # root_hash was checked by the signature
1408-        t2.set_hashes({0: root_hash})
1409-        try:
1410-            t2.set_hashes(hashes=share_hash_chain,
1411-                          leaves={shnum: share_hash_leaf})
1412-        except (hashtree.BadHashError, hashtree.NotEnoughHashesError,
1413-                IndexError), e:
1414-            msg = "corrupt hashes: %s" % (e,)
1415-            raise CorruptShareError(peerid, shnum, msg)
1416-
1417-        # validate enc_privkey: only possible if we have a write-cap
1418-        if not self._node.is_readonly():
1419-            alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
1420-            alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
1421-            if alleged_writekey != self._node.get_writekey():
1422-                raise CorruptShareError(peerid, shnum, "invalid privkey")
1423+    def _process_bad_shares(self, bad_shares):
1424+        if bad_shares:
1425+            self.need_repair = True
1426+        self.bad_shares = bad_shares
1427 
1428hunk ./src/allmydata/mutable/checker.py 105
1429-    def notify_server_corruption(self, ss, shnum, reason):
1430-        ss.callRemoteOnly("advise_corrupt_share",
1431-                          "mutable", self._storage_index, shnum, reason)
1432 
1433     def _count_shares(self, smap, version):
1434         available_shares = smap.shares_available()
1435hunk ./src/allmydata/mutable/repairer.py 5
1436 from zope.interface import implements
1437 from twisted.internet import defer
1438 from allmydata.interfaces import IRepairResults, ICheckResults
1439+from allmydata.mutable.publish import MutableData
1440 
1441 class RepairResults:
1442     implements(IRepairResults)
1443hunk ./src/allmydata/mutable/repairer.py 108
1444             raise RepairRequiresWritecapError("Sorry, repair currently requires a writecap, to set the write-enabler properly.")
1445 
1446         d = self.node.download_version(smap, best_version, fetch_privkey=True)
1447+        d.addCallback(lambda data:
1448+            MutableData(data))
1449         d.addCallback(self.node.upload, smap)
1450         d.addCallback(self.get_results, smap)
1451         return d
1452}
1453[mutable/filenode.py: add versions and partial-file updates to the mutable file node
1454Kevan Carstensen <kevan@isnotajoke.com>**20100819003231
1455 Ignore-this: b7b5434201fdb9b48f902d7ab25ef45c
1456 
1457 One of the goals of MDMF as a GSoC project is to lay the groundwork for
1458 LDMF, a format that will allow Tahoe-LAFS to deal with and encourage
1459 multiple versions of a single cap on the grid. In line with this, there
1460 is a now a distinction between an overriding mutable file (which can be
1461 thought to correspond to the cap/unique identifier for that mutable
1462 file) and versions of the mutable file (which we can download, update,
1463 and so on). All download, upload, and modification operations end up
1464 happening on a particular version of a mutable file, but there are
1465 shortcut methods on the object representing the overriding mutable file
1466 that perform these operations on the best version of the mutable file
1467 (which is what code should be doing until we have LDMF and better
1468 support for other paradigms).
1469 
1470 Another goal of MDMF was to take advantage of segmentation to give
1471 callers more efficient partial file updates or appends. This patch
1472 implements methods that do that, too.
1473 
1474] {
1475hunk ./src/allmydata/mutable/filenode.py 7
1476 from zope.interface import implements
1477 from twisted.internet import defer, reactor
1478 from foolscap.api import eventually
1479-from allmydata.interfaces import IMutableFileNode, \
1480-     ICheckable, ICheckResults, NotEnoughSharesError
1481-from allmydata.util import hashutil, log
1482+from allmydata.interfaces import IMutableFileNode, ICheckable, ICheckResults, \
1483+     NotEnoughSharesError, MDMF_VERSION, SDMF_VERSION, IMutableUploadable, \
1484+     IMutableFileVersion, IWritable
1485+from allmydata.util import hashutil, log, consumer, deferredutil, mathutil
1486 from allmydata.util.assertutil import precondition
1487 from allmydata.uri import WriteableSSKFileURI, ReadonlySSKFileURI
1488 from allmydata.monitor import Monitor
1489hunk ./src/allmydata/mutable/filenode.py 16
1490 from pycryptopp.cipher.aes import AES
1491 
1492-from allmydata.mutable.publish import Publish
1493+from allmydata.mutable.publish import Publish, MutableData,\
1494+                                      DEFAULT_MAX_SEGMENT_SIZE, \
1495+                                      TransformingUploadable
1496 from allmydata.mutable.common import MODE_READ, MODE_WRITE, UnrecoverableFileError, \
1497      ResponseCache, UncoordinatedWriteError
1498 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
1499hunk ./src/allmydata/mutable/filenode.py 70
1500         self._sharemap = {} # known shares, shnum-to-[nodeids]
1501         self._cache = ResponseCache()
1502         self._most_recent_size = None
1503+        # filled in after __init__ if we're being created for the first time;
1504+        # filled in by the servermap updater before publishing, otherwise.
1505+        # set to this default value in case neither of those things happen,
1506+        # or in case the servermap can't find any shares to tell us what
1507+        # to publish as.
1508+        # TODO: Set this back to None, and find out why the tests fail
1509+        #       with it set to None.
1510+        self._protocol_version = None
1511 
1512         # all users of this MutableFileNode go through the serializer. This
1513         # takes advantage of the fact that Deferreds discard the callbacks
1514hunk ./src/allmydata/mutable/filenode.py 134
1515         return self._upload(initial_contents, None)
1516 
1517     def _get_initial_contents(self, contents):
1518-        if isinstance(contents, str):
1519-            return contents
1520         if contents is None:
1521hunk ./src/allmydata/mutable/filenode.py 135
1522-            return ""
1523+            return MutableData("")
1524+
1525+        if IMutableUploadable.providedBy(contents):
1526+            return contents
1527+
1528         assert callable(contents), "%s should be callable, not %s" % \
1529                (contents, type(contents))
1530         return contents(self)
1531hunk ./src/allmydata/mutable/filenode.py 209
1532 
1533     def get_size(self):
1534         return self._most_recent_size
1535+
1536     def get_current_size(self):
1537         d = self.get_size_of_best_version()
1538         d.addCallback(self._stash_size)
1539hunk ./src/allmydata/mutable/filenode.py 214
1540         return d
1541+
1542     def _stash_size(self, size):
1543         self._most_recent_size = size
1544         return size
1545hunk ./src/allmydata/mutable/filenode.py 273
1546             return cmp(self.__class__, them.__class__)
1547         return cmp(self._uri, them._uri)
1548 
1549-    def _do_serialized(self, cb, *args, **kwargs):
1550-        # note: to avoid deadlock, this callable is *not* allowed to invoke
1551-        # other serialized methods within this (or any other)
1552-        # MutableFileNode. The callable should be a bound method of this same
1553-        # MFN instance.
1554-        d = defer.Deferred()
1555-        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
1556-        # we need to put off d.callback until this Deferred is finished being
1557-        # processed. Otherwise the caller's subsequent activities (like,
1558-        # doing other things with this node) can cause reentrancy problems in
1559-        # the Deferred code itself
1560-        self._serializer.addBoth(lambda res: eventually(d.callback, res))
1561-        # add a log.err just in case something really weird happens, because
1562-        # self._serializer stays around forever, therefore we won't see the
1563-        # usual Unhandled Error in Deferred that would give us a hint.
1564-        self._serializer.addErrback(log.err)
1565-        return d
1566 
1567     #################################
1568     # ICheckable
1569hunk ./src/allmydata/mutable/filenode.py 298
1570 
1571 
1572     #################################
1573-    # IMutableFileNode
1574+    # IFileNode
1575+
1576+    def get_best_readable_version(self):
1577+        """
1578+        I return a Deferred that fires with a MutableFileVersion
1579+        representing the best readable version of the file that I
1580+        represent
1581+        """
1582+        return self.get_readable_version()
1583+
1584+
1585+    def get_readable_version(self, servermap=None, version=None):
1586+        """
1587+        I return a Deferred that fires with an MutableFileVersion for my
1588+        version argument, if there is a recoverable file of that version
1589+        on the grid. If there is no recoverable version, I fire with an
1590+        UnrecoverableFileError.
1591+
1592+        If a servermap is provided, I look in there for the requested
1593+        version. If no servermap is provided, I create and update a new
1594+        one.
1595+
1596+        If no version is provided, then I return a MutableFileVersion
1597+        representing the best recoverable version of the file.
1598+        """
1599+        d = self._get_version_from_servermap(MODE_READ, servermap, version)
1600+        def _build_version((servermap, their_version)):
1601+            assert their_version in servermap.recoverable_versions()
1602+            assert their_version in servermap.make_versionmap()
1603+
1604+            mfv = MutableFileVersion(self,
1605+                                     servermap,
1606+                                     their_version,
1607+                                     self._storage_index,
1608+                                     self._storage_broker,
1609+                                     self._readkey,
1610+                                     history=self._history)
1611+            assert mfv.is_readonly()
1612+            # our caller can use this to download the contents of the
1613+            # mutable file.
1614+            return mfv
1615+        return d.addCallback(_build_version)
1616+
1617+
1618+    def _get_version_from_servermap(self,
1619+                                    mode,
1620+                                    servermap=None,
1621+                                    version=None):
1622+        """
1623+        I return a Deferred that fires with (servermap, version).
1624+
1625+        This function performs validation and a servermap update. If it
1626+        returns (servermap, version), the caller can assume that:
1627+            - servermap was last updated in mode.
1628+            - version is recoverable, and corresponds to the servermap.
1629+
1630+        If version and servermap are provided to me, I will validate
1631+        that version exists in the servermap, and that the servermap was
1632+        updated correctly.
1633+
1634+        If version is not provided, but servermap is, I will validate
1635+        the servermap and return the best recoverable version that I can
1636+        find in the servermap.
1637+
1638+        If the version is provided but the servermap isn't, I will
1639+        obtain a servermap that has been updated in the correct mode and
1640+        validate that version is found and recoverable.
1641+
1642+        If neither servermap nor version are provided, I will obtain a
1643+        servermap updated in the correct mode, and return the best
1644+        recoverable version that I can find in there.
1645+        """
1646+        # XXX: wording ^^^^
1647+        if servermap and servermap.last_update_mode == mode:
1648+            d = defer.succeed(servermap)
1649+        else:
1650+            d = self._get_servermap(mode)
1651+
1652+        def _get_version(servermap, v):
1653+            if v and v not in servermap.recoverable_versions():
1654+                v = None
1655+            elif not v:
1656+                v = servermap.best_recoverable_version()
1657+            if not v:
1658+                raise UnrecoverableFileError("no recoverable versions")
1659+
1660+            return (servermap, v)
1661+        return d.addCallback(_get_version, version)
1662+
1663 
1664     def download_best_version(self):
1665hunk ./src/allmydata/mutable/filenode.py 389
1666+        """
1667+        I return a Deferred that fires with the contents of the best
1668+        version of this mutable file.
1669+        """
1670         return self._do_serialized(self._download_best_version)
1671hunk ./src/allmydata/mutable/filenode.py 394
1672+
1673+
1674     def _download_best_version(self):
1675hunk ./src/allmydata/mutable/filenode.py 397
1676-        servermap = ServerMap()
1677-        d = self._try_once_to_download_best_version(servermap, MODE_READ)
1678-        def _maybe_retry(f):
1679-            f.trap(NotEnoughSharesError)
1680-            # the download is worth retrying once. Make sure to use the
1681-            # old servermap, since it is what remembers the bad shares,
1682-            # but use MODE_WRITE to make it look for even more shares.
1683-            # TODO: consider allowing this to retry multiple times.. this
1684-            # approach will let us tolerate about 8 bad shares, I think.
1685-            return self._try_once_to_download_best_version(servermap,
1686-                                                           MODE_WRITE)
1687+        """
1688+        I am the serialized sibling of download_best_version.
1689+        """
1690+        d = self.get_best_readable_version()
1691+        d.addCallback(self._record_size)
1692+        d.addCallback(lambda version: version.download_to_data())
1693+
1694+        # It is possible that the download will fail because there
1695+        # aren't enough shares to be had. If so, we will try again after
1696+        # updating the servermap in MODE_WRITE, which may find more
1697+        # shares than updating in MODE_READ, as we just did. We can do
1698+        # this by getting the best mutable version and downloading from
1699+        # that -- the best mutable version will be a MutableFileVersion
1700+        # with a servermap that was last updated in MODE_WRITE, as we
1701+        # want. If this fails, then we give up.
1702+        def _maybe_retry(failure):
1703+            failure.trap(NotEnoughSharesError)
1704+
1705+            d = self.get_best_mutable_version()
1706+            d.addCallback(self._record_size)
1707+            d.addCallback(lambda version: version.download_to_data())
1708+            return d
1709+
1710         d.addErrback(_maybe_retry)
1711         return d
1712hunk ./src/allmydata/mutable/filenode.py 422
1713-    def _try_once_to_download_best_version(self, servermap, mode):
1714-        d = self._update_servermap(servermap, mode)
1715-        d.addCallback(self._once_updated_download_best_version, servermap)
1716-        return d
1717-    def _once_updated_download_best_version(self, ignored, servermap):
1718-        goal = servermap.best_recoverable_version()
1719-        if not goal:
1720-            raise UnrecoverableFileError("no recoverable versions")
1721-        return self._try_once_to_download_version(servermap, goal)
1722+
1723+
1724+    def _record_size(self, mfv):
1725+        """
1726+        I record the size of a mutable file version.
1727+        """
1728+        self._most_recent_size = mfv.get_size()
1729+        return mfv
1730+
1731 
1732     def get_size_of_best_version(self):
1733hunk ./src/allmydata/mutable/filenode.py 433
1734-        d = self.get_servermap(MODE_READ)
1735-        def _got_servermap(smap):
1736-            ver = smap.best_recoverable_version()
1737-            if not ver:
1738-                raise UnrecoverableFileError("no recoverable version")
1739-            return smap.size_of_version(ver)
1740-        d.addCallback(_got_servermap)
1741-        return d
1742+        """
1743+        I return the size of the best version of this mutable file.
1744 
1745hunk ./src/allmydata/mutable/filenode.py 436
1746+        This is equivalent to calling get_size() on the result of
1747+        get_best_readable_version().
1748+        """
1749+        d = self.get_best_readable_version()
1750+        return d.addCallback(lambda mfv: mfv.get_size())
1751+
1752+
1753+    #################################
1754+    # IMutableFileNode
1755+
1756+    def get_best_mutable_version(self, servermap=None):
1757+        """
1758+        I return a Deferred that fires with a MutableFileVersion
1759+        representing the best readable version of the file that I
1760+        represent. I am like get_best_readable_version, except that I
1761+        will try to make a writable version if I can.
1762+        """
1763+        return self.get_mutable_version(servermap=servermap)
1764+
1765+
1766+    def get_mutable_version(self, servermap=None, version=None):
1767+        """
1768+        I return a version of this mutable file. I return a Deferred
1769+        that fires with a MutableFileVersion
1770+
1771+        If version is provided, the Deferred will fire with a
1772+        MutableFileVersion initailized with that version. Otherwise, it
1773+        will fire with the best version that I can recover.
1774+
1775+        If servermap is provided, I will use that to find versions
1776+        instead of performing my own servermap update.
1777+        """
1778+        if self.is_readonly():
1779+            return self.get_readable_version(servermap=servermap,
1780+                                             version=version)
1781+
1782+        # get_mutable_version => write intent, so we require that the
1783+        # servermap is updated in MODE_WRITE
1784+        d = self._get_version_from_servermap(MODE_WRITE, servermap, version)
1785+        def _build_version((servermap, smap_version)):
1786+            # these should have been set by the servermap update.
1787+            assert self._secret_holder
1788+            assert self._writekey
1789+
1790+            mfv = MutableFileVersion(self,
1791+                                     servermap,
1792+                                     smap_version,
1793+                                     self._storage_index,
1794+                                     self._storage_broker,
1795+                                     self._readkey,
1796+                                     self._writekey,
1797+                                     self._secret_holder,
1798+                                     history=self._history)
1799+            assert not mfv.is_readonly()
1800+            return mfv
1801+
1802+        return d.addCallback(_build_version)
1803+
1804+
1805+    # XXX: I'm uncomfortable with the difference between upload and
1806+    #      overwrite, which, FWICT, is basically that you don't have to
1807+    #      do a servermap update before you overwrite. We split them up
1808+    #      that way anyway, so I guess there's no real difficulty in
1809+    #      offering both ways to callers, but it also makes the
1810+    #      public-facing API cluttery, and makes it hard to discern the
1811+    #      right way of doing things.
1812+
1813+    # In general, we leave it to callers to ensure that they aren't
1814+    # going to cause UncoordinatedWriteErrors when working with
1815+    # MutableFileVersions. We know that the next three operations
1816+    # (upload, overwrite, and modify) will all operate on the same
1817+    # version, so we say that only one of them can be going on at once,
1818+    # and serialize them to ensure that that actually happens, since as
1819+    # the caller in this situation it is our job to do that.
1820     def overwrite(self, new_contents):
1821hunk ./src/allmydata/mutable/filenode.py 511
1822+        """
1823+        I overwrite the contents of the best recoverable version of this
1824+        mutable file with new_contents. This is equivalent to calling
1825+        overwrite on the result of get_best_mutable_version with
1826+        new_contents as an argument. I return a Deferred that eventually
1827+        fires with the results of my replacement process.
1828+        """
1829         return self._do_serialized(self._overwrite, new_contents)
1830hunk ./src/allmydata/mutable/filenode.py 519
1831+
1832+
1833     def _overwrite(self, new_contents):
1834hunk ./src/allmydata/mutable/filenode.py 522
1835+        """
1836+        I am the serialized sibling of overwrite.
1837+        """
1838+        d = self.get_best_mutable_version()
1839+        d.addCallback(lambda mfv: mfv.overwrite(new_contents))
1840+        d.addCallback(self._did_upload, new_contents.get_size())
1841+        return d
1842+
1843+
1844+
1845+    def upload(self, new_contents, servermap):
1846+        """
1847+        I overwrite the contents of the best recoverable version of this
1848+        mutable file with new_contents, using servermap instead of
1849+        creating/updating our own servermap. I return a Deferred that
1850+        fires with the results of my upload.
1851+        """
1852+        return self._do_serialized(self._upload, new_contents, servermap)
1853+
1854+
1855+    def modify(self, modifier, backoffer=None):
1856+        """
1857+        I modify the contents of the best recoverable version of this
1858+        mutable file with the modifier. This is equivalent to calling
1859+        modify on the result of get_best_mutable_version. I return a
1860+        Deferred that eventually fires with an UploadResults instance
1861+        describing this process.
1862+        """
1863+        return self._do_serialized(self._modify, modifier, backoffer)
1864+
1865+
1866+    def _modify(self, modifier, backoffer):
1867+        """
1868+        I am the serialized sibling of modify.
1869+        """
1870+        d = self.get_best_mutable_version()
1871+        d.addCallback(lambda mfv: mfv.modify(modifier, backoffer))
1872+        return d
1873+
1874+
1875+    def download_version(self, servermap, version, fetch_privkey=False):
1876+        """
1877+        Download the specified version of this mutable file. I return a
1878+        Deferred that fires with the contents of the specified version
1879+        as a bytestring, or errbacks if the file is not recoverable.
1880+        """
1881+        d = self.get_readable_version(servermap, version)
1882+        return d.addCallback(lambda mfv: mfv.download_to_data(fetch_privkey))
1883+
1884+
1885+    def get_servermap(self, mode):
1886+        """
1887+        I return a servermap that has been updated in mode.
1888+
1889+        mode should be one of MODE_READ, MODE_WRITE, MODE_CHECK or
1890+        MODE_ANYTHING. See servermap.py for more on what these mean.
1891+        """
1892+        return self._do_serialized(self._get_servermap, mode)
1893+
1894+
1895+    def _get_servermap(self, mode):
1896+        """
1897+        I am a serialized twin to get_servermap.
1898+        """
1899         servermap = ServerMap()
1900hunk ./src/allmydata/mutable/filenode.py 587
1901-        d = self._update_servermap(servermap, mode=MODE_WRITE)
1902-        d.addCallback(lambda ignored: self._upload(new_contents, servermap))
1903+        d = self._update_servermap(servermap, mode)
1904+        # The servermap will tell us about the most recent size of the
1905+        # file, so we may as well set that so that callers might get
1906+        # more data about us.
1907+        if not self._most_recent_size:
1908+            d.addCallback(self._get_size_from_servermap)
1909+        return d
1910+
1911+
1912+    def _get_size_from_servermap(self, servermap):
1913+        """
1914+        I extract the size of the best version of this file and record
1915+        it in self._most_recent_size. I return the servermap that I was
1916+        given.
1917+        """
1918+        if servermap.recoverable_versions():
1919+            v = servermap.best_recoverable_version()
1920+            size = v[4] # verinfo[4] == size
1921+            self._most_recent_size = size
1922+        return servermap
1923+
1924+
1925+    def _update_servermap(self, servermap, mode):
1926+        u = ServermapUpdater(self, self._storage_broker, Monitor(), servermap,
1927+                             mode)
1928+        if self._history:
1929+            self._history.notify_mapupdate(u.get_status())
1930+        return u.update()
1931+
1932+
1933+    def set_version(self, version):
1934+        # I can be set in two ways:
1935+        #  1. When the node is created.
1936+        #  2. (for an existing share) when the Servermap is updated
1937+        #     before I am read.
1938+        assert version in (MDMF_VERSION, SDMF_VERSION)
1939+        self._protocol_version = version
1940+
1941+
1942+    def get_version(self):
1943+        return self._protocol_version
1944+
1945+
1946+    def _do_serialized(self, cb, *args, **kwargs):
1947+        # note: to avoid deadlock, this callable is *not* allowed to invoke
1948+        # other serialized methods within this (or any other)
1949+        # MutableFileNode. The callable should be a bound method of this same
1950+        # MFN instance.
1951+        d = defer.Deferred()
1952+        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
1953+        # we need to put off d.callback until this Deferred is finished being
1954+        # processed. Otherwise the caller's subsequent activities (like,
1955+        # doing other things with this node) can cause reentrancy problems in
1956+        # the Deferred code itself
1957+        self._serializer.addBoth(lambda res: eventually(d.callback, res))
1958+        # add a log.err just in case something really weird happens, because
1959+        # self._serializer stays around forever, therefore we won't see the
1960+        # usual Unhandled Error in Deferred that would give us a hint.
1961+        self._serializer.addErrback(log.err)
1962         return d
1963 
1964 
1965hunk ./src/allmydata/mutable/filenode.py 649
1966+    def _upload(self, new_contents, servermap):
1967+        """
1968+        A MutableFileNode still has to have some way of getting
1969+        published initially, which is what I am here for. After that,
1970+        all publishing, updating, modifying and so on happens through
1971+        MutableFileVersions.
1972+        """
1973+        assert self._pubkey, "update_servermap must be called before publish"
1974+
1975+        p = Publish(self, self._storage_broker, servermap)
1976+        if self._history:
1977+            self._history.notify_publish(p.get_status(),
1978+                                         new_contents.get_size())
1979+        d = p.publish(new_contents)
1980+        d.addCallback(self._did_upload, new_contents.get_size())
1981+        return d
1982+
1983+
1984+    def _did_upload(self, res, size):
1985+        self._most_recent_size = size
1986+        return res
1987+
1988+
1989+class MutableFileVersion:
1990+    """
1991+    I represent a specific version (most likely the best version) of a
1992+    mutable file.
1993+
1994+    Since I implement IReadable, instances which hold a
1995+    reference to an instance of me are guaranteed the ability (absent
1996+    connection difficulties or unrecoverable versions) to read the file
1997+    that I represent. Depending on whether I was initialized with a
1998+    write capability or not, I may also provide callers the ability to
1999+    overwrite or modify the contents of the mutable file that I
2000+    reference.
2001+    """
2002+    implements(IMutableFileVersion, IWritable)
2003+
2004+    def __init__(self,
2005+                 node,
2006+                 servermap,
2007+                 version,
2008+                 storage_index,
2009+                 storage_broker,
2010+                 readcap,
2011+                 writekey=None,
2012+                 write_secrets=None,
2013+                 history=None):
2014+
2015+        self._node = node
2016+        self._servermap = servermap
2017+        self._version = version
2018+        self._storage_index = storage_index
2019+        self._write_secrets = write_secrets
2020+        self._history = history
2021+        self._storage_broker = storage_broker
2022+
2023+        #assert isinstance(readcap, IURI)
2024+        self._readcap = readcap
2025+
2026+        self._writekey = writekey
2027+        self._serializer = defer.succeed(None)
2028+
2029+
2030+    def get_sequence_number(self):
2031+        """
2032+        Get the sequence number of the mutable version that I represent.
2033+        """
2034+        return self._version[0] # verinfo[0] == the sequence number
2035+
2036+
2037+    # TODO: Terminology?
2038+    def get_writekey(self):
2039+        """
2040+        I return a writekey or None if I don't have a writekey.
2041+        """
2042+        return self._writekey
2043+
2044+
2045+    def overwrite(self, new_contents):
2046+        """
2047+        I overwrite the contents of this mutable file version with the
2048+        data in new_contents.
2049+        """
2050+        assert not self.is_readonly()
2051+
2052+        return self._do_serialized(self._overwrite, new_contents)
2053+
2054+
2055+    def _overwrite(self, new_contents):
2056+        assert IMutableUploadable.providedBy(new_contents)
2057+        assert self._servermap.last_update_mode == MODE_WRITE
2058+
2059+        return self._upload(new_contents)
2060+
2061+
2062     def modify(self, modifier, backoffer=None):
2063         """I use a modifier callback to apply a change to the mutable file.
2064         I implement the following pseudocode::
2065hunk ./src/allmydata/mutable/filenode.py 785
2066         backoffer should not invoke any methods on this MutableFileNode
2067         instance, and it needs to be highly conscious of deadlock issues.
2068         """
2069+        assert not self.is_readonly()
2070+
2071         return self._do_serialized(self._modify, modifier, backoffer)
2072hunk ./src/allmydata/mutable/filenode.py 788
2073+
2074+
2075     def _modify(self, modifier, backoffer):
2076hunk ./src/allmydata/mutable/filenode.py 791
2077-        servermap = ServerMap()
2078         if backoffer is None:
2079             backoffer = BackoffAgent().delay
2080hunk ./src/allmydata/mutable/filenode.py 793
2081-        return self._modify_and_retry(servermap, modifier, backoffer, True)
2082-    def _modify_and_retry(self, servermap, modifier, backoffer, first_time):
2083-        d = self._modify_once(servermap, modifier, first_time)
2084+        return self._modify_and_retry(modifier, backoffer, True)
2085+
2086+
2087+    def _modify_and_retry(self, modifier, backoffer, first_time):
2088+        """
2089+        I try to apply modifier to the contents of this version of the
2090+        mutable file. If I succeed, I return an UploadResults instance
2091+        describing my success. If I fail, I try again after waiting for
2092+        a little bit.
2093+        """
2094+        log.msg("doing modify")
2095+        d = self._modify_once(modifier, first_time)
2096         def _retry(f):
2097             f.trap(UncoordinatedWriteError)
2098             d2 = defer.maybeDeferred(backoffer, self, f)
2099hunk ./src/allmydata/mutable/filenode.py 809
2100             d2.addCallback(lambda ignored:
2101-                           self._modify_and_retry(servermap, modifier,
2102+                           self._modify_and_retry(modifier,
2103                                                   backoffer, False))
2104             return d2
2105         d.addErrback(_retry)
2106hunk ./src/allmydata/mutable/filenode.py 814
2107         return d
2108-    def _modify_once(self, servermap, modifier, first_time):
2109-        d = self._update_servermap(servermap, MODE_WRITE)
2110-        d.addCallback(self._once_updated_download_best_version, servermap)
2111+
2112+
2113+    def _modify_once(self, modifier, first_time):
2114+        """
2115+        I attempt to apply a modifier to the contents of the mutable
2116+        file.
2117+        """
2118+        # XXX: This is wrong -- we could get more servers if we updated
2119+        # in MODE_ANYTHING and possibly MODE_CHECK. Probably we want to
2120+        # assert that the last update wasn't MODE_READ
2121+        assert self._servermap.last_update_mode == MODE_WRITE
2122+
2123+        # download_to_data is serialized, so we have to call this to
2124+        # avoid deadlock.
2125+        d = self._try_to_download_data()
2126         def _apply(old_contents):
2127hunk ./src/allmydata/mutable/filenode.py 830
2128-            new_contents = modifier(old_contents, servermap, first_time)
2129+            new_contents = modifier(old_contents, self._servermap, first_time)
2130+            precondition((isinstance(new_contents, str) or
2131+                          new_contents is None),
2132+                         "Modifier function must return a string "
2133+                         "or None")
2134+
2135             if new_contents is None or new_contents == old_contents:
2136hunk ./src/allmydata/mutable/filenode.py 837
2137+                log.msg("no changes")
2138                 # no changes need to be made
2139                 if first_time:
2140                     return
2141hunk ./src/allmydata/mutable/filenode.py 845
2142                 # recovery when it observes UCWE, we need to do a second
2143                 # publish. See #551 for details. We'll basically loop until
2144                 # we managed an uncontested publish.
2145-                new_contents = old_contents
2146-            precondition(isinstance(new_contents, str),
2147-                         "Modifier function must return a string or None")
2148-            return self._upload(new_contents, servermap)
2149+                old_uploadable = MutableData(old_contents)
2150+                new_contents = old_uploadable
2151+            else:
2152+                new_contents = MutableData(new_contents)
2153+
2154+            return self._upload(new_contents)
2155         d.addCallback(_apply)
2156         return d
2157 
2158hunk ./src/allmydata/mutable/filenode.py 854
2159-    def get_servermap(self, mode):
2160-        return self._do_serialized(self._get_servermap, mode)
2161-    def _get_servermap(self, mode):
2162-        servermap = ServerMap()
2163-        return self._update_servermap(servermap, mode)
2164-    def _update_servermap(self, servermap, mode):
2165-        u = ServermapUpdater(self, self._storage_broker, Monitor(), servermap,
2166-                             mode)
2167-        if self._history:
2168-            self._history.notify_mapupdate(u.get_status())
2169-        return u.update()
2170 
2171hunk ./src/allmydata/mutable/filenode.py 855
2172-    def download_version(self, servermap, version, fetch_privkey=False):
2173-        return self._do_serialized(self._try_once_to_download_version,
2174-                                   servermap, version, fetch_privkey)
2175-    def _try_once_to_download_version(self, servermap, version,
2176-                                      fetch_privkey=False):
2177-        r = Retrieve(self, servermap, version, fetch_privkey)
2178+    def is_readonly(self):
2179+        """
2180+        I return True if this MutableFileVersion provides no write
2181+        access to the file that it encapsulates, and False if it
2182+        provides the ability to modify the file.
2183+        """
2184+        return self._writekey is None
2185+
2186+
2187+    def is_mutable(self):
2188+        """
2189+        I return True, since mutable files are always mutable by
2190+        somebody.
2191+        """
2192+        return True
2193+
2194+
2195+    def get_storage_index(self):
2196+        """
2197+        I return the storage index of the reference that I encapsulate.
2198+        """
2199+        return self._storage_index
2200+
2201+
2202+    def get_size(self):
2203+        """
2204+        I return the length, in bytes, of this readable object.
2205+        """
2206+        return self._servermap.size_of_version(self._version)
2207+
2208+
2209+    def download_to_data(self, fetch_privkey=False):
2210+        """
2211+        I return a Deferred that fires with the contents of this
2212+        readable object as a byte string.
2213+
2214+        """
2215+        c = consumer.MemoryConsumer()
2216+        d = self.read(c, fetch_privkey=fetch_privkey)
2217+        d.addCallback(lambda mc: "".join(mc.chunks))
2218+        return d
2219+
2220+
2221+    def _try_to_download_data(self):
2222+        """
2223+        I am an unserialized cousin of download_to_data; I am called
2224+        from the children of modify() to download the data associated
2225+        with this mutable version.
2226+        """
2227+        c = consumer.MemoryConsumer()
2228+        # modify will almost certainly write, so we need the privkey.
2229+        d = self._read(c, fetch_privkey=True)
2230+        d.addCallback(lambda mc: "".join(mc.chunks))
2231+        return d
2232+
2233+
2234+    def read(self, consumer, offset=0, size=None, fetch_privkey=False):
2235+        """
2236+        I read a portion (possibly all) of the mutable file that I
2237+        reference into consumer.
2238+        """
2239+        return self._do_serialized(self._read, consumer, offset, size,
2240+                                   fetch_privkey)
2241+
2242+
2243+    def _read(self, consumer, offset=0, size=None, fetch_privkey=False):
2244+        """
2245+        I am the serialized companion of read.
2246+        """
2247+        r = Retrieve(self._node, self._servermap, self._version, fetch_privkey)
2248         if self._history:
2249             self._history.notify_retrieve(r.get_status())
2250hunk ./src/allmydata/mutable/filenode.py 927
2251-        d = r.download()
2252-        d.addCallback(self._downloaded_version)
2253+        d = r.download(consumer, offset, size)
2254         return d
2255hunk ./src/allmydata/mutable/filenode.py 929
2256-    def _downloaded_version(self, data):
2257-        self._most_recent_size = len(data)
2258-        return data
2259 
2260hunk ./src/allmydata/mutable/filenode.py 930
2261-    def upload(self, new_contents, servermap):
2262-        return self._do_serialized(self._upload, new_contents, servermap)
2263-    def _upload(self, new_contents, servermap):
2264-        assert self._pubkey, "update_servermap must be called before publish"
2265-        p = Publish(self, self._storage_broker, servermap)
2266+
2267+    def _do_serialized(self, cb, *args, **kwargs):
2268+        # note: to avoid deadlock, this callable is *not* allowed to invoke
2269+        # other serialized methods within this (or any other)
2270+        # MutableFileNode. The callable should be a bound method of this same
2271+        # MFN instance.
2272+        d = defer.Deferred()
2273+        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
2274+        # we need to put off d.callback until this Deferred is finished being
2275+        # processed. Otherwise the caller's subsequent activities (like,
2276+        # doing other things with this node) can cause reentrancy problems in
2277+        # the Deferred code itself
2278+        self._serializer.addBoth(lambda res: eventually(d.callback, res))
2279+        # add a log.err just in case something really weird happens, because
2280+        # self._serializer stays around forever, therefore we won't see the
2281+        # usual Unhandled Error in Deferred that would give us a hint.
2282+        self._serializer.addErrback(log.err)
2283+        return d
2284+
2285+
2286+    def _upload(self, new_contents):
2287+        #assert self._pubkey, "update_servermap must be called before publish"
2288+        p = Publish(self._node, self._storage_broker, self._servermap)
2289         if self._history:
2290hunk ./src/allmydata/mutable/filenode.py 954
2291-            self._history.notify_publish(p.get_status(), len(new_contents))
2292+            self._history.notify_publish(p.get_status(),
2293+                                         new_contents.get_size())
2294         d = p.publish(new_contents)
2295hunk ./src/allmydata/mutable/filenode.py 957
2296-        d.addCallback(self._did_upload, len(new_contents))
2297+        d.addCallback(self._did_upload, new_contents.get_size())
2298         return d
2299hunk ./src/allmydata/mutable/filenode.py 959
2300+
2301+
2302     def _did_upload(self, res, size):
2303         self._most_recent_size = size
2304         return res
2305hunk ./src/allmydata/mutable/filenode.py 964
2306+
2307+    def update(self, data, offset):
2308+        """
2309+        Do an update of this mutable file version by inserting data at
2310+        offset within the file. If offset is the EOF, this is an append
2311+        operation. I return a Deferred that fires with the results of
2312+        the update operation when it has completed.
2313+
2314+        In cases where update does not append any data, or where it does
2315+        not append so many blocks that the block count crosses a
2316+        power-of-two boundary, this operation will use roughly
2317+        O(data.get_size()) memory/bandwidth/CPU to perform the update.
2318+        Otherwise, it must download, re-encode, and upload the entire
2319+        file again, which will use O(filesize) resources.
2320+        """
2321+        return self._do_serialized(self._update, data, offset)
2322+
2323+
2324+    def _update(self, data, offset):
2325+        """
2326+        I update the mutable file version represented by this particular
2327+        IMutableVersion by inserting the data in data at the offset
2328+        offset. I return a Deferred that fires when this has been
2329+        completed.
2330+        """
2331+        # We have two cases here:
2332+        # 1. The new data will add few enough segments so that it does
2333+        #    not cross into the next power-of-two boundary.
2334+        # 2. It doesn't.
2335+        #
2336+        # In the former case, we can modify the file in place. In the
2337+        # latter case, we need to re-encode the file.
2338+        new_size = data.get_size() + offset
2339+        old_size = self.get_size()
2340+        segment_size = self._version[3]
2341+        num_old_segments = mathutil.div_ceil(old_size,
2342+                                             segment_size)
2343+        num_new_segments = mathutil.div_ceil(new_size,
2344+                                             segment_size)
2345+        log.msg("got %d old segments, %d new segments" % \
2346+                        (num_old_segments, num_new_segments))
2347+
2348+        # We also do a whole file re-encode if the file is an SDMF file.
2349+        if self._version[2]: # version[2] == SDMF salt, which MDMF lacks
2350+            log.msg("doing re-encode instead of in-place update")
2351+            return self._do_modify_update(data, offset)
2352+
2353+        log.msg("updating in place")
2354+        d = self._do_update_update(data, offset)
2355+        d.addCallback(self._decode_and_decrypt_segments, data, offset)
2356+        d.addCallback(self._build_uploadable_and_finish, data, offset)
2357+        return d
2358+
2359+
2360+    def _do_modify_update(self, data, offset):
2361+        """
2362+        I perform a file update by modifying the contents of the file
2363+        after downloading it, then reuploading it. I am less efficient
2364+        than _do_update_update, but am necessary for certain updates.
2365+        """
2366+        def m(old, servermap, first_time):
2367+            start = offset
2368+            rest = offset + data.get_size()
2369+            new = old[:start]
2370+            new += "".join(data.read(data.get_size()))
2371+            new += old[rest:]
2372+            return new
2373+        return self._modify(m, None)
2374+
2375+
2376+    def _do_update_update(self, data, offset):
2377+        """
2378+        I start the Servermap update that gets us the data we need to
2379+        continue the update process. I return a Deferred that fires when
2380+        the servermap update is done.
2381+        """
2382+        assert IMutableUploadable.providedBy(data)
2383+        assert self.is_mutable()
2384+        # offset == self.get_size() is valid and means that we are
2385+        # appending data to the file.
2386+        assert offset <= self.get_size()
2387+
2388+        # We'll need the segment that the data starts in, regardless of
2389+        # what we'll do later.
2390+        start_segment = mathutil.div_ceil(offset, DEFAULT_MAX_SEGMENT_SIZE)
2391+        start_segment -= 1
2392+
2393+        # We only need the end segment if the data we append does not go
2394+        # beyond the current end-of-file.
2395+        end_segment = start_segment
2396+        if offset + data.get_size() < self.get_size():
2397+            end_data = offset + data.get_size()
2398+            end_segment = mathutil.div_ceil(end_data, DEFAULT_MAX_SEGMENT_SIZE)
2399+            end_segment -= 1
2400+        self._start_segment = start_segment
2401+        self._end_segment = end_segment
2402+
2403+        # Now ask for the servermap to be updated in MODE_WRITE with
2404+        # this update range.
2405+        u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
2406+                             self._servermap,
2407+                             mode=MODE_WRITE,
2408+                             update_range=(start_segment, end_segment))
2409+        return u.update()
2410+
2411+
2412+    def _decode_and_decrypt_segments(self, ignored, data, offset):
2413+        """
2414+        After the servermap update, I take the encrypted and encoded
2415+        data that the servermap fetched while doing its update and
2416+        transform it into decoded-and-decrypted plaintext that can be
2417+        used by the new uploadable. I return a Deferred that fires with
2418+        the segments.
2419+        """
2420+        r = Retrieve(self._node, self._servermap, self._version)
2421+        # decode: takes in our blocks and salts from the servermap,
2422+        # returns a Deferred that fires with the corresponding plaintext
2423+        # segments. Does not download -- simply takes advantage of
2424+        # existing infrastructure within the Retrieve class to avoid
2425+        # duplicating code.
2426+        sm = self._servermap
2427+        # XXX: If the methods in the servermap don't work as
2428+        # abstractions, you should rewrite them instead of going around
2429+        # them.
2430+        update_data = sm.update_data
2431+        start_segments = {} # shnum -> start segment
2432+        end_segments = {} # shnum -> end segment
2433+        blockhashes = {} # shnum -> blockhash tree
2434+        for (shnum, data) in update_data.iteritems():
2435+            data = [d[1] for d in data if d[0] == self._version]
2436+
2437+            # Every data entry in our list should now be share shnum for
2438+            # a particular version of the mutable file, so all of the
2439+            # entries should be identical.
2440+            datum = data[0]
2441+            assert filter(lambda x: x != datum, data) == []
2442+
2443+            blockhashes[shnum] = datum[0]
2444+            start_segments[shnum] = datum[1]
2445+            end_segments[shnum] = datum[2]
2446+
2447+        d1 = r.decode(start_segments, self._start_segment)
2448+        d2 = r.decode(end_segments, self._end_segment)
2449+        d3 = defer.succeed(blockhashes)
2450+        return deferredutil.gatherResults([d1, d2, d3])
2451+
2452+
2453+    def _build_uploadable_and_finish(self, segments_and_bht, data, offset):
2454+        """
2455+        After the process has the plaintext segments, I build the
2456+        TransformingUploadable that the publisher will eventually
2457+        re-upload to the grid. I then invoke the publisher with that
2458+        uploadable, and return a Deferred when the publish operation has
2459+        completed without issue.
2460+        """
2461+        u = TransformingUploadable(data, offset,
2462+                                   self._version[3],
2463+                                   segments_and_bht[0],
2464+                                   segments_and_bht[1])
2465+        p = Publish(self._node, self._storage_broker, self._servermap)
2466+        return p.update(u, offset, segments_and_bht[2], self._version)
2467}
2468[mutable/publish.py: Modify the publish process to support MDMF
2469Kevan Carstensen <kevan@isnotajoke.com>**20100819003342
2470 Ignore-this: 2bb379974927e2e20cff75bae8302d1d
2471 
2472 The inner workings of the publishing process needed to be reworked to a
2473 large extend to cope with segmented mutable files, and to cope with
2474 partial-file updates of mutable files. This patch does that. It also
2475 introduces wrappers for uploadable data, allowing the use of
2476 filehandle-like objects as data sources, in addition to strings. This
2477 reduces memory inefficiency when dealing with large files through the
2478 webapi, and clarifies update code there.
2479] {
2480hunk ./src/allmydata/mutable/publish.py 3
2481 
2482 
2483-import os, struct, time
2484+import os, time
2485+from StringIO import StringIO
2486 from itertools import count
2487 from zope.interface import implements
2488 from twisted.internet import defer
2489hunk ./src/allmydata/mutable/publish.py 9
2490 from twisted.python import failure
2491-from allmydata.interfaces import IPublishStatus
2492+from allmydata.interfaces import IPublishStatus, SDMF_VERSION, MDMF_VERSION, \
2493+                                 IMutableUploadable
2494 from allmydata.util import base32, hashutil, mathutil, idlib, log
2495 from allmydata.util.dictutil import DictOfSets
2496 from allmydata import hashtree, codec
2497hunk ./src/allmydata/mutable/publish.py 21
2498 from allmydata.mutable.common import MODE_WRITE, MODE_CHECK, \
2499      UncoordinatedWriteError, NotEnoughServersError
2500 from allmydata.mutable.servermap import ServerMap
2501-from allmydata.mutable.layout import pack_prefix, pack_share, unpack_header, pack_checkstring, \
2502-     unpack_checkstring, SIGNED_PREFIX
2503+from allmydata.mutable.layout import unpack_checkstring, MDMFSlotWriteProxy, \
2504+                                     SDMFSlotWriteProxy
2505+
2506+KiB = 1024
2507+DEFAULT_MAX_SEGMENT_SIZE = 128 * KiB
2508+PUSHING_BLOCKS_STATE = 0
2509+PUSHING_EVERYTHING_ELSE_STATE = 1
2510+DONE_STATE = 2
2511 
2512 class PublishStatus:
2513     implements(IPublishStatus)
2514hunk ./src/allmydata/mutable/publish.py 118
2515         self._status.set_helper(False)
2516         self._status.set_progress(0.0)
2517         self._status.set_active(True)
2518+        self._version = self._node.get_version()
2519+        assert self._version in (SDMF_VERSION, MDMF_VERSION)
2520+
2521 
2522     def get_status(self):
2523         return self._status
2524hunk ./src/allmydata/mutable/publish.py 132
2525             kwargs["facility"] = "tahoe.mutable.publish"
2526         return log.msg(*args, **kwargs)
2527 
2528+
2529+    def update(self, data, offset, blockhashes, version):
2530+        """
2531+        I replace the contents of this file with the contents of data,
2532+        starting at offset. I return a Deferred that fires with None
2533+        when the replacement has been completed, or with an error if
2534+        something went wrong during the process.
2535+
2536+        Note that this process will not upload new shares. If the file
2537+        being updated is in need of repair, callers will have to repair
2538+        it on their own.
2539+        """
2540+        # How this works:
2541+        # 1: Make peer assignments. We'll assign each share that we know
2542+        # about on the grid to that peer that currently holds that
2543+        # share, and will not place any new shares.
2544+        # 2: Setup encoding parameters. Most of these will stay the same
2545+        # -- datalength will change, as will some of the offsets.
2546+        # 3. Upload the new segments.
2547+        # 4. Be done.
2548+        assert IMutableUploadable.providedBy(data)
2549+
2550+        self.data = data
2551+
2552+        # XXX: Use the MutableFileVersion instead.
2553+        self.datalength = self._node.get_size()
2554+        if data.get_size() > self.datalength:
2555+            self.datalength = data.get_size()
2556+
2557+        self.log("starting update")
2558+        self.log("adding new data of length %d at offset %d" % \
2559+                    (data.get_size(), offset))
2560+        self.log("new data length is %d" % self.datalength)
2561+        self._status.set_size(self.datalength)
2562+        self._status.set_status("Started")
2563+        self._started = time.time()
2564+
2565+        self.done_deferred = defer.Deferred()
2566+
2567+        self._writekey = self._node.get_writekey()
2568+        assert self._writekey, "need write capability to publish"
2569+
2570+        # first, which servers will we publish to? We require that the
2571+        # servermap was updated in MODE_WRITE, so we can depend upon the
2572+        # peerlist computed by that process instead of computing our own.
2573+        assert self._servermap
2574+        assert self._servermap.last_update_mode in (MODE_WRITE, MODE_CHECK)
2575+        # we will push a version that is one larger than anything present
2576+        # in the grid, according to the servermap.
2577+        self._new_seqnum = self._servermap.highest_seqnum() + 1
2578+        self._status.set_servermap(self._servermap)
2579+
2580+        self.log(format="new seqnum will be %(seqnum)d",
2581+                 seqnum=self._new_seqnum, level=log.NOISY)
2582+
2583+        # We're updating an existing file, so all of the following
2584+        # should be available.
2585+        self.readkey = self._node.get_readkey()
2586+        self.required_shares = self._node.get_required_shares()
2587+        assert self.required_shares is not None
2588+        self.total_shares = self._node.get_total_shares()
2589+        assert self.total_shares is not None
2590+        self._status.set_encoding(self.required_shares, self.total_shares)
2591+
2592+        self._pubkey = self._node.get_pubkey()
2593+        assert self._pubkey
2594+        self._privkey = self._node.get_privkey()
2595+        assert self._privkey
2596+        self._encprivkey = self._node.get_encprivkey()
2597+
2598+        sb = self._storage_broker
2599+        full_peerlist = sb.get_servers_for_index(self._storage_index)
2600+        self.full_peerlist = full_peerlist # for use later, immutable
2601+        self.bad_peers = set() # peerids who have errbacked/refused requests
2602+
2603+        # This will set self.segment_size, self.num_segments, and
2604+        # self.fec. TODO: Does it know how to do the offset? Probably
2605+        # not. So do that part next.
2606+        self.setup_encoding_parameters(offset=offset)
2607+
2608+        # if we experience any surprises (writes which were rejected because
2609+        # our test vector did not match, or shares which we didn't expect to
2610+        # see), we set this flag and report an UncoordinatedWriteError at the
2611+        # end of the publish process.
2612+        self.surprised = False
2613+
2614+        # we keep track of three tables. The first is our goal: which share
2615+        # we want to see on which servers. This is initially populated by the
2616+        # existing servermap.
2617+        self.goal = set() # pairs of (peerid, shnum) tuples
2618+
2619+        # the second table is our list of outstanding queries: those which
2620+        # are in flight and may or may not be delivered, accepted, or
2621+        # acknowledged. Items are added to this table when the request is
2622+        # sent, and removed when the response returns (or errbacks).
2623+        self.outstanding = set() # (peerid, shnum) tuples
2624+
2625+        # the third is a table of successes: share which have actually been
2626+        # placed. These are populated when responses come back with success.
2627+        # When self.placed == self.goal, we're done.
2628+        self.placed = set() # (peerid, shnum) tuples
2629+
2630+        # we also keep a mapping from peerid to RemoteReference. Each time we
2631+        # pull a connection out of the full peerlist, we add it to this for
2632+        # use later.
2633+        self.connections = {}
2634+
2635+        self.bad_share_checkstrings = {}
2636+
2637+        # This is set at the last step of the publishing process.
2638+        self.versioninfo = ""
2639+
2640+        # we use the servermap to populate the initial goal: this way we will
2641+        # try to update each existing share in place. Since we're
2642+        # updating, we ignore damaged and missing shares -- callers must
2643+        # do a repair to repair and recreate these.
2644+        for (peerid, shnum) in self._servermap.servermap:
2645+            self.goal.add( (peerid, shnum) )
2646+            self.connections[peerid] = self._servermap.connections[peerid]
2647+        self.writers = {}
2648+
2649+        # SDMF files are updated differently.
2650+        self._version = MDMF_VERSION
2651+        writer_class = MDMFSlotWriteProxy
2652+
2653+        # For each (peerid, shnum) in self.goal, we make a
2654+        # write proxy for that peer. We'll use this to write
2655+        # shares to the peer.
2656+        for key in self.goal:
2657+            peerid, shnum = key
2658+            write_enabler = self._node.get_write_enabler(peerid)
2659+            renew_secret = self._node.get_renewal_secret(peerid)
2660+            cancel_secret = self._node.get_cancel_secret(peerid)
2661+            secrets = (write_enabler, renew_secret, cancel_secret)
2662+
2663+            self.writers[shnum] =  writer_class(shnum,
2664+                                                self.connections[peerid],
2665+                                                self._storage_index,
2666+                                                secrets,
2667+                                                self._new_seqnum,
2668+                                                self.required_shares,
2669+                                                self.total_shares,
2670+                                                self.segment_size,
2671+                                                self.datalength)
2672+            self.writers[shnum].peerid = peerid
2673+            assert (peerid, shnum) in self._servermap.servermap
2674+            old_versionid, old_timestamp = self._servermap.servermap[key]
2675+            (old_seqnum, old_root_hash, old_salt, old_segsize,
2676+             old_datalength, old_k, old_N, old_prefix,
2677+             old_offsets_tuple) = old_versionid
2678+            self.writers[shnum].set_checkstring(old_seqnum,
2679+                                                old_root_hash,
2680+                                                old_salt)
2681+
2682+        # Our remote shares will not have a complete checkstring until
2683+        # after we are done writing share data and have started to write
2684+        # blocks. In the meantime, we need to know what to look for when
2685+        # writing, so that we can detect UncoordinatedWriteErrors.
2686+        self._checkstring = self.writers.values()[0].get_checkstring()
2687+
2688+        # Now, we start pushing shares.
2689+        self._status.timings["setup"] = time.time() - self._started
2690+        # First, we encrypt, encode, and publish the shares that we need
2691+        # to encrypt, encode, and publish.
2692+
2693+        # Our update process fetched these for us. We need to update
2694+        # them in place as publishing happens.
2695+        self.blockhashes = {} # (shnum, [blochashes])
2696+        for (i, bht) in blockhashes.iteritems():
2697+            # We need to extract the leaves from our old hash tree.
2698+            old_segcount = mathutil.div_ceil(version[4],
2699+                                             version[3])
2700+            h = hashtree.IncompleteHashTree(old_segcount)
2701+            bht = dict(enumerate(bht))
2702+            h.set_hashes(bht)
2703+            leaves = h[h.get_leaf_index(0):]
2704+            for j in xrange(self.num_segments - len(leaves)):
2705+                leaves.append(None)
2706+
2707+            assert len(leaves) >= self.num_segments
2708+            self.blockhashes[i] = leaves
2709+            # This list will now be the leaves that were set during the
2710+            # initial upload + enough empty hashes to make it a
2711+            # power-of-two. If we exceed a power of two boundary, we
2712+            # should be encoding the file over again, and should not be
2713+            # here. So, we have
2714+            #assert len(self.blockhashes[i]) == \
2715+            #    hashtree.roundup_pow2(self.num_segments), \
2716+            #        len(self.blockhashes[i])
2717+            # XXX: Except this doesn't work. Figure out why.
2718+
2719+        # These are filled in later, after we've modified the block hash
2720+        # tree suitably.
2721+        self.sharehash_leaves = None # eventually [sharehashes]
2722+        self.sharehashes = {} # shnum -> [sharehash leaves necessary to
2723+                              # validate the share]
2724+
2725+        self.log("Starting push")
2726+
2727+        self._state = PUSHING_BLOCKS_STATE
2728+        self._push()
2729+
2730+        return self.done_deferred
2731+
2732+
2733     def publish(self, newdata):
2734         """Publish the filenode's current contents.  Returns a Deferred that
2735         fires (with None) when the publish has done as much work as it's ever
2736hunk ./src/allmydata/mutable/publish.py 344
2737         simultaneous write.
2738         """
2739 
2740-        # 1: generate shares (SDMF: files are small, so we can do it in RAM)
2741-        # 2: perform peer selection, get candidate servers
2742-        #  2a: send queries to n+epsilon servers, to determine current shares
2743-        #  2b: based upon responses, create target map
2744-        # 3: send slot_testv_and_readv_and_writev messages
2745-        # 4: as responses return, update share-dispatch table
2746-        # 4a: may need to run recovery algorithm
2747-        # 5: when enough responses are back, we're done
2748+        # 0. Setup encoding parameters, encoder, and other such things.
2749+        # 1. Encrypt, encode, and publish segments.
2750+        assert IMutableUploadable.providedBy(newdata)
2751 
2752hunk ./src/allmydata/mutable/publish.py 348
2753-        self.log("starting publish, datalen is %s" % len(newdata))
2754-        self._status.set_size(len(newdata))
2755+        self.data = newdata
2756+        self.datalength = newdata.get_size()
2757+        #if self.datalength >= DEFAULT_MAX_SEGMENT_SIZE:
2758+        #    self._version = MDMF_VERSION
2759+        #else:
2760+        #    self._version = SDMF_VERSION
2761+
2762+        self.log("starting publish, datalen is %s" % self.datalength)
2763+        self._status.set_size(self.datalength)
2764         self._status.set_status("Started")
2765         self._started = time.time()
2766 
2767hunk ./src/allmydata/mutable/publish.py 405
2768         self.full_peerlist = full_peerlist # for use later, immutable
2769         self.bad_peers = set() # peerids who have errbacked/refused requests
2770 
2771-        self.newdata = newdata
2772-        self.salt = os.urandom(16)
2773-
2774+        # This will set self.segment_size, self.num_segments, and
2775+        # self.fec.
2776         self.setup_encoding_parameters()
2777 
2778         # if we experience any surprises (writes which were rejected because
2779hunk ./src/allmydata/mutable/publish.py 415
2780         # end of the publish process.
2781         self.surprised = False
2782 
2783-        # as a failsafe, refuse to iterate through self.loop more than a
2784-        # thousand times.
2785-        self.looplimit = 1000
2786-
2787         # we keep track of three tables. The first is our goal: which share
2788         # we want to see on which servers. This is initially populated by the
2789         # existing servermap.
2790hunk ./src/allmydata/mutable/publish.py 438
2791 
2792         self.bad_share_checkstrings = {}
2793 
2794+        # This is set at the last step of the publishing process.
2795+        self.versioninfo = ""
2796+
2797         # we use the servermap to populate the initial goal: this way we will
2798         # try to update each existing share in place.
2799         for (peerid, shnum) in self._servermap.servermap:
2800hunk ./src/allmydata/mutable/publish.py 454
2801             self.bad_share_checkstrings[key] = old_checkstring
2802             self.connections[peerid] = self._servermap.connections[peerid]
2803 
2804-        # create the shares. We'll discard these as they are delivered. SDMF:
2805-        # we're allowed to hold everything in memory.
2806+        # TODO: Make this part do peer selection.
2807+        self.update_goal()
2808+        self.writers = {}
2809+        if self._version == MDMF_VERSION:
2810+            writer_class = MDMFSlotWriteProxy
2811+        else:
2812+            writer_class = SDMFSlotWriteProxy
2813 
2814hunk ./src/allmydata/mutable/publish.py 462
2815+        # For each (peerid, shnum) in self.goal, we make a
2816+        # write proxy for that peer. We'll use this to write
2817+        # shares to the peer.
2818+        for key in self.goal:
2819+            peerid, shnum = key
2820+            write_enabler = self._node.get_write_enabler(peerid)
2821+            renew_secret = self._node.get_renewal_secret(peerid)
2822+            cancel_secret = self._node.get_cancel_secret(peerid)
2823+            secrets = (write_enabler, renew_secret, cancel_secret)
2824+
2825+            self.writers[shnum] =  writer_class(shnum,
2826+                                                self.connections[peerid],
2827+                                                self._storage_index,
2828+                                                secrets,
2829+                                                self._new_seqnum,
2830+                                                self.required_shares,
2831+                                                self.total_shares,
2832+                                                self.segment_size,
2833+                                                self.datalength)
2834+            self.writers[shnum].peerid = peerid
2835+            if (peerid, shnum) in self._servermap.servermap:
2836+                old_versionid, old_timestamp = self._servermap.servermap[key]
2837+                (old_seqnum, old_root_hash, old_salt, old_segsize,
2838+                 old_datalength, old_k, old_N, old_prefix,
2839+                 old_offsets_tuple) = old_versionid
2840+                self.writers[shnum].set_checkstring(old_seqnum,
2841+                                                    old_root_hash,
2842+                                                    old_salt)
2843+            elif (peerid, shnum) in self.bad_share_checkstrings:
2844+                old_checkstring = self.bad_share_checkstrings[(peerid, shnum)]
2845+                self.writers[shnum].set_checkstring(old_checkstring)
2846+
2847+        # Our remote shares will not have a complete checkstring until
2848+        # after we are done writing share data and have started to write
2849+        # blocks. In the meantime, we need to know what to look for when
2850+        # writing, so that we can detect UncoordinatedWriteErrors.
2851+        self._checkstring = self.writers.values()[0].get_checkstring()
2852+
2853+        # Now, we start pushing shares.
2854         self._status.timings["setup"] = time.time() - self._started
2855hunk ./src/allmydata/mutable/publish.py 502
2856-        d = self._encrypt_and_encode()
2857-        d.addCallback(self._generate_shares)
2858-        def _start_pushing(res):
2859-            self._started_pushing = time.time()
2860-            return res
2861-        d.addCallback(_start_pushing)
2862-        d.addCallback(self.loop) # trigger delivery
2863-        d.addErrback(self._fatal_error)
2864+        # First, we encrypt, encode, and publish the shares that we need
2865+        # to encrypt, encode, and publish.
2866+
2867+        # This will eventually hold the block hash chain for each share
2868+        # that we publish. We define it this way so that empty publishes
2869+        # will still have something to write to the remote slot.
2870+        self.blockhashes = dict([(i, []) for i in xrange(self.total_shares)])
2871+        for i in xrange(self.total_shares):
2872+            blocks = self.blockhashes[i]
2873+            for j in xrange(self.num_segments):
2874+                blocks.append(None)
2875+        self.sharehash_leaves = None # eventually [sharehashes]
2876+        self.sharehashes = {} # shnum -> [sharehash leaves necessary to
2877+                              # validate the share]
2878+
2879+        self.log("Starting push")
2880+
2881+        self._state = PUSHING_BLOCKS_STATE
2882+        self._push()
2883 
2884         return self.done_deferred
2885 
2886hunk ./src/allmydata/mutable/publish.py 524
2887-    def setup_encoding_parameters(self):
2888-        segment_size = len(self.newdata)
2889+
2890+    def _update_status(self):
2891+        self._status.set_status("Sending Shares: %d placed out of %d, "
2892+                                "%d messages outstanding" %
2893+                                (len(self.placed),
2894+                                 len(self.goal),
2895+                                 len(self.outstanding)))
2896+        self._status.set_progress(1.0 * len(self.placed) / len(self.goal))
2897+
2898+
2899+    def setup_encoding_parameters(self, offset=0):
2900+        if self._version == MDMF_VERSION:
2901+            segment_size = DEFAULT_MAX_SEGMENT_SIZE # 128 KiB by default
2902+        else:
2903+            segment_size = self.datalength # SDMF is only one segment
2904         # this must be a multiple of self.required_shares
2905         segment_size = mathutil.next_multiple(segment_size,
2906                                               self.required_shares)
2907hunk ./src/allmydata/mutable/publish.py 543
2908         self.segment_size = segment_size
2909+
2910+        # Calculate the starting segment for the upload.
2911         if segment_size:
2912hunk ./src/allmydata/mutable/publish.py 546
2913-            self.num_segments = mathutil.div_ceil(len(self.newdata),
2914+            self.num_segments = mathutil.div_ceil(self.datalength,
2915                                                   segment_size)
2916hunk ./src/allmydata/mutable/publish.py 548
2917+            self.starting_segment = mathutil.div_ceil(offset,
2918+                                                      segment_size)
2919+            self.starting_segment -= 1
2920+            if offset == 0:
2921+                self.starting_segment = 0
2922+
2923         else:
2924             self.num_segments = 0
2925hunk ./src/allmydata/mutable/publish.py 556
2926-        assert self.num_segments in [0, 1,] # SDMF restrictions
2927+            self.starting_segment = 0
2928+
2929+
2930+        self.log("building encoding parameters for file")
2931+        self.log("got segsize %d" % self.segment_size)
2932+        self.log("got %d segments" % self.num_segments)
2933+
2934+        if self._version == SDMF_VERSION:
2935+            assert self.num_segments in (0, 1) # SDMF
2936+        # calculate the tail segment size.
2937+
2938+        if segment_size and self.datalength:
2939+            self.tail_segment_size = self.datalength % segment_size
2940+            self.log("got tail segment size %d" % self.tail_segment_size)
2941+        else:
2942+            self.tail_segment_size = 0
2943+
2944+        if self.tail_segment_size == 0 and segment_size:
2945+            # The tail segment is the same size as the other segments.
2946+            self.tail_segment_size = segment_size
2947+
2948+        # Make FEC encoders
2949+        fec = codec.CRSEncoder()
2950+        fec.set_params(self.segment_size,
2951+                       self.required_shares, self.total_shares)
2952+        self.piece_size = fec.get_block_size()
2953+        self.fec = fec
2954+
2955+        if self.tail_segment_size == self.segment_size:
2956+            self.tail_fec = self.fec
2957+        else:
2958+            tail_fec = codec.CRSEncoder()
2959+            tail_fec.set_params(self.tail_segment_size,
2960+                                self.required_shares,
2961+                                self.total_shares)
2962+            self.tail_fec = tail_fec
2963+
2964+        self._current_segment = self.starting_segment
2965+        self.end_segment = self.num_segments - 1
2966+        # Now figure out where the last segment should be.
2967+        if self.data.get_size() != self.datalength:
2968+            end = self.data.get_size()
2969+            self.end_segment = mathutil.div_ceil(end,
2970+                                                 segment_size)
2971+            self.end_segment -= 1
2972+        self.log("got start segment %d" % self.starting_segment)
2973+        self.log("got end segment %d" % self.end_segment)
2974+
2975+
2976+    def _push(self, ignored=None):
2977+        """
2978+        I manage state transitions. In particular, I see that we still
2979+        have a good enough number of writers to complete the upload
2980+        successfully.
2981+        """
2982+        # Can we still successfully publish this file?
2983+        # TODO: Keep track of outstanding queries before aborting the
2984+        #       process.
2985+        if len(self.writers) <= self.required_shares or self.surprised:
2986+            return self._failure()
2987+
2988+        # Figure out what we need to do next. Each of these needs to
2989+        # return a deferred so that we don't block execution when this
2990+        # is first called in the upload method.
2991+        if self._state == PUSHING_BLOCKS_STATE:
2992+            return self.push_segment(self._current_segment)
2993+
2994+        elif self._state == PUSHING_EVERYTHING_ELSE_STATE:
2995+            return self.push_everything_else()
2996+
2997+        # If we make it to this point, we were successful in placing the
2998+        # file.
2999+        return self._done(None)
3000+
3001+
3002+    def push_segment(self, segnum):
3003+        if self.num_segments == 0 and self._version == SDMF_VERSION:
3004+            self._add_dummy_salts()
3005 
3006hunk ./src/allmydata/mutable/publish.py 635
3007-    def _fatal_error(self, f):
3008-        self.log("error during loop", failure=f, level=log.UNUSUAL)
3009-        self._done(f)
3010+        if segnum > self.end_segment:
3011+            # We don't have any more segments to push.
3012+            self._state = PUSHING_EVERYTHING_ELSE_STATE
3013+            return self._push()
3014+
3015+        d = self._encode_segment(segnum)
3016+        d.addCallback(self._push_segment, segnum)
3017+        def _increment_segnum(ign):
3018+            self._current_segment += 1
3019+        # XXX: I don't think we need to do addBoth here -- any errBacks
3020+        # should be handled within push_segment.
3021+        d.addBoth(_increment_segnum)
3022+        d.addBoth(self._turn_barrier)
3023+        d.addBoth(self._push)
3024+
3025+
3026+    def _turn_barrier(self, result):
3027+        """
3028+        I help the publish process avoid the recursion limit issues
3029+        described in #237.
3030+        """
3031+        return fireEventually(result)
3032+
3033+
3034+    def _add_dummy_salts(self):
3035+        """
3036+        SDMF files need a salt even if they're empty, or the signature
3037+        won't make sense. This method adds a dummy salt to each of our
3038+        SDMF writers so that they can write the signature later.
3039+        """
3040+        salt = os.urandom(16)
3041+        assert self._version == SDMF_VERSION
3042+
3043+        for writer in self.writers.itervalues():
3044+            writer.put_salt(salt)
3045+
3046+
3047+    def _encode_segment(self, segnum):
3048+        """
3049+        I encrypt and encode the segment segnum.
3050+        """
3051+        started = time.time()
3052+
3053+        if segnum + 1 == self.num_segments:
3054+            segsize = self.tail_segment_size
3055+        else:
3056+            segsize = self.segment_size
3057+
3058+
3059+        self.log("Pushing segment %d of %d" % (segnum + 1, self.num_segments))
3060+        data = self.data.read(segsize)
3061+        # XXX: This is dumb. Why return a list?
3062+        data = "".join(data)
3063+
3064+        assert len(data) == segsize, len(data)
3065+
3066+        salt = os.urandom(16)
3067+
3068+        key = hashutil.ssk_readkey_data_hash(salt, self.readkey)
3069+        self._status.set_status("Encrypting")
3070+        enc = AES(key)
3071+        crypttext = enc.process(data)
3072+        assert len(crypttext) == len(data)
3073+
3074+        now = time.time()
3075+        self._status.timings["encrypt"] = now - started
3076+        started = now
3077+
3078+        # now apply FEC
3079+        if segnum + 1 == self.num_segments:
3080+            fec = self.tail_fec
3081+        else:
3082+            fec = self.fec
3083+
3084+        self._status.set_status("Encoding")
3085+        crypttext_pieces = [None] * self.required_shares
3086+        piece_size = fec.get_block_size()
3087+        for i in range(len(crypttext_pieces)):
3088+            offset = i * piece_size
3089+            piece = crypttext[offset:offset+piece_size]
3090+            piece = piece + "\x00"*(piece_size - len(piece)) # padding
3091+            crypttext_pieces[i] = piece
3092+            assert len(piece) == piece_size
3093+        d = fec.encode(crypttext_pieces)
3094+        def _done_encoding(res):
3095+            elapsed = time.time() - started
3096+            self._status.timings["encode"] = elapsed
3097+            return (res, salt)
3098+        d.addCallback(_done_encoding)
3099+        return d
3100+
3101+
3102+    def _push_segment(self, encoded_and_salt, segnum):
3103+        """
3104+        I push (data, salt) as segment number segnum.
3105+        """
3106+        results, salt = encoded_and_salt
3107+        shares, shareids = results
3108+        self._status.set_status("Pushing segment")
3109+        for i in xrange(len(shares)):
3110+            sharedata = shares[i]
3111+            shareid = shareids[i]
3112+            if self._version == MDMF_VERSION:
3113+                hashed = salt + sharedata
3114+            else:
3115+                hashed = sharedata
3116+            block_hash = hashutil.block_hash(hashed)
3117+            self.blockhashes[shareid][segnum] = block_hash
3118+            # find the writer for this share
3119+            writer = self.writers[shareid]
3120+            writer.put_block(sharedata, segnum, salt)
3121+
3122+
3123+    def push_everything_else(self):
3124+        """
3125+        I put everything else associated with a share.
3126+        """
3127+        self._pack_started = time.time()
3128+        self.push_encprivkey()
3129+        self.push_blockhashes()
3130+        self.push_sharehashes()
3131+        self.push_toplevel_hashes_and_signature()
3132+        d = self.finish_publishing()
3133+        def _change_state(ignored):
3134+            self._state = DONE_STATE
3135+        d.addCallback(_change_state)
3136+        d.addCallback(self._push)
3137+        return d
3138+
3139+
3140+    def push_encprivkey(self):
3141+        encprivkey = self._encprivkey
3142+        self._status.set_status("Pushing encrypted private key")
3143+        for writer in self.writers.itervalues():
3144+            writer.put_encprivkey(encprivkey)
3145+
3146+
3147+    def push_blockhashes(self):
3148+        self.sharehash_leaves = [None] * len(self.blockhashes)
3149+        self._status.set_status("Building and pushing block hash tree")
3150+        for shnum, blockhashes in self.blockhashes.iteritems():
3151+            t = hashtree.HashTree(blockhashes)
3152+            self.blockhashes[shnum] = list(t)
3153+            # set the leaf for future use.
3154+            self.sharehash_leaves[shnum] = t[0]
3155+
3156+            writer = self.writers[shnum]
3157+            writer.put_blockhashes(self.blockhashes[shnum])
3158+
3159+
3160+    def push_sharehashes(self):
3161+        self._status.set_status("Building and pushing share hash chain")
3162+        share_hash_tree = hashtree.HashTree(self.sharehash_leaves)
3163+        for shnum in xrange(len(self.sharehash_leaves)):
3164+            needed_indices = share_hash_tree.needed_hashes(shnum)
3165+            self.sharehashes[shnum] = dict( [ (i, share_hash_tree[i])
3166+                                             for i in needed_indices] )
3167+            writer = self.writers[shnum]
3168+            writer.put_sharehashes(self.sharehashes[shnum])
3169+        self.root_hash = share_hash_tree[0]
3170+
3171+
3172+    def push_toplevel_hashes_and_signature(self):
3173+        # We need to to three things here:
3174+        #   - Push the root hash and salt hash
3175+        #   - Get the checkstring of the resulting layout; sign that.
3176+        #   - Push the signature
3177+        self._status.set_status("Pushing root hashes and signature")
3178+        for shnum in xrange(self.total_shares):
3179+            writer = self.writers[shnum]
3180+            writer.put_root_hash(self.root_hash)
3181+        self._update_checkstring()
3182+        self._make_and_place_signature()
3183+
3184+
3185+    def _update_checkstring(self):
3186+        """
3187+        After putting the root hash, MDMF files will have the
3188+        checkstring written to the storage server. This means that we
3189+        can update our copy of the checkstring so we can detect
3190+        uncoordinated writes. SDMF files will have the same checkstring,
3191+        so we need not do anything.
3192+        """
3193+        self._checkstring = self.writers.values()[0].get_checkstring()
3194+
3195+
3196+    def _make_and_place_signature(self):
3197+        """
3198+        I create and place the signature.
3199+        """
3200+        started = time.time()
3201+        self._status.set_status("Signing prefix")
3202+        signable = self.writers[0].get_signable()
3203+        self.signature = self._privkey.sign(signable)
3204+
3205+        for (shnum, writer) in self.writers.iteritems():
3206+            writer.put_signature(self.signature)
3207+        self._status.timings['sign'] = time.time() - started
3208+
3209+
3210+    def finish_publishing(self):
3211+        # We're almost done -- we just need to put the verification key
3212+        # and the offsets
3213+        started = time.time()
3214+        self._status.set_status("Pushing shares")
3215+        self._started_pushing = started
3216+        ds = []
3217+        verification_key = self._pubkey.serialize()
3218+
3219+
3220+        # TODO: Bad, since we remove from this same dict. We need to
3221+        # make a copy, or just use a non-iterated value.
3222+        for (shnum, writer) in self.writers.iteritems():
3223+            writer.put_verification_key(verification_key)
3224+            d = writer.finish_publishing()
3225+            # Add the (peerid, shnum) tuple to our list of outstanding
3226+            # queries. This gets used by _loop if some of our queries
3227+            # fail to place shares.
3228+            self.outstanding.add((writer.peerid, writer.shnum))
3229+            d.addCallback(self._got_write_answer, writer, started)
3230+            d.addErrback(self._connection_problem, writer)
3231+            ds.append(d)
3232+        self._record_verinfo()
3233+        self._status.timings['pack'] = time.time() - started
3234+        return defer.DeferredList(ds)
3235+
3236+
3237+    def _record_verinfo(self):
3238+        self.versioninfo = self.writers.values()[0].get_verinfo()
3239+
3240+
3241+    def _connection_problem(self, f, writer):
3242+        """
3243+        We ran into a connection problem while working with writer, and
3244+        need to deal with that.
3245+        """
3246+        self.log("found problem: %s" % str(f))
3247+        self._last_failure = f
3248+        del(self.writers[writer.shnum])
3249 
3250hunk ./src/allmydata/mutable/publish.py 875
3251-    def _update_status(self):
3252-        self._status.set_status("Sending Shares: %d placed out of %d, "
3253-                                "%d messages outstanding" %
3254-                                (len(self.placed),
3255-                                 len(self.goal),
3256-                                 len(self.outstanding)))
3257-        self._status.set_progress(1.0 * len(self.placed) / len(self.goal))
3258 
3259hunk ./src/allmydata/mutable/publish.py 876
3260-    def loop(self, ignored=None):
3261-        self.log("entering loop", level=log.NOISY)
3262-        if not self._running:
3263-            return
3264-
3265-        self.looplimit -= 1
3266-        if self.looplimit <= 0:
3267-            raise LoopLimitExceededError("loop limit exceeded")
3268-
3269-        if self.surprised:
3270-            # don't send out any new shares, just wait for the outstanding
3271-            # ones to be retired.
3272-            self.log("currently surprised, so don't send any new shares",
3273-                     level=log.NOISY)
3274-        else:
3275-            self.update_goal()
3276-            # how far are we from our goal?
3277-            needed = self.goal - self.placed - self.outstanding
3278-            self._update_status()
3279-
3280-            if needed:
3281-                # we need to send out new shares
3282-                self.log(format="need to send %(needed)d new shares",
3283-                         needed=len(needed), level=log.NOISY)
3284-                self._send_shares(needed)
3285-                return
3286-
3287-        if self.outstanding:
3288-            # queries are still pending, keep waiting
3289-            self.log(format="%(outstanding)d queries still outstanding",
3290-                     outstanding=len(self.outstanding),
3291-                     level=log.NOISY)
3292-            return
3293-
3294-        # no queries outstanding, no placements needed: we're done
3295-        self.log("no queries outstanding, no placements needed: done",
3296-                 level=log.OPERATIONAL)
3297-        now = time.time()
3298-        elapsed = now - self._started_pushing
3299-        self._status.timings["push"] = elapsed
3300-        return self._done(None)
3301-
3302     def log_goal(self, goal, message=""):
3303         logmsg = [message]
3304         for (shnum, peerid) in sorted([(s,p) for (p,s) in goal]):
3305hunk ./src/allmydata/mutable/publish.py 957
3306             self.log_goal(self.goal, "after update: ")
3307 
3308 
3309+    def _got_write_answer(self, answer, writer, started):
3310+        if not answer:
3311+            # SDMF writers only pretend to write when readers set their
3312+            # blocks, salts, and so on -- they actually just write once,
3313+            # at the end of the upload process. In fake writes, they
3314+            # return defer.succeed(None). If we see that, we shouldn't
3315+            # bother checking it.
3316+            return
3317 
3318hunk ./src/allmydata/mutable/publish.py 966
3319-    def _encrypt_and_encode(self):
3320-        # this returns a Deferred that fires with a list of (sharedata,
3321-        # sharenum) tuples. TODO: cache the ciphertext, only produce the
3322-        # shares that we care about.
3323-        self.log("_encrypt_and_encode")
3324-
3325-        self._status.set_status("Encrypting")
3326-        started = time.time()
3327-
3328-        key = hashutil.ssk_readkey_data_hash(self.salt, self.readkey)
3329-        enc = AES(key)
3330-        crypttext = enc.process(self.newdata)
3331-        assert len(crypttext) == len(self.newdata)
3332+        peerid = writer.peerid
3333+        lp = self.log("_got_write_answer from %s, share %d" %
3334+                      (idlib.shortnodeid_b2a(peerid), writer.shnum))
3335 
3336         now = time.time()
3337hunk ./src/allmydata/mutable/publish.py 971
3338-        self._status.timings["encrypt"] = now - started
3339-        started = now
3340-
3341-        # now apply FEC
3342-
3343-        self._status.set_status("Encoding")
3344-        fec = codec.CRSEncoder()
3345-        fec.set_params(self.segment_size,
3346-                       self.required_shares, self.total_shares)
3347-        piece_size = fec.get_block_size()
3348-        crypttext_pieces = [None] * self.required_shares
3349-        for i in range(len(crypttext_pieces)):
3350-            offset = i * piece_size
3351-            piece = crypttext[offset:offset+piece_size]
3352-            piece = piece + "\x00"*(piece_size - len(piece)) # padding
3353-            crypttext_pieces[i] = piece
3354-            assert len(piece) == piece_size
3355-
3356-        d = fec.encode(crypttext_pieces)
3357-        def _done_encoding(res):
3358-            elapsed = time.time() - started
3359-            self._status.timings["encode"] = elapsed
3360-            return res
3361-        d.addCallback(_done_encoding)
3362-        return d
3363-
3364-    def _generate_shares(self, shares_and_shareids):
3365-        # this sets self.shares and self.root_hash
3366-        self.log("_generate_shares")
3367-        self._status.set_status("Generating Shares")
3368-        started = time.time()
3369-
3370-        # we should know these by now
3371-        privkey = self._privkey
3372-        encprivkey = self._encprivkey
3373-        pubkey = self._pubkey
3374-
3375-        (shares, share_ids) = shares_and_shareids
3376-
3377-        assert len(shares) == len(share_ids)
3378-        assert len(shares) == self.total_shares
3379-        all_shares = {}
3380-        block_hash_trees = {}
3381-        share_hash_leaves = [None] * len(shares)
3382-        for i in range(len(shares)):
3383-            share_data = shares[i]
3384-            shnum = share_ids[i]
3385-            all_shares[shnum] = share_data
3386-
3387-            # build the block hash tree. SDMF has only one leaf.
3388-            leaves = [hashutil.block_hash(share_data)]
3389-            t = hashtree.HashTree(leaves)
3390-            block_hash_trees[shnum] = list(t)
3391-            share_hash_leaves[shnum] = t[0]
3392-        for leaf in share_hash_leaves:
3393-            assert leaf is not None
3394-        share_hash_tree = hashtree.HashTree(share_hash_leaves)
3395-        share_hash_chain = {}
3396-        for shnum in range(self.total_shares):
3397-            needed_hashes = share_hash_tree.needed_hashes(shnum)
3398-            share_hash_chain[shnum] = dict( [ (i, share_hash_tree[i])
3399-                                              for i in needed_hashes ] )
3400-        root_hash = share_hash_tree[0]
3401-        assert len(root_hash) == 32
3402-        self.log("my new root_hash is %s" % base32.b2a(root_hash))
3403-        self._new_version_info = (self._new_seqnum, root_hash, self.salt)
3404-
3405-        prefix = pack_prefix(self._new_seqnum, root_hash, self.salt,
3406-                             self.required_shares, self.total_shares,
3407-                             self.segment_size, len(self.newdata))
3408-
3409-        # now pack the beginning of the share. All shares are the same up
3410-        # to the signature, then they have divergent share hash chains,
3411-        # then completely different block hash trees + salt + share data,
3412-        # then they all share the same encprivkey at the end. The sizes
3413-        # of everything are the same for all shares.
3414-
3415-        sign_started = time.time()
3416-        signature = privkey.sign(prefix)
3417-        self._status.timings["sign"] = time.time() - sign_started
3418-
3419-        verification_key = pubkey.serialize()
3420-
3421-        final_shares = {}
3422-        for shnum in range(self.total_shares):
3423-            final_share = pack_share(prefix,
3424-                                     verification_key,
3425-                                     signature,
3426-                                     share_hash_chain[shnum],
3427-                                     block_hash_trees[shnum],
3428-                                     all_shares[shnum],
3429-                                     encprivkey)
3430-            final_shares[shnum] = final_share
3431-        elapsed = time.time() - started
3432-        self._status.timings["pack"] = elapsed
3433-        self.shares = final_shares
3434-        self.root_hash = root_hash
3435-
3436-        # we also need to build up the version identifier for what we're
3437-        # pushing. Extract the offsets from one of our shares.
3438-        assert final_shares
3439-        offsets = unpack_header(final_shares.values()[0])[-1]
3440-        offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
3441-        verinfo = (self._new_seqnum, root_hash, self.salt,
3442-                   self.segment_size, len(self.newdata),
3443-                   self.required_shares, self.total_shares,
3444-                   prefix, offsets_tuple)
3445-        self.versioninfo = verinfo
3446-
3447-
3448-
3449-    def _send_shares(self, needed):
3450-        self.log("_send_shares")
3451-
3452-        # we're finally ready to send out our shares. If we encounter any
3453-        # surprises here, it's because somebody else is writing at the same
3454-        # time. (Note: in the future, when we remove the _query_peers() step
3455-        # and instead speculate about [or remember] which shares are where,
3456-        # surprises here are *not* indications of UncoordinatedWriteError,
3457-        # and we'll need to respond to them more gracefully.)
3458-
3459-        # needed is a set of (peerid, shnum) tuples. The first thing we do is
3460-        # organize it by peerid.
3461-
3462-        peermap = DictOfSets()
3463-        for (peerid, shnum) in needed:
3464-            peermap.add(peerid, shnum)
3465-
3466-        # the next thing is to build up a bunch of test vectors. The
3467-        # semantics of Publish are that we perform the operation if the world
3468-        # hasn't changed since the ServerMap was constructed (more or less).
3469-        # For every share we're trying to place, we create a test vector that
3470-        # tests to see if the server*share still corresponds to the
3471-        # map.
3472-
3473-        all_tw_vectors = {} # maps peerid to tw_vectors
3474-        sm = self._servermap.servermap
3475-
3476-        for key in needed:
3477-            (peerid, shnum) = key
3478-
3479-            if key in sm:
3480-                # an old version of that share already exists on the
3481-                # server, according to our servermap. We will create a
3482-                # request that attempts to replace it.
3483-                old_versionid, old_timestamp = sm[key]
3484-                (old_seqnum, old_root_hash, old_salt, old_segsize,
3485-                 old_datalength, old_k, old_N, old_prefix,
3486-                 old_offsets_tuple) = old_versionid
3487-                old_checkstring = pack_checkstring(old_seqnum,
3488-                                                   old_root_hash,
3489-                                                   old_salt)
3490-                testv = (0, len(old_checkstring), "eq", old_checkstring)
3491-
3492-            elif key in self.bad_share_checkstrings:
3493-                old_checkstring = self.bad_share_checkstrings[key]
3494-                testv = (0, len(old_checkstring), "eq", old_checkstring)
3495-
3496-            else:
3497-                # add a testv that requires the share not exist
3498-
3499-                # Unfortunately, foolscap-0.2.5 has a bug in the way inbound
3500-                # constraints are handled. If the same object is referenced
3501-                # multiple times inside the arguments, foolscap emits a
3502-                # 'reference' token instead of a distinct copy of the
3503-                # argument. The bug is that these 'reference' tokens are not
3504-                # accepted by the inbound constraint code. To work around
3505-                # this, we need to prevent python from interning the
3506-                # (constant) tuple, by creating a new copy of this vector
3507-                # each time.
3508-
3509-                # This bug is fixed in foolscap-0.2.6, and even though this
3510-                # version of Tahoe requires foolscap-0.3.1 or newer, we are
3511-                # supposed to be able to interoperate with older versions of
3512-                # Tahoe which are allowed to use older versions of foolscap,
3513-                # including foolscap-0.2.5 . In addition, I've seen other
3514-                # foolscap problems triggered by 'reference' tokens (see #541
3515-                # for details). So we must keep this workaround in place.
3516-
3517-                #testv = (0, 1, 'eq', "")
3518-                testv = tuple([0, 1, 'eq', ""])
3519-
3520-            testvs = [testv]
3521-            # the write vector is simply the share
3522-            writev = [(0, self.shares[shnum])]
3523-
3524-            if peerid not in all_tw_vectors:
3525-                all_tw_vectors[peerid] = {}
3526-                # maps shnum to (testvs, writevs, new_length)
3527-            assert shnum not in all_tw_vectors[peerid]
3528-
3529-            all_tw_vectors[peerid][shnum] = (testvs, writev, None)
3530-
3531-        # we read the checkstring back from each share, however we only use
3532-        # it to detect whether there was a new share that we didn't know
3533-        # about. The success or failure of the write will tell us whether
3534-        # there was a collision or not. If there is a collision, the first
3535-        # thing we'll do is update the servermap, which will find out what
3536-        # happened. We could conceivably reduce a roundtrip by using the
3537-        # readv checkstring to populate the servermap, but really we'd have
3538-        # to read enough data to validate the signatures too, so it wouldn't
3539-        # be an overall win.
3540-        read_vector = [(0, struct.calcsize(SIGNED_PREFIX))]
3541-
3542-        # ok, send the messages!
3543-        self.log("sending %d shares" % len(all_tw_vectors), level=log.NOISY)
3544-        started = time.time()
3545-        for (peerid, tw_vectors) in all_tw_vectors.items():
3546-
3547-            write_enabler = self._node.get_write_enabler(peerid)
3548-            renew_secret = self._node.get_renewal_secret(peerid)
3549-            cancel_secret = self._node.get_cancel_secret(peerid)
3550-            secrets = (write_enabler, renew_secret, cancel_secret)
3551-            shnums = tw_vectors.keys()
3552-
3553-            for shnum in shnums:
3554-                self.outstanding.add( (peerid, shnum) )
3555+        elapsed = now - started
3556 
3557hunk ./src/allmydata/mutable/publish.py 973
3558-            d = self._do_testreadwrite(peerid, secrets,
3559-                                       tw_vectors, read_vector)
3560-            d.addCallbacks(self._got_write_answer, self._got_write_error,
3561-                           callbackArgs=(peerid, shnums, started),
3562-                           errbackArgs=(peerid, shnums, started))
3563-            # tolerate immediate errback, like with DeadReferenceError
3564-            d.addBoth(fireEventually)
3565-            d.addCallback(self.loop)
3566-            d.addErrback(self._fatal_error)
3567+        self._status.add_per_server_time(peerid, elapsed)
3568 
3569hunk ./src/allmydata/mutable/publish.py 975
3570-        self._update_status()
3571-        self.log("%d shares sent" % len(all_tw_vectors), level=log.NOISY)
3572+        wrote, read_data = answer
3573 
3574hunk ./src/allmydata/mutable/publish.py 977
3575-    def _do_testreadwrite(self, peerid, secrets,
3576-                          tw_vectors, read_vector):
3577-        storage_index = self._storage_index
3578-        ss = self.connections[peerid]
3579+        surprise_shares = set(read_data.keys()) - set([writer.shnum])
3580 
3581hunk ./src/allmydata/mutable/publish.py 979
3582-        #print "SS[%s] is %s" % (idlib.shortnodeid_b2a(peerid), ss), ss.tracker.interfaceName
3583-        d = ss.callRemote("slot_testv_and_readv_and_writev",
3584-                          storage_index,
3585-                          secrets,
3586-                          tw_vectors,
3587-                          read_vector)
3588-        return d
3589+        # We need to remove from surprise_shares any shares that we are
3590+        # knowingly also writing to that peer from other writers.
3591 
3592hunk ./src/allmydata/mutable/publish.py 982
3593-    def _got_write_answer(self, answer, peerid, shnums, started):
3594-        lp = self.log("_got_write_answer from %s" %
3595-                      idlib.shortnodeid_b2a(peerid))
3596-        for shnum in shnums:
3597-            self.outstanding.discard( (peerid, shnum) )
3598+        # TODO: Precompute this.
3599+        known_shnums = [x.shnum for x in self.writers.values()
3600+                        if x.peerid == peerid]
3601+        surprise_shares -= set(known_shnums)
3602+        self.log("found the following surprise shares: %s" %
3603+                 str(surprise_shares))
3604 
3605hunk ./src/allmydata/mutable/publish.py 989
3606-        now = time.time()
3607-        elapsed = now - started
3608-        self._status.add_per_server_time(peerid, elapsed)
3609-
3610-        wrote, read_data = answer
3611-
3612-        surprise_shares = set(read_data.keys()) - set(shnums)
3613+        # Now surprise shares contains all of the shares that we did not
3614+        # expect to be there.
3615 
3616         surprised = False
3617         for shnum in surprise_shares:
3618hunk ./src/allmydata/mutable/publish.py 996
3619             # read_data is a dict mapping shnum to checkstring (SIGNED_PREFIX)
3620             checkstring = read_data[shnum][0]
3621-            their_version_info = unpack_checkstring(checkstring)
3622-            if their_version_info == self._new_version_info:
3623+            # What we want to do here is to see if their (seqnum,
3624+            # roothash, salt) is the same as our (seqnum, roothash,
3625+            # salt), or the equivalent for MDMF. The best way to do this
3626+            # is to store a packed representation of our checkstring
3627+            # somewhere, then not bother unpacking the other
3628+            # checkstring.
3629+            if checkstring == self._checkstring:
3630                 # they have the right share, somehow
3631 
3632                 if (peerid,shnum) in self.goal:
3633hunk ./src/allmydata/mutable/publish.py 1081
3634             self.log("our testv failed, so the write did not happen",
3635                      parent=lp, level=log.WEIRD, umid="8sc26g")
3636             self.surprised = True
3637-            self.bad_peers.add(peerid) # don't ask them again
3638+            self.bad_peers.add(writer) # don't ask them again
3639             # use the checkstring to add information to the log message
3640             for (shnum,readv) in read_data.items():
3641                 checkstring = readv[0]
3642hunk ./src/allmydata/mutable/publish.py 1103
3643                 # if expected_version==None, then we didn't expect to see a
3644                 # share on that peer, and the 'surprise_shares' clause above
3645                 # will have logged it.
3646-            # self.loop() will take care of finding new homes
3647             return
3648 
3649hunk ./src/allmydata/mutable/publish.py 1105
3650-        for shnum in shnums:
3651-            self.placed.add( (peerid, shnum) )
3652-            # and update the servermap
3653-            self._servermap.add_new_share(peerid, shnum,
3654+        # and update the servermap
3655+        # self.versioninfo is set during the last phase of publishing.
3656+        # If we get there, we know that responses correspond to placed
3657+        # shares, and can safely execute these statements.
3658+        if self.versioninfo:
3659+            self.log("wrote successfully: adding new share to servermap")
3660+            self._servermap.add_new_share(peerid, writer.shnum,
3661                                           self.versioninfo, started)
3662hunk ./src/allmydata/mutable/publish.py 1113
3663-
3664-        # self.loop() will take care of checking to see if we're done
3665+            self.placed.add( (peerid, writer.shnum) )
3666+        self._update_status()
3667+        # the next method in the deferred chain will check to see if
3668+        # we're done and successful.
3669         return
3670 
3671hunk ./src/allmydata/mutable/publish.py 1119
3672-    def _got_write_error(self, f, peerid, shnums, started):
3673-        for shnum in shnums:
3674-            self.outstanding.discard( (peerid, shnum) )
3675-        self.bad_peers.add(peerid)
3676-        if self._first_write_error is None:
3677-            self._first_write_error = f
3678-        self.log(format="error while writing shares %(shnums)s to peerid %(peerid)s",
3679-                 shnums=list(shnums), peerid=idlib.shortnodeid_b2a(peerid),
3680-                 failure=f,
3681-                 level=log.UNUSUAL)
3682-        # self.loop() will take care of checking to see if we're done
3683-        return
3684-
3685 
3686     def _done(self, res):
3687         if not self._running:
3688hunk ./src/allmydata/mutable/publish.py 1126
3689         self._running = False
3690         now = time.time()
3691         self._status.timings["total"] = now - self._started
3692+
3693+        elapsed = now - self._started_pushing
3694+        self._status.timings['push'] = elapsed
3695+
3696         self._status.set_active(False)
3697hunk ./src/allmydata/mutable/publish.py 1131
3698-        if isinstance(res, failure.Failure):
3699-            self.log("Publish done, with failure", failure=res,
3700-                     level=log.WEIRD, umid="nRsR9Q")
3701-            self._status.set_status("Failed")
3702-        elif self.surprised:
3703-            self.log("Publish done, UncoordinatedWriteError", level=log.UNUSUAL)
3704-            self._status.set_status("UncoordinatedWriteError")
3705-            # deliver a failure
3706-            res = failure.Failure(UncoordinatedWriteError())
3707-            # TODO: recovery
3708-        else:
3709-            self.log("Publish done, success")
3710-            self._status.set_status("Finished")
3711-            self._status.set_progress(1.0)
3712+        self.log("Publish done, success")
3713+        self._status.set_status("Finished")
3714+        self._status.set_progress(1.0)
3715         eventually(self.done_deferred.callback, res)
3716 
3717hunk ./src/allmydata/mutable/publish.py 1136
3718+    def _failure(self):
3719+
3720+        if not self.surprised:
3721+            # We ran out of servers
3722+            self.log("Publish ran out of good servers, "
3723+                     "last failure was: %s" % str(self._last_failure))
3724+            e = NotEnoughServersError("Ran out of non-bad servers, "
3725+                                      "last failure was %s" %
3726+                                      str(self._last_failure))
3727+        else:
3728+            # We ran into shares that we didn't recognize, which means
3729+            # that we need to return an UncoordinatedWriteError.
3730+            self.log("Publish failed with UncoordinatedWriteError")
3731+            e = UncoordinatedWriteError()
3732+        f = failure.Failure(e)
3733+        eventually(self.done_deferred.callback, f)
3734+
3735+
3736+class MutableFileHandle:
3737+    """
3738+    I am a mutable uploadable built around a filehandle-like object,
3739+    usually either a StringIO instance or a handle to an actual file.
3740+    """
3741+    implements(IMutableUploadable)
3742+
3743+    def __init__(self, filehandle):
3744+        # The filehandle is defined as a generally file-like object that
3745+        # has these two methods. We don't care beyond that.
3746+        assert hasattr(filehandle, "read")
3747+        assert hasattr(filehandle, "close")
3748+
3749+        self._filehandle = filehandle
3750+        # We must start reading at the beginning of the file, or we risk
3751+        # encountering errors when the data read does not match the size
3752+        # reported to the uploader.
3753+        self._filehandle.seek(0)
3754+
3755+        # We have not yet read anything, so our position is 0.
3756+        self._marker = 0
3757+
3758+
3759+    def get_size(self):
3760+        """
3761+        I return the amount of data in my filehandle.
3762+        """
3763+        if not hasattr(self, "_size"):
3764+            old_position = self._filehandle.tell()
3765+            # Seek to the end of the file by seeking 0 bytes from the
3766+            # file's end
3767+            self._filehandle.seek(0, 2) # 2 == os.SEEK_END in 2.5+
3768+            self._size = self._filehandle.tell()
3769+            # Restore the previous position, in case this was called
3770+            # after a read.
3771+            self._filehandle.seek(old_position)
3772+            assert self._filehandle.tell() == old_position
3773+
3774+        assert hasattr(self, "_size")
3775+        return self._size
3776+
3777+
3778+    def pos(self):
3779+        """
3780+        I return the position of my read marker -- i.e., how much data I
3781+        have already read and returned to callers.
3782+        """
3783+        return self._marker
3784+
3785+
3786+    def read(self, length):
3787+        """
3788+        I return some data (up to length bytes) from my filehandle.
3789+
3790+        In most cases, I return length bytes, but sometimes I won't --
3791+        for example, if I am asked to read beyond the end of a file, or
3792+        an error occurs.
3793+        """
3794+        results = self._filehandle.read(length)
3795+        self._marker += len(results)
3796+        return [results]
3797+
3798+
3799+    def close(self):
3800+        """
3801+        I close the underlying filehandle. Any further operations on the
3802+        filehandle fail at this point.
3803+        """
3804+        self._filehandle.close()
3805+
3806+
3807+class MutableData(MutableFileHandle):
3808+    """
3809+    I am a mutable uploadable built around a string, which I then cast
3810+    into a StringIO and treat as a filehandle.
3811+    """
3812+
3813+    def __init__(self, s):
3814+        # Take a string and return a file-like uploadable.
3815+        assert isinstance(s, str)
3816+
3817+        MutableFileHandle.__init__(self, StringIO(s))
3818+
3819+
3820+class TransformingUploadable:
3821+    """
3822+    I am an IMutableUploadable that wraps another IMutableUploadable,
3823+    and some segments that are already on the grid. When I am called to
3824+    read, I handle merging of boundary segments.
3825+    """
3826+    implements(IMutableUploadable)
3827+
3828+
3829+    def __init__(self, data, offset, segment_size, start, end):
3830+        assert IMutableUploadable.providedBy(data)
3831+
3832+        self._newdata = data
3833+        self._offset = offset
3834+        self._segment_size = segment_size
3835+        self._start = start
3836+        self._end = end
3837+
3838+        self._read_marker = 0
3839+
3840+        self._first_segment_offset = offset % segment_size
3841+
3842+        num = self.log("TransformingUploadable: starting", parent=None)
3843+        self._log_number = num
3844+        self.log("got fso: %d" % self._first_segment_offset)
3845+        self.log("got offset: %d" % self._offset)
3846+
3847+
3848+    def log(self, *args, **kwargs):
3849+        if 'parent' not in kwargs:
3850+            kwargs['parent'] = self._log_number
3851+        if "facility" not in kwargs:
3852+            kwargs["facility"] = "tahoe.mutable.transforminguploadable"
3853+        return log.msg(*args, **kwargs)
3854+
3855+
3856+    def get_size(self):
3857+        return self._offset + self._newdata.get_size()
3858+
3859+
3860+    def read(self, length):
3861+        # We can get data from 3 sources here.
3862+        #   1. The first of the segments provided to us.
3863+        #   2. The data that we're replacing things with.
3864+        #   3. The last of the segments provided to us.
3865+
3866+        # are we in state 0?
3867+        self.log("reading %d bytes" % length)
3868+
3869+        old_start_data = ""
3870+        old_data_length = self._first_segment_offset - self._read_marker
3871+        if old_data_length > 0:
3872+            if old_data_length > length:
3873+                old_data_length = length
3874+            self.log("returning %d bytes of old start data" % old_data_length)
3875+
3876+            old_data_end = old_data_length + self._read_marker
3877+            old_start_data = self._start[self._read_marker:old_data_end]
3878+            length -= old_data_length
3879+        else:
3880+            # otherwise calculations later get screwed up.
3881+            old_data_length = 0
3882+
3883+        # Is there enough new data to satisfy this read? If not, we need
3884+        # to pad the end of the data with data from our last segment.
3885+        old_end_length = length - \
3886+            (self._newdata.get_size() - self._newdata.pos())
3887+        old_end_data = ""
3888+        if old_end_length > 0:
3889+            self.log("reading %d bytes of old end data" % old_end_length)
3890+
3891+            # TODO: We're not explicitly checking for tail segment size
3892+            # here. Is that a problem?
3893+            old_data_offset = (length - old_end_length + \
3894+                               old_data_length) % self._segment_size
3895+            self.log("reading at offset %d" % old_data_offset)
3896+            old_end = old_data_offset + old_end_length
3897+            old_end_data = self._end[old_data_offset:old_end]
3898+            length -= old_end_length
3899+            assert length == self._newdata.get_size() - self._newdata.pos()
3900+
3901+        self.log("reading %d bytes of new data" % length)
3902+        new_data = self._newdata.read(length)
3903+        new_data = "".join(new_data)
3904+
3905+        self._read_marker += len(old_start_data + new_data + old_end_data)
3906+
3907+        return old_start_data + new_data + old_end_data
3908 
3909hunk ./src/allmydata/mutable/publish.py 1327
3910+    def close(self):
3911+        pass
3912}
3913[nodemaker.py: Make nodemaker expose a way to create MDMF files
3914Kevan Carstensen <kevan@isnotajoke.com>**20100819003509
3915 Ignore-this: a6701746d6b992fc07bc0556a2b4a61d
3916] {
3917hunk ./src/allmydata/nodemaker.py 3
3918 import weakref
3919 from zope.interface import implements
3920-from allmydata.interfaces import INodeMaker
3921+from allmydata.util.assertutil import precondition
3922+from allmydata.interfaces import INodeMaker, SDMF_VERSION
3923 from allmydata.immutable.literal import LiteralFileNode
3924 from allmydata.immutable.filenode import ImmutableFileNode, CiphertextFileNode
3925 from allmydata.immutable.upload import Data
3926hunk ./src/allmydata/nodemaker.py 9
3927 from allmydata.mutable.filenode import MutableFileNode
3928+from allmydata.mutable.publish import MutableData
3929 from allmydata.dirnode import DirectoryNode, pack_children
3930 from allmydata.unknown import UnknownNode
3931 from allmydata import uri
3932hunk ./src/allmydata/nodemaker.py 92
3933             return self._create_dirnode(filenode)
3934         return None
3935 
3936-    def create_mutable_file(self, contents=None, keysize=None):
3937+    def create_mutable_file(self, contents=None, keysize=None,
3938+                            version=SDMF_VERSION):
3939         n = MutableFileNode(self.storage_broker, self.secret_holder,
3940                             self.default_encoding_parameters, self.history)
3941hunk ./src/allmydata/nodemaker.py 96
3942+        n.set_version(version)
3943         d = self.key_generator.generate(keysize)
3944         d.addCallback(n.create_with_keys, contents)
3945         d.addCallback(lambda res: n)
3946hunk ./src/allmydata/nodemaker.py 103
3947         return d
3948 
3949     def create_new_mutable_directory(self, initial_children={}):
3950+        # mutable directories will always be SDMF for now, to help
3951+        # compatibility with older clients.
3952+        version = SDMF_VERSION
3953+        # initial_children must have metadata (i.e. {} instead of None)
3954+        for (name, (node, metadata)) in initial_children.iteritems():
3955+            precondition(isinstance(metadata, dict),
3956+                         "create_new_mutable_directory requires metadata to be a dict, not None", metadata)
3957+            node.raise_error()
3958         d = self.create_mutable_file(lambda n:
3959hunk ./src/allmydata/nodemaker.py 112
3960-                                     pack_children(initial_children, n.get_writekey()))
3961+                                     MutableData(pack_children(initial_children,
3962+                                                    n.get_writekey())),
3963+                                     version=version)
3964         d.addCallback(self._create_dirnode)
3965         return d
3966 
3967}
3968[docs: update docs to mention MDMF
3969Kevan Carstensen <kevan@isnotajoke.com>**20100814225644
3970 Ignore-this: 1c3caa3cd44831007dcfbef297814308
3971] {
3972merger 0.0 (
3973replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
3974merger 0.0 (
3975hunk ./docs/configuration.rst 383
3976-shares.needed = (int, optional) aka "k", default 3
3977-shares.total = (int, optional) aka "N", N >= k, default 10
3978-shares.happy = (int, optional) 1 <= happy <= N, default 7
3979-
3980- These three values set the default encoding parameters. Each time a new file
3981- is uploaded, erasure-coding is used to break the ciphertext into separate
3982- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
3983- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
3984- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
3985- Setting k to 1 is equivalent to simple replication (uploading N copies of
3986- the file).
3987-
3988- These values control the tradeoff between storage overhead, performance, and
3989- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
3990- backend storage space (the actual value will be a bit more, because of other
3991- forms of overhead). Up to N-k shares can be lost before the file becomes
3992- unrecoverable, so assuming there are at least N servers, up to N-k servers
3993- can be offline without losing the file. So large N/k ratios are more
3994- reliable, and small N/k ratios use less disk space. Clearly, k must never be
3995- smaller than N.
3996-
3997- Large values of N will slow down upload operations slightly, since more
3998- servers must be involved, and will slightly increase storage overhead due to
3999- the hash trees that are created. Large values of k will cause downloads to
4000- be marginally slower, because more servers must be involved. N cannot be
4001- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
4002- uses.
4003-
4004- shares.happy allows you control over the distribution of your immutable file.
4005- For a successful upload, shares are guaranteed to be initially placed on
4006- at least 'shares.happy' distinct servers, the correct functioning of any
4007- k of which is sufficient to guarantee the availability of the uploaded file.
4008- This value should not be larger than the number of servers on your grid.
4009-
4010- A value of shares.happy <= k is allowed, but does not provide any redundancy
4011- if some servers fail or lose shares.
4012-
4013- (Mutable files use a different share placement algorithm that does not
4014-  consider this parameter.)
4015-
4016-
4017-== Storage Server Configuration ==
4018-
4019-[storage]
4020-enabled = (boolean, optional)
4021-
4022- If this is True, the node will run a storage server, offering space to other
4023- clients. If it is False, the node will not run a storage server, meaning
4024- that no shares will be stored on this node. Use False this for clients who
4025- do not wish to provide storage service. The default value is True.
4026-
4027-readonly = (boolean, optional)
4028-
4029- If True, the node will run a storage server but will not accept any shares,
4030- making it effectively read-only. Use this for storage servers which are
4031- being decommissioned: the storage/ directory could be mounted read-only,
4032- while shares are moved to other servers. Note that this currently only
4033- affects immutable shares. Mutable shares (used for directories) will be
4034- written and modified anyway. See ticket #390 for the current status of this
4035- bug. The default value is False.
4036-
4037-reserved_space = (str, optional)
4038-
4039- If provided, this value defines how much disk space is reserved: the storage
4040- server will not accept any share which causes the amount of free disk space
4041- to drop below this value. (The free space is measured by a call to statvfs(2)
4042- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
4043- user account under which the storage server runs.)
4044-
4045- This string contains a number, with an optional case-insensitive scale
4046- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
4047- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
4048- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
4049-
4050-expire.enabled =
4051-expire.mode =
4052-expire.override_lease_duration =
4053-expire.cutoff_date =
4054-expire.immutable =
4055-expire.mutable =
4056-
4057- These settings control garbage-collection, in which the server will delete
4058- shares that no longer have an up-to-date lease on them. Please see the
4059- neighboring "garbage-collection.txt" document for full details.
4060-
4061-
4062-== Running A Helper ==
4063+Running A Helper
4064+================
4065hunk ./docs/configuration.rst 423
4066+mutable.format = sdmf or mdmf
4067+
4068+ This value tells Tahoe-LAFS what the default mutable file format should
4069+ be. If mutable.format=sdmf, then newly created mutable files will be in
4070+ the old SDMF format. This is desirable for clients that operate on
4071+ grids where some peers run older versions of Tahoe-LAFS, as these older
4072+ versions cannot read the new MDMF mutable file format. If
4073+ mutable.format = mdmf, then newly created mutable files will use the
4074+ new MDMF format, which supports efficient in-place modification and
4075+ streaming downloads. You can overwrite this value using a special
4076+ mutable-type parameter in the webapi. If you do not specify a value
4077+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
4078+
4079+ Note that this parameter only applies to mutable files. Mutable
4080+ directories, which are stored as mutable files, are not controlled by
4081+ this parameter and will always use SDMF. We may revisit this decision
4082+ in future versions of Tahoe-LAFS.
4083)
4084)
4085hunk ./docs/frontends/webapi.rst 363
4086  writeable mutable file, that file's contents will be overwritten in-place. If
4087  it is a read-cap for a mutable file, an error will occur. If it is an
4088  immutable file, the old file will be discarded, and a new one will be put in
4089- its place.
4090+ its place. If the target file is a writable mutable file, you may also
4091+ specify an "offset" parameter -- a byte offset that determines where in
4092+ the mutable file the data from the HTTP request body is placed. This
4093+ operation is relatively efficient for MDMF mutable files, and is
4094+ relatively inefficient (but still supported) for SDMF mutable files.
4095 
4096  When creating a new file, if "mutable=true" is in the query arguments, the
4097  operation will create a mutable file instead of an immutable one.
4098hunk ./docs/frontends/webapi.rst 388
4099 
4100  If "mutable=true" is in the query arguments, the operation will create a
4101  mutable file, and return its write-cap in the HTTP respose. The default is
4102- to create an immutable file, returning the read-cap as a response.
4103+ to create an immutable file, returning the read-cap as a response. If
4104+ you create a mutable file, you can also use the "mutable-type" query
4105+ parameter. If "mutable-type=sdmf", then the mutable file will be created
4106+ in the old SDMF mutable file format. This is desirable for files that
4107+ need to be read by old clients. If "mutable-type=mdmf", then the file
4108+ will be created in the new MDMF mutable file format. MDMF mutable files
4109+ can be downloaded more efficiently, and modified in-place efficiently,
4110+ but are not compatible with older versions of Tahoe-LAFS. If no
4111+ "mutable-type" argument is given, the file is created in whatever
4112+ format was configured in tahoe.cfg.
4113 
4114 Creating A New Directory
4115 ------------------------
4116hunk ./docs/frontends/webapi.rst 1082
4117  If a "mutable=true" argument is provided, the operation will create a
4118  mutable file, and the response body will contain the write-cap instead of
4119  the upload results page. The default is to create an immutable file,
4120- returning the upload results page as a response.
4121+ returning the upload results page as a response. If you create a
4122+ mutable file, you may choose to specify the format of that mutable file
4123+ with the "mutable-type" parameter. If "mutable-type=mdmf", then the
4124+ file will be created as an MDMF mutable file. If "mutable-type=sdmf",
4125+ then the file will be created as an SDMF mutable file. If no value is
4126+ specified, the file will be created in whatever format is specified in
4127+ tahoe.cfg.
4128 
4129 
4130 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=upload``
4131}
4132[mutable/layout.py and interfaces.py: add MDMF writer and reader
4133Kevan Carstensen <kevan@isnotajoke.com>**20100819003304
4134 Ignore-this: 44400fec923987b62830da2ed5075fb4
4135 
4136 The MDMF writer is responsible for keeping state as plaintext is
4137 gradually processed into share data by the upload process. When the
4138 upload finishes, it will write all of its share data to a remote server,
4139 reporting its status back to the publisher.
4140 
4141 The MDMF reader is responsible for abstracting an MDMF file as it sits
4142 on the grid from the downloader; specifically, by receiving and
4143 responding to requests for arbitrary data within the MDMF file.
4144 
4145 The interfaces.py file has also been modified to contain an interface
4146 for the writer.
4147] {
4148hunk ./src/allmydata/interfaces.py 7
4149      ChoiceOf, IntegerConstraint, Any, RemoteInterface, Referenceable
4150 
4151 HASH_SIZE=32
4152+SALT_SIZE=16
4153+
4154+SDMF_VERSION=0
4155+MDMF_VERSION=1
4156 
4157 Hash = StringConstraint(maxLength=HASH_SIZE,
4158                         minLength=HASH_SIZE)# binary format 32-byte SHA256 hash
4159hunk ./src/allmydata/interfaces.py 424
4160         """
4161 
4162 
4163+class IMutableSlotWriter(Interface):
4164+    """
4165+    The interface for a writer around a mutable slot on a remote server.
4166+    """
4167+    def set_checkstring(checkstring, *args):
4168+        """
4169+        Set the checkstring that I will pass to the remote server when
4170+        writing.
4171+
4172+            @param checkstring A packed checkstring to use.
4173+
4174+        Note that implementations can differ in which semantics they
4175+        wish to support for set_checkstring -- they can, for example,
4176+        build the checkstring themselves from its constituents, or
4177+        some other thing.
4178+        """
4179+
4180+    def get_checkstring():
4181+        """
4182+        Get the checkstring that I think currently exists on the remote
4183+        server.
4184+        """
4185+
4186+    def put_block(data, segnum, salt):
4187+        """
4188+        Add a block and salt to the share.
4189+        """
4190+
4191+    def put_encprivey(encprivkey):
4192+        """
4193+        Add the encrypted private key to the share.
4194+        """
4195+
4196+    def put_blockhashes(blockhashes=list):
4197+        """
4198+        Add the block hash tree to the share.
4199+        """
4200+
4201+    def put_sharehashes(sharehashes=dict):
4202+        """
4203+        Add the share hash chain to the share.
4204+        """
4205+
4206+    def get_signable():
4207+        """
4208+        Return the part of the share that needs to be signed.
4209+        """
4210+
4211+    def put_signature(signature):
4212+        """
4213+        Add the signature to the share.
4214+        """
4215+
4216+    def put_verification_key(verification_key):
4217+        """
4218+        Add the verification key to the share.
4219+        """
4220+
4221+    def finish_publishing():
4222+        """
4223+        Do anything necessary to finish writing the share to a remote
4224+        server. I require that no further publishing needs to take place
4225+        after this method has been called.
4226+        """
4227+
4228+
4229 class IURI(Interface):
4230     def init_from_string(uri):
4231         """Accept a string (as created by my to_string() method) and populate
4232hunk ./src/allmydata/mutable/layout.py 4
4233 
4234 import struct
4235 from allmydata.mutable.common import NeedMoreDataError, UnknownVersionError
4236+from allmydata.interfaces import HASH_SIZE, SALT_SIZE, SDMF_VERSION, \
4237+                                 MDMF_VERSION, IMutableSlotWriter
4238+from allmydata.util import mathutil, observer
4239+from twisted.python import failure
4240+from twisted.internet import defer
4241+from zope.interface import implements
4242+
4243+
4244+# These strings describe the format of the packed structs they help process
4245+# Here's what they mean:
4246+#
4247+#  PREFIX:
4248+#    >: Big-endian byte order; the most significant byte is first (leftmost).
4249+#    B: The version information; an 8 bit version identifier. Stored as
4250+#       an unsigned char. This is currently 00 00 00 00; our modifications
4251+#       will turn it into 00 00 00 01.
4252+#    Q: The sequence number; this is sort of like a revision history for
4253+#       mutable files; they start at 1 and increase as they are changed after
4254+#       being uploaded. Stored as an unsigned long long, which is 8 bytes in
4255+#       length.
4256+#  32s: The root hash of the share hash tree. We use sha-256d, so we use 32
4257+#       characters = 32 bytes to store the value.
4258+#  16s: The salt for the readkey. This is a 16-byte random value, stored as
4259+#       16 characters.
4260+#
4261+#  SIGNED_PREFIX additions, things that are covered by the signature:
4262+#    B: The "k" encoding parameter. We store this as an 8-bit character,
4263+#       which is convenient because our erasure coding scheme cannot
4264+#       encode if you ask for more than 255 pieces.
4265+#    B: The "N" encoding parameter. Stored as an 8-bit character for the
4266+#       same reasons as above.
4267+#    Q: The segment size of the uploaded file. This will essentially be the
4268+#       length of the file in SDMF. An unsigned long long, so we can store
4269+#       files of quite large size.
4270+#    Q: The data length of the uploaded file. Modulo padding, this will be
4271+#       the same of the data length field. Like the data length field, it is
4272+#       an unsigned long long and can be quite large.
4273+#
4274+#   HEADER additions:
4275+#     L: The offset of the signature of this. An unsigned long.
4276+#     L: The offset of the share hash chain. An unsigned long.
4277+#     L: The offset of the block hash tree. An unsigned long.
4278+#     L: The offset of the share data. An unsigned long.
4279+#     Q: The offset of the encrypted private key. An unsigned long long, to
4280+#        account for the possibility of a lot of share data.
4281+#     Q: The offset of the EOF. An unsigned long long, to account for the
4282+#        possibility of a lot of share data.
4283+#
4284+#  After all of these, we have the following:
4285+#    - The verification key: Occupies the space between the end of the header
4286+#      and the start of the signature (i.e.: data[HEADER_LENGTH:o['signature']].
4287+#    - The signature, which goes from the signature offset to the share hash
4288+#      chain offset.
4289+#    - The share hash chain, which goes from the share hash chain offset to
4290+#      the block hash tree offset.
4291+#    - The share data, which goes from the share data offset to the encrypted
4292+#      private key offset.
4293+#    - The encrypted private key offset, which goes until the end of the file.
4294+#
4295+#  The block hash tree in this encoding has only one share, so the offset of
4296+#  the share data will be 32 bits more than the offset of the block hash tree.
4297+#  Given this, we may need to check to see how many bytes a reasonably sized
4298+#  block hash tree will take up.
4299 
4300 PREFIX = ">BQ32s16s" # each version has a different prefix
4301 SIGNED_PREFIX = ">BQ32s16s BBQQ" # this is covered by the signature
4302hunk ./src/allmydata/mutable/layout.py 73
4303 SIGNED_PREFIX_LENGTH = struct.calcsize(SIGNED_PREFIX)
4304 HEADER = ">BQ32s16s BBQQ LLLLQQ" # includes offsets
4305 HEADER_LENGTH = struct.calcsize(HEADER)
4306+OFFSETS = ">LLLLQQ"
4307+OFFSETS_LENGTH = struct.calcsize(OFFSETS)
4308 
4309hunk ./src/allmydata/mutable/layout.py 76
4310+# These are still used for some tests.
4311 def unpack_header(data):
4312     o = {}
4313     (version,
4314hunk ./src/allmydata/mutable/layout.py 92
4315      o['EOF']) = struct.unpack(HEADER, data[:HEADER_LENGTH])
4316     return (version, seqnum, root_hash, IV, k, N, segsize, datalen, o)
4317 
4318-def unpack_prefix_and_signature(data):
4319-    assert len(data) >= HEADER_LENGTH, len(data)
4320-    prefix = data[:SIGNED_PREFIX_LENGTH]
4321-
4322-    (version,
4323-     seqnum,
4324-     root_hash,
4325-     IV,
4326-     k, N, segsize, datalen,
4327-     o) = unpack_header(data)
4328-
4329-    if version != 0:
4330-        raise UnknownVersionError("got mutable share version %d, but I only understand version 0" % version)
4331-
4332-    if len(data) < o['share_hash_chain']:
4333-        raise NeedMoreDataError(o['share_hash_chain'],
4334-                                o['enc_privkey'], o['EOF']-o['enc_privkey'])
4335-
4336-    pubkey_s = data[HEADER_LENGTH:o['signature']]
4337-    signature = data[o['signature']:o['share_hash_chain']]
4338-
4339-    return (seqnum, root_hash, IV, k, N, segsize, datalen,
4340-            pubkey_s, signature, prefix)
4341-
4342 def unpack_share(data):
4343     assert len(data) >= HEADER_LENGTH
4344     o = {}
4345hunk ./src/allmydata/mutable/layout.py 139
4346             pubkey, signature, share_hash_chain, block_hash_tree,
4347             share_data, enc_privkey)
4348 
4349-def unpack_share_data(verinfo, hash_and_data):
4350-    (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, o_t) = verinfo
4351-
4352-    # hash_and_data starts with the share_hash_chain, so figure out what the
4353-    # offsets really are
4354-    o = dict(o_t)
4355-    o_share_hash_chain = 0
4356-    o_block_hash_tree = o['block_hash_tree'] - o['share_hash_chain']
4357-    o_share_data = o['share_data'] - o['share_hash_chain']
4358-    o_enc_privkey = o['enc_privkey'] - o['share_hash_chain']
4359-
4360-    share_hash_chain_s = hash_and_data[o_share_hash_chain:o_block_hash_tree]
4361-    share_hash_format = ">H32s"
4362-    hsize = struct.calcsize(share_hash_format)
4363-    assert len(share_hash_chain_s) % hsize == 0, len(share_hash_chain_s)
4364-    share_hash_chain = []
4365-    for i in range(0, len(share_hash_chain_s), hsize):
4366-        chunk = share_hash_chain_s[i:i+hsize]
4367-        (hid, h) = struct.unpack(share_hash_format, chunk)
4368-        share_hash_chain.append( (hid, h) )
4369-    share_hash_chain = dict(share_hash_chain)
4370-    block_hash_tree_s = hash_and_data[o_block_hash_tree:o_share_data]
4371-    assert len(block_hash_tree_s) % 32 == 0, len(block_hash_tree_s)
4372-    block_hash_tree = []
4373-    for i in range(0, len(block_hash_tree_s), 32):
4374-        block_hash_tree.append(block_hash_tree_s[i:i+32])
4375-
4376-    share_data = hash_and_data[o_share_data:o_enc_privkey]
4377-
4378-    return (share_hash_chain, block_hash_tree, share_data)
4379-
4380-
4381-def pack_checkstring(seqnum, root_hash, IV):
4382-    return struct.pack(PREFIX,
4383-                       0, # version,
4384-                       seqnum,
4385-                       root_hash,
4386-                       IV)
4387-
4388 def unpack_checkstring(checkstring):
4389     cs_len = struct.calcsize(PREFIX)
4390     version, seqnum, root_hash, IV = struct.unpack(PREFIX, checkstring[:cs_len])
4391hunk ./src/allmydata/mutable/layout.py 146
4392         raise UnknownVersionError("got mutable share version %d, but I only understand version 0" % version)
4393     return (seqnum, root_hash, IV)
4394 
4395-def pack_prefix(seqnum, root_hash, IV,
4396-                required_shares, total_shares,
4397-                segment_size, data_length):
4398-    prefix = struct.pack(SIGNED_PREFIX,
4399-                         0, # version,
4400-                         seqnum,
4401-                         root_hash,
4402-                         IV,
4403-
4404-                         required_shares,
4405-                         total_shares,
4406-                         segment_size,
4407-                         data_length,
4408-                         )
4409-    return prefix
4410 
4411 def pack_offsets(verification_key_length, signature_length,
4412                  share_hash_chain_length, block_hash_tree_length,
4413hunk ./src/allmydata/mutable/layout.py 192
4414                            encprivkey])
4415     return final_share
4416 
4417+def pack_prefix(seqnum, root_hash, IV,
4418+                required_shares, total_shares,
4419+                segment_size, data_length):
4420+    prefix = struct.pack(SIGNED_PREFIX,
4421+                         0, # version,
4422+                         seqnum,
4423+                         root_hash,
4424+                         IV,
4425+                         required_shares,
4426+                         total_shares,
4427+                         segment_size,
4428+                         data_length,
4429+                         )
4430+    return prefix
4431+
4432+
4433+class SDMFSlotWriteProxy:
4434+    implements(IMutableSlotWriter)
4435+    """
4436+    I represent a remote write slot for an SDMF mutable file. I build a
4437+    share in memory, and then write it in one piece to the remote
4438+    server. This mimics how SDMF shares were built before MDMF (and the
4439+    new MDMF uploader), but provides that functionality in a way that
4440+    allows the MDMF uploader to be built without much special-casing for
4441+    file format, which makes the uploader code more readable.
4442+    """
4443+    def __init__(self,
4444+                 shnum,
4445+                 rref, # a remote reference to a storage server
4446+                 storage_index,
4447+                 secrets, # (write_enabler, renew_secret, cancel_secret)
4448+                 seqnum, # the sequence number of the mutable file
4449+                 required_shares,
4450+                 total_shares,
4451+                 segment_size,
4452+                 data_length): # the length of the original file
4453+        self.shnum = shnum
4454+        self._rref = rref
4455+        self._storage_index = storage_index
4456+        self._secrets = secrets
4457+        self._seqnum = seqnum
4458+        self._required_shares = required_shares
4459+        self._total_shares = total_shares
4460+        self._segment_size = segment_size
4461+        self._data_length = data_length
4462+
4463+        # This is an SDMF file, so it should have only one segment, so,
4464+        # modulo padding of the data length, the segment size and the
4465+        # data length should be the same.
4466+        expected_segment_size = mathutil.next_multiple(data_length,
4467+                                                       self._required_shares)
4468+        assert expected_segment_size == segment_size
4469+
4470+        self._block_size = self._segment_size / self._required_shares
4471+
4472+        # This is meant to mimic how SDMF files were built before MDMF
4473+        # entered the picture: we generate each share in its entirety,
4474+        # then push it off to the storage server in one write. When
4475+        # callers call set_*, they are just populating this dict.
4476+        # finish_publishing will stitch these pieces together into a
4477+        # coherent share, and then write the coherent share to the
4478+        # storage server.
4479+        self._share_pieces = {}
4480+
4481+        # This tells the write logic what checkstring to use when
4482+        # writing remote shares.
4483+        self._testvs = []
4484+
4485+        self._readvs = [(0, struct.calcsize(PREFIX))]
4486+
4487+
4488+    def set_checkstring(self, checkstring_or_seqnum,
4489+                              root_hash=None,
4490+                              salt=None):
4491+        """
4492+        Set the checkstring that I will pass to the remote server when
4493+        writing.
4494+
4495+            @param checkstring_or_seqnum: A packed checkstring to use,
4496+                   or a sequence number. I will treat this as a checkstr
4497+
4498+        Note that implementations can differ in which semantics they
4499+        wish to support for set_checkstring -- they can, for example,
4500+        build the checkstring themselves from its constituents, or
4501+        some other thing.
4502+        """
4503+        if root_hash and salt:
4504+            checkstring = struct.pack(PREFIX,
4505+                                      0,
4506+                                      checkstring_or_seqnum,
4507+                                      root_hash,
4508+                                      salt)
4509+        else:
4510+            checkstring = checkstring_or_seqnum
4511+        self._testvs = [(0, len(checkstring), "eq", checkstring)]
4512+
4513+
4514+    def get_checkstring(self):
4515+        """
4516+        Get the checkstring that I think currently exists on the remote
4517+        server.
4518+        """
4519+        if self._testvs:
4520+            return self._testvs[0][3]
4521+        return ""
4522+
4523+
4524+    def put_block(self, data, segnum, salt):
4525+        """
4526+        Add a block and salt to the share.
4527+        """
4528+        # SDMF files have only one segment
4529+        assert segnum == 0
4530+        assert len(data) == self._block_size
4531+        assert len(salt) == SALT_SIZE
4532+
4533+        self._share_pieces['sharedata'] = data
4534+        self._share_pieces['salt'] = salt
4535+
4536+        # TODO: Figure out something intelligent to return.
4537+        return defer.succeed(None)
4538+
4539+
4540+    def put_encprivkey(self, encprivkey):
4541+        """
4542+        Add the encrypted private key to the share.
4543+        """
4544+        self._share_pieces['encprivkey'] = encprivkey
4545+
4546+        return defer.succeed(None)
4547+
4548+
4549+    def put_blockhashes(self, blockhashes):
4550+        """
4551+        Add the block hash tree to the share.
4552+        """
4553+        assert isinstance(blockhashes, list)
4554+        for h in blockhashes:
4555+            assert len(h) == HASH_SIZE
4556+
4557+        # serialize the blockhashes, then set them.
4558+        blockhashes_s = "".join(blockhashes)
4559+        self._share_pieces['block_hash_tree'] = blockhashes_s
4560+
4561+        return defer.succeed(None)
4562+
4563+
4564+    def put_sharehashes(self, sharehashes):
4565+        """
4566+        Add the share hash chain to the share.
4567+        """
4568+        assert isinstance(sharehashes, dict)
4569+        for h in sharehashes.itervalues():
4570+            assert len(h) == HASH_SIZE
4571+
4572+        # serialize the sharehashes, then set them.
4573+        sharehashes_s = "".join([struct.pack(">H32s", i, sharehashes[i])
4574+                                 for i in sorted(sharehashes.keys())])
4575+        self._share_pieces['share_hash_chain'] = sharehashes_s
4576+
4577+        return defer.succeed(None)
4578+
4579+
4580+    def put_root_hash(self, root_hash):
4581+        """
4582+        Add the root hash to the share.
4583+        """
4584+        assert len(root_hash) == HASH_SIZE
4585+
4586+        self._share_pieces['root_hash'] = root_hash
4587+
4588+        return defer.succeed(None)
4589+
4590+
4591+    def put_salt(self, salt):
4592+        """
4593+        Add a salt to an empty SDMF file.
4594+        """
4595+        assert len(salt) == SALT_SIZE
4596+
4597+        self._share_pieces['salt'] = salt
4598+        self._share_pieces['sharedata'] = ""
4599+
4600+
4601+    def get_signable(self):
4602+        """
4603+        Return the part of the share that needs to be signed.
4604+
4605+        SDMF writers need to sign the packed representation of the
4606+        first eight fields of the remote share, that is:
4607+            - version number (0)
4608+            - sequence number
4609+            - root of the share hash tree
4610+            - salt
4611+            - k
4612+            - n
4613+            - segsize
4614+            - datalen
4615+
4616+        This method is responsible for returning that to callers.
4617+        """
4618+        return struct.pack(SIGNED_PREFIX,
4619+                           0,
4620+                           self._seqnum,
4621+                           self._share_pieces['root_hash'],
4622+                           self._share_pieces['salt'],
4623+                           self._required_shares,
4624+                           self._total_shares,
4625+                           self._segment_size,
4626+                           self._data_length)
4627+
4628+
4629+    def put_signature(self, signature):
4630+        """
4631+        Add the signature to the share.
4632+        """
4633+        self._share_pieces['signature'] = signature
4634+
4635+        return defer.succeed(None)
4636+
4637+
4638+    def put_verification_key(self, verification_key):
4639+        """
4640+        Add the verification key to the share.
4641+        """
4642+        self._share_pieces['verification_key'] = verification_key
4643+
4644+        return defer.succeed(None)
4645+
4646+
4647+    def get_verinfo(self):
4648+        """
4649+        I return my verinfo tuple. This is used by the ServermapUpdater
4650+        to keep track of versions of mutable files.
4651+
4652+        The verinfo tuple for MDMF files contains:
4653+            - seqnum
4654+            - root hash
4655+            - a blank (nothing)
4656+            - segsize
4657+            - datalen
4658+            - k
4659+            - n
4660+            - prefix (the thing that you sign)
4661+            - a tuple of offsets
4662+
4663+        We include the nonce in MDMF to simplify processing of version
4664+        information tuples.
4665+
4666+        The verinfo tuple for SDMF files is the same, but contains a
4667+        16-byte IV instead of a hash of salts.
4668+        """
4669+        return (self._seqnum,
4670+                self._share_pieces['root_hash'],
4671+                self._share_pieces['salt'],
4672+                self._segment_size,
4673+                self._data_length,
4674+                self._required_shares,
4675+                self._total_shares,
4676+                self.get_signable(),
4677+                self._get_offsets_tuple())
4678+
4679+    def _get_offsets_dict(self):
4680+        post_offset = HEADER_LENGTH
4681+        offsets = {}
4682+
4683+        verification_key_length = len(self._share_pieces['verification_key'])
4684+        o1 = offsets['signature'] = post_offset + verification_key_length
4685+
4686+        signature_length = len(self._share_pieces['signature'])
4687+        o2 = offsets['share_hash_chain'] = o1 + signature_length
4688+
4689+        share_hash_chain_length = len(self._share_pieces['share_hash_chain'])
4690+        o3 = offsets['block_hash_tree'] = o2 + share_hash_chain_length
4691+
4692+        block_hash_tree_length = len(self._share_pieces['block_hash_tree'])
4693+        o4 = offsets['share_data'] = o3 + block_hash_tree_length
4694+
4695+        share_data_length = len(self._share_pieces['sharedata'])
4696+        o5 = offsets['enc_privkey'] = o4 + share_data_length
4697+
4698+        encprivkey_length = len(self._share_pieces['encprivkey'])
4699+        offsets['EOF'] = o5 + encprivkey_length
4700+        return offsets
4701+
4702+
4703+    def _get_offsets_tuple(self):
4704+        offsets = self._get_offsets_dict()
4705+        return tuple([(key, value) for key, value in offsets.items()])
4706+
4707+
4708+    def _pack_offsets(self):
4709+        offsets = self._get_offsets_dict()
4710+        return struct.pack(">LLLLQQ",
4711+                           offsets['signature'],
4712+                           offsets['share_hash_chain'],
4713+                           offsets['block_hash_tree'],
4714+                           offsets['share_data'],
4715+                           offsets['enc_privkey'],
4716+                           offsets['EOF'])
4717+
4718+
4719+    def finish_publishing(self):
4720+        """
4721+        Do anything necessary to finish writing the share to a remote
4722+        server. I require that no further publishing needs to take place
4723+        after this method has been called.
4724+        """
4725+        for k in ["sharedata", "encprivkey", "signature", "verification_key",
4726+                  "share_hash_chain", "block_hash_tree"]:
4727+            assert k in self._share_pieces
4728+        # This is the only method that actually writes something to the
4729+        # remote server.
4730+        # First, we need to pack the share into data that we can write
4731+        # to the remote server in one write.
4732+        offsets = self._pack_offsets()
4733+        prefix = self.get_signable()
4734+        final_share = "".join([prefix,
4735+                               offsets,
4736+                               self._share_pieces['verification_key'],
4737+                               self._share_pieces['signature'],
4738+                               self._share_pieces['share_hash_chain'],
4739+                               self._share_pieces['block_hash_tree'],
4740+                               self._share_pieces['sharedata'],
4741+                               self._share_pieces['encprivkey']])
4742+
4743+        # Our only data vector is going to be writing the final share,
4744+        # in its entirely.
4745+        datavs = [(0, final_share)]
4746+
4747+        if not self._testvs:
4748+            # Our caller has not provided us with another checkstring
4749+            # yet, so we assume that we are writing a new share, and set
4750+            # a test vector that will allow a new share to be written.
4751+            self._testvs = []
4752+            self._testvs.append(tuple([0, 1, "eq", ""]))
4753+
4754+        tw_vectors = {}
4755+        tw_vectors[self.shnum] = (self._testvs, datavs, None)
4756+        return self._rref.callRemote("slot_testv_and_readv_and_writev",
4757+                                     self._storage_index,
4758+                                     self._secrets,
4759+                                     tw_vectors,
4760+                                     # TODO is it useful to read something?
4761+                                     self._readvs)
4762+
4763+
4764+MDMFHEADER = ">BQ32sBBQQ QQQQQQ"
4765+MDMFHEADERWITHOUTOFFSETS = ">BQ32sBBQQ"
4766+MDMFHEADERSIZE = struct.calcsize(MDMFHEADER)
4767+MDMFHEADERWITHOUTOFFSETSSIZE = struct.calcsize(MDMFHEADERWITHOUTOFFSETS)
4768+MDMFCHECKSTRING = ">BQ32s"
4769+MDMFSIGNABLEHEADER = ">BQ32sBBQQ"
4770+MDMFOFFSETS = ">QQQQQQ"
4771+MDMFOFFSETS_LENGTH = struct.calcsize(MDMFOFFSETS)
4772+
4773+class MDMFSlotWriteProxy:
4774+    implements(IMutableSlotWriter)
4775+
4776+    """
4777+    I represent a remote write slot for an MDMF mutable file.
4778+
4779+    I abstract away from my caller the details of block and salt
4780+    management, and the implementation of the on-disk format for MDMF
4781+    shares.
4782+    """
4783+    # Expected layout, MDMF:
4784+    # offset:     size:       name:
4785+    #-- signed part --
4786+    # 0           1           version number (01)
4787+    # 1           8           sequence number
4788+    # 9           32          share tree root hash
4789+    # 41          1           The "k" encoding parameter
4790+    # 42          1           The "N" encoding parameter
4791+    # 43          8           The segment size of the uploaded file
4792+    # 51          8           The data length of the original plaintext
4793+    #-- end signed part --
4794+    # 59          8           The offset of the encrypted private key
4795+    # 83          8           The offset of the signature
4796+    # 91          8           The offset of the verification key
4797+    # 67          8           The offset of the block hash tree
4798+    # 75          8           The offset of the share hash chain
4799+    # 99          8           The offset of the EOF
4800+    #
4801+    # followed by salts and share data, the encrypted private key, the
4802+    # block hash tree, the salt hash tree, the share hash chain, a
4803+    # signature over the first eight fields, and a verification key.
4804+    #
4805+    # The checkstring is the first three fields -- the version number,
4806+    # sequence number, root hash and root salt hash. This is consistent
4807+    # in meaning to what we have with SDMF files, except now instead of
4808+    # using the literal salt, we use a value derived from all of the
4809+    # salts -- the share hash root.
4810+    #
4811+    # The salt is stored before the block for each segment. The block
4812+    # hash tree is computed over the combination of block and salt for
4813+    # each segment. In this way, we get integrity checking for both
4814+    # block and salt with the current block hash tree arrangement.
4815+    #
4816+    # The ordering of the offsets is different to reflect the dependencies
4817+    # that we'll run into with an MDMF file. The expected write flow is
4818+    # something like this:
4819+    #
4820+    #   0: Initialize with the sequence number, encoding parameters and
4821+    #      data length. From this, we can deduce the number of segments,
4822+    #      and where they should go.. We can also figure out where the
4823+    #      encrypted private key should go, because we can figure out how
4824+    #      big the share data will be.
4825+    #
4826+    #   1: Encrypt, encode, and upload the file in chunks. Do something
4827+    #      like
4828+    #
4829+    #       put_block(data, segnum, salt)
4830+    #
4831+    #      to write a block and a salt to the disk. We can do both of
4832+    #      these operations now because we have enough of the offsets to
4833+    #      know where to put them.
4834+    #
4835+    #   2: Put the encrypted private key. Use:
4836+    #
4837+    #        put_encprivkey(encprivkey)
4838+    #
4839+    #      Now that we know the length of the private key, we can fill
4840+    #      in the offset for the block hash tree.
4841+    #
4842+    #   3: We're now in a position to upload the block hash tree for
4843+    #      a share. Put that using something like:
4844+    #       
4845+    #        put_blockhashes(block_hash_tree)
4846+    #
4847+    #      Note that block_hash_tree is a list of hashes -- we'll take
4848+    #      care of the details of serializing that appropriately. When
4849+    #      we get the block hash tree, we are also in a position to
4850+    #      calculate the offset for the share hash chain, and fill that
4851+    #      into the offsets table.
4852+    #
4853+    #   4: At the same time, we're in a position to upload the salt hash
4854+    #      tree. This is a Merkle tree over all of the salts. We use a
4855+    #      Merkle tree so that we can validate each block,salt pair as
4856+    #      we download them later. We do this using
4857+    #
4858+    #        put_salthashes(salt_hash_tree)
4859+    #
4860+    #      When you do this, I automatically put the root of the tree
4861+    #      (the hash at index 0 of the list) in its appropriate slot in
4862+    #      the signed prefix of the share.
4863+    #
4864+    #   5: We're now in a position to upload the share hash chain for
4865+    #      a share. Do that with something like:
4866+    #     
4867+    #        put_sharehashes(share_hash_chain)
4868+    #
4869+    #      share_hash_chain should be a dictionary mapping shnums to
4870+    #      32-byte hashes -- the wrapper handles serialization.
4871+    #      We'll know where to put the signature at this point, also.
4872+    #      The root of this tree will be put explicitly in the next
4873+    #      step.
4874+    #
4875+    #      TODO: Why? Why not just include it in the tree here?
4876+    #
4877+    #   6: Before putting the signature, we must first put the
4878+    #      root_hash. Do this with:
4879+    #
4880+    #        put_root_hash(root_hash).
4881+    #     
4882+    #      In terms of knowing where to put this value, it was always
4883+    #      possible to place it, but it makes sense semantically to
4884+    #      place it after the share hash tree, so that's why you do it
4885+    #      in this order.
4886+    #
4887+    #   6: With the root hash put, we can now sign the header. Use:
4888+    #
4889+    #        get_signable()
4890+    #
4891+    #      to get the part of the header that you want to sign, and use:
4892+    #       
4893+    #        put_signature(signature)
4894+    #
4895+    #      to write your signature to the remote server.
4896+    #
4897+    #   6: Add the verification key, and finish. Do:
4898+    #
4899+    #        put_verification_key(key)
4900+    #
4901+    #      and
4902+    #
4903+    #        finish_publish()
4904+    #
4905+    # Checkstring management:
4906+    #
4907+    # To write to a mutable slot, we have to provide test vectors to ensure
4908+    # that we are writing to the same data that we think we are. These
4909+    # vectors allow us to detect uncoordinated writes; that is, writes
4910+    # where both we and some other shareholder are writing to the
4911+    # mutable slot, and to report those back to the parts of the program
4912+    # doing the writing.
4913+    #
4914+    # With SDMF, this was easy -- all of the share data was written in
4915+    # one go, so it was easy to detect uncoordinated writes, and we only
4916+    # had to do it once. With MDMF, not all of the file is written at
4917+    # once.
4918+    #
4919+    # If a share is new, we write out as much of the header as we can
4920+    # before writing out anything else. This gives other writers a
4921+    # canary that they can use to detect uncoordinated writes, and, if
4922+    # they do the same thing, gives us the same canary. We them update
4923+    # the share. We won't be able to write out two fields of the header
4924+    # -- the share tree hash and the salt hash -- until we finish
4925+    # writing out the share. We only require the writer to provide the
4926+    # initial checkstring, and keep track of what it should be after
4927+    # updates ourselves.
4928+    #
4929+    # If we haven't written anything yet, then on the first write (which
4930+    # will probably be a block + salt of a share), we'll also write out
4931+    # the header. On subsequent passes, we'll expect to see the header.
4932+    # This changes in two places:
4933+    #
4934+    #   - When we write out the salt hash
4935+    #   - When we write out the root of the share hash tree
4936+    #
4937+    # since these values will change the header. It is possible that we
4938+    # can just make those be written in one operation to minimize
4939+    # disruption.
4940+    def __init__(self,
4941+                 shnum,
4942+                 rref, # a remote reference to a storage server
4943+                 storage_index,
4944+                 secrets, # (write_enabler, renew_secret, cancel_secret)
4945+                 seqnum, # the sequence number of the mutable file
4946+                 required_shares,
4947+                 total_shares,
4948+                 segment_size,
4949+                 data_length): # the length of the original file
4950+        self.shnum = shnum
4951+        self._rref = rref
4952+        self._storage_index = storage_index
4953+        self._seqnum = seqnum
4954+        self._required_shares = required_shares
4955+        assert self.shnum >= 0 and self.shnum < total_shares
4956+        self._total_shares = total_shares
4957+        # We build up the offset table as we write things. It is the
4958+        # last thing we write to the remote server.
4959+        self._offsets = {}
4960+        self._testvs = []
4961+        # This is a list of write vectors that will be sent to our
4962+        # remote server once we are directed to write things there.
4963+        self._writevs = []
4964+        self._secrets = secrets
4965+        # The segment size needs to be a multiple of the k parameter --
4966+        # any padding should have been carried out by the publisher
4967+        # already.
4968+        assert segment_size % required_shares == 0
4969+        self._segment_size = segment_size
4970+        self._data_length = data_length
4971+
4972+        # These are set later -- we define them here so that we can
4973+        # check for their existence easily
4974+
4975+        # This is the root of the share hash tree -- the Merkle tree
4976+        # over the roots of the block hash trees computed for shares in
4977+        # this upload.
4978+        self._root_hash = None
4979+
4980+        # We haven't yet written anything to the remote bucket. By
4981+        # setting this, we tell the _write method as much. The write
4982+        # method will then know that it also needs to add a write vector
4983+        # for the checkstring (or what we have of it) to the first write
4984+        # request. We'll then record that value for future use.  If
4985+        # we're expecting something to be there already, we need to call
4986+        # set_checkstring before we write anything to tell the first
4987+        # write about that.
4988+        self._written = False
4989+
4990+        # When writing data to the storage servers, we get a read vector
4991+        # for free. We'll read the checkstring, which will help us
4992+        # figure out what's gone wrong if a write fails.
4993+        self._readv = [(0, struct.calcsize(MDMFCHECKSTRING))]
4994+
4995+        # We calculate the number of segments because it tells us
4996+        # where the salt part of the file ends/share segment begins,
4997+        # and also because it provides a useful amount of bounds checking.
4998+        self._num_segments = mathutil.div_ceil(self._data_length,
4999+                                               self._segment_size)
5000+        self._block_size = self._segment_size / self._required_shares
5001+        # We also calculate the share size, to help us with block
5002+        # constraints later.
5003+        tail_size = self._data_length % self._segment_size
5004+        if not tail_size:
5005+            self._tail_block_size = self._block_size
5006+        else:
5007+            self._tail_block_size = mathutil.next_multiple(tail_size,
5008+                                                           self._required_shares)
5009+            self._tail_block_size /= self._required_shares
5010+
5011+        # We already know where the sharedata starts; right after the end
5012+        # of the header (which is defined as the signable part + the offsets)
5013+        # We can also calculate where the encrypted private key begins
5014+        # from what we know know.
5015+        self._actual_block_size = self._block_size + SALT_SIZE
5016+        data_size = self._actual_block_size * (self._num_segments - 1)
5017+        data_size += self._tail_block_size
5018+        data_size += SALT_SIZE
5019+        self._offsets['enc_privkey'] = MDMFHEADERSIZE
5020+        self._offsets['enc_privkey'] += data_size
5021+        # We'll wait for the rest. Callers can now call my "put_block" and
5022+        # "set_checkstring" methods.
5023+
5024+
5025+    def set_checkstring(self,
5026+                        seqnum_or_checkstring,
5027+                        root_hash=None,
5028+                        salt=None):
5029+        """
5030+        Set checkstring checkstring for the given shnum.
5031+
5032+        This can be invoked in one of two ways.
5033+
5034+        With one argument, I assume that you are giving me a literal
5035+        checkstring -- e.g., the output of get_checkstring. I will then
5036+        set that checkstring as it is. This form is used by unit tests.
5037+
5038+        With two arguments, I assume that you are giving me a sequence
5039+        number and root hash to make a checkstring from. In that case, I
5040+        will build a checkstring and set it for you. This form is used
5041+        by the publisher.
5042+
5043+        By default, I assume that I am writing new shares to the grid.
5044+        If you don't explcitly set your own checkstring, I will use
5045+        one that requires that the remote share not exist. You will want
5046+        to use this method if you are updating a share in-place;
5047+        otherwise, writes will fail.
5048+        """
5049+        # You're allowed to overwrite checkstrings with this method;
5050+        # I assume that users know what they are doing when they call
5051+        # it.
5052+        if root_hash:
5053+            checkstring = struct.pack(MDMFCHECKSTRING,
5054+                                      1,
5055+                                      seqnum_or_checkstring,
5056+                                      root_hash)
5057+        else:
5058+            checkstring = seqnum_or_checkstring
5059+
5060+        if checkstring == "":
5061+            # We special-case this, since len("") = 0, but we need
5062+            # length of 1 for the case of an empty share to work on the
5063+            # storage server, which is what a checkstring that is the
5064+            # empty string means.
5065+            self._testvs = []
5066+        else:
5067+            self._testvs = []
5068+            self._testvs.append((0, len(checkstring), "eq", checkstring))
5069+
5070+
5071+    def __repr__(self):
5072+        return "MDMFSlotWriteProxy for share %d" % self.shnum
5073+
5074+
5075+    def get_checkstring(self):
5076+        """
5077+        Given a share number, I return a representation of what the
5078+        checkstring for that share on the server will look like.
5079+
5080+        I am mostly used for tests.
5081+        """
5082+        if self._root_hash:
5083+            roothash = self._root_hash
5084+        else:
5085+            roothash = "\x00" * 32
5086+        return struct.pack(MDMFCHECKSTRING,
5087+                           1,
5088+                           self._seqnum,
5089+                           roothash)
5090+
5091+
5092+    def put_block(self, data, segnum, salt):
5093+        """
5094+        I queue a write vector for the data, salt, and segment number
5095+        provided to me. I return None, as I do not actually cause
5096+        anything to be written yet.
5097+        """
5098+        if segnum >= self._num_segments:
5099+            raise LayoutInvalid("I won't overwrite the private key")
5100+        if len(salt) != SALT_SIZE:
5101+            raise LayoutInvalid("I was given a salt of size %d, but "
5102+                                "I wanted a salt of size %d")
5103+        if segnum + 1 == self._num_segments:
5104+            if len(data) != self._tail_block_size:
5105+                raise LayoutInvalid("I was given the wrong size block to write")
5106+        elif len(data) != self._block_size:
5107+            raise LayoutInvalid("I was given the wrong size block to write")
5108+
5109+        # We want to write at len(MDMFHEADER) + segnum * block_size.
5110+
5111+        offset = MDMFHEADERSIZE + (self._actual_block_size * segnum)
5112+        data = salt + data
5113+
5114+        self._writevs.append(tuple([offset, data]))
5115+
5116+
5117+    def put_encprivkey(self, encprivkey):
5118+        """
5119+        I queue a write vector for the encrypted private key provided to
5120+        me.
5121+        """
5122+        assert self._offsets
5123+        assert self._offsets['enc_privkey']
5124+        # You shouldn't re-write the encprivkey after the block hash
5125+        # tree is written, since that could cause the private key to run
5126+        # into the block hash tree. Before it writes the block hash
5127+        # tree, the block hash tree writing method writes the offset of
5128+        # the salt hash tree. So that's a good indicator of whether or
5129+        # not the block hash tree has been written.
5130+        if "share_hash_chain" in self._offsets:
5131+            raise LayoutInvalid("You must write this before the block hash tree")
5132+
5133+        self._offsets['block_hash_tree'] = self._offsets['enc_privkey'] + \
5134+            len(encprivkey)
5135+        self._writevs.append(tuple([self._offsets['enc_privkey'], encprivkey]))
5136+
5137+
5138+    def put_blockhashes(self, blockhashes):
5139+        """
5140+        I queue a write vector to put the block hash tree in blockhashes
5141+        onto the remote server.
5142+
5143+        The encrypted private key must be queued before the block hash
5144+        tree, since we need to know how large it is to know where the
5145+        block hash tree should go. The block hash tree must be put
5146+        before the salt hash tree, since its size determines the
5147+        offset of the share hash chain.
5148+        """
5149+        assert self._offsets
5150+        assert isinstance(blockhashes, list)
5151+        if "block_hash_tree" not in self._offsets:
5152+            raise LayoutInvalid("You must put the encrypted private key "
5153+                                "before you put the block hash tree")
5154+        # If written, the share hash chain causes the signature offset
5155+        # to be defined.
5156+        if "signature" in self._offsets:
5157+            raise LayoutInvalid("You must put the block hash tree before "
5158+                                "you put the share hash chain")
5159+        blockhashes_s = "".join(blockhashes)
5160+        self._offsets['share_hash_chain'] = self._offsets['block_hash_tree'] + len(blockhashes_s)
5161+
5162+        self._writevs.append(tuple([self._offsets['block_hash_tree'],
5163+                                  blockhashes_s]))
5164+
5165+
5166+    def put_sharehashes(self, sharehashes):
5167+        """
5168+        I queue a write vector to put the share hash chain in my
5169+        argument onto the remote server.
5170+
5171+        The salt hash tree must be queued before the share hash chain,
5172+        since we need to know where the salt hash tree ends before we
5173+        can know where the share hash chain starts. The share hash chain
5174+        must be put before the signature, since the length of the packed
5175+        share hash chain determines the offset of the signature. Also,
5176+        semantically, you must know what the root of the salt hash tree
5177+        is before you can generate a valid signature.
5178+        """
5179+        assert isinstance(sharehashes, dict)
5180+        if "share_hash_chain" not in self._offsets:
5181+            raise LayoutInvalid("You need to put the salt hash tree before "
5182+                                "you can put the share hash chain")
5183+        # The signature comes after the share hash chain. If the
5184+        # signature has already been written, we must not write another
5185+        # share hash chain. The signature writes the verification key
5186+        # offset when it gets sent to the remote server, so we look for
5187+        # that.
5188+        if "verification_key" in self._offsets:
5189+            raise LayoutInvalid("You must write the share hash chain "
5190+                                "before you write the signature")
5191+        sharehashes_s = "".join([struct.pack(">H32s", i, sharehashes[i])
5192+                                  for i in sorted(sharehashes.keys())])
5193+        self._offsets['signature'] = self._offsets['share_hash_chain'] + len(sharehashes_s)
5194+        self._writevs.append(tuple([self._offsets['share_hash_chain'],
5195+                            sharehashes_s]))
5196+
5197+
5198+    def put_root_hash(self, roothash):
5199+        """
5200+        Put the root hash (the root of the share hash tree) in the
5201+        remote slot.
5202+        """
5203+        # It does not make sense to be able to put the root
5204+        # hash without first putting the share hashes, since you need
5205+        # the share hashes to generate the root hash.
5206+        #
5207+        # Signature is defined by the routine that places the share hash
5208+        # chain, so it's a good thing to look for in finding out whether
5209+        # or not the share hash chain exists on the remote server.
5210+        if "signature" not in self._offsets:
5211+            raise LayoutInvalid("You need to put the share hash chain "
5212+                                "before you can put the root share hash")
5213+        if len(roothash) != HASH_SIZE:
5214+            raise LayoutInvalid("hashes and salts must be exactly %d bytes"
5215+                                 % HASH_SIZE)
5216+        self._root_hash = roothash
5217+        # To write both of these values, we update the checkstring on
5218+        # the remote server, which includes them
5219+        checkstring = self.get_checkstring()
5220+        self._writevs.append(tuple([0, checkstring]))
5221+        # This write, if successful, changes the checkstring, so we need
5222+        # to update our internal checkstring to be consistent with the
5223+        # one on the server.
5224+
5225+
5226+    def get_signable(self):
5227+        """
5228+        Get the first seven fields of the mutable file; the parts that
5229+        are signed.
5230+        """
5231+        if not self._root_hash:
5232+            raise LayoutInvalid("You need to set the root hash "
5233+                                "before getting something to "
5234+                                "sign")
5235+        return struct.pack(MDMFSIGNABLEHEADER,
5236+                           1,
5237+                           self._seqnum,
5238+                           self._root_hash,
5239+                           self._required_shares,
5240+                           self._total_shares,
5241+                           self._segment_size,
5242+                           self._data_length)
5243+
5244+
5245+    def put_signature(self, signature):
5246+        """
5247+        I queue a write vector for the signature of the MDMF share.
5248+
5249+        I require that the root hash and share hash chain have been put
5250+        to the grid before I will write the signature to the grid.
5251+        """
5252+        if "signature" not in self._offsets:
5253+            raise LayoutInvalid("You must put the share hash chain "
5254+        # It does not make sense to put a signature without first
5255+        # putting the root hash and the salt hash (since otherwise
5256+        # the signature would be incomplete), so we don't allow that.
5257+                       "before putting the signature")
5258+        if not self._root_hash:
5259+            raise LayoutInvalid("You must complete the signed prefix "
5260+                                "before computing a signature")
5261+        # If we put the signature after we put the verification key, we
5262+        # could end up running into the verification key, and will
5263+        # probably screw up the offsets as well. So we don't allow that.
5264+        # The method that writes the verification key defines the EOF
5265+        # offset before writing the verification key, so look for that.
5266+        if "EOF" in self._offsets:
5267+            raise LayoutInvalid("You must write the signature before the verification key")
5268+
5269+        self._offsets['verification_key'] = self._offsets['signature'] + len(signature)
5270+        self._writevs.append(tuple([self._offsets['signature'], signature]))
5271+
5272+
5273+    def put_verification_key(self, verification_key):
5274+        """
5275+        I queue a write vector for the verification key.
5276+
5277+        I require that the signature have been written to the storage
5278+        server before I allow the verification key to be written to the
5279+        remote server.
5280+        """
5281+        if "verification_key" not in self._offsets:
5282+            raise LayoutInvalid("You must put the signature before you "
5283+                                "can put the verification key")
5284+        self._offsets['EOF'] = self._offsets['verification_key'] + len(verification_key)
5285+        self._writevs.append(tuple([self._offsets['verification_key'],
5286+                            verification_key]))
5287+
5288+
5289+    def _get_offsets_tuple(self):
5290+        return tuple([(key, value) for key, value in self._offsets.items()])
5291+
5292+
5293+    def get_verinfo(self):
5294+        return (self._seqnum,
5295+                self._root_hash,
5296+                self._required_shares,
5297+                self._total_shares,
5298+                self._segment_size,
5299+                self._data_length,
5300+                self.get_signable(),
5301+                self._get_offsets_tuple())
5302+
5303+
5304+    def finish_publishing(self):
5305+        """
5306+        I add a write vector for the offsets table, and then cause all
5307+        of the write vectors that I've dealt with so far to be published
5308+        to the remote server, ending the write process.
5309+        """
5310+        if "EOF" not in self._offsets:
5311+            raise LayoutInvalid("You must put the verification key before "
5312+                                "you can publish the offsets")
5313+        offsets_offset = struct.calcsize(MDMFHEADERWITHOUTOFFSETS)
5314+        offsets = struct.pack(MDMFOFFSETS,
5315+                              self._offsets['enc_privkey'],
5316+                              self._offsets['block_hash_tree'],
5317+                              self._offsets['share_hash_chain'],
5318+                              self._offsets['signature'],
5319+                              self._offsets['verification_key'],
5320+                              self._offsets['EOF'])
5321+        self._writevs.append(tuple([offsets_offset, offsets]))
5322+        encoding_parameters_offset = struct.calcsize(MDMFCHECKSTRING)
5323+        params = struct.pack(">BBQQ",
5324+                             self._required_shares,
5325+                             self._total_shares,
5326+                             self._segment_size,
5327+                             self._data_length)
5328+        self._writevs.append(tuple([encoding_parameters_offset, params]))
5329+        return self._write(self._writevs)
5330+
5331+
5332+    def _write(self, datavs, on_failure=None, on_success=None):
5333+        """I write the data vectors in datavs to the remote slot."""
5334+        tw_vectors = {}
5335+        if not self._testvs:
5336+            self._testvs = []
5337+            self._testvs.append(tuple([0, 1, "eq", ""]))
5338+        if not self._written:
5339+            # Write a new checkstring to the share when we write it, so
5340+            # that we have something to check later.
5341+            new_checkstring = self.get_checkstring()
5342+            datavs.append((0, new_checkstring))
5343+            def _first_write():
5344+                self._written = True
5345+                self._testvs = [(0, len(new_checkstring), "eq", new_checkstring)]
5346+            on_success = _first_write
5347+        tw_vectors[self.shnum] = (self._testvs, datavs, None)
5348+        d = self._rref.callRemote("slot_testv_and_readv_and_writev",
5349+                                  self._storage_index,
5350+                                  self._secrets,
5351+                                  tw_vectors,
5352+                                  self._readv)
5353+        def _result(results):
5354+            if isinstance(results, failure.Failure) or not results[0]:
5355+                # Do nothing; the write was unsuccessful.
5356+                if on_failure: on_failure()
5357+            else:
5358+                if on_success: on_success()
5359+            return results
5360+        d.addCallback(_result)
5361+        return d
5362+
5363+
5364+class MDMFSlotReadProxy:
5365+    """
5366+    I read from a mutable slot filled with data written in the MDMF data
5367+    format (which is described above).
5368+
5369+    I can be initialized with some amount of data, which I will use (if
5370+    it is valid) to eliminate some of the need to fetch it from servers.
5371+    """
5372+    def __init__(self,
5373+                 rref,
5374+                 storage_index,
5375+                 shnum,
5376+                 data=""):
5377+        # Start the initialization process.
5378+        self._rref = rref
5379+        self._storage_index = storage_index
5380+        self.shnum = shnum
5381+
5382+        # Before doing anything, the reader is probably going to want to
5383+        # verify that the signature is correct. To do that, they'll need
5384+        # the verification key, and the signature. To get those, we'll
5385+        # need the offset table. So fetch the offset table on the
5386+        # assumption that that will be the first thing that a reader is
5387+        # going to do.
5388+
5389+        # The fact that these encoding parameters are None tells us
5390+        # that we haven't yet fetched them from the remote share, so we
5391+        # should. We could just not set them, but the checks will be
5392+        # easier to read if we don't have to use hasattr.
5393+        self._version_number = None
5394+        self._sequence_number = None
5395+        self._root_hash = None
5396+        # Filled in if we're dealing with an SDMF file. Unused
5397+        # otherwise.
5398+        self._salt = None
5399+        self._required_shares = None
5400+        self._total_shares = None
5401+        self._segment_size = None
5402+        self._data_length = None
5403+        self._offsets = None
5404+
5405+        # If the user has chosen to initialize us with some data, we'll
5406+        # try to satisfy subsequent data requests with that data before
5407+        # asking the storage server for it. If
5408+        self._data = data
5409+        # The way callers interact with cache in the filenode returns
5410+        # None if there isn't any cached data, but the way we index the
5411+        # cached data requires a string, so convert None to "".
5412+        if self._data == None:
5413+            self._data = ""
5414+
5415+        self._queue_observers = observer.ObserverList()
5416+        self._queue_errbacks = observer.ObserverList()
5417+        self._readvs = []
5418+
5419+
5420+    def _maybe_fetch_offsets_and_header(self, force_remote=False):
5421+        """
5422+        I fetch the offset table and the header from the remote slot if
5423+        I don't already have them. If I do have them, I do nothing and
5424+        return an empty Deferred.
5425+        """
5426+        if self._offsets:
5427+            return defer.succeed(None)
5428+        # At this point, we may be either SDMF or MDMF. Fetching 107
5429+        # bytes will be enough to get header and offsets for both SDMF and
5430+        # MDMF, though we'll be left with 4 more bytes than we
5431+        # need if this ends up being MDMF. This is probably less
5432+        # expensive than the cost of a second roundtrip.
5433+        readvs = [(0, 107)]
5434+        d = self._read(readvs, force_remote)
5435+        d.addCallback(self._process_encoding_parameters)
5436+        d.addCallback(self._process_offsets)
5437+        return d
5438+
5439+
5440+    def _process_encoding_parameters(self, encoding_parameters):
5441+        assert self.shnum in encoding_parameters
5442+        encoding_parameters = encoding_parameters[self.shnum][0]
5443+        # The first byte is the version number. It will tell us what
5444+        # to do next.
5445+        (verno,) = struct.unpack(">B", encoding_parameters[:1])
5446+        if verno == MDMF_VERSION:
5447+            read_size = MDMFHEADERWITHOUTOFFSETSSIZE
5448+            (verno,
5449+             seqnum,
5450+             root_hash,
5451+             k,
5452+             n,
5453+             segsize,
5454+             datalen) = struct.unpack(MDMFHEADERWITHOUTOFFSETS,
5455+                                      encoding_parameters[:read_size])
5456+            if segsize == 0 and datalen == 0:
5457+                # Empty file, no segments.
5458+                self._num_segments = 0
5459+            else:
5460+                self._num_segments = mathutil.div_ceil(datalen, segsize)
5461+
5462+        elif verno == SDMF_VERSION:
5463+            read_size = SIGNED_PREFIX_LENGTH
5464+            (verno,
5465+             seqnum,
5466+             root_hash,
5467+             salt,
5468+             k,
5469+             n,
5470+             segsize,
5471+             datalen) = struct.unpack(">BQ32s16s BBQQ",
5472+                                encoding_parameters[:SIGNED_PREFIX_LENGTH])
5473+            self._salt = salt
5474+            if segsize == 0 and datalen == 0:
5475+                # empty file
5476+                self._num_segments = 0
5477+            else:
5478+                # non-empty SDMF files have one segment.
5479+                self._num_segments = 1
5480+        else:
5481+            raise UnknownVersionError("You asked me to read mutable file "
5482+                                      "version %d, but I only understand "
5483+                                      "%d and %d" % (verno, SDMF_VERSION,
5484+                                                     MDMF_VERSION))
5485+
5486+        self._version_number = verno
5487+        self._sequence_number = seqnum
5488+        self._root_hash = root_hash
5489+        self._required_shares = k
5490+        self._total_shares = n
5491+        self._segment_size = segsize
5492+        self._data_length = datalen
5493+
5494+        self._block_size = self._segment_size / self._required_shares
5495+        # We can upload empty files, and need to account for this fact
5496+        # so as to avoid zero-division and zero-modulo errors.
5497+        if datalen > 0:
5498+            tail_size = self._data_length % self._segment_size
5499+        else:
5500+            tail_size = 0
5501+        if not tail_size:
5502+            self._tail_block_size = self._block_size
5503+        else:
5504+            self._tail_block_size = mathutil.next_multiple(tail_size,
5505+                                                    self._required_shares)
5506+            self._tail_block_size /= self._required_shares
5507+
5508+        return encoding_parameters
5509+
5510+
5511+    def _process_offsets(self, offsets):
5512+        if self._version_number == 0:
5513+            read_size = OFFSETS_LENGTH
5514+            read_offset = SIGNED_PREFIX_LENGTH
5515+            end = read_size + read_offset
5516+            (signature,
5517+             share_hash_chain,
5518+             block_hash_tree,
5519+             share_data,
5520+             enc_privkey,
5521+             EOF) = struct.unpack(">LLLLQQ",
5522+                                  offsets[read_offset:end])
5523+            self._offsets = {}
5524+            self._offsets['signature'] = signature
5525+            self._offsets['share_data'] = share_data
5526+            self._offsets['block_hash_tree'] = block_hash_tree
5527+            self._offsets['share_hash_chain'] = share_hash_chain
5528+            self._offsets['enc_privkey'] = enc_privkey
5529+            self._offsets['EOF'] = EOF
5530+
5531+        elif self._version_number == 1:
5532+            read_offset = MDMFHEADERWITHOUTOFFSETSSIZE
5533+            read_length = MDMFOFFSETS_LENGTH
5534+            end = read_offset + read_length
5535+            (encprivkey,
5536+             blockhashes,
5537+             sharehashes,
5538+             signature,
5539+             verification_key,
5540+             eof) = struct.unpack(MDMFOFFSETS,
5541+                                  offsets[read_offset:end])
5542+            self._offsets = {}
5543+            self._offsets['enc_privkey'] = encprivkey
5544+            self._offsets['block_hash_tree'] = blockhashes
5545+            self._offsets['share_hash_chain'] = sharehashes
5546+            self._offsets['signature'] = signature
5547+            self._offsets['verification_key'] = verification_key
5548+            self._offsets['EOF'] = eof
5549+
5550+
5551+    def get_block_and_salt(self, segnum, queue=False):
5552+        """
5553+        I return (block, salt), where block is the block data and
5554+        salt is the salt used to encrypt that segment.
5555+        """
5556+        d = self._maybe_fetch_offsets_and_header()
5557+        def _then(ignored):
5558+            if self._version_number == 1:
5559+                base_share_offset = MDMFHEADERSIZE
5560+            else:
5561+                base_share_offset = self._offsets['share_data']
5562+
5563+            if segnum + 1 > self._num_segments:
5564+                raise LayoutInvalid("Not a valid segment number")
5565+
5566+            if self._version_number == 0:
5567+                share_offset = base_share_offset + self._block_size * segnum
5568+            else:
5569+                share_offset = base_share_offset + (self._block_size + \
5570+                                                    SALT_SIZE) * segnum
5571+            if segnum + 1 == self._num_segments:
5572+                data = self._tail_block_size
5573+            else:
5574+                data = self._block_size
5575+
5576+            if self._version_number == 1:
5577+                data += SALT_SIZE
5578+
5579+            readvs = [(share_offset, data)]
5580+            return readvs
5581+        d.addCallback(_then)
5582+        d.addCallback(lambda readvs:
5583+            self._read(readvs, queue=queue))
5584+        def _process_results(results):
5585+            assert self.shnum in results
5586+            if self._version_number == 0:
5587+                # We only read the share data, but we know the salt from
5588+                # when we fetched the header
5589+                data = results[self.shnum]
5590+                if not data:
5591+                    data = ""
5592+                else:
5593+                    assert len(data) == 1
5594+                    data = data[0]
5595+                salt = self._salt
5596+            else:
5597+                data = results[self.shnum]
5598+                if not data:
5599+                    salt = data = ""
5600+                else:
5601+                    salt_and_data = results[self.shnum][0]
5602+                    salt = salt_and_data[:SALT_SIZE]
5603+                    data = salt_and_data[SALT_SIZE:]
5604+            return data, salt
5605+        d.addCallback(_process_results)
5606+        return d
5607+
5608+
5609+    def get_blockhashes(self, needed=None, queue=False, force_remote=False):
5610+        """
5611+        I return the block hash tree
5612+
5613+        I take an optional argument, needed, which is a set of indices
5614+        correspond to hashes that I should fetch. If this argument is
5615+        missing, I will fetch the entire block hash tree; otherwise, I
5616+        may attempt to fetch fewer hashes, based on what needed says
5617+        that I should do. Note that I may fetch as many hashes as I
5618+        want, so long as the set of hashes that I do fetch is a superset
5619+        of the ones that I am asked for, so callers should be prepared
5620+        to tolerate additional hashes.
5621+        """
5622+        # TODO: Return only the parts of the block hash tree necessary
5623+        # to validate the blocknum provided?
5624+        # This is a good idea, but it is hard to implement correctly. It
5625+        # is bad to fetch any one block hash more than once, so we
5626+        # probably just want to fetch the whole thing at once and then
5627+        # serve it.
5628+        if needed == set([]):
5629+            return defer.succeed([])
5630+        d = self._maybe_fetch_offsets_and_header()
5631+        def _then(ignored):
5632+            blockhashes_offset = self._offsets['block_hash_tree']
5633+            if self._version_number == 1:
5634+                blockhashes_length = self._offsets['share_hash_chain'] - blockhashes_offset
5635+            else:
5636+                blockhashes_length = self._offsets['share_data'] - blockhashes_offset
5637+            readvs = [(blockhashes_offset, blockhashes_length)]
5638+            return readvs
5639+        d.addCallback(_then)
5640+        d.addCallback(lambda readvs:
5641+            self._read(readvs, queue=queue, force_remote=force_remote))
5642+        def _build_block_hash_tree(results):
5643+            assert self.shnum in results
5644+
5645+            rawhashes = results[self.shnum][0]
5646+            results = [rawhashes[i:i+HASH_SIZE]
5647+                       for i in range(0, len(rawhashes), HASH_SIZE)]
5648+            return results
5649+        d.addCallback(_build_block_hash_tree)
5650+        return d
5651+
5652+
5653+    def get_sharehashes(self, needed=None, queue=False, force_remote=False):
5654+        """
5655+        I return the part of the share hash chain placed to validate
5656+        this share.
5657+
5658+        I take an optional argument, needed. Needed is a set of indices
5659+        that correspond to the hashes that I should fetch. If needed is
5660+        not present, I will fetch and return the entire share hash
5661+        chain. Otherwise, I may fetch and return any part of the share
5662+        hash chain that is a superset of the part that I am asked to
5663+        fetch. Callers should be prepared to deal with more hashes than
5664+        they've asked for.
5665+        """
5666+        if needed == set([]):
5667+            return defer.succeed([])
5668+        d = self._maybe_fetch_offsets_and_header()
5669+
5670+        def _make_readvs(ignored):
5671+            sharehashes_offset = self._offsets['share_hash_chain']
5672+            if self._version_number == 0:
5673+                sharehashes_length = self._offsets['block_hash_tree'] - sharehashes_offset
5674+            else:
5675+                sharehashes_length = self._offsets['signature'] - sharehashes_offset
5676+            readvs = [(sharehashes_offset, sharehashes_length)]
5677+            return readvs
5678+        d.addCallback(_make_readvs)
5679+        d.addCallback(lambda readvs:
5680+            self._read(readvs, queue=queue, force_remote=force_remote))
5681+        def _build_share_hash_chain(results):
5682+            assert self.shnum in results
5683+
5684+            sharehashes = results[self.shnum][0]
5685+            results = [sharehashes[i:i+(HASH_SIZE + 2)]
5686+                       for i in range(0, len(sharehashes), HASH_SIZE + 2)]
5687+            results = dict([struct.unpack(">H32s", data)
5688+                            for data in results])
5689+            return results
5690+        d.addCallback(_build_share_hash_chain)
5691+        return d
5692+
5693+
5694+    def get_encprivkey(self, queue=False):
5695+        """
5696+        I return the encrypted private key.
5697+        """
5698+        d = self._maybe_fetch_offsets_and_header()
5699+
5700+        def _make_readvs(ignored):
5701+            privkey_offset = self._offsets['enc_privkey']
5702+            if self._version_number == 0:
5703+                privkey_length = self._offsets['EOF'] - privkey_offset
5704+            else:
5705+                privkey_length = self._offsets['block_hash_tree'] - privkey_offset
5706+            readvs = [(privkey_offset, privkey_length)]
5707+            return readvs
5708+        d.addCallback(_make_readvs)
5709+        d.addCallback(lambda readvs:
5710+            self._read(readvs, queue=queue))
5711+        def _process_results(results):
5712+            assert self.shnum in results
5713+            privkey = results[self.shnum][0]
5714+            return privkey
5715+        d.addCallback(_process_results)
5716+        return d
5717+
5718+
5719+    def get_signature(self, queue=False):
5720+        """
5721+        I return the signature of my share.
5722+        """
5723+        d = self._maybe_fetch_offsets_and_header()
5724+
5725+        def _make_readvs(ignored):
5726+            signature_offset = self._offsets['signature']
5727+            if self._version_number == 1:
5728+                signature_length = self._offsets['verification_key'] - signature_offset
5729+            else:
5730+                signature_length = self._offsets['share_hash_chain'] - signature_offset
5731+            readvs = [(signature_offset, signature_length)]
5732+            return readvs
5733+        d.addCallback(_make_readvs)
5734+        d.addCallback(lambda readvs:
5735+            self._read(readvs, queue=queue))
5736+        def _process_results(results):
5737+            assert self.shnum in results
5738+            signature = results[self.shnum][0]
5739+            return signature
5740+        d.addCallback(_process_results)
5741+        return d
5742+
5743+
5744+    def get_verification_key(self, queue=False):
5745+        """
5746+        I return the verification key.
5747+        """
5748+        d = self._maybe_fetch_offsets_and_header()
5749+
5750+        def _make_readvs(ignored):
5751+            if self._version_number == 1:
5752+                vk_offset = self._offsets['verification_key']
5753+                vk_length = self._offsets['EOF'] - vk_offset
5754+            else:
5755+                vk_offset = struct.calcsize(">BQ32s16sBBQQLLLLQQ")
5756+                vk_length = self._offsets['signature'] - vk_offset
5757+            readvs = [(vk_offset, vk_length)]
5758+            return readvs
5759+        d.addCallback(_make_readvs)
5760+        d.addCallback(lambda readvs:
5761+            self._read(readvs, queue=queue))
5762+        def _process_results(results):
5763+            assert self.shnum in results
5764+            verification_key = results[self.shnum][0]
5765+            return verification_key
5766+        d.addCallback(_process_results)
5767+        return d
5768+
5769+
5770+    def get_encoding_parameters(self):
5771+        """
5772+        I return (k, n, segsize, datalen)
5773+        """
5774+        d = self._maybe_fetch_offsets_and_header()
5775+        d.addCallback(lambda ignored:
5776+            (self._required_shares,
5777+             self._total_shares,
5778+             self._segment_size,
5779+             self._data_length))
5780+        return d
5781+
5782+
5783+    def get_seqnum(self):
5784+        """
5785+        I return the sequence number for this share.
5786+        """
5787+        d = self._maybe_fetch_offsets_and_header()
5788+        d.addCallback(lambda ignored:
5789+            self._sequence_number)
5790+        return d
5791+
5792+
5793+    def get_root_hash(self):
5794+        """
5795+        I return the root of the block hash tree
5796+        """
5797+        d = self._maybe_fetch_offsets_and_header()
5798+        d.addCallback(lambda ignored: self._root_hash)
5799+        return d
5800+
5801+
5802+    def get_checkstring(self):
5803+        """
5804+        I return the packed representation of the following:
5805+
5806+            - version number
5807+            - sequence number
5808+            - root hash
5809+            - salt hash
5810+
5811+        which my users use as a checkstring to detect other writers.
5812+        """
5813+        d = self._maybe_fetch_offsets_and_header()
5814+        def _build_checkstring(ignored):
5815+            if self._salt:
5816+                checkstring = struct.pack(PREFIX,
5817+                                          self._version_number,
5818+                                          self._sequence_number,
5819+                                          self._root_hash,
5820+                                          self._salt)
5821+            else:
5822+                checkstring = struct.pack(MDMFCHECKSTRING,
5823+                                          self._version_number,
5824+                                          self._sequence_number,
5825+                                          self._root_hash)
5826+
5827+            return checkstring
5828+        d.addCallback(_build_checkstring)
5829+        return d
5830+
5831+
5832+    def get_prefix(self, force_remote):
5833+        d = self._maybe_fetch_offsets_and_header(force_remote)
5834+        d.addCallback(lambda ignored:
5835+            self._build_prefix())
5836+        return d
5837+
5838+
5839+    def _build_prefix(self):
5840+        # The prefix is another name for the part of the remote share
5841+        # that gets signed. It consists of everything up to and
5842+        # including the datalength, packed by struct.
5843+        if self._version_number == SDMF_VERSION:
5844+            return struct.pack(SIGNED_PREFIX,
5845+                           self._version_number,
5846+                           self._sequence_number,
5847+                           self._root_hash,
5848+                           self._salt,
5849+                           self._required_shares,
5850+                           self._total_shares,
5851+                           self._segment_size,
5852+                           self._data_length)
5853+
5854+        else:
5855+            return struct.pack(MDMFSIGNABLEHEADER,
5856+                           self._version_number,
5857+                           self._sequence_number,
5858+                           self._root_hash,
5859+                           self._required_shares,
5860+                           self._total_shares,
5861+                           self._segment_size,
5862+                           self._data_length)
5863+
5864+
5865+    def _get_offsets_tuple(self):
5866+        # The offsets tuple is another component of the version
5867+        # information tuple. It is basically our offsets dictionary,
5868+        # itemized and in a tuple.
5869+        return self._offsets.copy()
5870+
5871+
5872+    def get_verinfo(self):
5873+        """
5874+        I return my verinfo tuple. This is used by the ServermapUpdater
5875+        to keep track of versions of mutable files.
5876+
5877+        The verinfo tuple for MDMF files contains:
5878+            - seqnum
5879+            - root hash
5880+            - a blank (nothing)
5881+            - segsize
5882+            - datalen
5883+            - k
5884+            - n
5885+            - prefix (the thing that you sign)
5886+            - a tuple of offsets
5887+
5888+        We include the nonce in MDMF to simplify processing of version
5889+        information tuples.
5890+
5891+        The verinfo tuple for SDMF files is the same, but contains a
5892+        16-byte IV instead of a hash of salts.
5893+        """
5894+        d = self._maybe_fetch_offsets_and_header()
5895+        def _build_verinfo(ignored):
5896+            if self._version_number == SDMF_VERSION:
5897+                salt_to_use = self._salt
5898+            else:
5899+                salt_to_use = None
5900+            return (self._sequence_number,
5901+                    self._root_hash,
5902+                    salt_to_use,
5903+                    self._segment_size,
5904+                    self._data_length,
5905+                    self._required_shares,
5906+                    self._total_shares,
5907+                    self._build_prefix(),
5908+                    self._get_offsets_tuple())
5909+        d.addCallback(_build_verinfo)
5910+        return d
5911+
5912+
5913+    def flush(self):
5914+        """
5915+        I flush my queue of read vectors.
5916+        """
5917+        d = self._read(self._readvs)
5918+        def _then(results):
5919+            self._readvs = []
5920+            if isinstance(results, failure.Failure):
5921+                self._queue_errbacks.notify(results)
5922+            else:
5923+                self._queue_observers.notify(results)
5924+            self._queue_observers = observer.ObserverList()
5925+            self._queue_errbacks = observer.ObserverList()
5926+        d.addBoth(_then)
5927+
5928+
5929+    def _read(self, readvs, force_remote=False, queue=False):
5930+        unsatisfiable = filter(lambda x: x[0] + x[1] > len(self._data), readvs)
5931+        # TODO: It's entirely possible to tweak this so that it just
5932+        # fulfills the requests that it can, and not demand that all
5933+        # requests are satisfiable before running it.
5934+        if not unsatisfiable and not force_remote:
5935+            results = [self._data[offset:offset+length]
5936+                       for (offset, length) in readvs]
5937+            results = {self.shnum: results}
5938+            return defer.succeed(results)
5939+        else:
5940+            if queue:
5941+                start = len(self._readvs)
5942+                self._readvs += readvs
5943+                end = len(self._readvs)
5944+                def _get_results(results, start, end):
5945+                    if not self.shnum in results:
5946+                        return {self._shnum: [""]}
5947+                    return {self.shnum: results[self.shnum][start:end]}
5948+                d = defer.Deferred()
5949+                d.addCallback(_get_results, start, end)
5950+                self._queue_observers.subscribe(d.callback)
5951+                self._queue_errbacks.subscribe(d.errback)
5952+                return d
5953+            return self._rref.callRemote("slot_readv",
5954+                                         self._storage_index,
5955+                                         [self.shnum],
5956+                                         readvs)
5957+
5958+
5959+    def is_sdmf(self):
5960+        """I tell my caller whether or not my remote file is SDMF or MDMF
5961+        """
5962+        d = self._maybe_fetch_offsets_and_header()
5963+        d.addCallback(lambda ignored:
5964+            self._version_number == 0)
5965+        return d
5966+
5967+
5968+class LayoutInvalid(Exception):
5969+    """
5970+    This isn't a valid MDMF mutable file
5971+    """
5972merger 0.0 (
5973hunk ./src/allmydata/test/test_storage.py 3
5974-from allmydata.util import log
5975-
5976merger 0.0 (
5977hunk ./src/allmydata/test/test_storage.py 3
5978-import time, os.path, stat, re, simplejson, struct
5979+from allmydata.util import log
5980+
5981+import mock
5982hunk ./src/allmydata/test/test_storage.py 3
5983-import time, os.path, stat, re, simplejson, struct
5984+import time, os.path, stat, re, simplejson, struct, shutil
5985)
5986)
5987hunk ./src/allmydata/test/test_storage.py 23
5988 from allmydata.storage.expirer import LeaseCheckingCrawler
5989 from allmydata.immutable.layout import WriteBucketProxy, WriteBucketProxy_v2, \
5990      ReadBucketProxy
5991-from allmydata.interfaces import BadWriteEnablerError
5992-from allmydata.test.common import LoggingServiceParent
5993+from allmydata.mutable.layout import MDMFSlotWriteProxy, MDMFSlotReadProxy, \
5994+                                     LayoutInvalid, MDMFSIGNABLEHEADER, \
5995+                                     SIGNED_PREFIX, MDMFHEADER, \
5996+                                     MDMFOFFSETS, SDMFSlotWriteProxy
5997+from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \
5998+                                 SDMF_VERSION
5999+from allmydata.test.common import LoggingServiceParent, ShouldFailMixin
6000 from allmydata.test.common_web import WebRenderingMixin
6001 from allmydata.web.storage import StorageStatus, remove_prefix
6002 
6003hunk ./src/allmydata/test/test_storage.py 107
6004 
6005 class RemoteBucket:
6006 
6007+    def __init__(self):
6008+        self.read_count = 0
6009+        self.write_count = 0
6010+
6011     def callRemote(self, methname, *args, **kwargs):
6012         def _call():
6013             meth = getattr(self.target, "remote_" + methname)
6014hunk ./src/allmydata/test/test_storage.py 115
6015             return meth(*args, **kwargs)
6016+
6017+        if methname == "slot_readv":
6018+            self.read_count += 1
6019+        if "writev" in methname:
6020+            self.write_count += 1
6021+
6022         return defer.maybeDeferred(_call)
6023 
6024hunk ./src/allmydata/test/test_storage.py 123
6025+
6026 class BucketProxy(unittest.TestCase):
6027     def make_bucket(self, name, size):
6028         basedir = os.path.join("storage", "BucketProxy", name)
6029hunk ./src/allmydata/test/test_storage.py 1306
6030         self.failUnless(os.path.exists(prefixdir), prefixdir)
6031         self.failIf(os.path.exists(bucketdir), bucketdir)
6032 
6033+
6034+class MDMFProxies(unittest.TestCase, ShouldFailMixin):
6035+    def setUp(self):
6036+        self.sparent = LoggingServiceParent()
6037+        self._lease_secret = itertools.count()
6038+        self.ss = self.create("MDMFProxies storage test server")
6039+        self.rref = RemoteBucket()
6040+        self.rref.target = self.ss
6041+        self.secrets = (self.write_enabler("we_secret"),
6042+                        self.renew_secret("renew_secret"),
6043+                        self.cancel_secret("cancel_secret"))
6044+        self.segment = "aaaaaa"
6045+        self.block = "aa"
6046+        self.salt = "a" * 16
6047+        self.block_hash = "a" * 32
6048+        self.block_hash_tree = [self.block_hash for i in xrange(6)]
6049+        self.share_hash = self.block_hash
6050+        self.share_hash_chain = dict([(i, self.share_hash) for i in xrange(6)])
6051+        self.signature = "foobarbaz"
6052+        self.verification_key = "vvvvvv"
6053+        self.encprivkey = "private"
6054+        self.root_hash = self.block_hash
6055+        self.salt_hash = self.root_hash
6056+        self.salt_hash_tree = [self.salt_hash for i in xrange(6)]
6057+        self.block_hash_tree_s = self.serialize_blockhashes(self.block_hash_tree)
6058+        self.share_hash_chain_s = self.serialize_sharehashes(self.share_hash_chain)
6059+        # blockhashes and salt hashes are serialized in the same way,
6060+        # only we lop off the first element and store that in the
6061+        # header.
6062+        self.salt_hash_tree_s = self.serialize_blockhashes(self.salt_hash_tree[1:])
6063+
6064+
6065+    def tearDown(self):
6066+        self.sparent.stopService()
6067+        shutil.rmtree(self.workdir("MDMFProxies storage test server"))
6068+
6069+
6070+    def write_enabler(self, we_tag):
6071+        return hashutil.tagged_hash("we_blah", we_tag)
6072+
6073+
6074+    def renew_secret(self, tag):
6075+        return hashutil.tagged_hash("renew_blah", str(tag))
6076+
6077+
6078+    def cancel_secret(self, tag):
6079+        return hashutil.tagged_hash("cancel_blah", str(tag))
6080+
6081+
6082+    def workdir(self, name):
6083+        basedir = os.path.join("storage", "MutableServer", name)
6084+        return basedir
6085+
6086+
6087+    def create(self, name):
6088+        workdir = self.workdir(name)
6089+        ss = StorageServer(workdir, "\x00" * 20)
6090+        ss.setServiceParent(self.sparent)
6091+        return ss
6092+
6093+
6094+    def build_test_mdmf_share(self, tail_segment=False, empty=False):
6095+        # Start with the checkstring
6096+        data = struct.pack(">BQ32s",
6097+                           1,
6098+                           0,
6099+                           self.root_hash)
6100+        self.checkstring = data
6101+        # Next, the encoding parameters
6102+        if tail_segment:
6103+            data += struct.pack(">BBQQ",
6104+                                3,
6105+                                10,
6106+                                6,
6107+                                33)
6108+        elif empty:
6109+            data += struct.pack(">BBQQ",
6110+                                3,
6111+                                10,
6112+                                0,
6113+                                0)
6114+        else:
6115+            data += struct.pack(">BBQQ",
6116+                                3,
6117+                                10,
6118+                                6,
6119+                                36)
6120+        # Now we'll build the offsets.
6121+        sharedata = ""
6122+        if not tail_segment and not empty:
6123+            for i in xrange(6):
6124+                sharedata += self.salt + self.block
6125+        elif tail_segment:
6126+            for i in xrange(5):
6127+                sharedata += self.salt + self.block
6128+            sharedata += self.salt + "a"
6129+
6130+        # The encrypted private key comes after the shares + salts
6131+        offset_size = struct.calcsize(MDMFOFFSETS)
6132+        encrypted_private_key_offset = len(data) + offset_size + len(sharedata)
6133+        # The blockhashes come after the private key
6134+        blockhashes_offset = encrypted_private_key_offset + len(self.encprivkey)
6135+        # The sharehashes come after the salt hashes
6136+        sharehashes_offset = blockhashes_offset + len(self.block_hash_tree_s)
6137+        # The signature comes after the share hash chain
6138+        signature_offset = sharehashes_offset + len(self.share_hash_chain_s)
6139+        # The verification key comes after the signature
6140+        verification_offset = signature_offset + len(self.signature)
6141+        # The EOF comes after the verification key
6142+        eof_offset = verification_offset + len(self.verification_key)
6143+        data += struct.pack(MDMFOFFSETS,
6144+                            encrypted_private_key_offset,
6145+                            blockhashes_offset,
6146+                            sharehashes_offset,
6147+                            signature_offset,
6148+                            verification_offset,
6149+                            eof_offset)
6150+        self.offsets = {}
6151+        self.offsets['enc_privkey'] = encrypted_private_key_offset
6152+        self.offsets['block_hash_tree'] = blockhashes_offset
6153+        self.offsets['share_hash_chain'] = sharehashes_offset
6154+        self.offsets['signature'] = signature_offset
6155+        self.offsets['verification_key'] = verification_offset
6156+        self.offsets['EOF'] = eof_offset
6157+        # Next, we'll add in the salts and share data,
6158+        data += sharedata
6159+        # the private key,
6160+        data += self.encprivkey
6161+        # the block hash tree,
6162+        data += self.block_hash_tree_s
6163+        # the share hash chain,
6164+        data += self.share_hash_chain_s
6165+        # the signature,
6166+        data += self.signature
6167+        # and the verification key
6168+        data += self.verification_key
6169+        return data
6170+
6171+
6172+    def write_test_share_to_server(self,
6173+                                   storage_index,
6174+                                   tail_segment=False,
6175+                                   empty=False):
6176+        """
6177+        I write some data for the read tests to read to self.ss
6178+
6179+        If tail_segment=True, then I will write a share that has a
6180+        smaller tail segment than other segments.
6181+        """
6182+        write = self.ss.remote_slot_testv_and_readv_and_writev
6183+        data = self.build_test_mdmf_share(tail_segment, empty)
6184+        # Finally, we write the whole thing to the storage server in one
6185+        # pass.
6186+        testvs = [(0, 1, "eq", "")]
6187+        tws = {}
6188+        tws[0] = (testvs, [(0, data)], None)
6189+        readv = [(0, 1)]
6190+        results = write(storage_index, self.secrets, tws, readv)
6191+        self.failUnless(results[0])
6192+
6193+
6194+    def build_test_sdmf_share(self, empty=False):
6195+        if empty:
6196+            sharedata = ""
6197+        else:
6198+            sharedata = self.segment * 6
6199+        self.sharedata = sharedata
6200+        blocksize = len(sharedata) / 3
6201+        block = sharedata[:blocksize]
6202+        self.blockdata = block
6203+        prefix = struct.pack(">BQ32s16s BBQQ",
6204+                             0, # version,
6205+                             0,
6206+                             self.root_hash,
6207+                             self.salt,
6208+                             3,
6209+                             10,
6210+                             len(sharedata),
6211+                             len(sharedata),
6212+                            )
6213+        post_offset = struct.calcsize(">BQ32s16sBBQQLLLLQQ")
6214+        signature_offset = post_offset + len(self.verification_key)
6215+        sharehashes_offset = signature_offset + len(self.signature)
6216+        blockhashes_offset = sharehashes_offset + len(self.share_hash_chain_s)
6217+        sharedata_offset = blockhashes_offset + len(self.block_hash_tree_s)
6218+        encprivkey_offset = sharedata_offset + len(block)
6219+        eof_offset = encprivkey_offset + len(self.encprivkey)
6220+        offsets = struct.pack(">LLLLQQ",
6221+                              signature_offset,
6222+                              sharehashes_offset,
6223+                              blockhashes_offset,
6224+                              sharedata_offset,
6225+                              encprivkey_offset,
6226+                              eof_offset)
6227+        final_share = "".join([prefix,
6228+                           offsets,
6229+                           self.verification_key,
6230+                           self.signature,
6231+                           self.share_hash_chain_s,
6232+                           self.block_hash_tree_s,
6233+                           block,
6234+                           self.encprivkey])
6235+        self.offsets = {}
6236+        self.offsets['signature'] = signature_offset
6237+        self.offsets['share_hash_chain'] = sharehashes_offset
6238+        self.offsets['block_hash_tree'] = blockhashes_offset
6239+        self.offsets['share_data'] = sharedata_offset
6240+        self.offsets['enc_privkey'] = encprivkey_offset
6241+        self.offsets['EOF'] = eof_offset
6242+        return final_share
6243+
6244+
6245+    def write_sdmf_share_to_server(self,
6246+                                   storage_index,
6247+                                   empty=False):
6248+        # Some tests need SDMF shares to verify that we can still
6249+        # read them. This method writes one, which resembles but is not
6250+        assert self.rref
6251+        write = self.ss.remote_slot_testv_and_readv_and_writev
6252+        share = self.build_test_sdmf_share(empty)
6253+        testvs = [(0, 1, "eq", "")]
6254+        tws = {}
6255+        tws[0] = (testvs, [(0, share)], None)
6256+        readv = []
6257+        results = write(storage_index, self.secrets, tws, readv)
6258+        self.failUnless(results[0])
6259+
6260+
6261+    def test_read(self):
6262+        self.write_test_share_to_server("si1")
6263+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6264+        # Check that every method equals what we expect it to.
6265+        d = defer.succeed(None)
6266+        def _check_block_and_salt((block, salt)):
6267+            self.failUnlessEqual(block, self.block)
6268+            self.failUnlessEqual(salt, self.salt)
6269+
6270+        for i in xrange(6):
6271+            d.addCallback(lambda ignored, i=i:
6272+                mr.get_block_and_salt(i))
6273+            d.addCallback(_check_block_and_salt)
6274+
6275+        d.addCallback(lambda ignored:
6276+            mr.get_encprivkey())
6277+        d.addCallback(lambda encprivkey:
6278+            self.failUnlessEqual(self.encprivkey, encprivkey))
6279+
6280+        d.addCallback(lambda ignored:
6281+            mr.get_blockhashes())
6282+        d.addCallback(lambda blockhashes:
6283+            self.failUnlessEqual(self.block_hash_tree, blockhashes))
6284+
6285+        d.addCallback(lambda ignored:
6286+            mr.get_sharehashes())
6287+        d.addCallback(lambda sharehashes:
6288+            self.failUnlessEqual(self.share_hash_chain, sharehashes))
6289+
6290+        d.addCallback(lambda ignored:
6291+            mr.get_signature())
6292+        d.addCallback(lambda signature:
6293+            self.failUnlessEqual(signature, self.signature))
6294+
6295+        d.addCallback(lambda ignored:
6296+            mr.get_verification_key())
6297+        d.addCallback(lambda verification_key:
6298+            self.failUnlessEqual(verification_key, self.verification_key))
6299+
6300+        d.addCallback(lambda ignored:
6301+            mr.get_seqnum())
6302+        d.addCallback(lambda seqnum:
6303+            self.failUnlessEqual(seqnum, 0))
6304+
6305+        d.addCallback(lambda ignored:
6306+            mr.get_root_hash())
6307+        d.addCallback(lambda root_hash:
6308+            self.failUnlessEqual(self.root_hash, root_hash))
6309+
6310+        d.addCallback(lambda ignored:
6311+            mr.get_seqnum())
6312+        d.addCallback(lambda seqnum:
6313+            self.failUnlessEqual(0, seqnum))
6314+
6315+        d.addCallback(lambda ignored:
6316+            mr.get_encoding_parameters())
6317+        def _check_encoding_parameters((k, n, segsize, datalen)):
6318+            self.failUnlessEqual(k, 3)
6319+            self.failUnlessEqual(n, 10)
6320+            self.failUnlessEqual(segsize, 6)
6321+            self.failUnlessEqual(datalen, 36)
6322+        d.addCallback(_check_encoding_parameters)
6323+
6324+        d.addCallback(lambda ignored:
6325+            mr.get_checkstring())
6326+        d.addCallback(lambda checkstring:
6327+            self.failUnlessEqual(checkstring, checkstring))
6328+        return d
6329+
6330+
6331+    def test_read_with_different_tail_segment_size(self):
6332+        self.write_test_share_to_server("si1", tail_segment=True)
6333+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6334+        d = mr.get_block_and_salt(5)
6335+        def _check_tail_segment(results):
6336+            block, salt = results
6337+            self.failUnlessEqual(len(block), 1)
6338+            self.failUnlessEqual(block, "a")
6339+        d.addCallback(_check_tail_segment)
6340+        return d
6341+
6342+
6343+    def test_get_block_with_invalid_segnum(self):
6344+        self.write_test_share_to_server("si1")
6345+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6346+        d = defer.succeed(None)
6347+        d.addCallback(lambda ignored:
6348+            self.shouldFail(LayoutInvalid, "test invalid segnum",
6349+                            None,
6350+                            mr.get_block_and_salt, 7))
6351+        return d
6352+
6353+
6354+    def test_get_encoding_parameters_first(self):
6355+        self.write_test_share_to_server("si1")
6356+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6357+        d = mr.get_encoding_parameters()
6358+        def _check_encoding_parameters((k, n, segment_size, datalen)):
6359+            self.failUnlessEqual(k, 3)
6360+            self.failUnlessEqual(n, 10)
6361+            self.failUnlessEqual(segment_size, 6)
6362+            self.failUnlessEqual(datalen, 36)
6363+        d.addCallback(_check_encoding_parameters)
6364+        return d
6365+
6366+
6367+    def test_get_seqnum_first(self):
6368+        self.write_test_share_to_server("si1")
6369+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6370+        d = mr.get_seqnum()
6371+        d.addCallback(lambda seqnum:
6372+            self.failUnlessEqual(seqnum, 0))
6373+        return d
6374+
6375+
6376+    def test_get_root_hash_first(self):
6377+        self.write_test_share_to_server("si1")
6378+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6379+        d = mr.get_root_hash()
6380+        d.addCallback(lambda root_hash:
6381+            self.failUnlessEqual(root_hash, self.root_hash))
6382+        return d
6383+
6384+
6385+    def test_get_checkstring_first(self):
6386+        self.write_test_share_to_server("si1")
6387+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6388+        d = mr.get_checkstring()
6389+        d.addCallback(lambda checkstring:
6390+            self.failUnlessEqual(checkstring, self.checkstring))
6391+        return d
6392+
6393+
6394+    def test_write_read_vectors(self):
6395+        # When writing for us, the storage server will return to us a
6396+        # read vector, along with its result. If a write fails because
6397+        # the test vectors failed, this read vector can help us to
6398+        # diagnose the problem. This test ensures that the read vector
6399+        # is working appropriately.
6400+        mw = self._make_new_mw("si1", 0)
6401+
6402+        for i in xrange(6):
6403+            mw.put_block(self.block, i, self.salt)
6404+        mw.put_encprivkey(self.encprivkey)
6405+        mw.put_blockhashes(self.block_hash_tree)
6406+        mw.put_sharehashes(self.share_hash_chain)
6407+        mw.put_root_hash(self.root_hash)
6408+        mw.put_signature(self.signature)
6409+        mw.put_verification_key(self.verification_key)
6410+        d = mw.finish_publishing()
6411+        def _then(results):
6412+            self.failUnless(len(results), 2)
6413+            result, readv = results
6414+            self.failUnless(result)
6415+            self.failIf(readv)
6416+            self.old_checkstring = mw.get_checkstring()
6417+            mw.set_checkstring("")
6418+        d.addCallback(_then)
6419+        d.addCallback(lambda ignored:
6420+            mw.finish_publishing())
6421+        def _then_again(results):
6422+            self.failUnlessEqual(len(results), 2)
6423+            result, readvs = results
6424+            self.failIf(result)
6425+            self.failUnlessIn(0, readvs)
6426+            readv = readvs[0][0]
6427+            self.failUnlessEqual(readv, self.old_checkstring)
6428+        d.addCallback(_then_again)
6429+        # The checkstring remains the same for the rest of the process.
6430+        return d
6431+
6432+
6433+    def test_blockhashes_after_share_hash_chain(self):
6434+        mw = self._make_new_mw("si1", 0)
6435+        d = defer.succeed(None)
6436+        # Put everything up to and including the share hash chain
6437+        for i in xrange(6):
6438+            d.addCallback(lambda ignored, i=i:
6439+                mw.put_block(self.block, i, self.salt))
6440+        d.addCallback(lambda ignored:
6441+            mw.put_encprivkey(self.encprivkey))
6442+        d.addCallback(lambda ignored:
6443+            mw.put_blockhashes(self.block_hash_tree))
6444+        d.addCallback(lambda ignored:
6445+            mw.put_sharehashes(self.share_hash_chain))
6446+
6447+        # Now try to put the block hash tree again.
6448+        d.addCallback(lambda ignored:
6449+            self.shouldFail(LayoutInvalid, "test repeat salthashes",
6450+                            None,
6451+                            mw.put_blockhashes, self.block_hash_tree))
6452+        return d
6453+
6454+
6455+    def test_encprivkey_after_blockhashes(self):
6456+        mw = self._make_new_mw("si1", 0)
6457+        d = defer.succeed(None)
6458+        # Put everything up to and including the block hash tree
6459+        for i in xrange(6):
6460+            d.addCallback(lambda ignored, i=i:
6461+                mw.put_block(self.block, i, self.salt))
6462+        d.addCallback(lambda ignored:
6463+            mw.put_encprivkey(self.encprivkey))
6464+        d.addCallback(lambda ignored:
6465+            mw.put_blockhashes(self.block_hash_tree))
6466+        d.addCallback(lambda ignored:
6467+            self.shouldFail(LayoutInvalid, "out of order private key",
6468+                            None,
6469+                            mw.put_encprivkey, self.encprivkey))
6470+        return d
6471+
6472+
6473+    def test_share_hash_chain_after_signature(self):
6474+        mw = self._make_new_mw("si1", 0)
6475+        d = defer.succeed(None)
6476+        # Put everything up to and including the signature
6477+        for i in xrange(6):
6478+            d.addCallback(lambda ignored, i=i:
6479+                mw.put_block(self.block, i, self.salt))
6480+        d.addCallback(lambda ignored:
6481+            mw.put_encprivkey(self.encprivkey))
6482+        d.addCallback(lambda ignored:
6483+            mw.put_blockhashes(self.block_hash_tree))
6484+        d.addCallback(lambda ignored:
6485+            mw.put_sharehashes(self.share_hash_chain))
6486+        d.addCallback(lambda ignored:
6487+            mw.put_root_hash(self.root_hash))
6488+        d.addCallback(lambda ignored:
6489+            mw.put_signature(self.signature))
6490+        # Now try to put the share hash chain again. This should fail
6491+        d.addCallback(lambda ignored:
6492+            self.shouldFail(LayoutInvalid, "out of order share hash chain",
6493+                            None,
6494+                            mw.put_sharehashes, self.share_hash_chain))
6495+        return d
6496+
6497+
6498+    def test_signature_after_verification_key(self):
6499+        mw = self._make_new_mw("si1", 0)
6500+        d = defer.succeed(None)
6501+        # Put everything up to and including the verification key.
6502+        for i in xrange(6):
6503+            d.addCallback(lambda ignored, i=i:
6504+                mw.put_block(self.block, i, self.salt))
6505+        d.addCallback(lambda ignored:
6506+            mw.put_encprivkey(self.encprivkey))
6507+        d.addCallback(lambda ignored:
6508+            mw.put_blockhashes(self.block_hash_tree))
6509+        d.addCallback(lambda ignored:
6510+            mw.put_sharehashes(self.share_hash_chain))
6511+        d.addCallback(lambda ignored:
6512+            mw.put_root_hash(self.root_hash))
6513+        d.addCallback(lambda ignored:
6514+            mw.put_signature(self.signature))
6515+        d.addCallback(lambda ignored:
6516+            mw.put_verification_key(self.verification_key))
6517+        # Now try to put the signature again. This should fail
6518+        d.addCallback(lambda ignored:
6519+            self.shouldFail(LayoutInvalid, "signature after verification",
6520+                            None,
6521+                            mw.put_signature, self.signature))
6522+        return d
6523+
6524+
6525+    def test_uncoordinated_write(self):
6526+        # Make two mutable writers, both pointing to the same storage
6527+        # server, both at the same storage index, and try writing to the
6528+        # same share.
6529+        mw1 = self._make_new_mw("si1", 0)
6530+        mw2 = self._make_new_mw("si1", 0)
6531+
6532+        def _check_success(results):
6533+            result, readvs = results
6534+            self.failUnless(result)
6535+
6536+        def _check_failure(results):
6537+            result, readvs = results
6538+            self.failIf(result)
6539+
6540+        def _write_share(mw):
6541+            for i in xrange(6):
6542+                mw.put_block(self.block, i, self.salt)
6543+            mw.put_encprivkey(self.encprivkey)
6544+            mw.put_blockhashes(self.block_hash_tree)
6545+            mw.put_sharehashes(self.share_hash_chain)
6546+            mw.put_root_hash(self.root_hash)
6547+            mw.put_signature(self.signature)
6548+            mw.put_verification_key(self.verification_key)
6549+            return mw.finish_publishing()
6550+        d = _write_share(mw1)
6551+        d.addCallback(_check_success)
6552+        d.addCallback(lambda ignored:
6553+            _write_share(mw2))
6554+        d.addCallback(_check_failure)
6555+        return d
6556+
6557+
6558+    def test_invalid_salt_size(self):
6559+        # Salts need to be 16 bytes in size. Writes that attempt to
6560+        # write more or less than this should be rejected.
6561+        mw = self._make_new_mw("si1", 0)
6562+        invalid_salt = "a" * 17 # 17 bytes
6563+        another_invalid_salt = "b" * 15 # 15 bytes
6564+        d = defer.succeed(None)
6565+        d.addCallback(lambda ignored:
6566+            self.shouldFail(LayoutInvalid, "salt too big",
6567+                            None,
6568+                            mw.put_block, self.block, 0, invalid_salt))
6569+        d.addCallback(lambda ignored:
6570+            self.shouldFail(LayoutInvalid, "salt too small",
6571+                            None,
6572+                            mw.put_block, self.block, 0,
6573+                            another_invalid_salt))
6574+        return d
6575+
6576+
6577+    def test_write_test_vectors(self):
6578+        # If we give the write proxy a bogus test vector at
6579+        # any point during the process, it should fail to write when we
6580+        # tell it to write.
6581+        def _check_failure(results):
6582+            self.failUnlessEqual(len(results), 2)
6583+            res, d = results
6584+            self.failIf(res)
6585+
6586+        def _check_success(results):
6587+            self.failUnlessEqual(len(results), 2)
6588+            res, d = results
6589+            self.failUnless(results)
6590+
6591+        mw = self._make_new_mw("si1", 0)
6592+        mw.set_checkstring("this is a lie")
6593+        for i in xrange(6):
6594+            mw.put_block(self.block, i, self.salt)
6595+        mw.put_encprivkey(self.encprivkey)
6596+        mw.put_blockhashes(self.block_hash_tree)
6597+        mw.put_sharehashes(self.share_hash_chain)
6598+        mw.put_root_hash(self.root_hash)
6599+        mw.put_signature(self.signature)
6600+        mw.put_verification_key(self.verification_key)
6601+        d = mw.finish_publishing()
6602+        d.addCallback(_check_failure)
6603+        d.addCallback(lambda ignored:
6604+            mw.set_checkstring(""))
6605+        d.addCallback(lambda ignored:
6606+            mw.finish_publishing())
6607+        d.addCallback(_check_success)
6608+        return d
6609+
6610+
6611+    def serialize_blockhashes(self, blockhashes):
6612+        return "".join(blockhashes)
6613+
6614+
6615+    def serialize_sharehashes(self, sharehashes):
6616+        ret = "".join([struct.pack(">H32s", i, sharehashes[i])
6617+                        for i in sorted(sharehashes.keys())])
6618+        return ret
6619+
6620+
6621+    def test_write(self):
6622+        # This translates to a file with 6 6-byte segments, and with 2-byte
6623+        # blocks.
6624+        mw = self._make_new_mw("si1", 0)
6625+        # Test writing some blocks.
6626+        read = self.ss.remote_slot_readv
6627+        expected_sharedata_offset = struct.calcsize(MDMFHEADER)
6628+        written_block_size = 2 + len(self.salt)
6629+        written_block = self.block + self.salt
6630+        for i in xrange(6):
6631+            mw.put_block(self.block, i, self.salt)
6632+
6633+        mw.put_encprivkey(self.encprivkey)
6634+        mw.put_blockhashes(self.block_hash_tree)
6635+        mw.put_sharehashes(self.share_hash_chain)
6636+        mw.put_root_hash(self.root_hash)
6637+        mw.put_signature(self.signature)
6638+        mw.put_verification_key(self.verification_key)
6639+        d = mw.finish_publishing()
6640+        def _check_publish(results):
6641+            self.failUnlessEqual(len(results), 2)
6642+            result, ign = results
6643+            self.failUnless(result, "publish failed")
6644+            for i in xrange(6):
6645+                self.failUnlessEqual(read("si1", [0], [(expected_sharedata_offset + (i * written_block_size), written_block_size)]),
6646+                                {0: [written_block]})
6647+
6648+            expected_private_key_offset = expected_sharedata_offset + \
6649+                                      len(written_block) * 6
6650+            self.failUnlessEqual(len(self.encprivkey), 7)
6651+            self.failUnlessEqual(read("si1", [0], [(expected_private_key_offset, 7)]),
6652+                                 {0: [self.encprivkey]})
6653+
6654+            expected_block_hash_offset = expected_private_key_offset + len(self.encprivkey)
6655+            self.failUnlessEqual(len(self.block_hash_tree_s), 32 * 6)
6656+            self.failUnlessEqual(read("si1", [0], [(expected_block_hash_offset, 32 * 6)]),
6657+                                 {0: [self.block_hash_tree_s]})
6658+
6659+            expected_share_hash_offset = expected_block_hash_offset + len(self.block_hash_tree_s)
6660+            self.failUnlessEqual(read("si1", [0],[(expected_share_hash_offset, (32 + 2) * 6)]),
6661+                                 {0: [self.share_hash_chain_s]})
6662+
6663+            self.failUnlessEqual(read("si1", [0], [(9, 32)]),
6664+                                 {0: [self.root_hash]})
6665+            expected_signature_offset = expected_share_hash_offset + len(self.share_hash_chain_s)
6666+            self.failUnlessEqual(len(self.signature), 9)
6667+            self.failUnlessEqual(read("si1", [0], [(expected_signature_offset, 9)]),
6668+                                 {0: [self.signature]})
6669+
6670+            expected_verification_key_offset = expected_signature_offset + len(self.signature)
6671+            self.failUnlessEqual(len(self.verification_key), 6)
6672+            self.failUnlessEqual(read("si1", [0], [(expected_verification_key_offset, 6)]),
6673+                                 {0: [self.verification_key]})
6674+
6675+            signable = mw.get_signable()
6676+            verno, seq, roothash, k, n, segsize, datalen = \
6677+                                            struct.unpack(">BQ32sBBQQ",
6678+                                                          signable)
6679+            self.failUnlessEqual(verno, 1)
6680+            self.failUnlessEqual(seq, 0)
6681+            self.failUnlessEqual(roothash, self.root_hash)
6682+            self.failUnlessEqual(k, 3)
6683+            self.failUnlessEqual(n, 10)
6684+            self.failUnlessEqual(segsize, 6)
6685+            self.failUnlessEqual(datalen, 36)
6686+            expected_eof_offset = expected_verification_key_offset + len(self.verification_key)
6687+
6688+            # Check the version number to make sure that it is correct.
6689+            expected_version_number = struct.pack(">B", 1)
6690+            self.failUnlessEqual(read("si1", [0], [(0, 1)]),
6691+                                 {0: [expected_version_number]})
6692+            # Check the sequence number to make sure that it is correct
6693+            expected_sequence_number = struct.pack(">Q", 0)
6694+            self.failUnlessEqual(read("si1", [0], [(1, 8)]),
6695+                                 {0: [expected_sequence_number]})
6696+            # Check that the encoding parameters (k, N, segement size, data
6697+            # length) are what they should be. These are  3, 10, 6, 36
6698+            expected_k = struct.pack(">B", 3)
6699+            self.failUnlessEqual(read("si1", [0], [(41, 1)]),
6700+                                 {0: [expected_k]})
6701+            expected_n = struct.pack(">B", 10)
6702+            self.failUnlessEqual(read("si1", [0], [(42, 1)]),
6703+                                 {0: [expected_n]})
6704+            expected_segment_size = struct.pack(">Q", 6)
6705+            self.failUnlessEqual(read("si1", [0], [(43, 8)]),
6706+                                 {0: [expected_segment_size]})
6707+            expected_data_length = struct.pack(">Q", 36)
6708+            self.failUnlessEqual(read("si1", [0], [(51, 8)]),
6709+                                 {0: [expected_data_length]})
6710+            expected_offset = struct.pack(">Q", expected_private_key_offset)
6711+            self.failUnlessEqual(read("si1", [0], [(59, 8)]),
6712+                                 {0: [expected_offset]})
6713+            expected_offset = struct.pack(">Q", expected_block_hash_offset)
6714+            self.failUnlessEqual(read("si1", [0], [(67, 8)]),
6715+                                 {0: [expected_offset]})
6716+            expected_offset = struct.pack(">Q", expected_share_hash_offset)
6717+            self.failUnlessEqual(read("si1", [0], [(75, 8)]),
6718+                                 {0: [expected_offset]})
6719+            expected_offset = struct.pack(">Q", expected_signature_offset)
6720+            self.failUnlessEqual(read("si1", [0], [(83, 8)]),
6721+                                 {0: [expected_offset]})
6722+            expected_offset = struct.pack(">Q", expected_verification_key_offset)
6723+            self.failUnlessEqual(read("si1", [0], [(91, 8)]),
6724+                                 {0: [expected_offset]})
6725+            expected_offset = struct.pack(">Q", expected_eof_offset)
6726+            self.failUnlessEqual(read("si1", [0], [(99, 8)]),
6727+                                 {0: [expected_offset]})
6728+        d.addCallback(_check_publish)
6729+        return d
6730+
6731+    def _make_new_mw(self, si, share, datalength=36):
6732+        # This is a file of size 36 bytes. Since it has a segment
6733+        # size of 6, we know that it has 6 byte segments, which will
6734+        # be split into blocks of 2 bytes because our FEC k
6735+        # parameter is 3.
6736+        mw = MDMFSlotWriteProxy(share, self.rref, si, self.secrets, 0, 3, 10,
6737+                                6, datalength)
6738+        return mw
6739+
6740+
6741+    def test_write_rejected_with_too_many_blocks(self):
6742+        mw = self._make_new_mw("si0", 0)
6743+
6744+        # Try writing too many blocks. We should not be able to write
6745+        # more than 6
6746+        # blocks into each share.
6747+        d = defer.succeed(None)
6748+        for i in xrange(6):
6749+            d.addCallback(lambda ignored, i=i:
6750+                mw.put_block(self.block, i, self.salt))
6751+        d.addCallback(lambda ignored:
6752+            self.shouldFail(LayoutInvalid, "too many blocks",
6753+                            None,
6754+                            mw.put_block, self.block, 7, self.salt))
6755+        return d
6756+
6757+
6758+    def test_write_rejected_with_invalid_salt(self):
6759+        # Try writing an invalid salt. Salts are 16 bytes -- any more or
6760+        # less should cause an error.
6761+        mw = self._make_new_mw("si1", 0)
6762+        bad_salt = "a" * 17 # 17 bytes
6763+        d = defer.succeed(None)
6764+        d.addCallback(lambda ignored:
6765+            self.shouldFail(LayoutInvalid, "test_invalid_salt",
6766+                            None, mw.put_block, self.block, 7, bad_salt))
6767+        return d
6768+
6769+
6770+    def test_write_rejected_with_invalid_root_hash(self):
6771+        # Try writing an invalid root hash. This should be SHA256d, and
6772+        # 32 bytes long as a result.
6773+        mw = self._make_new_mw("si2", 0)
6774+        # 17 bytes != 32 bytes
6775+        invalid_root_hash = "a" * 17
6776+        d = defer.succeed(None)
6777+        # Before this test can work, we need to put some blocks + salts,
6778+        # a block hash tree, and a share hash tree. Otherwise, we'll see
6779+        # failures that match what we are looking for, but are caused by
6780+        # the constraints imposed on operation ordering.
6781+        for i in xrange(6):
6782+            d.addCallback(lambda ignored, i=i:
6783+                mw.put_block(self.block, i, self.salt))
6784+        d.addCallback(lambda ignored:
6785+            mw.put_encprivkey(self.encprivkey))
6786+        d.addCallback(lambda ignored:
6787+            mw.put_blockhashes(self.block_hash_tree))
6788+        d.addCallback(lambda ignored:
6789+            mw.put_sharehashes(self.share_hash_chain))
6790+        d.addCallback(lambda ignored:
6791+            self.shouldFail(LayoutInvalid, "invalid root hash",
6792+                            None, mw.put_root_hash, invalid_root_hash))
6793+        return d
6794+
6795+
6796+    def test_write_rejected_with_invalid_blocksize(self):
6797+        # The blocksize implied by the writer that we get from
6798+        # _make_new_mw is 2bytes -- any more or any less than this
6799+        # should be cause for failure, unless it is the tail segment, in
6800+        # which case it may not be failure.
6801+        invalid_block = "a"
6802+        mw = self._make_new_mw("si3", 0, 33) # implies a tail segment with
6803+                                             # one byte blocks
6804+        # 1 bytes != 2 bytes
6805+        d = defer.succeed(None)
6806+        d.addCallback(lambda ignored, invalid_block=invalid_block:
6807+            self.shouldFail(LayoutInvalid, "test blocksize too small",
6808+                            None, mw.put_block, invalid_block, 0,
6809+                            self.salt))
6810+        invalid_block = invalid_block * 3
6811+        # 3 bytes != 2 bytes
6812+        d.addCallback(lambda ignored:
6813+            self.shouldFail(LayoutInvalid, "test blocksize too large",
6814+                            None,
6815+                            mw.put_block, invalid_block, 0, self.salt))
6816+        for i in xrange(5):
6817+            d.addCallback(lambda ignored, i=i:
6818+                mw.put_block(self.block, i, self.salt))
6819+        # Try to put an invalid tail segment
6820+        d.addCallback(lambda ignored:
6821+            self.shouldFail(LayoutInvalid, "test invalid tail segment",
6822+                            None,
6823+                            mw.put_block, self.block, 5, self.salt))
6824+        valid_block = "a"
6825+        d.addCallback(lambda ignored:
6826+            mw.put_block(valid_block, 5, self.salt))
6827+        return d
6828+
6829+
6830+    def test_write_enforces_order_constraints(self):
6831+        # We require that the MDMFSlotWriteProxy be interacted with in a
6832+        # specific way.
6833+        # That way is:
6834+        # 0: __init__
6835+        # 1: write blocks and salts
6836+        # 2: Write the encrypted private key
6837+        # 3: Write the block hashes
6838+        # 4: Write the share hashes
6839+        # 5: Write the root hash and salt hash
6840+        # 6: Write the signature and verification key
6841+        # 7: Write the file.
6842+        #
6843+        # Some of these can be performed out-of-order, and some can't.
6844+        # The dependencies that I want to test here are:
6845+        #  - Private key before block hashes
6846+        #  - share hashes and block hashes before root hash
6847+        #  - root hash before signature
6848+        #  - signature before verification key
6849+        mw0 = self._make_new_mw("si0", 0)
6850+        # Write some shares
6851+        d = defer.succeed(None)
6852+        for i in xrange(6):
6853+            d.addCallback(lambda ignored, i=i:
6854+                mw0.put_block(self.block, i, self.salt))
6855+        # Try to write the block hashes before writing the encrypted
6856+        # private key
6857+        d.addCallback(lambda ignored:
6858+            self.shouldFail(LayoutInvalid, "block hashes before key",
6859+                            None, mw0.put_blockhashes,
6860+                            self.block_hash_tree))
6861+
6862+        # Write the private key.
6863+        d.addCallback(lambda ignored:
6864+            mw0.put_encprivkey(self.encprivkey))
6865+
6866+
6867+        # Try to write the share hash chain without writing the block
6868+        # hash tree
6869+        d.addCallback(lambda ignored:
6870+            self.shouldFail(LayoutInvalid, "share hash chain before "
6871+                                           "salt hash tree",
6872+                            None,
6873+                            mw0.put_sharehashes, self.share_hash_chain))
6874+
6875+        # Try to write the root hash and without writing either the
6876+        # block hashes or the or the share hashes
6877+        d.addCallback(lambda ignored:
6878+            self.shouldFail(LayoutInvalid, "root hash before share hashes",
6879+                            None,
6880+                            mw0.put_root_hash, self.root_hash))
6881+
6882+        # Now write the block hashes and try again
6883+        d.addCallback(lambda ignored:
6884+            mw0.put_blockhashes(self.block_hash_tree))
6885+
6886+        d.addCallback(lambda ignored:
6887+            self.shouldFail(LayoutInvalid, "root hash before share hashes",
6888+                            None, mw0.put_root_hash, self.root_hash))
6889+
6890+        # We haven't yet put the root hash on the share, so we shouldn't
6891+        # be able to sign it.
6892+        d.addCallback(lambda ignored:
6893+            self.shouldFail(LayoutInvalid, "signature before root hash",
6894+                            None, mw0.put_signature, self.signature))
6895+
6896+        d.addCallback(lambda ignored:
6897+            self.failUnlessRaises(LayoutInvalid, mw0.get_signable))
6898+
6899+        # ..and, since that fails, we also shouldn't be able to put the
6900+        # verification key.
6901+        d.addCallback(lambda ignored:
6902+            self.shouldFail(LayoutInvalid, "key before signature",
6903+                            None, mw0.put_verification_key,
6904+                            self.verification_key))
6905+
6906+        # Now write the share hashes.
6907+        d.addCallback(lambda ignored:
6908+            mw0.put_sharehashes(self.share_hash_chain))
6909+        # We should be able to write the root hash now too
6910+        d.addCallback(lambda ignored:
6911+            mw0.put_root_hash(self.root_hash))
6912+
6913+        # We should still be unable to put the verification key
6914+        d.addCallback(lambda ignored:
6915+            self.shouldFail(LayoutInvalid, "key before signature",
6916+                            None, mw0.put_verification_key,
6917+                            self.verification_key))
6918+
6919+        d.addCallback(lambda ignored:
6920+            mw0.put_signature(self.signature))
6921+
6922+        # We shouldn't be able to write the offsets to the remote server
6923+        # until the offset table is finished; IOW, until we have written
6924+        # the verification key.
6925+        d.addCallback(lambda ignored:
6926+            self.shouldFail(LayoutInvalid, "offsets before verification key",
6927+                            None,
6928+                            mw0.finish_publishing))
6929+
6930+        d.addCallback(lambda ignored:
6931+            mw0.put_verification_key(self.verification_key))
6932+        return d
6933+
6934+
6935+    def test_end_to_end(self):
6936+        mw = self._make_new_mw("si1", 0)
6937+        # Write a share using the mutable writer, and make sure that the
6938+        # reader knows how to read everything back to us.
6939+        d = defer.succeed(None)
6940+        for i in xrange(6):
6941+            d.addCallback(lambda ignored, i=i:
6942+                mw.put_block(self.block, i, self.salt))
6943+        d.addCallback(lambda ignored:
6944+            mw.put_encprivkey(self.encprivkey))
6945+        d.addCallback(lambda ignored:
6946+            mw.put_blockhashes(self.block_hash_tree))
6947+        d.addCallback(lambda ignored:
6948+            mw.put_sharehashes(self.share_hash_chain))
6949+        d.addCallback(lambda ignored:
6950+            mw.put_root_hash(self.root_hash))
6951+        d.addCallback(lambda ignored:
6952+            mw.put_signature(self.signature))
6953+        d.addCallback(lambda ignored:
6954+            mw.put_verification_key(self.verification_key))
6955+        d.addCallback(lambda ignored:
6956+            mw.finish_publishing())
6957+
6958+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6959+        def _check_block_and_salt((block, salt)):
6960+            self.failUnlessEqual(block, self.block)
6961+            self.failUnlessEqual(salt, self.salt)
6962+
6963+        for i in xrange(6):
6964+            d.addCallback(lambda ignored, i=i:
6965+                mr.get_block_and_salt(i))
6966+            d.addCallback(_check_block_and_salt)
6967+
6968+        d.addCallback(lambda ignored:
6969+            mr.get_encprivkey())
6970+        d.addCallback(lambda encprivkey:
6971+            self.failUnlessEqual(self.encprivkey, encprivkey))
6972+
6973+        d.addCallback(lambda ignored:
6974+            mr.get_blockhashes())
6975+        d.addCallback(lambda blockhashes:
6976+            self.failUnlessEqual(self.block_hash_tree, blockhashes))
6977+
6978+        d.addCallback(lambda ignored:
6979+            mr.get_sharehashes())
6980+        d.addCallback(lambda sharehashes:
6981+            self.failUnlessEqual(self.share_hash_chain, sharehashes))
6982+
6983+        d.addCallback(lambda ignored:
6984+            mr.get_signature())
6985+        d.addCallback(lambda signature:
6986+            self.failUnlessEqual(signature, self.signature))
6987+
6988+        d.addCallback(lambda ignored:
6989+            mr.get_verification_key())
6990+        d.addCallback(lambda verification_key:
6991+            self.failUnlessEqual(verification_key, self.verification_key))
6992+
6993+        d.addCallback(lambda ignored:
6994+            mr.get_seqnum())
6995+        d.addCallback(lambda seqnum:
6996+            self.failUnlessEqual(seqnum, 0))
6997+
6998+        d.addCallback(lambda ignored:
6999+            mr.get_root_hash())
7000+        d.addCallback(lambda root_hash:
7001+            self.failUnlessEqual(self.root_hash, root_hash))
7002+
7003+        d.addCallback(lambda ignored:
7004+            mr.get_encoding_parameters())
7005+        def _check_encoding_parameters((k, n, segsize, datalen)):
7006+            self.failUnlessEqual(k, 3)
7007+            self.failUnlessEqual(n, 10)
7008+            self.failUnlessEqual(segsize, 6)
7009+            self.failUnlessEqual(datalen, 36)
7010+        d.addCallback(_check_encoding_parameters)
7011+
7012+        d.addCallback(lambda ignored:
7013+            mr.get_checkstring())
7014+        d.addCallback(lambda checkstring:
7015+            self.failUnlessEqual(checkstring, mw.get_checkstring()))
7016+        return d
7017+
7018+
7019+    def test_is_sdmf(self):
7020+        # The MDMFSlotReadProxy should also know how to read SDMF files,
7021+        # since it will encounter them on the grid. Callers use the
7022+        # is_sdmf method to test this.
7023+        self.write_sdmf_share_to_server("si1")
7024+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7025+        d = mr.is_sdmf()
7026+        d.addCallback(lambda issdmf:
7027+            self.failUnless(issdmf))
7028+        return d
7029+
7030+
7031+    def test_reads_sdmf(self):
7032+        # The slot read proxy should, naturally, know how to tell us
7033+        # about data in the SDMF format
7034+        self.write_sdmf_share_to_server("si1")
7035+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7036+        d = defer.succeed(None)
7037+        d.addCallback(lambda ignored:
7038+            mr.is_sdmf())
7039+        d.addCallback(lambda issdmf:
7040+            self.failUnless(issdmf))
7041+
7042+        # What do we need to read?
7043+        #  - The sharedata
7044+        #  - The salt
7045+        d.addCallback(lambda ignored:
7046+            mr.get_block_and_salt(0))
7047+        def _check_block_and_salt(results):
7048+            block, salt = results
7049+            # Our original file is 36 bytes long. Then each share is 12
7050+            # bytes in size. The share is composed entirely of the
7051+            # letter a. self.block contains 2 as, so 6 * self.block is
7052+            # what we are looking for.
7053+            self.failUnlessEqual(block, self.block * 6)
7054+            self.failUnlessEqual(salt, self.salt)
7055+        d.addCallback(_check_block_and_salt)
7056+
7057+        #  - The blockhashes
7058+        d.addCallback(lambda ignored:
7059+            mr.get_blockhashes())
7060+        d.addCallback(lambda blockhashes:
7061+            self.failUnlessEqual(self.block_hash_tree,
7062+                                 blockhashes,
7063+                                 blockhashes))
7064+        #  - The sharehashes
7065+        d.addCallback(lambda ignored:
7066+            mr.get_sharehashes())
7067+        d.addCallback(lambda sharehashes:
7068+            self.failUnlessEqual(self.share_hash_chain,
7069+                                 sharehashes))
7070+        #  - The keys
7071+        d.addCallback(lambda ignored:
7072+            mr.get_encprivkey())
7073+        d.addCallback(lambda encprivkey:
7074+            self.failUnlessEqual(encprivkey, self.encprivkey, encprivkey))
7075+        d.addCallback(lambda ignored:
7076+            mr.get_verification_key())
7077+        d.addCallback(lambda verification_key:
7078+            self.failUnlessEqual(verification_key,
7079+                                 self.verification_key,
7080+                                 verification_key))
7081+        #  - The signature
7082+        d.addCallback(lambda ignored:
7083+            mr.get_signature())
7084+        d.addCallback(lambda signature:
7085+            self.failUnlessEqual(signature, self.signature, signature))
7086+
7087+        #  - The sequence number
7088+        d.addCallback(lambda ignored:
7089+            mr.get_seqnum())
7090+        d.addCallback(lambda seqnum:
7091+            self.failUnlessEqual(seqnum, 0, seqnum))
7092+
7093+        #  - The root hash
7094+        d.addCallback(lambda ignored:
7095+            mr.get_root_hash())
7096+        d.addCallback(lambda root_hash:
7097+            self.failUnlessEqual(root_hash, self.root_hash, root_hash))
7098+        return d
7099+
7100+
7101+    def test_only_reads_one_segment_sdmf(self):
7102+        # SDMF shares have only one segment, so it doesn't make sense to
7103+        # read more segments than that. The reader should know this and
7104+        # complain if we try to do that.
7105+        self.write_sdmf_share_to_server("si1")
7106+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7107+        d = defer.succeed(None)
7108+        d.addCallback(lambda ignored:
7109+            mr.is_sdmf())
7110+        d.addCallback(lambda issdmf:
7111+            self.failUnless(issdmf))
7112+        d.addCallback(lambda ignored:
7113+            self.shouldFail(LayoutInvalid, "test bad segment",
7114+                            None,
7115+                            mr.get_block_and_salt, 1))
7116+        return d
7117+
7118+
7119+    def test_read_with_prefetched_mdmf_data(self):
7120+        # The MDMFSlotReadProxy will prefill certain fields if you pass
7121+        # it data that you have already fetched. This is useful for
7122+        # cases like the Servermap, which prefetches ~2kb of data while
7123+        # finding out which shares are on the remote peer so that it
7124+        # doesn't waste round trips.
7125+        mdmf_data = self.build_test_mdmf_share()
7126+        self.write_test_share_to_server("si1")
7127+        def _make_mr(ignored, length):
7128+            mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:length])
7129+            return mr
7130+
7131+        d = defer.succeed(None)
7132+        # This should be enough to fill in both the encoding parameters
7133+        # and the table of offsets, which will complete the version
7134+        # information tuple.
7135+        d.addCallback(_make_mr, 107)
7136+        d.addCallback(lambda mr:
7137+            mr.get_verinfo())
7138+        def _check_verinfo(verinfo):
7139+            self.failUnless(verinfo)
7140+            self.failUnlessEqual(len(verinfo), 9)
7141+            (seqnum,
7142+             root_hash,
7143+             salt_hash,
7144+             segsize,
7145+             datalen,
7146+             k,
7147+             n,
7148+             prefix,
7149+             offsets) = verinfo
7150+            self.failUnlessEqual(seqnum, 0)
7151+            self.failUnlessEqual(root_hash, self.root_hash)
7152+            self.failUnlessEqual(segsize, 6)
7153+            self.failUnlessEqual(datalen, 36)
7154+            self.failUnlessEqual(k, 3)
7155+            self.failUnlessEqual(n, 10)
7156+            expected_prefix = struct.pack(MDMFSIGNABLEHEADER,
7157+                                          1,
7158+                                          seqnum,
7159+                                          root_hash,
7160+                                          k,
7161+                                          n,
7162+                                          segsize,
7163+                                          datalen)
7164+            self.failUnlessEqual(expected_prefix, prefix)
7165+            self.failUnlessEqual(self.rref.read_count, 0)
7166+        d.addCallback(_check_verinfo)
7167+        # This is not enough data to read a block and a share, so the
7168+        # wrapper should attempt to read this from the remote server.
7169+        d.addCallback(_make_mr, 107)
7170+        d.addCallback(lambda mr:
7171+            mr.get_block_and_salt(0))
7172+        def _check_block_and_salt((block, salt)):
7173+            self.failUnlessEqual(block, self.block)
7174+            self.failUnlessEqual(salt, self.salt)
7175+            self.failUnlessEqual(self.rref.read_count, 1)
7176+        # This should be enough data to read one block.
7177+        d.addCallback(_make_mr, 249)
7178+        d.addCallback(lambda mr:
7179+            mr.get_block_and_salt(0))
7180+        d.addCallback(_check_block_and_salt)
7181+        return d
7182+
7183+
7184+    def test_read_with_prefetched_sdmf_data(self):
7185+        sdmf_data = self.build_test_sdmf_share()
7186+        self.write_sdmf_share_to_server("si1")
7187+        def _make_mr(ignored, length):
7188+            mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:length])
7189+            return mr
7190+
7191+        d = defer.succeed(None)
7192+        # This should be enough to get us the encoding parameters,
7193+        # offset table, and everything else we need to build a verinfo
7194+        # string.
7195+        d.addCallback(_make_mr, 107)
7196+        d.addCallback(lambda mr:
7197+            mr.get_verinfo())
7198+        def _check_verinfo(verinfo):
7199+            self.failUnless(verinfo)
7200+            self.failUnlessEqual(len(verinfo), 9)
7201+            (seqnum,
7202+             root_hash,
7203+             salt,
7204+             segsize,
7205+             datalen,
7206+             k,
7207+             n,
7208+             prefix,
7209+             offsets) = verinfo
7210+            self.failUnlessEqual(seqnum, 0)
7211+            self.failUnlessEqual(root_hash, self.root_hash)
7212+            self.failUnlessEqual(salt, self.salt)
7213+            self.failUnlessEqual(segsize, 36)
7214+            self.failUnlessEqual(datalen, 36)
7215+            self.failUnlessEqual(k, 3)
7216+            self.failUnlessEqual(n, 10)
7217+            expected_prefix = struct.pack(SIGNED_PREFIX,
7218+                                          0,
7219+                                          seqnum,
7220+                                          root_hash,
7221+                                          salt,
7222+                                          k,
7223+                                          n,
7224+                                          segsize,
7225+                                          datalen)
7226+            self.failUnlessEqual(expected_prefix, prefix)
7227+            self.failUnlessEqual(self.rref.read_count, 0)
7228+        d.addCallback(_check_verinfo)
7229+        # This shouldn't be enough to read any share data.
7230+        d.addCallback(_make_mr, 107)
7231+        d.addCallback(lambda mr:
7232+            mr.get_block_and_salt(0))
7233+        def _check_block_and_salt((block, salt)):
7234+            self.failUnlessEqual(block, self.block * 6)
7235+            self.failUnlessEqual(salt, self.salt)
7236+            # TODO: Fix the read routine so that it reads only the data
7237+            #       that it has cached if it can't read all of it.
7238+            self.failUnlessEqual(self.rref.read_count, 2)
7239+
7240+        # This should be enough to read share data.
7241+        d.addCallback(_make_mr, self.offsets['share_data'])
7242+        d.addCallback(lambda mr:
7243+            mr.get_block_and_salt(0))
7244+        d.addCallback(_check_block_and_salt)
7245+        return d
7246+
7247+
7248+    def test_read_with_empty_mdmf_file(self):
7249+        # Some tests upload a file with no contents to test things
7250+        # unrelated to the actual handling of the content of the file.
7251+        # The reader should behave intelligently in these cases.
7252+        self.write_test_share_to_server("si1", empty=True)
7253+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7254+        # We should be able to get the encoding parameters, and they
7255+        # should be correct.
7256+        d = defer.succeed(None)
7257+        d.addCallback(lambda ignored:
7258+            mr.get_encoding_parameters())
7259+        def _check_encoding_parameters(params):
7260+            self.failUnlessEqual(len(params), 4)
7261+            k, n, segsize, datalen = params
7262+            self.failUnlessEqual(k, 3)
7263+            self.failUnlessEqual(n, 10)
7264+            self.failUnlessEqual(segsize, 0)
7265+            self.failUnlessEqual(datalen, 0)
7266+        d.addCallback(_check_encoding_parameters)
7267+
7268+        # We should not be able to fetch a block, since there are no
7269+        # blocks to fetch
7270+        d.addCallback(lambda ignored:
7271+            self.shouldFail(LayoutInvalid, "get block on empty file",
7272+                            None,
7273+                            mr.get_block_and_salt, 0))
7274+        return d
7275+
7276+
7277+    def test_read_with_empty_sdmf_file(self):
7278+        self.write_sdmf_share_to_server("si1", empty=True)
7279+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7280+        # We should be able to get the encoding parameters, and they
7281+        # should be correct
7282+        d = defer.succeed(None)
7283+        d.addCallback(lambda ignored:
7284+            mr.get_encoding_parameters())
7285+        def _check_encoding_parameters(params):
7286+            self.failUnlessEqual(len(params), 4)
7287+            k, n, segsize, datalen = params
7288+            self.failUnlessEqual(k, 3)
7289+            self.failUnlessEqual(n, 10)
7290+            self.failUnlessEqual(segsize, 0)
7291+            self.failUnlessEqual(datalen, 0)
7292+        d.addCallback(_check_encoding_parameters)
7293+
7294+        # It does not make sense to get a block in this format, so we
7295+        # should not be able to.
7296+        d.addCallback(lambda ignored:
7297+            self.shouldFail(LayoutInvalid, "get block on an empty file",
7298+                            None,
7299+                            mr.get_block_and_salt, 0))
7300+        return d
7301+
7302+
7303+    def test_verinfo_with_sdmf_file(self):
7304+        self.write_sdmf_share_to_server("si1")
7305+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7306+        # We should be able to get the version information.
7307+        d = defer.succeed(None)
7308+        d.addCallback(lambda ignored:
7309+            mr.get_verinfo())
7310+        def _check_verinfo(verinfo):
7311+            self.failUnless(verinfo)
7312+            self.failUnlessEqual(len(verinfo), 9)
7313+            (seqnum,
7314+             root_hash,
7315+             salt,
7316+             segsize,
7317+             datalen,
7318+             k,
7319+             n,
7320+             prefix,
7321+             offsets) = verinfo
7322+            self.failUnlessEqual(seqnum, 0)
7323+            self.failUnlessEqual(root_hash, self.root_hash)
7324+            self.failUnlessEqual(salt, self.salt)
7325+            self.failUnlessEqual(segsize, 36)
7326+            self.failUnlessEqual(datalen, 36)
7327+            self.failUnlessEqual(k, 3)
7328+            self.failUnlessEqual(n, 10)
7329+            expected_prefix = struct.pack(">BQ32s16s BBQQ",
7330+                                          0,
7331+                                          seqnum,
7332+                                          root_hash,
7333+                                          salt,
7334+                                          k,
7335+                                          n,
7336+                                          segsize,
7337+                                          datalen)
7338+            self.failUnlessEqual(prefix, expected_prefix)
7339+            self.failUnlessEqual(offsets, self.offsets)
7340+        d.addCallback(_check_verinfo)
7341+        return d
7342+
7343+
7344+    def test_verinfo_with_mdmf_file(self):
7345+        self.write_test_share_to_server("si1")
7346+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7347+        d = defer.succeed(None)
7348+        d.addCallback(lambda ignored:
7349+            mr.get_verinfo())
7350+        def _check_verinfo(verinfo):
7351+            self.failUnless(verinfo)
7352+            self.failUnlessEqual(len(verinfo), 9)
7353+            (seqnum,
7354+             root_hash,
7355+             IV,
7356+             segsize,
7357+             datalen,
7358+             k,
7359+             n,
7360+             prefix,
7361+             offsets) = verinfo
7362+            self.failUnlessEqual(seqnum, 0)
7363+            self.failUnlessEqual(root_hash, self.root_hash)
7364+            self.failIf(IV)
7365+            self.failUnlessEqual(segsize, 6)
7366+            self.failUnlessEqual(datalen, 36)
7367+            self.failUnlessEqual(k, 3)
7368+            self.failUnlessEqual(n, 10)
7369+            expected_prefix = struct.pack(">BQ32s BBQQ",
7370+                                          1,
7371+                                          seqnum,
7372+                                          root_hash,
7373+                                          k,
7374+                                          n,
7375+                                          segsize,
7376+                                          datalen)
7377+            self.failUnlessEqual(prefix, expected_prefix)
7378+            self.failUnlessEqual(offsets, self.offsets)
7379+        d.addCallback(_check_verinfo)
7380+        return d
7381+
7382+
7383+    def test_reader_queue(self):
7384+        self.write_test_share_to_server('si1')
7385+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7386+        d1 = mr.get_block_and_salt(0, queue=True)
7387+        d2 = mr.get_blockhashes(queue=True)
7388+        d3 = mr.get_sharehashes(queue=True)
7389+        d4 = mr.get_signature(queue=True)
7390+        d5 = mr.get_verification_key(queue=True)
7391+        dl = defer.DeferredList([d1, d2, d3, d4, d5])
7392+        mr.flush()
7393+        def _print(results):
7394+            self.failUnlessEqual(len(results), 5)
7395+            # We have one read for version information and offsets, and
7396+            # one for everything else.
7397+            self.failUnlessEqual(self.rref.read_count, 2)
7398+            block, salt = results[0][1] # results[0] is a boolean that says
7399+                                           # whether or not the operation
7400+                                           # worked.
7401+            self.failUnlessEqual(self.block, block)
7402+            self.failUnlessEqual(self.salt, salt)
7403+
7404+            blockhashes = results[1][1]
7405+            self.failUnlessEqual(self.block_hash_tree, blockhashes)
7406+
7407+            sharehashes = results[2][1]
7408+            self.failUnlessEqual(self.share_hash_chain, sharehashes)
7409+
7410+            signature = results[3][1]
7411+            self.failUnlessEqual(self.signature, signature)
7412+
7413+            verification_key = results[4][1]
7414+            self.failUnlessEqual(self.verification_key, verification_key)
7415+        dl.addCallback(_print)
7416+        return dl
7417+
7418+
7419+    def test_sdmf_writer(self):
7420+        # Go through the motions of writing an SDMF share to the storage
7421+        # server. Then read the storage server to see that the share got
7422+        # written in the way that we think it should have.
7423+
7424+        # We do this first so that the necessary instance variables get
7425+        # set the way we want them for the tests below.
7426+        data = self.build_test_sdmf_share()
7427+        sdmfr = SDMFSlotWriteProxy(0,
7428+                                   self.rref,
7429+                                   "si1",
7430+                                   self.secrets,
7431+                                   0, 3, 10, 36, 36)
7432+        # Put the block and salt.
7433+        sdmfr.put_block(self.blockdata, 0, self.salt)
7434+
7435+        # Put the encprivkey
7436+        sdmfr.put_encprivkey(self.encprivkey)
7437+
7438+        # Put the block and share hash chains
7439+        sdmfr.put_blockhashes(self.block_hash_tree)
7440+        sdmfr.put_sharehashes(self.share_hash_chain)
7441+        sdmfr.put_root_hash(self.root_hash)
7442+
7443+        # Put the signature
7444+        sdmfr.put_signature(self.signature)
7445+
7446+        # Put the verification key
7447+        sdmfr.put_verification_key(self.verification_key)
7448+
7449+        # Now check to make sure that nothing has been written yet.
7450+        self.failUnlessEqual(self.rref.write_count, 0)
7451+
7452+        # Now finish publishing
7453+        d = sdmfr.finish_publishing()
7454+        def _then(ignored):
7455+            self.failUnlessEqual(self.rref.write_count, 1)
7456+            read = self.ss.remote_slot_readv
7457+            self.failUnlessEqual(read("si1", [0], [(0, len(data))]),
7458+                                 {0: [data]})
7459+        d.addCallback(_then)
7460+        return d
7461+
7462+
7463+    def test_sdmf_writer_preexisting_share(self):
7464+        data = self.build_test_sdmf_share()
7465+        self.write_sdmf_share_to_server("si1")
7466+
7467+        # Now there is a share on the storage server. To successfully
7468+        # write, we need to set the checkstring correctly. When we
7469+        # don't, no write should occur.
7470+        sdmfw = SDMFSlotWriteProxy(0,
7471+                                   self.rref,
7472+                                   "si1",
7473+                                   self.secrets,
7474+                                   1, 3, 10, 36, 36)
7475+        sdmfw.put_block(self.blockdata, 0, self.salt)
7476+
7477+        # Put the encprivkey
7478+        sdmfw.put_encprivkey(self.encprivkey)
7479+
7480+        # Put the block and share hash chains
7481+        sdmfw.put_blockhashes(self.block_hash_tree)
7482+        sdmfw.put_sharehashes(self.share_hash_chain)
7483+
7484+        # Put the root hash
7485+        sdmfw.put_root_hash(self.root_hash)
7486+
7487+        # Put the signature
7488+        sdmfw.put_signature(self.signature)
7489+
7490+        # Put the verification key
7491+        sdmfw.put_verification_key(self.verification_key)
7492+
7493+        # We shouldn't have a checkstring yet
7494+        self.failUnlessEqual(sdmfw.get_checkstring(), "")
7495+
7496+        d = sdmfw.finish_publishing()
7497+        def _then(results):
7498+            self.failIf(results[0])
7499+            # this is the correct checkstring
7500+            self._expected_checkstring = results[1][0][0]
7501+            return self._expected_checkstring
7502+
7503+        d.addCallback(_then)
7504+        d.addCallback(sdmfw.set_checkstring)
7505+        d.addCallback(lambda ignored:
7506+            sdmfw.get_checkstring())
7507+        d.addCallback(lambda checkstring:
7508+            self.failUnlessEqual(checkstring, self._expected_checkstring))
7509+        d.addCallback(lambda ignored:
7510+            sdmfw.finish_publishing())
7511+        def _then_again(results):
7512+            self.failUnless(results[0])
7513+            read = self.ss.remote_slot_readv
7514+            self.failUnlessEqual(read("si1", [0], [(1, 8)]),
7515+                                 {0: [struct.pack(">Q", 1)]})
7516+            self.failUnlessEqual(read("si1", [0], [(9, len(data) - 9)]),
7517+                                 {0: [data[9:]]})
7518+        d.addCallback(_then_again)
7519+        return d
7520+
7521+
7522 class Stats(unittest.TestCase):
7523 
7524     def setUp(self):
7525}
7526[mutable/retrieve.py: Modify the retrieval process to support MDMF
7527Kevan Carstensen <kevan@isnotajoke.com>**20100819003409
7528 Ignore-this: c03f4e41aaa0366a9bf44847f2caf9db
7529 
7530 The logic behind a mutable file download had to be adapted to work with
7531 segmented mutable files; this patch performs those adaptations. It also
7532 exposes some decoding and decrypting functionality to make partial-file
7533 updates a little easier, and supports efficient random-access downloads
7534 of parts of an MDMF file.
7535] {
7536hunk ./src/allmydata/mutable/retrieve.py 2
7537 
7538-import struct, time
7539+import time
7540 from itertools import count
7541 from zope.interface import implements
7542 from twisted.internet import defer
7543merger 0.0 (
7544hunk ./src/allmydata/mutable/retrieve.py 10
7545+from allmydata.util.dictutil import DictOfSets
7546hunk ./src/allmydata/mutable/retrieve.py 7
7547-from foolscap.api import DeadReferenceError, eventually, fireEventually
7548-from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError
7549-from allmydata.util import hashutil, idlib, log
7550+from twisted.internet.interfaces import IPushProducer, IConsumer
7551+from foolscap.api import eventually, fireEventually
7552+from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError, \
7553+                                 MDMF_VERSION, SDMF_VERSION
7554+from allmydata.util import hashutil, log, mathutil
7555)
7556hunk ./src/allmydata/mutable/retrieve.py 16
7557 from pycryptopp.publickey import rsa
7558 
7559 from allmydata.mutable.common import CorruptShareError, UncoordinatedWriteError
7560-from allmydata.mutable.layout import SIGNED_PREFIX, unpack_share_data
7561+from allmydata.mutable.layout import MDMFSlotReadProxy
7562 
7563 class RetrieveStatus:
7564     implements(IRetrieveStatus)
7565hunk ./src/allmydata/mutable/retrieve.py 83
7566     # times, and each will have a separate response chain. However the
7567     # Retrieve object will remain tied to a specific version of the file, and
7568     # will use a single ServerMap instance.
7569+    implements(IPushProducer)
7570 
7571hunk ./src/allmydata/mutable/retrieve.py 85
7572-    def __init__(self, filenode, servermap, verinfo, fetch_privkey=False):
7573+    def __init__(self, filenode, servermap, verinfo, fetch_privkey=False,
7574+                 verify=False):
7575         self._node = filenode
7576         assert self._node.get_pubkey()
7577         self._storage_index = filenode.get_storage_index()
7578hunk ./src/allmydata/mutable/retrieve.py 104
7579         self.verinfo = verinfo
7580         # during repair, we may be called upon to grab the private key, since
7581         # it wasn't picked up during a verify=False checker run, and we'll
7582-        # need it for repair to generate the a new version.
7583-        self._need_privkey = fetch_privkey
7584-        if self._node.get_privkey():
7585+        # need it for repair to generate a new version.
7586+        self._need_privkey = fetch_privkey or verify
7587+        if self._node.get_privkey() and not verify:
7588             self._need_privkey = False
7589 
7590hunk ./src/allmydata/mutable/retrieve.py 109
7591+        if self._need_privkey:
7592+            # TODO: Evaluate the need for this. We'll use it if we want
7593+            # to limit how many queries are on the wire for the privkey
7594+            # at once.
7595+            self._privkey_query_markers = [] # one Marker for each time we've
7596+                                             # tried to get the privkey.
7597+
7598+        # verify means that we are using the downloader logic to verify all
7599+        # of our shares. This tells the downloader a few things.
7600+        #
7601+        # 1. We need to download all of the shares.
7602+        # 2. We don't need to decode or decrypt the shares, since our
7603+        #    caller doesn't care about the plaintext, only the
7604+        #    information about which shares are or are not valid.
7605+        # 3. When we are validating readers, we need to validate the
7606+        #    signature on the prefix. Do we? We already do this in the
7607+        #    servermap update?
7608+        self._verify = False
7609+        if verify:
7610+            self._verify = True
7611+
7612         self._status = RetrieveStatus()
7613         self._status.set_storage_index(self._storage_index)
7614         self._status.set_helper(False)
7615hunk ./src/allmydata/mutable/retrieve.py 139
7616          offsets_tuple) = self.verinfo
7617         self._status.set_size(datalength)
7618         self._status.set_encoding(k, N)
7619+        self.readers = {}
7620+        self._paused = False
7621+        self._paused_deferred = None
7622+        self._offset = None
7623+        self._read_length = None
7624+        self.log("got seqnum %d" % self.verinfo[0])
7625+
7626 
7627     def get_status(self):
7628         return self._status
7629hunk ./src/allmydata/mutable/retrieve.py 157
7630             kwargs["facility"] = "tahoe.mutable.retrieve"
7631         return log.msg(*args, **kwargs)
7632 
7633-    def download(self):
7634+
7635+    ###################
7636+    # IPushProducer
7637+
7638+    def pauseProducing(self):
7639+        """
7640+        I am called by my download target if we have produced too much
7641+        data for it to handle. I make the downloader stop producing new
7642+        data until my resumeProducing method is called.
7643+        """
7644+        if self._paused:
7645+            return
7646+
7647+        # fired when the download is unpaused.
7648+        self._old_status = self._status.get_status()
7649+        self._status.set_status("Paused")
7650+
7651+        self._pause_deferred = defer.Deferred()
7652+        self._paused = True
7653+
7654+
7655+    def resumeProducing(self):
7656+        """
7657+        I am called by my download target once it is ready to begin
7658+        receiving data again.
7659+        """
7660+        if not self._paused:
7661+            return
7662+
7663+        self._paused = False
7664+        p = self._pause_deferred
7665+        self._pause_deferred = None
7666+        self._status.set_status(self._old_status)
7667+
7668+        eventually(p.callback, None)
7669+
7670+
7671+    def _check_for_paused(self, res):
7672+        """
7673+        I am called just before a write to the consumer. I return a
7674+        Deferred that eventually fires with the data that is to be
7675+        written to the consumer. If the download has not been paused,
7676+        the Deferred fires immediately. Otherwise, the Deferred fires
7677+        when the downloader is unpaused.
7678+        """
7679+        if self._paused:
7680+            d = defer.Deferred()
7681+            self._pause_defered.addCallback(lambda ignored: d.callback(res))
7682+            return d
7683+        return defer.succeed(res)
7684+
7685+
7686+    def download(self, consumer=None, offset=0, size=None):
7687+        assert IConsumer.providedBy(consumer) or self._verify
7688+
7689+        if consumer:
7690+            self._consumer = consumer
7691+            # we provide IPushProducer, so streaming=True, per
7692+            # IConsumer.
7693+            self._consumer.registerProducer(self, streaming=True)
7694+
7695         self._done_deferred = defer.Deferred()
7696         self._started = time.time()
7697         self._status.set_status("Retrieving Shares")
7698hunk ./src/allmydata/mutable/retrieve.py 222
7699 
7700+        self._offset = offset
7701+        self._read_length = size
7702+
7703         # first, which servers can we use?
7704         versionmap = self.servermap.make_versionmap()
7705         shares = versionmap[self.verinfo]
7706hunk ./src/allmydata/mutable/retrieve.py 232
7707         self.remaining_sharemap = DictOfSets()
7708         for (shnum, peerid, timestamp) in shares:
7709             self.remaining_sharemap.add(shnum, peerid)
7710+            # If the servermap update fetched anything, it fetched at least 1
7711+            # KiB, so we ask for that much.
7712+            # TODO: Change the cache methods to allow us to fetch all of the
7713+            # data that they have, then change this method to do that.
7714+            any_cache, timestamp = self._node._read_from_cache(self.verinfo,
7715+                                                               shnum,
7716+                                                               0,
7717+                                                               1000)
7718+            ss = self.servermap.connections[peerid]
7719+            reader = MDMFSlotReadProxy(ss,
7720+                                       self._storage_index,
7721+                                       shnum,
7722+                                       any_cache)
7723+            reader.peerid = peerid
7724+            self.readers[shnum] = reader
7725+
7726 
7727         self.shares = {} # maps shnum to validated blocks
7728hunk ./src/allmydata/mutable/retrieve.py 250
7729+        self._active_readers = [] # list of active readers for this dl.
7730+        self._validated_readers = set() # set of readers that we have
7731+                                        # validated the prefix of
7732+        self._block_hash_trees = {} # shnum => hashtree
7733 
7734         # how many shares do we need?
7735hunk ./src/allmydata/mutable/retrieve.py 256
7736-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
7737+        (seqnum,
7738+         root_hash,
7739+         IV,
7740+         segsize,
7741+         datalength,
7742+         k,
7743+         N,
7744+         prefix,
7745          offsets_tuple) = self.verinfo
7746hunk ./src/allmydata/mutable/retrieve.py 265
7747-        assert len(self.remaining_sharemap) >= k
7748-        # we start with the lowest shnums we have available, since FEC is
7749-        # faster if we're using "primary shares"
7750-        self.active_shnums = set(sorted(self.remaining_sharemap.keys())[:k])
7751-        for shnum in self.active_shnums:
7752-            # we use an arbitrary peer who has the share. If shares are
7753-            # doubled up (more than one share per peer), we could make this
7754-            # run faster by spreading the load among multiple peers. But the
7755-            # algorithm to do that is more complicated than I want to write
7756-            # right now, and a well-provisioned grid shouldn't have multiple
7757-            # shares per peer.
7758-            peerid = list(self.remaining_sharemap[shnum])[0]
7759-            self.get_data(shnum, peerid)
7760 
7761hunk ./src/allmydata/mutable/retrieve.py 266
7762-        # control flow beyond this point: state machine. Receiving responses
7763-        # from queries is the input. We might send out more queries, or we
7764-        # might produce a result.
7765 
7766hunk ./src/allmydata/mutable/retrieve.py 267
7767+        # We need one share hash tree for the entire file; its leaves
7768+        # are the roots of the block hash trees for the shares that
7769+        # comprise it, and its root is in the verinfo.
7770+        self.share_hash_tree = hashtree.IncompleteHashTree(N)
7771+        self.share_hash_tree.set_hashes({0: root_hash})
7772+
7773+        # This will set up both the segment decoder and the tail segment
7774+        # decoder, as well as a variety of other instance variables that
7775+        # the download process will use.
7776+        self._setup_encoding_parameters()
7777+        assert len(self.remaining_sharemap) >= k
7778+
7779+        self.log("starting download")
7780+        self._paused = False
7781+        self._started_fetching = time.time()
7782+
7783+        self._add_active_peers()
7784+        # The download process beyond this is a state machine.
7785+        # _add_active_peers will select the peers that we want to use
7786+        # for the download, and then attempt to start downloading. After
7787+        # each segment, it will check for doneness, reacting to broken
7788+        # peers and corrupt shares as necessary. If it runs out of good
7789+        # peers before downloading all of the segments, _done_deferred
7790+        # will errback.  Otherwise, it will eventually callback with the
7791+        # contents of the mutable file.
7792         return self._done_deferred
7793 
7794hunk ./src/allmydata/mutable/retrieve.py 294
7795-    def get_data(self, shnum, peerid):
7796-        self.log(format="sending sh#%(shnum)d request to [%(peerid)s]",
7797-                 shnum=shnum,
7798-                 peerid=idlib.shortnodeid_b2a(peerid),
7799-                 level=log.NOISY)
7800-        ss = self.servermap.connections[peerid]
7801-        started = time.time()
7802-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
7803+
7804+    def decode(self, blocks_and_salts, segnum):
7805+        """
7806+        I am a helper method that the mutable file update process uses
7807+        as a shortcut to decode and decrypt the segments that it needs
7808+        to fetch in order to perform a file update. I take in a
7809+        collection of blocks and salts, and pick some of those to make a
7810+        segment with. I return the plaintext associated with that
7811+        segment.
7812+        """
7813+        # shnum => block hash tree. Unusued, but setup_encoding_parameters will
7814+        # want to set this.
7815+        # XXX: Make it so that it won't set this if we're just decoding.
7816+        self._block_hash_trees = {}
7817+        self._setup_encoding_parameters()
7818+        # This is the form expected by decode.
7819+        blocks_and_salts = blocks_and_salts.items()
7820+        blocks_and_salts = [(True, [d]) for d in blocks_and_salts]
7821+
7822+        d = self._decode_blocks(blocks_and_salts, segnum)
7823+        d.addCallback(self._decrypt_segment)
7824+        return d
7825+
7826+
7827+    def _setup_encoding_parameters(self):
7828+        """
7829+        I set up the encoding parameters, including k, n, the number
7830+        of segments associated with this file, and the segment decoder.
7831+        """
7832+        (seqnum,
7833+         root_hash,
7834+         IV,
7835+         segsize,
7836+         datalength,
7837+         k,
7838+         n,
7839+         known_prefix,
7840          offsets_tuple) = self.verinfo
7841hunk ./src/allmydata/mutable/retrieve.py 332
7842-        offsets = dict(offsets_tuple)
7843+        self._required_shares = k
7844+        self._total_shares = n
7845+        self._segment_size = segsize
7846+        self._data_length = datalength
7847 
7848hunk ./src/allmydata/mutable/retrieve.py 337
7849-        # we read the checkstring, to make sure that the data we grab is from
7850-        # the right version.
7851-        readv = [ (0, struct.calcsize(SIGNED_PREFIX)) ]
7852+        if not IV:
7853+            self._version = MDMF_VERSION
7854+        else:
7855+            self._version = SDMF_VERSION
7856 
7857hunk ./src/allmydata/mutable/retrieve.py 342
7858-        # We also read the data, and the hashes necessary to validate them
7859-        # (share_hash_chain, block_hash_tree, share_data). We don't read the
7860-        # signature or the pubkey, since that was handled during the
7861-        # servermap phase, and we'll be comparing the share hash chain
7862-        # against the roothash that was validated back then.
7863+        if datalength and segsize:
7864+            self._num_segments = mathutil.div_ceil(datalength, segsize)
7865+            self._tail_data_size = datalength % segsize
7866+        else:
7867+            self._num_segments = 0
7868+            self._tail_data_size = 0
7869 
7870hunk ./src/allmydata/mutable/retrieve.py 349
7871-        readv.append( (offsets['share_hash_chain'],
7872-                       offsets['enc_privkey'] - offsets['share_hash_chain'] ) )
7873+        self._segment_decoder = codec.CRSDecoder()
7874+        self._segment_decoder.set_params(segsize, k, n)
7875 
7876hunk ./src/allmydata/mutable/retrieve.py 352
7877-        # if we need the private key (for repair), we also fetch that
7878-        if self._need_privkey:
7879-            readv.append( (offsets['enc_privkey'],
7880-                           offsets['EOF'] - offsets['enc_privkey']) )
7881+        if  not self._tail_data_size:
7882+            self._tail_data_size = segsize
7883+
7884+        self._tail_segment_size = mathutil.next_multiple(self._tail_data_size,
7885+                                                         self._required_shares)
7886+        if self._tail_segment_size == self._segment_size:
7887+            self._tail_decoder = self._segment_decoder
7888+        else:
7889+            self._tail_decoder = codec.CRSDecoder()
7890+            self._tail_decoder.set_params(self._tail_segment_size,
7891+                                          self._required_shares,
7892+                                          self._total_shares)
7893 
7894hunk ./src/allmydata/mutable/retrieve.py 365
7895-        m = Marker()
7896-        self._outstanding_queries[m] = (peerid, shnum, started)
7897+        self.log("got encoding parameters: "
7898+                 "k: %d "
7899+                 "n: %d "
7900+                 "%d segments of %d bytes each (%d byte tail segment)" % \
7901+                 (k, n, self._num_segments, self._segment_size,
7902+                  self._tail_segment_size))
7903 
7904         # ask the cache first
7905         got_from_cache = False
7906merger 0.0 (
7907hunk ./src/allmydata/mutable/retrieve.py 376
7908-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
7909-                                                            offset, length)
7910+            data = self._node._read_from_cache(self.verinfo, shnum, offset, length)
7911hunk ./src/allmydata/mutable/retrieve.py 372
7912-        # ask the cache first
7913-        got_from_cache = False
7914-        datavs = []
7915-        for (offset, length) in readv:
7916-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
7917-                                                            offset, length)
7918-            if data is not None:
7919-                datavs.append(data)
7920-        if len(datavs) == len(readv):
7921-            self.log("got data from cache")
7922-            got_from_cache = True
7923-            d = fireEventually({shnum: datavs})
7924-            # datavs is a dict mapping shnum to a pair of strings
7925+        for i in xrange(self._total_shares):
7926+            # So we don't have to do this later.
7927+            self._block_hash_trees[i] = hashtree.IncompleteHashTree(self._num_segments)
7928+
7929+        # Our last task is to tell the downloader where to start and
7930+        # where to stop. We use three parameters for that:
7931+        #   - self._start_segment: the segment that we need to start
7932+        #     downloading from.
7933+        #   - self._current_segment: the next segment that we need to
7934+        #     download.
7935+        #   - self._last_segment: The last segment that we were asked to
7936+        #     download.
7937+        #
7938+        #  We say that the download is complete when
7939+        #  self._current_segment > self._last_segment. We use
7940+        #  self._start_segment and self._last_segment to know when to
7941+        #  strip things off of segments, and how much to strip.
7942+        if self._offset:
7943+            self.log("got offset: %d" % self._offset)
7944+            # our start segment is the first segment containing the
7945+            # offset we were given.
7946+            start = mathutil.div_ceil(self._offset,
7947+                                      self._segment_size)
7948+            # this gets us the first segment after self._offset. Then
7949+            # our start segment is the one before it.
7950+            start -= 1
7951+
7952+            assert start < self._num_segments
7953+            self._start_segment = start
7954+            self.log("got start segment: %d" % self._start_segment)
7955)
7956hunk ./src/allmydata/mutable/retrieve.py 386
7957             d = fireEventually({shnum: datavs})
7958             # datavs is a dict mapping shnum to a pair of strings
7959         else:
7960-            d = self._do_read(ss, peerid, self._storage_index, [shnum], readv)
7961-        self.remaining_sharemap.discard(shnum, peerid)
7962+            self._start_segment = 0
7963 
7964hunk ./src/allmydata/mutable/retrieve.py 388
7965-        d.addCallback(self._got_results, m, peerid, started, got_from_cache)
7966-        d.addErrback(self._query_failed, m, peerid)
7967-        # errors that aren't handled by _query_failed (and errors caused by
7968-        # _query_failed) get logged, but we still want to check for doneness.
7969-        def _oops(f):
7970-            self.log(format="problem in _query_failed for sh#%(shnum)d to %(peerid)s",
7971-                     shnum=shnum,
7972-                     peerid=idlib.shortnodeid_b2a(peerid),
7973-                     failure=f,
7974-                     level=log.WEIRD, umid="W0xnQA")
7975-        d.addErrback(_oops)
7976-        d.addBoth(self._check_for_done)
7977-        # any error during _check_for_done means the download fails. If the
7978-        # download is successful, _check_for_done will fire _done by itself.
7979-        d.addErrback(self._done)
7980-        d.addErrback(log.err)
7981-        return d # purely for testing convenience
7982 
7983hunk ./src/allmydata/mutable/retrieve.py 389
7984-    def _do_read(self, ss, peerid, storage_index, shnums, readv):
7985-        # isolate the callRemote to a separate method, so tests can subclass
7986-        # Publish and override it
7987-        d = ss.callRemote("slot_readv", storage_index, shnums, readv)
7988-        return d
7989+        if self._read_length:
7990+            # our end segment is the last segment containing part of the
7991+            # segment that we were asked to read.
7992+            self.log("got read length %d" % self._read_length)
7993+            end_data = self._offset + self._read_length
7994+            end = mathutil.div_ceil(end_data,
7995+                                    self._segment_size)
7996+            end -= 1
7997+            assert end < self._num_segments
7998+            self._last_segment = end
7999+            self.log("got end segment: %d" % self._last_segment)
8000+        else:
8001+            self._last_segment = self._num_segments - 1
8002 
8003hunk ./src/allmydata/mutable/retrieve.py 403
8004-    def remove_peer(self, peerid):
8005-        for shnum in list(self.remaining_sharemap.keys()):
8006-            self.remaining_sharemap.discard(shnum, peerid)
8007+        self._current_segment = self._start_segment
8008 
8009hunk ./src/allmydata/mutable/retrieve.py 405
8010-    def _got_results(self, datavs, marker, peerid, started, got_from_cache):
8011-        now = time.time()
8012-        elapsed = now - started
8013-        if not got_from_cache:
8014-            self._status.add_fetch_timing(peerid, elapsed)
8015-        self.log(format="got results (%(shares)d shares) from [%(peerid)s]",
8016-                 shares=len(datavs),
8017-                 peerid=idlib.shortnodeid_b2a(peerid),
8018-                 level=log.NOISY)
8019-        self._outstanding_queries.pop(marker, None)
8020-        if not self._running:
8021-            return
8022+    def _add_active_peers(self):
8023+        """
8024+        I populate self._active_readers with enough active readers to
8025+        retrieve the contents of this mutable file. I am called before
8026+        downloading starts, and (eventually) after each validation
8027+        error, connection error, or other problem in the download.
8028+        """
8029+        # TODO: It would be cool to investigate other heuristics for
8030+        # reader selection. For instance, the cost (in time the user
8031+        # spends waiting for their file) of selecting a really slow peer
8032+        # that happens to have a primary share is probably more than
8033+        # selecting a really fast peer that doesn't have a primary
8034+        # share. Maybe the servermap could be extended to provide this
8035+        # information; it could keep track of latency information while
8036+        # it gathers more important data, and then this routine could
8037+        # use that to select active readers.
8038+        #
8039+        # (these and other questions would be easier to answer with a
8040+        #  robust, configurable tahoe-lafs simulator, which modeled node
8041+        #  failures, differences in node speed, and other characteristics
8042+        #  that we expect storage servers to have.  You could have
8043+        #  presets for really stable grids (like allmydata.com),
8044+        #  friendnets, make it easy to configure your own settings, and
8045+        #  then simulate the effect of big changes on these use cases
8046+        #  instead of just reasoning about what the effect might be. Out
8047+        #  of scope for MDMF, though.)
8048 
8049hunk ./src/allmydata/mutable/retrieve.py 432
8050-        # note that we only ask for a single share per query, so we only
8051-        # expect a single share back. On the other hand, we use the extra
8052-        # shares if we get them.. seems better than an assert().
8053+        # We need at least self._required_shares readers to download a
8054+        # segment.
8055+        if self._verify:
8056+            needed = self._total_shares
8057+        else:
8058+            needed = self._required_shares - len(self._active_readers)
8059+        # XXX: Why don't format= log messages work here?
8060+        self.log("adding %d peers to the active peers list" % needed)
8061 
8062hunk ./src/allmydata/mutable/retrieve.py 441
8063-        for shnum,datav in datavs.items():
8064-            (prefix, hash_and_data) = datav[:2]
8065-            try:
8066-                self._got_results_one_share(shnum, peerid,
8067-                                            prefix, hash_and_data)
8068-            except CorruptShareError, e:
8069-                # log it and give the other shares a chance to be processed
8070-                f = failure.Failure()
8071-                self.log(format="bad share: %(f_value)s",
8072-                         f_value=str(f.value), failure=f,
8073-                         level=log.WEIRD, umid="7fzWZw")
8074-                self.notify_server_corruption(peerid, shnum, str(e))
8075-                self.remove_peer(peerid)
8076-                self.servermap.mark_bad_share(peerid, shnum, prefix)
8077-                self._bad_shares.add( (peerid, shnum) )
8078-                self._status.problems[peerid] = f
8079-                self._last_failure = f
8080-                pass
8081-            if self._need_privkey and len(datav) > 2:
8082-                lp = None
8083-                self._try_to_validate_privkey(datav[2], peerid, shnum, lp)
8084-        # all done!
8085+        # We favor lower numbered shares, since FEC is faster with
8086+        # primary shares than with other shares, and lower-numbered
8087+        # shares are more likely to be primary than higher numbered
8088+        # shares.
8089+        active_shnums = set(sorted(self.remaining_sharemap.keys()))
8090+        # We shouldn't consider adding shares that we already have; this
8091+        # will cause problems later.
8092+        active_shnums -= set([reader.shnum for reader in self._active_readers])
8093+        active_shnums = list(active_shnums)[:needed]
8094+        if len(active_shnums) < needed and not self._verify:
8095+            # We don't have enough readers to retrieve the file; fail.
8096+            return self._failed()
8097 
8098hunk ./src/allmydata/mutable/retrieve.py 454
8099-    def notify_server_corruption(self, peerid, shnum, reason):
8100-        ss = self.servermap.connections[peerid]
8101-        ss.callRemoteOnly("advise_corrupt_share",
8102-                          "mutable", self._storage_index, shnum, reason)
8103+        for shnum in active_shnums:
8104+            self._active_readers.append(self.readers[shnum])
8105+            self.log("added reader for share %d" % shnum)
8106+        assert len(self._active_readers) >= self._required_shares
8107+        # Conceptually, this is part of the _add_active_peers step. It
8108+        # validates the prefixes of newly added readers to make sure
8109+        # that they match what we are expecting for self.verinfo. If
8110+        # validation is successful, _validate_active_prefixes will call
8111+        # _download_current_segment for us. If validation is
8112+        # unsuccessful, then _validate_prefixes will remove the peer and
8113+        # call _add_active_peers again, where we will attempt to rectify
8114+        # the problem by choosing another peer.
8115+        return self._validate_active_prefixes()
8116 
8117hunk ./src/allmydata/mutable/retrieve.py 468
8118-    def _got_results_one_share(self, shnum, peerid,
8119-                               got_prefix, got_hash_and_data):
8120-        self.log("_got_results: got shnum #%d from peerid %s"
8121-                 % (shnum, idlib.shortnodeid_b2a(peerid)))
8122-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8123-         offsets_tuple) = self.verinfo
8124-        assert len(got_prefix) == len(prefix), (len(got_prefix), len(prefix))
8125-        if got_prefix != prefix:
8126-            msg = "someone wrote to the data since we read the servermap: prefix changed"
8127-            raise UncoordinatedWriteError(msg)
8128-        (share_hash_chain, block_hash_tree,
8129-         share_data) = unpack_share_data(self.verinfo, got_hash_and_data)
8130 
8131hunk ./src/allmydata/mutable/retrieve.py 469
8132-        assert isinstance(share_data, str)
8133-        # build the block hash tree. SDMF has only one leaf.
8134-        leaves = [hashutil.block_hash(share_data)]
8135-        t = hashtree.HashTree(leaves)
8136-        if list(t) != block_hash_tree:
8137-            raise CorruptShareError(peerid, shnum, "block hash tree failure")
8138-        share_hash_leaf = t[0]
8139-        t2 = hashtree.IncompleteHashTree(N)
8140-        # root_hash was checked by the signature
8141-        t2.set_hashes({0: root_hash})
8142-        try:
8143-            t2.set_hashes(hashes=share_hash_chain,
8144-                          leaves={shnum: share_hash_leaf})
8145-        except (hashtree.BadHashError, hashtree.NotEnoughHashesError,
8146-                IndexError), e:
8147-            msg = "corrupt hashes: %s" % (e,)
8148-            raise CorruptShareError(peerid, shnum, msg)
8149-        self.log(" data valid! len=%d" % len(share_data))
8150-        # each query comes down to this: placing validated share data into
8151-        # self.shares
8152-        self.shares[shnum] = share_data
8153+    def _validate_active_prefixes(self):
8154+        """
8155+        I check to make sure that the prefixes on the peers that I am
8156+        currently reading from match the prefix that we want to see, as
8157+        said in self.verinfo.
8158 
8159hunk ./src/allmydata/mutable/retrieve.py 475
8160-    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
8161+        If I find that all of the active peers have acceptable prefixes,
8162+        I pass control to _download_current_segment, which will use
8163+        those peers to do cool things. If I find that some of the active
8164+        peers have unacceptable prefixes, I will remove them from active
8165+        peers (and from further consideration) and call
8166+        _add_active_peers to attempt to rectify the situation. I keep
8167+        track of which peers I have already validated so that I don't
8168+        need to do so again.
8169+        """
8170+        assert self._active_readers, "No more active readers"
8171 
8172hunk ./src/allmydata/mutable/retrieve.py 486
8173-        alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
8174-        alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
8175-        if alleged_writekey != self._node.get_writekey():
8176-            self.log("invalid privkey from %s shnum %d" %
8177-                     (idlib.nodeid_b2a(peerid)[:8], shnum),
8178-                     parent=lp, level=log.WEIRD, umid="YIw4tA")
8179-            return
8180+        ds = []
8181+        new_readers = set(self._active_readers) - self._validated_readers
8182+        self.log('validating %d newly-added active readers' % len(new_readers))
8183 
8184hunk ./src/allmydata/mutable/retrieve.py 490
8185-        # it's good
8186-        self.log("got valid privkey from shnum %d on peerid %s" %
8187-                 (shnum, idlib.shortnodeid_b2a(peerid)),
8188-                 parent=lp)
8189-        privkey = rsa.create_signing_key_from_string(alleged_privkey_s)
8190-        self._node._populate_encprivkey(enc_privkey)
8191-        self._node._populate_privkey(privkey)
8192-        self._need_privkey = False
8193+        for reader in new_readers:
8194+            # We force a remote read here -- otherwise, we are relying
8195+            # on cached data that we already verified as valid, and we
8196+            # won't detect an uncoordinated write that has occurred
8197+            # since the last servermap update.
8198+            d = reader.get_prefix(force_remote=True)
8199+            d.addCallback(self._try_to_validate_prefix, reader)
8200+            ds.append(d)
8201+        dl = defer.DeferredList(ds, consumeErrors=True)
8202+        def _check_results(results):
8203+            # Each result in results will be of the form (success, msg).
8204+            # We don't care about msg, but success will tell us whether
8205+            # or not the checkstring validated. If it didn't, we need to
8206+            # remove the offending (peer,share) from our active readers,
8207+            # and ensure that active readers is again populated.
8208+            bad_readers = []
8209+            for i, result in enumerate(results):
8210+                if not result[0]:
8211+                    reader = self._active_readers[i]
8212+                    f = result[1]
8213+                    assert isinstance(f, failure.Failure)
8214 
8215hunk ./src/allmydata/mutable/retrieve.py 512
8216-    def _query_failed(self, f, marker, peerid):
8217-        self.log(format="query to [%(peerid)s] failed",
8218-                 peerid=idlib.shortnodeid_b2a(peerid),
8219-                 level=log.NOISY)
8220-        self._status.problems[peerid] = f
8221-        self._outstanding_queries.pop(marker, None)
8222-        if not self._running:
8223-            return
8224-        self._last_failure = f
8225-        self.remove_peer(peerid)
8226-        level = log.WEIRD
8227-        if f.check(DeadReferenceError):
8228-            level = log.UNUSUAL
8229-        self.log(format="error during query: %(f_value)s",
8230-                 f_value=str(f.value), failure=f, level=level, umid="gOJB5g")
8231+                    self.log("The reader %s failed to "
8232+                             "properly validate: %s" % \
8233+                             (reader, str(f.value)))
8234+                    bad_readers.append((reader, f))
8235+                else:
8236+                    reader = self._active_readers[i]
8237+                    self.log("the reader %s checks out, so we'll use it" % \
8238+                             reader)
8239+                    self._validated_readers.add(reader)
8240+                    # Each time we validate a reader, we check to see if
8241+                    # we need the private key. If we do, we politely ask
8242+                    # for it and then continue computing. If we find
8243+                    # that we haven't gotten it at the end of
8244+                    # segment decoding, then we'll take more drastic
8245+                    # measures.
8246+                    if self._need_privkey and not self._node.is_readonly():
8247+                        d = reader.get_encprivkey()
8248+                        d.addCallback(self._try_to_validate_privkey, reader)
8249+            if bad_readers:
8250+                # We do them all at once, or else we screw up list indexing.
8251+                for (reader, f) in bad_readers:
8252+                    self._mark_bad_share(reader, f)
8253+                if self._verify:
8254+                    if len(self._active_readers) >= self._required_shares:
8255+                        return self._download_current_segment()
8256+                    else:
8257+                        return self._failed()
8258+                else:
8259+                    return self._add_active_peers()
8260+            else:
8261+                return self._download_current_segment()
8262+            # The next step will assert that it has enough active
8263+            # readers to fetch shares; we just need to remove it.
8264+        dl.addCallback(_check_results)
8265+        return dl
8266 
8267hunk ./src/allmydata/mutable/retrieve.py 548
8268-    def _check_for_done(self, res):
8269-        # exit paths:
8270-        #  return : keep waiting, no new queries
8271-        #  return self._send_more_queries(outstanding) : send some more queries
8272-        #  fire self._done(plaintext) : download successful
8273-        #  raise exception : download fails
8274 
8275hunk ./src/allmydata/mutable/retrieve.py 549
8276-        self.log(format="_check_for_done: running=%(running)s, decoding=%(decoding)s",
8277-                 running=self._running, decoding=self._decoding,
8278-                 level=log.NOISY)
8279-        if not self._running:
8280-            return
8281-        if self._decoding:
8282-            return
8283-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8284+    def _try_to_validate_prefix(self, prefix, reader):
8285+        """
8286+        I check that the prefix returned by a candidate server for
8287+        retrieval matches the prefix that the servermap knows about
8288+        (and, hence, the prefix that was validated earlier). If it does,
8289+        I return True, which means that I approve of the use of the
8290+        candidate server for segment retrieval. If it doesn't, I return
8291+        False, which means that another server must be chosen.
8292+        """
8293+        (seqnum,
8294+         root_hash,
8295+         IV,
8296+         segsize,
8297+         datalength,
8298+         k,
8299+         N,
8300+         known_prefix,
8301          offsets_tuple) = self.verinfo
8302hunk ./src/allmydata/mutable/retrieve.py 567
8303+        if known_prefix != prefix:
8304+            self.log("prefix from share %d doesn't match" % reader.shnum)
8305+            raise UncoordinatedWriteError("Mismatched prefix -- this could "
8306+                                          "indicate an uncoordinated write")
8307+        # Otherwise, we're okay -- no issues.
8308 
8309hunk ./src/allmydata/mutable/retrieve.py 573
8310-        if len(self.shares) < k:
8311-            # we don't have enough shares yet
8312-            return self._maybe_send_more_queries(k)
8313-        if self._need_privkey:
8314-            # we got k shares, but none of them had a valid privkey. TODO:
8315-            # look further. Adding code to do this is a bit complicated, and
8316-            # I want to avoid that complication, and this should be pretty
8317-            # rare (k shares with bitflips in the enc_privkey but not in the
8318-            # data blocks). If we actually do get here, the subsequent repair
8319-            # will fail for lack of a privkey.
8320-            self.log("got k shares but still need_privkey, bummer",
8321-                     level=log.WEIRD, umid="MdRHPA")
8322 
8323hunk ./src/allmydata/mutable/retrieve.py 574
8324-        # we have enough to finish. All the shares have had their hashes
8325-        # checked, so if something fails at this point, we don't know how
8326-        # to fix it, so the download will fail.
8327+    def _remove_reader(self, reader):
8328+        """
8329+        At various points, we will wish to remove a peer from
8330+        consideration and/or use. These include, but are not necessarily
8331+        limited to:
8332 
8333hunk ./src/allmydata/mutable/retrieve.py 580
8334-        self._decoding = True # avoid reentrancy
8335-        self._status.set_status("decoding")
8336-        now = time.time()
8337-        elapsed = now - self._started
8338-        self._status.timings["fetch"] = elapsed
8339+            - A connection error.
8340+            - A mismatched prefix (that is, a prefix that does not match
8341+              our conception of the version information string).
8342+            - A failing block hash, salt hash, or share hash, which can
8343+              indicate disk failure/bit flips, or network trouble.
8344 
8345hunk ./src/allmydata/mutable/retrieve.py 586
8346-        d = defer.maybeDeferred(self._decode)
8347-        d.addCallback(self._decrypt, IV, self._node.get_readkey())
8348-        d.addBoth(self._done)
8349-        return d # purely for test convenience
8350+        This method will do that. I will make sure that the
8351+        (shnum,reader) combination represented by my reader argument is
8352+        not used for anything else during this download. I will not
8353+        advise the reader of any corruption, something that my callers
8354+        may wish to do on their own.
8355+        """
8356+        # TODO: When you're done writing this, see if this is ever
8357+        # actually used for something that _mark_bad_share isn't. I have
8358+        # a feeling that they will be used for very similar things, and
8359+        # that having them both here is just going to be an epic amount
8360+        # of code duplication.
8361+        #
8362+        # (well, okay, not epic, but meaningful)
8363+        self.log("removing reader %s" % reader)
8364+        # Remove the reader from _active_readers
8365+        self._active_readers.remove(reader)
8366+        # TODO: self.readers.remove(reader)?
8367+        for shnum in list(self.remaining_sharemap.keys()):
8368+            self.remaining_sharemap.discard(shnum, reader.peerid)
8369 
8370hunk ./src/allmydata/mutable/retrieve.py 606
8371-    def _maybe_send_more_queries(self, k):
8372-        # we don't have enough shares yet. Should we send out more queries?
8373-        # There are some number of queries outstanding, each for a single
8374-        # share. If we can generate 'needed_shares' additional queries, we do
8375-        # so. If we can't, then we know this file is a goner, and we raise
8376-        # NotEnoughSharesError.
8377-        self.log(format=("_maybe_send_more_queries, have=%(have)d, k=%(k)d, "
8378-                         "outstanding=%(outstanding)d"),
8379-                 have=len(self.shares), k=k,
8380-                 outstanding=len(self._outstanding_queries),
8381-                 level=log.NOISY)
8382 
8383hunk ./src/allmydata/mutable/retrieve.py 607
8384-        remaining_shares = k - len(self.shares)
8385-        needed = remaining_shares - len(self._outstanding_queries)
8386-        if not needed:
8387-            # we have enough queries in flight already
8388+    def _mark_bad_share(self, reader, f):
8389+        """
8390+        I mark the (peerid, shnum) encapsulated by my reader argument as
8391+        a bad share, which means that it will not be used anywhere else.
8392 
8393hunk ./src/allmydata/mutable/retrieve.py 612
8394-            # TODO: but if they've been in flight for a long time, and we
8395-            # have reason to believe that new queries might respond faster
8396-            # (i.e. we've seen other queries come back faster, then consider
8397-            # sending out new queries. This could help with peers which have
8398-            # silently gone away since the servermap was updated, for which
8399-            # we're still waiting for the 15-minute TCP disconnect to happen.
8400-            self.log("enough queries are in flight, no more are needed",
8401-                     level=log.NOISY)
8402-            return
8403+        There are several reasons to want to mark something as a bad
8404+        share. These include:
8405+
8406+            - A connection error to the peer.
8407+            - A mismatched prefix (that is, a prefix that does not match
8408+              our local conception of the version information string).
8409+            - A failing block hash, salt hash, share hash, or other
8410+              integrity check.
8411 
8412hunk ./src/allmydata/mutable/retrieve.py 621
8413-        outstanding_shnums = set([shnum
8414-                                  for (peerid, shnum, started)
8415-                                  in self._outstanding_queries.values()])
8416-        # prefer low-numbered shares, they are more likely to be primary
8417-        available_shnums = sorted(self.remaining_sharemap.keys())
8418-        for shnum in available_shnums:
8419-            if shnum in outstanding_shnums:
8420-                # skip ones that are already in transit
8421-                continue
8422-            if shnum not in self.remaining_sharemap:
8423-                # no servers for that shnum. note that DictOfSets removes
8424-                # empty sets from the dict for us.
8425-                continue
8426-            peerid = list(self.remaining_sharemap[shnum])[0]
8427-            # get_data will remove that peerid from the sharemap, and add the
8428-            # query to self._outstanding_queries
8429-            self._status.set_status("Retrieving More Shares")
8430-            self.get_data(shnum, peerid)
8431-            needed -= 1
8432-            if not needed:
8433+        This method will ensure that readers that we wish to mark bad
8434+        (for these reasons or other reasons) are not used for the rest
8435+        of the download. Additionally, it will attempt to tell the
8436+        remote peer (with no guarantee of success) that its share is
8437+        corrupt.
8438+        """
8439+        self.log("marking share %d on server %s as bad" % \
8440+                 (reader.shnum, reader))
8441+        prefix = self.verinfo[-2]
8442+        self.servermap.mark_bad_share(reader.peerid,
8443+                                      reader.shnum,
8444+                                      prefix)
8445+        self._remove_reader(reader)
8446+        self._bad_shares.add((reader.peerid, reader.shnum, f))
8447+        self._status.problems[reader.peerid] = f
8448+        self._last_failure = f
8449+        self.notify_server_corruption(reader.peerid, reader.shnum,
8450+                                      str(f.value))
8451+
8452+
8453+    def _download_current_segment(self):
8454+        """
8455+        I download, validate, decode, decrypt, and assemble the segment
8456+        that this Retrieve is currently responsible for downloading.
8457+        """
8458+        assert len(self._active_readers) >= self._required_shares
8459+        if self._current_segment <= self._last_segment:
8460+            d = self._process_segment(self._current_segment)
8461+        else:
8462+            d = defer.succeed(None)
8463+        d.addBoth(self._turn_barrier)
8464+        d.addCallback(self._check_for_done)
8465+        return d
8466+
8467+
8468+    def _turn_barrier(self, result):
8469+        """
8470+        I help the download process avoid the recursion limit issues
8471+        discussed in #237.
8472+        """
8473+        return fireEventually(result)
8474+
8475+
8476+    def _process_segment(self, segnum):
8477+        """
8478+        I download, validate, decode, and decrypt one segment of the
8479+        file that this Retrieve is retrieving. This means coordinating
8480+        the process of getting k blocks of that file, validating them,
8481+        assembling them into one segment with the decoder, and then
8482+        decrypting them.
8483+        """
8484+        self.log("processing segment %d" % segnum)
8485+
8486+        # TODO: The old code uses a marker. Should this code do that
8487+        # too? What did the Marker do?
8488+        assert len(self._active_readers) >= self._required_shares
8489+
8490+        # We need to ask each of our active readers for its block and
8491+        # salt. We will then validate those. If validation is
8492+        # successful, we will assemble the results into plaintext.
8493+        ds = []
8494+        for reader in self._active_readers:
8495+            started = time.time()
8496+            d = reader.get_block_and_salt(segnum, queue=True)
8497+            d2 = self._get_needed_hashes(reader, segnum)
8498+            dl = defer.DeferredList([d, d2], consumeErrors=True)
8499+            dl.addCallback(self._validate_block, segnum, reader, started)
8500+            dl.addErrback(self._validation_or_decoding_failed, [reader])
8501+            ds.append(dl)
8502+            reader.flush()
8503+        dl = defer.DeferredList(ds)
8504+        if self._verify:
8505+            dl.addCallback(lambda ignored: "")
8506+            dl.addCallback(self._set_segment)
8507+        else:
8508+            dl.addCallback(self._maybe_decode_and_decrypt_segment, segnum)
8509+        return dl
8510+
8511+
8512+    def _maybe_decode_and_decrypt_segment(self, blocks_and_salts, segnum):
8513+        """
8514+        I take the results of fetching and validating the blocks from a
8515+        callback chain in another method. If the results are such that
8516+        they tell me that validation and fetching succeeded without
8517+        incident, I will proceed with decoding and decryption.
8518+        Otherwise, I will do nothing.
8519+        """
8520+        self.log("trying to decode and decrypt segment %d" % segnum)
8521+        failures = False
8522+        for block_and_salt in blocks_and_salts:
8523+            if not block_and_salt[0] or block_and_salt[1] == None:
8524+                self.log("some validation operations failed; not proceeding")
8525+                failures = True
8526                 break
8527hunk ./src/allmydata/mutable/retrieve.py 715
8528+        if not failures:
8529+            self.log("everything looks ok, building segment %d" % segnum)
8530+            d = self._decode_blocks(blocks_and_salts, segnum)
8531+            d.addCallback(self._decrypt_segment)
8532+            d.addErrback(self._validation_or_decoding_failed,
8533+                         self._active_readers)
8534+            # check to see whether we've been paused before writing
8535+            # anything.
8536+            d.addCallback(self._check_for_paused)
8537+            d.addCallback(self._set_segment)
8538+            return d
8539+        else:
8540+            return defer.succeed(None)
8541+
8542+
8543+    def _set_segment(self, segment):
8544+        """
8545+        Given a plaintext segment, I register that segment with the
8546+        target that is handling the file download.
8547+        """
8548+        self.log("got plaintext for segment %d" % self._current_segment)
8549+        if self._current_segment == self._start_segment:
8550+            # We're on the first segment. It's possible that we want
8551+            # only some part of the end of this segment, and that we
8552+            # just downloaded the whole thing to get that part. If so,
8553+            # we need to account for that and give the reader just the
8554+            # data that they want.
8555+            n = self._offset % self._segment_size
8556+            self.log("stripping %d bytes off of the first segment" % n)
8557+            self.log("original segment length: %d" % len(segment))
8558+            segment = segment[n:]
8559+            self.log("new segment length: %d" % len(segment))
8560+
8561+        if self._current_segment == self._last_segment and self._read_length is not None:
8562+            # We're on the last segment. It's possible that we only want
8563+            # part of the beginning of this segment, and that we
8564+            # downloaded the whole thing anyway. Make sure to give the
8565+            # caller only the portion of the segment that they want to
8566+            # receive.
8567+            extra = self._read_length
8568+            if self._start_segment != self._last_segment:
8569+                extra -= self._segment_size - \
8570+                            (self._offset % self._segment_size)
8571+            extra %= self._segment_size
8572+            self.log("original segment length: %d" % len(segment))
8573+            segment = segment[:extra]
8574+            self.log("new segment length: %d" % len(segment))
8575+            self.log("only taking %d bytes of the last segment" % extra)
8576+
8577+        if not self._verify:
8578+            self._consumer.write(segment)
8579+        else:
8580+            # we don't care about the plaintext if we are doing a verify.
8581+            segment = None
8582+        self._current_segment += 1
8583 
8584hunk ./src/allmydata/mutable/retrieve.py 771
8585-        # at this point, we have as many outstanding queries as we can. If
8586-        # needed!=0 then we might not have enough to recover the file.
8587-        if needed:
8588-            format = ("ran out of peers: "
8589-                      "have %(have)d shares (k=%(k)d), "
8590-                      "%(outstanding)d queries in flight, "
8591-                      "need %(need)d more, "
8592-                      "found %(bad)d bad shares")
8593-            args = {"have": len(self.shares),
8594-                    "k": k,
8595-                    "outstanding": len(self._outstanding_queries),
8596-                    "need": needed,
8597-                    "bad": len(self._bad_shares),
8598-                    }
8599-            self.log(format=format,
8600-                     level=log.WEIRD, umid="ezTfjw", **args)
8601-            err = NotEnoughSharesError("%s, last failure: %s" %
8602-                                      (format % args, self._last_failure))
8603-            if self._bad_shares:
8604-                self.log("We found some bad shares this pass. You should "
8605-                         "update the servermap and try again to check "
8606-                         "more peers",
8607-                         level=log.WEIRD, umid="EFkOlA")
8608-                err.servermap = self.servermap
8609-            raise err
8610 
8611hunk ./src/allmydata/mutable/retrieve.py 772
8612+    def _validation_or_decoding_failed(self, f, readers):
8613+        """
8614+        I am called when a block or a salt fails to correctly validate, or when
8615+        the decryption or decoding operation fails for some reason.  I react to
8616+        this failure by notifying the remote server of corruption, and then
8617+        removing the remote peer from further activity.
8618+        """
8619+        assert isinstance(readers, list)
8620+        bad_shnums = [reader.shnum for reader in readers]
8621+
8622+        self.log("validation or decoding failed on share(s) %s, peer(s) %s "
8623+                 ", segment %d: %s" % \
8624+                 (bad_shnums, readers, self._current_segment, str(f)))
8625+        for reader in readers:
8626+            self._mark_bad_share(reader, f)
8627         return
8628 
8629hunk ./src/allmydata/mutable/retrieve.py 789
8630-    def _decode(self):
8631-        started = time.time()
8632-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8633-         offsets_tuple) = self.verinfo
8634 
8635hunk ./src/allmydata/mutable/retrieve.py 790
8636-        # shares_dict is a dict mapping shnum to share data, but the codec
8637-        # wants two lists.
8638-        shareids = []; shares = []
8639-        for shareid, share in self.shares.items():
8640+    def _validate_block(self, results, segnum, reader, started):
8641+        """
8642+        I validate a block from one share on a remote server.
8643+        """
8644+        # Grab the part of the block hash tree that is necessary to
8645+        # validate this block, then generate the block hash root.
8646+        self.log("validating share %d for segment %d" % (reader.shnum,
8647+                                                             segnum))
8648+        self._status.add_fetch_timing(reader.peerid, started)
8649+        self._status.set_status("Valdiating blocks for segment %d" % segnum)
8650+        # Did we fail to fetch either of the things that we were
8651+        # supposed to? Fail if so.
8652+        if not results[0][0] and results[1][0]:
8653+            # handled by the errback handler.
8654+
8655+            # These all get batched into one query, so the resulting
8656+            # failure should be the same for all of them, so we can just
8657+            # use the first one.
8658+            assert isinstance(results[0][1], failure.Failure)
8659+
8660+            f = results[0][1]
8661+            raise CorruptShareError(reader.peerid,
8662+                                    reader.shnum,
8663+                                    "Connection error: %s" % str(f))
8664+
8665+        block_and_salt, block_and_sharehashes = results
8666+        block, salt = block_and_salt[1]
8667+        blockhashes, sharehashes = block_and_sharehashes[1]
8668+
8669+        blockhashes = dict(enumerate(blockhashes[1]))
8670+        self.log("the reader gave me the following blockhashes: %s" % \
8671+                 blockhashes.keys())
8672+        self.log("the reader gave me the following sharehashes: %s" % \
8673+                 sharehashes[1].keys())
8674+        bht = self._block_hash_trees[reader.shnum]
8675+
8676+        if bht.needed_hashes(segnum, include_leaf=True):
8677+            try:
8678+                bht.set_hashes(blockhashes)
8679+            except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8680+                    IndexError), e:
8681+                raise CorruptShareError(reader.peerid,
8682+                                        reader.shnum,
8683+                                        "block hash tree failure: %s" % e)
8684+
8685+        if self._version == MDMF_VERSION:
8686+            blockhash = hashutil.block_hash(salt + block)
8687+        else:
8688+            blockhash = hashutil.block_hash(block)
8689+        # If this works without an error, then validation is
8690+        # successful.
8691+        try:
8692+           bht.set_hashes(leaves={segnum: blockhash})
8693+        except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8694+                IndexError), e:
8695+            raise CorruptShareError(reader.peerid,
8696+                                    reader.shnum,
8697+                                    "block hash tree failure: %s" % e)
8698+
8699+        # Reaching this point means that we know that this segment
8700+        # is correct. Now we need to check to see whether the share
8701+        # hash chain is also correct.
8702+        # SDMF wrote share hash chains that didn't contain the
8703+        # leaves, which would be produced from the block hash tree.
8704+        # So we need to validate the block hash tree first. If
8705+        # successful, then bht[0] will contain the root for the
8706+        # shnum, which will be a leaf in the share hash tree, which
8707+        # will allow us to validate the rest of the tree.
8708+        if self.share_hash_tree.needed_hashes(reader.shnum,
8709+                                              include_leaf=True) or \
8710+                                              self._verify:
8711+            try:
8712+                self.share_hash_tree.set_hashes(hashes=sharehashes[1],
8713+                                            leaves={reader.shnum: bht[0]})
8714+            except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8715+                    IndexError), e:
8716+                raise CorruptShareError(reader.peerid,
8717+                                        reader.shnum,
8718+                                        "corrupt hashes: %s" % e)
8719+
8720+        self.log('share %d is valid for segment %d' % (reader.shnum,
8721+                                                       segnum))
8722+        return {reader.shnum: (block, salt)}
8723+
8724+
8725+    def _get_needed_hashes(self, reader, segnum):
8726+        """
8727+        I get the hashes needed to validate segnum from the reader, then return
8728+        to my caller when this is done.
8729+        """
8730+        bht = self._block_hash_trees[reader.shnum]
8731+        needed = bht.needed_hashes(segnum, include_leaf=True)
8732+        # The root of the block hash tree is also a leaf in the share
8733+        # hash tree. So we don't need to fetch it from the remote
8734+        # server. In the case of files with one segment, this means that
8735+        # we won't fetch any block hash tree from the remote server,
8736+        # since the hash of each share of the file is the entire block
8737+        # hash tree, and is a leaf in the share hash tree. This is fine,
8738+        # since any share corruption will be detected in the share hash
8739+        # tree.
8740+        #needed.discard(0)
8741+        self.log("getting blockhashes for segment %d, share %d: %s" % \
8742+                 (segnum, reader.shnum, str(needed)))
8743+        d1 = reader.get_blockhashes(needed, queue=True, force_remote=True)
8744+        if self.share_hash_tree.needed_hashes(reader.shnum):
8745+            need = self.share_hash_tree.needed_hashes(reader.shnum)
8746+            self.log("also need sharehashes for share %d: %s" % (reader.shnum,
8747+                                                                 str(need)))
8748+            d2 = reader.get_sharehashes(need, queue=True, force_remote=True)
8749+        else:
8750+            d2 = defer.succeed({}) # the logic in the next method
8751+                                   # expects a dict
8752+        dl = defer.DeferredList([d1, d2], consumeErrors=True)
8753+        return dl
8754+
8755+
8756+    def _decode_blocks(self, blocks_and_salts, segnum):
8757+        """
8758+        I take a list of k blocks and salts, and decode that into a
8759+        single encrypted segment.
8760+        """
8761+        d = {}
8762+        # We want to merge our dictionaries to the form
8763+        # {shnum: blocks_and_salts}
8764+        #
8765+        # The dictionaries come from validate block that way, so we just
8766+        # need to merge them.
8767+        for block_and_salt in blocks_and_salts:
8768+            d.update(block_and_salt[1])
8769+
8770+        # All of these blocks should have the same salt; in SDMF, it is
8771+        # the file-wide IV, while in MDMF it is the per-segment salt. In
8772+        # either case, we just need to get one of them and use it.
8773+        #
8774+        # d.items()[0] is like (shnum, (block, salt))
8775+        # d.items()[0][1] is like (block, salt)
8776+        # d.items()[0][1][1] is the salt.
8777+        salt = d.items()[0][1][1]
8778+        # Next, extract just the blocks from the dict. We'll use the
8779+        # salt in the next step.
8780+        share_and_shareids = [(k, v[0]) for k, v in d.items()]
8781+        d2 = dict(share_and_shareids)
8782+        shareids = []
8783+        shares = []
8784+        for shareid, share in d2.items():
8785             shareids.append(shareid)
8786             shares.append(share)
8787 
8788hunk ./src/allmydata/mutable/retrieve.py 938
8789-        assert len(shareids) >= k, len(shareids)
8790+        self._status.set_status("Decoding")
8791+        started = time.time()
8792+        assert len(shareids) >= self._required_shares, len(shareids)
8793         # zfec really doesn't want extra shares
8794hunk ./src/allmydata/mutable/retrieve.py 942
8795-        shareids = shareids[:k]
8796-        shares = shares[:k]
8797-
8798-        fec = codec.CRSDecoder()
8799-        fec.set_params(segsize, k, N)
8800-
8801-        self.log("params %s, we have %d shares" % ((segsize, k, N), len(shares)))
8802-        self.log("about to decode, shareids=%s" % (shareids,))
8803-        d = defer.maybeDeferred(fec.decode, shares, shareids)
8804-        def _done(buffers):
8805-            self._status.timings["decode"] = time.time() - started
8806-            self.log(" decode done, %d buffers" % len(buffers))
8807+        shareids = shareids[:self._required_shares]
8808+        shares = shares[:self._required_shares]
8809+        self.log("decoding segment %d" % segnum)
8810+        if segnum == self._num_segments - 1:
8811+            d = defer.maybeDeferred(self._tail_decoder.decode, shares, shareids)
8812+        else:
8813+            d = defer.maybeDeferred(self._segment_decoder.decode, shares, shareids)
8814+        def _process(buffers):
8815             segment = "".join(buffers)
8816hunk ./src/allmydata/mutable/retrieve.py 951
8817+            self.log(format="now decoding segment %(segnum)s of %(numsegs)s",
8818+                     segnum=segnum,
8819+                     numsegs=self._num_segments,
8820+                     level=log.NOISY)
8821             self.log(" joined length %d, datalength %d" %
8822hunk ./src/allmydata/mutable/retrieve.py 956
8823-                     (len(segment), datalength))
8824-            segment = segment[:datalength]
8825+                     (len(segment), self._data_length))
8826+            if segnum == self._num_segments - 1:
8827+                size_to_use = self._tail_data_size
8828+            else:
8829+                size_to_use = self._segment_size
8830+            segment = segment[:size_to_use]
8831             self.log(" segment len=%d" % len(segment))
8832hunk ./src/allmydata/mutable/retrieve.py 963
8833-            return segment
8834-        def _err(f):
8835-            self.log(" decode failed: %s" % f)
8836-            return f
8837-        d.addCallback(_done)
8838-        d.addErrback(_err)
8839+            self._status.timings.setdefault("decode", 0)
8840+            self._status.timings['decode'] = time.time() - started
8841+            return segment, salt
8842+        d.addCallback(_process)
8843         return d
8844 
8845hunk ./src/allmydata/mutable/retrieve.py 969
8846-    def _decrypt(self, crypttext, IV, readkey):
8847+
8848+    def _decrypt_segment(self, segment_and_salt):
8849+        """
8850+        I take a single segment and its salt, and decrypt it. I return
8851+        the plaintext of the segment that is in my argument.
8852+        """
8853+        segment, salt = segment_and_salt
8854         self._status.set_status("decrypting")
8855hunk ./src/allmydata/mutable/retrieve.py 977
8856+        self.log("decrypting segment %d" % self._current_segment)
8857         started = time.time()
8858hunk ./src/allmydata/mutable/retrieve.py 979
8859-        key = hashutil.ssk_readkey_data_hash(IV, readkey)
8860+        key = hashutil.ssk_readkey_data_hash(salt, self._node.get_readkey())
8861         decryptor = AES(key)
8862hunk ./src/allmydata/mutable/retrieve.py 981
8863-        plaintext = decryptor.process(crypttext)
8864-        self._status.timings["decrypt"] = time.time() - started
8865+        plaintext = decryptor.process(segment)
8866+        self._status.timings.setdefault("decrypt", 0)
8867+        self._status.timings['decrypt'] = time.time() - started
8868         return plaintext
8869 
8870hunk ./src/allmydata/mutable/retrieve.py 986
8871-    def _done(self, res):
8872-        if not self._running:
8873+
8874+    def notify_server_corruption(self, peerid, shnum, reason):
8875+        ss = self.servermap.connections[peerid]
8876+        ss.callRemoteOnly("advise_corrupt_share",
8877+                          "mutable", self._storage_index, shnum, reason)
8878+
8879+
8880+    def _try_to_validate_privkey(self, enc_privkey, reader):
8881+        alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
8882+        alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
8883+        if alleged_writekey != self._node.get_writekey():
8884+            self.log("invalid privkey from %s shnum %d" %
8885+                     (reader, reader.shnum),
8886+                     level=log.WEIRD, umid="YIw4tA")
8887+            if self._verify:
8888+                self.servermap.mark_bad_share(reader.peerid, reader.shnum,
8889+                                              self.verinfo[-2])
8890+                e = CorruptShareError(reader.peerid,
8891+                                      reader.shnum,
8892+                                      "invalid privkey")
8893+                f = failure.Failure(e)
8894+                self._bad_shares.add((reader.peerid, reader.shnum, f))
8895             return
8896hunk ./src/allmydata/mutable/retrieve.py 1009
8897+
8898+        # it's good
8899+        self.log("got valid privkey from shnum %d on reader %s" %
8900+                 (reader.shnum, reader))
8901+        privkey = rsa.create_signing_key_from_string(alleged_privkey_s)
8902+        self._node._populate_encprivkey(enc_privkey)
8903+        self._node._populate_privkey(privkey)
8904+        self._need_privkey = False
8905+
8906+
8907+    def _check_for_done(self, res):
8908+        """
8909+        I check to see if this Retrieve object has successfully finished
8910+        its work.
8911+
8912+        I can exit in the following ways:
8913+            - If there are no more segments to download, then I exit by
8914+              causing self._done_deferred to fire with the plaintext
8915+              content requested by the caller.
8916+            - If there are still segments to be downloaded, and there
8917+              are enough active readers (readers which have not broken
8918+              and have not given us corrupt data) to continue
8919+              downloading, I send control back to
8920+              _download_current_segment.
8921+            - If there are still segments to be downloaded but there are
8922+              not enough active peers to download them, I ask
8923+              _add_active_peers to add more peers. If it is successful,
8924+              it will call _download_current_segment. If there are not
8925+              enough peers to retrieve the file, then that will cause
8926+              _done_deferred to errback.
8927+        """
8928+        self.log("checking for doneness")
8929+        if self._current_segment > self._last_segment:
8930+            # No more segments to download, we're done.
8931+            self.log("got plaintext, done")
8932+            return self._done()
8933+
8934+        if len(self._active_readers) >= self._required_shares:
8935+            # More segments to download, but we have enough good peers
8936+            # in self._active_readers that we can do that without issue,
8937+            # so go nab the next segment.
8938+            self.log("not done yet: on segment %d of %d" % \
8939+                     (self._current_segment + 1, self._num_segments))
8940+            return self._download_current_segment()
8941+
8942+        self.log("not done yet: on segment %d of %d, need to add peers" % \
8943+                 (self._current_segment + 1, self._num_segments))
8944+        return self._add_active_peers()
8945+
8946+
8947+    def _done(self):
8948+        """
8949+        I am called by _check_for_done when the download process has
8950+        finished successfully. After making some useful logging
8951+        statements, I return the decrypted contents to the owner of this
8952+        Retrieve object through self._done_deferred.
8953+        """
8954         self._running = False
8955         self._status.set_active(False)
8956hunk ./src/allmydata/mutable/retrieve.py 1068
8957-        self._status.timings["total"] = time.time() - self._started
8958-        # res is either the new contents, or a Failure
8959-        if isinstance(res, failure.Failure):
8960-            self.log("Retrieve done, with failure", failure=res,
8961-                     level=log.UNUSUAL)
8962-            self._status.set_status("Failed")
8963+        now = time.time()
8964+        self._status.timings['total'] = now - self._started
8965+        self._status.timings['fetch'] = now - self._started_fetching
8966+
8967+        if self._verify:
8968+            ret = list(self._bad_shares)
8969+            self.log("done verifying, found %d bad shares" % len(ret))
8970         else:
8971hunk ./src/allmydata/mutable/retrieve.py 1076
8972-            self.log("Retrieve done, success!")
8973-            self._status.set_status("Finished")
8974-            self._status.set_progress(1.0)
8975-            # remember the encoding parameters, use them again next time
8976-            (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8977-             offsets_tuple) = self.verinfo
8978-            self._node._populate_required_shares(k)
8979-            self._node._populate_total_shares(N)
8980-        eventually(self._done_deferred.callback, res)
8981+            # TODO: upload status here?
8982+            ret = self._consumer
8983+            self._consumer.unregisterProducer()
8984+        eventually(self._done_deferred.callback, ret)
8985+
8986 
8987hunk ./src/allmydata/mutable/retrieve.py 1082
8988+    def _failed(self):
8989+        """
8990+        I am called by _add_active_peers when there are not enough
8991+        active peers left to complete the download. After making some
8992+        useful logging statements, I return an exception to that effect
8993+        to the caller of this Retrieve object through
8994+        self._done_deferred.
8995+        """
8996+        self._running = False
8997+        self._status.set_active(False)
8998+        now = time.time()
8999+        self._status.timings['total'] = now - self._started
9000+        self._status.timings['fetch'] = now - self._started_fetching
9001+
9002+        if self._verify:
9003+            ret = list(self._bad_shares)
9004+        else:
9005+            format = ("ran out of peers: "
9006+                      "have %(have)d of %(total)d segments "
9007+                      "found %(bad)d bad shares "
9008+                      "encoding %(k)d-of-%(n)d")
9009+            args = {"have": self._current_segment,
9010+                    "total": self._num_segments,
9011+                    "need": self._last_segment,
9012+                    "k": self._required_shares,
9013+                    "n": self._total_shares,
9014+                    "bad": len(self._bad_shares)}
9015+            e = NotEnoughSharesError("%s, last failure: %s" % \
9016+                                     (format % args, str(self._last_failure)))
9017+            f = failure.Failure(e)
9018+            ret = f
9019+        eventually(self._done_deferred.callback, ret)
9020}
9021[mutable/servermap.py: Alter the servermap updater to work with MDMF files
9022Kevan Carstensen <kevan@isnotajoke.com>**20100819003439
9023 Ignore-this: 7e408303194834bd59a2f27efab3bdb
9024 
9025 These modifications were basically all to the end of having the
9026 servermap updater use the unified MDMF + SDMF read interface whenever
9027 possible -- this reduces the complexity of the code, making it easier to
9028 read and maintain. To do this, I needed to modify the process of
9029 updating the servermap a little bit.
9030 
9031 To support partial-file updates, I also modified the servermap updater
9032 to fetch the block hash trees and certain segments of files while it
9033 performed a servermap update (this can be done without adding any new
9034 roundtrips because of batch-read functionality that the read proxy has).
9035 
9036] {
9037hunk ./src/allmydata/mutable/servermap.py 2
9038 
9039-import sys, time
9040+import sys, time, struct
9041 from zope.interface import implements
9042 from itertools import count
9043 from twisted.internet import defer
9044merger 0.0 (
9045hunk ./src/allmydata/mutable/servermap.py 9
9046+from allmydata.util.dictutil import DictOfSets
9047hunk ./src/allmydata/mutable/servermap.py 7
9048-from foolscap.api import DeadReferenceError, RemoteException, eventually
9049-from allmydata.util import base32, hashutil, idlib, log
9050+from foolscap.api import DeadReferenceError, RemoteException, eventually, \
9051+                         fireEventually
9052+from allmydata.util import base32, hashutil, idlib, log, deferredutil
9053)
9054merger 0.0 (
9055hunk ./src/allmydata/mutable/servermap.py 14
9056-     DictOfSets, CorruptShareError, NeedMoreDataError
9057+     CorruptShareError, NeedMoreDataError
9058hunk ./src/allmydata/mutable/servermap.py 14
9059-     DictOfSets, CorruptShareError, NeedMoreDataError
9060-from allmydata.mutable.layout import unpack_prefix_and_signature, unpack_header, unpack_share, \
9061-     SIGNED_PREFIX_LENGTH
9062+     DictOfSets, CorruptShareError
9063+from allmydata.mutable.layout import SIGNED_PREFIX_LENGTH, MDMFSlotReadProxy
9064)
9065hunk ./src/allmydata/mutable/servermap.py 123
9066         self.bad_shares = {} # maps (peerid,shnum) to old checkstring
9067         self.last_update_mode = None
9068         self.last_update_time = 0
9069+        self.update_data = {} # (verinfo,shnum) => data
9070 
9071     def copy(self):
9072         s = ServerMap()
9073hunk ./src/allmydata/mutable/servermap.py 254
9074         """Return a set of versionids, one for each version that is currently
9075         recoverable."""
9076         versionmap = self.make_versionmap()
9077-
9078         recoverable_versions = set()
9079         for (verinfo, shares) in versionmap.items():
9080             (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9081hunk ./src/allmydata/mutable/servermap.py 339
9082         return False
9083 
9084 
9085+    def get_update_data_for_share_and_verinfo(self, shnum, verinfo):
9086+        """
9087+        I return the update data for the given shnum
9088+        """
9089+        update_data = self.update_data[shnum]
9090+        update_datum = [i[1] for i in update_data if i[0] == verinfo][0]
9091+        return update_datum
9092+
9093+
9094+    def set_update_data_for_share_and_verinfo(self, shnum, verinfo, data):
9095+        """
9096+        I record the block hash tree for the given shnum.
9097+        """
9098+        self.update_data.setdefault(shnum , []).append((verinfo, data))
9099+
9100+
9101 class ServermapUpdater:
9102     def __init__(self, filenode, storage_broker, monitor, servermap,
9103hunk ./src/allmydata/mutable/servermap.py 357
9104-                 mode=MODE_READ, add_lease=False):
9105+                 mode=MODE_READ, add_lease=False, update_range=None):
9106         """I update a servermap, locating a sufficient number of useful
9107         shares and remembering where they are located.
9108 
9109hunk ./src/allmydata/mutable/servermap.py 382
9110         self._servers_responded = set()
9111 
9112         # how much data should we read?
9113+        # SDMF:
9114         #  * if we only need the checkstring, then [0:75]
9115         #  * if we need to validate the checkstring sig, then [543ish:799ish]
9116         #  * if we need the verification key, then [107:436ish]
9117merger 0.0 (
9118hunk ./src/allmydata/mutable/servermap.py 392
9119-        # read 2000 bytes, which also happens to read enough actual data to
9120-        # pre-fetch a 9-entry dirnode.
9121+        # read 4000 bytes, which also happens to read enough actual data to
9122+        # pre-fetch an 18-entry dirnode.
9123hunk ./src/allmydata/mutable/servermap.py 390
9124-        # A future version of the SMDF slot format should consider using
9125-        # fixed-size slots so we can retrieve less data. For now, we'll just
9126-        # read 2000 bytes, which also happens to read enough actual data to
9127-        # pre-fetch a 9-entry dirnode.
9128+        # MDMF:
9129+        #  * Checkstring? [0:72]
9130+        #  * If we want to validate the checkstring, then [0:72], [143:?] --
9131+        #    the offset table will tell us for sure.
9132+        #  * If we need the verification key, we have to consult the offset
9133+        #    table as well.
9134+        # At this point, we don't know which we are. Our filenode can
9135+        # tell us, but it might be lying -- in some cases, we're
9136+        # responsible for telling it which kind of file it is.
9137)
9138hunk ./src/allmydata/mutable/servermap.py 399
9139             # we use unpack_prefix_and_signature, so we need 1k
9140             self._read_size = 1000
9141         self._need_privkey = False
9142+
9143         if mode == MODE_WRITE and not self._node.get_privkey():
9144             self._need_privkey = True
9145         # check+repair: repair requires the privkey, so if we didn't happen
9146hunk ./src/allmydata/mutable/servermap.py 406
9147         # to ask for it during the check, we'll have problems doing the
9148         # publish.
9149 
9150+        self.fetch_update_data = False
9151+        if mode == MODE_WRITE and update_range:
9152+            # We're updating the servermap in preparation for an
9153+            # in-place file update, so we need to fetch some additional
9154+            # data from each share that we find.
9155+            assert len(update_range) == 2
9156+
9157+            self.start_segment = update_range[0]
9158+            self.end_segment = update_range[1]
9159+            self.fetch_update_data = True
9160+
9161         prefix = si_b2a(self._storage_index)[:5]
9162         self._log_number = log.msg(format="SharemapUpdater(%(si)s): starting (%(mode)s)",
9163                                    si=prefix, mode=mode)
9164merger 0.0 (
9165hunk ./src/allmydata/mutable/servermap.py 455
9166-        full_peerlist = sb.get_servers_for_index(self._storage_index)
9167+        full_peerlist = [(s.get_serverid(), s.get_rref())
9168+                         for s in sb.get_servers_for_psi(self._storage_index)]
9169hunk ./src/allmydata/mutable/servermap.py 455
9170+        # All of the peers, permuted by the storage index, as usual.
9171)
9172hunk ./src/allmydata/mutable/servermap.py 461
9173         self._good_peers = set() # peers who had some shares
9174         self._empty_peers = set() # peers who don't have any shares
9175         self._bad_peers = set() # peers to whom our queries failed
9176+        self._readers = {} # peerid -> dict(sharewriters), filled in
9177+                           # after responses come in.
9178 
9179         k = self._node.get_required_shares()
9180hunk ./src/allmydata/mutable/servermap.py 465
9181+        # For what cases can these conditions work?
9182         if k is None:
9183             # make a guess
9184             k = 3
9185hunk ./src/allmydata/mutable/servermap.py 478
9186         self.num_peers_to_query = k + self.EPSILON
9187 
9188         if self.mode == MODE_CHECK:
9189+            # We want to query all of the peers.
9190             initial_peers_to_query = dict(full_peerlist)
9191             must_query = set(initial_peers_to_query.keys())
9192             self.extra_peers = []
9193hunk ./src/allmydata/mutable/servermap.py 486
9194             # we're planning to replace all the shares, so we want a good
9195             # chance of finding them all. We will keep searching until we've
9196             # seen epsilon that don't have a share.
9197+            # We don't query all of the peers because that could take a while.
9198             self.num_peers_to_query = N + self.EPSILON
9199             initial_peers_to_query, must_query = self._build_initial_querylist()
9200             self.required_num_empty_peers = self.EPSILON
9201hunk ./src/allmydata/mutable/servermap.py 496
9202             # might also avoid the round trip required to read the encrypted
9203             # private key.
9204 
9205-        else:
9206+        else: # MODE_READ, MODE_ANYTHING
9207+            # 2k peers is good enough.
9208             initial_peers_to_query, must_query = self._build_initial_querylist()
9209 
9210         # this is a set of peers that we are required to get responses from:
9211hunk ./src/allmydata/mutable/servermap.py 512
9212         # before we can consider ourselves finished, and self.extra_peers
9213         # contains the overflow (peers that we should tap if we don't get
9214         # enough responses)
9215+        # I guess that self._must_query is a subset of
9216+        # initial_peers_to_query?
9217+        assert set(must_query).issubset(set(initial_peers_to_query))
9218 
9219         self._send_initial_requests(initial_peers_to_query)
9220         self._status.timings["initial_queries"] = time.time() - self._started
9221hunk ./src/allmydata/mutable/servermap.py 571
9222         # errors that aren't handled by _query_failed (and errors caused by
9223         # _query_failed) get logged, but we still want to check for doneness.
9224         d.addErrback(log.err)
9225-        d.addBoth(self._check_for_done)
9226         d.addErrback(self._fatal_error)
9227hunk ./src/allmydata/mutable/servermap.py 572
9228+        d.addCallback(self._check_for_done)
9229         return d
9230 
9231     def _do_read(self, ss, peerid, storage_index, shnums, readv):
9232hunk ./src/allmydata/mutable/servermap.py 591
9233         d = ss.callRemote("slot_readv", storage_index, shnums, readv)
9234         return d
9235 
9236+
9237+    def _got_corrupt_share(self, e, shnum, peerid, data, lp):
9238+        """
9239+        I am called when a remote server returns a corrupt share in
9240+        response to one of our queries. By corrupt, I mean a share
9241+        without a valid signature. I then record the failure, notify the
9242+        server of the corruption, and record the share as bad.
9243+        """
9244+        f = failure.Failure(e)
9245+        self.log(format="bad share: %(f_value)s", f_value=str(f),
9246+                 failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
9247+        # Notify the server that its share is corrupt.
9248+        self.notify_server_corruption(peerid, shnum, str(e))
9249+        # By flagging this as a bad peer, we won't count any of
9250+        # the other shares on that peer as valid, though if we
9251+        # happen to find a valid version string amongst those
9252+        # shares, we'll keep track of it so that we don't need
9253+        # to validate the signature on those again.
9254+        self._bad_peers.add(peerid)
9255+        self._last_failure = f
9256+        # XXX: Use the reader for this?
9257+        checkstring = data[:SIGNED_PREFIX_LENGTH]
9258+        self._servermap.mark_bad_share(peerid, shnum, checkstring)
9259+        self._servermap.problems.append(f)
9260+
9261+
9262+    def _cache_good_sharedata(self, verinfo, shnum, now, data):
9263+        """
9264+        If one of my queries returns successfully (which means that we
9265+        were able to and successfully did validate the signature), I
9266+        cache the data that we initially fetched from the storage
9267+        server. This will help reduce the number of roundtrips that need
9268+        to occur when the file is downloaded, or when the file is
9269+        updated.
9270+        """
9271+        if verinfo:
9272+            self._node._add_to_cache(verinfo, shnum, 0, data, now)
9273+
9274+
9275     def _got_results(self, datavs, peerid, readsize, stuff, started):
9276         lp = self.log(format="got result from [%(peerid)s], %(numshares)d shares",
9277                       peerid=idlib.shortnodeid_b2a(peerid),
9278hunk ./src/allmydata/mutable/servermap.py 633
9279-                      numshares=len(datavs),
9280-                      level=log.NOISY)
9281+                      numshares=len(datavs))
9282         now = time.time()
9283         elapsed = now - started
9284hunk ./src/allmydata/mutable/servermap.py 636
9285-        self._queries_outstanding.discard(peerid)
9286-        self._servermap.reachable_peers.add(peerid)
9287-        self._must_query.discard(peerid)
9288-        self._queries_completed += 1
9289+        def _done_processing(ignored=None):
9290+            self._queries_outstanding.discard(peerid)
9291+            self._servermap.reachable_peers.add(peerid)
9292+            self._must_query.discard(peerid)
9293+            self._queries_completed += 1
9294         if not self._running:
9295hunk ./src/allmydata/mutable/servermap.py 642
9296-            self.log("but we're not running, so we'll ignore it", parent=lp,
9297-                     level=log.NOISY)
9298+            self.log("but we're not running, so we'll ignore it", parent=lp)
9299+            _done_processing()
9300             self._status.add_per_server_time(peerid, "late", started, elapsed)
9301             return
9302         self._status.add_per_server_time(peerid, "query", started, elapsed)
9303hunk ./src/allmydata/mutable/servermap.py 653
9304         else:
9305             self._empty_peers.add(peerid)
9306 
9307-        last_verinfo = None
9308-        last_shnum = None
9309+        ss, storage_index = stuff
9310+        ds = []
9311+
9312         for shnum,datav in datavs.items():
9313             data = datav[0]
9314             try:
9315merger 0.0 (
9316hunk ./src/allmydata/mutable/servermap.py 662
9317-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
9318+                self._node._add_to_cache(verinfo, shnum, 0, data)
9319hunk ./src/allmydata/mutable/servermap.py 658
9320-            try:
9321-                verinfo = self._got_results_one_share(shnum, data, peerid, lp)
9322-                last_verinfo = verinfo
9323-                last_shnum = shnum
9324-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
9325-            except CorruptShareError, e:
9326-                # log it and give the other shares a chance to be processed
9327-                f = failure.Failure()
9328-                self.log(format="bad share: %(f_value)s", f_value=str(f.value),
9329-                         failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
9330-                self.notify_server_corruption(peerid, shnum, str(e))
9331-                self._bad_peers.add(peerid)
9332-                self._last_failure = f
9333-                checkstring = data[:SIGNED_PREFIX_LENGTH]
9334-                self._servermap.mark_bad_share(peerid, shnum, checkstring)
9335-                self._servermap.problems.append(f)
9336-                pass
9337+            reader = MDMFSlotReadProxy(ss,
9338+                                       storage_index,
9339+                                       shnum,
9340+                                       data)
9341+            self._readers.setdefault(peerid, dict())[shnum] = reader
9342+            # our goal, with each response, is to validate the version
9343+            # information and share data as best we can at this point --
9344+            # we do this by validating the signature. To do this, we
9345+            # need to do the following:
9346+            #   - If we don't already have the public key, fetch the
9347+            #     public key. We use this to validate the signature.
9348+            if not self._node.get_pubkey():
9349+                # fetch and set the public key.
9350+                d = reader.get_verification_key(queue=True)
9351+                d.addCallback(lambda results, shnum=shnum, peerid=peerid:
9352+                    self._try_to_set_pubkey(results, peerid, shnum, lp))
9353+                # XXX: Make self._pubkey_query_failed?
9354+                d.addErrback(lambda error, shnum=shnum, peerid=peerid:
9355+                    self._got_corrupt_share(error, shnum, peerid, data, lp))
9356+            else:
9357+                # we already have the public key.
9358+                d = defer.succeed(None)
9359)
9360hunk ./src/allmydata/mutable/servermap.py 676
9361                 self._servermap.problems.append(f)
9362                 pass
9363 
9364-        self._status.timings["cumulative_verify"] += (time.time() - now)
9365+            # Neither of these two branches return anything of
9366+            # consequence, so the first entry in our deferredlist will
9367+            # be None.
9368 
9369hunk ./src/allmydata/mutable/servermap.py 680
9370-        if self._need_privkey and last_verinfo:
9371-            # send them a request for the privkey. We send one request per
9372-            # server.
9373-            lp2 = self.log("sending privkey request",
9374-                           parent=lp, level=log.NOISY)
9375-            (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9376-             offsets_tuple) = last_verinfo
9377-            o = dict(offsets_tuple)
9378+            # - Next, we need the version information. We almost
9379+            #   certainly got this by reading the first thousand or so
9380+            #   bytes of the share on the storage server, so we
9381+            #   shouldn't need to fetch anything at this step.
9382+            d2 = reader.get_verinfo()
9383+            d2.addErrback(lambda error, shnum=shnum, peerid=peerid:
9384+                self._got_corrupt_share(error, shnum, peerid, data, lp))
9385+            # - Next, we need the signature. For an SDMF share, it is
9386+            #   likely that we fetched this when doing our initial fetch
9387+            #   to get the version information. In MDMF, this lives at
9388+            #   the end of the share, so unless the file is quite small,
9389+            #   we'll need to do a remote fetch to get it.
9390+            d3 = reader.get_signature(queue=True)
9391+            d3.addErrback(lambda error, shnum=shnum, peerid=peerid:
9392+                self._got_corrupt_share(error, shnum, peerid, data, lp))
9393+            #  Once we have all three of these responses, we can move on
9394+            #  to validating the signature
9395 
9396hunk ./src/allmydata/mutable/servermap.py 698
9397-            self._queries_outstanding.add(peerid)
9398-            readv = [ (o['enc_privkey'], (o['EOF'] - o['enc_privkey'])) ]
9399-            ss = self._servermap.connections[peerid]
9400-            privkey_started = time.time()
9401-            d = self._do_read(ss, peerid, self._storage_index,
9402-                              [last_shnum], readv)
9403-            d.addCallback(self._got_privkey_results, peerid, last_shnum,
9404-                          privkey_started, lp2)
9405-            d.addErrback(self._privkey_query_failed, peerid, last_shnum, lp2)
9406-            d.addErrback(log.err)
9407-            d.addCallback(self._check_for_done)
9408-            d.addErrback(self._fatal_error)
9409+            # Does the node already have a privkey? If not, we'll try to
9410+            # fetch it here.
9411+            if self._need_privkey:
9412+                d4 = reader.get_encprivkey(queue=True)
9413+                d4.addCallback(lambda results, shnum=shnum, peerid=peerid:
9414+                    self._try_to_validate_privkey(results, peerid, shnum, lp))
9415+                d4.addErrback(lambda error, shnum=shnum, peerid=peerid:
9416+                    self._privkey_query_failed(error, shnum, data, lp))
9417+            else:
9418+                d4 = defer.succeed(None)
9419+
9420+
9421+            if self.fetch_update_data:
9422+                # fetch the block hash tree and first + last segment, as
9423+                # configured earlier.
9424+                # Then set them in wherever we happen to want to set
9425+                # them.
9426+                ds = []
9427+                # XXX: We do this above, too. Is there a good way to
9428+                # make the two routines share the value without
9429+                # introducing more roundtrips?
9430+                ds.append(reader.get_verinfo())
9431+                ds.append(reader.get_blockhashes(queue=True))
9432+                ds.append(reader.get_block_and_salt(self.start_segment,
9433+                                                    queue=True))
9434+                ds.append(reader.get_block_and_salt(self.end_segment,
9435+                                                    queue=True))
9436+                d5 = deferredutil.gatherResults(ds)
9437+                d5.addCallback(self._got_update_results_one_share, shnum)
9438+            else:
9439+                d5 = defer.succeed(None)
9440 
9441hunk ./src/allmydata/mutable/servermap.py 730
9442+            dl = defer.DeferredList([d, d2, d3, d4, d5])
9443+            dl.addBoth(self._turn_barrier)
9444+            reader.flush()
9445+            dl.addCallback(lambda results, shnum=shnum, peerid=peerid:
9446+                self._got_signature_one_share(results, shnum, peerid, lp))
9447+            dl.addErrback(lambda error, shnum=shnum, data=data:
9448+               self._got_corrupt_share(error, shnum, peerid, data, lp))
9449+            dl.addCallback(lambda verinfo, shnum=shnum, peerid=peerid, data=data:
9450+                self._cache_good_sharedata(verinfo, shnum, now, data))
9451+            ds.append(dl)
9452+        # dl is a deferred list that will fire when all of the shares
9453+        # that we found on this peer are done processing. When dl fires,
9454+        # we know that processing is done, so we can decrement the
9455+        # semaphore-like thing that we incremented earlier.
9456+        dl = defer.DeferredList(ds, fireOnOneErrback=True)
9457+        # Are we done? Done means that there are no more queries to
9458+        # send, that there are no outstanding queries, and that we
9459+        # haven't received any queries that are still processing. If we
9460+        # are done, self._check_for_done will cause the done deferred
9461+        # that we returned to our caller to fire, which tells them that
9462+        # they have a complete servermap, and that we won't be touching
9463+        # the servermap anymore.
9464+        dl.addCallback(_done_processing)
9465+        dl.addCallback(self._check_for_done)
9466+        dl.addErrback(self._fatal_error)
9467         # all done!
9468         self.log("_got_results done", parent=lp, level=log.NOISY)
9469hunk ./src/allmydata/mutable/servermap.py 757
9470+        return dl
9471+
9472+
9473+    def _turn_barrier(self, result):
9474+        """
9475+        I help the servermap updater avoid the recursion limit issues
9476+        discussed in #237.
9477+        """
9478+        return fireEventually(result)
9479+
9480+
9481+    def _try_to_set_pubkey(self, pubkey_s, peerid, shnum, lp):
9482+        if self._node.get_pubkey():
9483+            return # don't go through this again if we don't have to
9484+        fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s)
9485+        assert len(fingerprint) == 32
9486+        if fingerprint != self._node.get_fingerprint():
9487+            raise CorruptShareError(peerid, shnum,
9488+                                "pubkey doesn't match fingerprint")
9489+        self._node._populate_pubkey(self._deserialize_pubkey(pubkey_s))
9490+        assert self._node.get_pubkey()
9491+
9492 
9493     def notify_server_corruption(self, peerid, shnum, reason):
9494         ss = self._servermap.connections[peerid]
9495hunk ./src/allmydata/mutable/servermap.py 785
9496         ss.callRemoteOnly("advise_corrupt_share",
9497                           "mutable", self._storage_index, shnum, reason)
9498 
9499-    def _got_results_one_share(self, shnum, data, peerid, lp):
9500+
9501+    def _got_signature_one_share(self, results, shnum, peerid, lp):
9502+        # It is our job to give versioninfo to our caller. We need to
9503+        # raise CorruptShareError if the share is corrupt for any
9504+        # reason, something that our caller will handle.
9505         self.log(format="_got_results: got shnum #%(shnum)d from peerid %(peerid)s",
9506                  shnum=shnum,
9507                  peerid=idlib.shortnodeid_b2a(peerid),
9508hunk ./src/allmydata/mutable/servermap.py 795
9509                  level=log.NOISY,
9510                  parent=lp)
9511+        if not self._running:
9512+            # We can't process the results, since we can't touch the
9513+            # servermap anymore.
9514+            self.log("but we're not running anymore.")
9515+            return None
9516 
9517hunk ./src/allmydata/mutable/servermap.py 801
9518-        # this might raise NeedMoreDataError, if the pubkey and signature
9519-        # live at some weird offset. That shouldn't happen, so I'm going to
9520-        # treat it as a bad share.
9521-        (seqnum, root_hash, IV, k, N, segsize, datalength,
9522-         pubkey_s, signature, prefix) = unpack_prefix_and_signature(data)
9523-
9524-        if not self._node.get_pubkey():
9525-            fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s)
9526-            assert len(fingerprint) == 32
9527-            if fingerprint != self._node.get_fingerprint():
9528-                raise CorruptShareError(peerid, shnum,
9529-                                        "pubkey doesn't match fingerprint")
9530-            self._node._populate_pubkey(self._deserialize_pubkey(pubkey_s))
9531-
9532-        if self._need_privkey:
9533-            self._try_to_extract_privkey(data, peerid, shnum, lp)
9534-
9535-        (ig_version, ig_seqnum, ig_root_hash, ig_IV, ig_k, ig_N,
9536-         ig_segsize, ig_datalen, offsets) = unpack_header(data)
9537+        _, verinfo, signature, __, ___ = results
9538+        (seqnum,
9539+         root_hash,
9540+         saltish,
9541+         segsize,
9542+         datalen,
9543+         k,
9544+         n,
9545+         prefix,
9546+         offsets) = verinfo[1]
9547         offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
9548 
9549hunk ./src/allmydata/mutable/servermap.py 813
9550-        verinfo = (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9551+        # XXX: This should be done for us in the method, so
9552+        # presumably you can go in there and fix it.
9553+        verinfo = (seqnum,
9554+                   root_hash,
9555+                   saltish,
9556+                   segsize,
9557+                   datalen,
9558+                   k,
9559+                   n,
9560+                   prefix,
9561                    offsets_tuple)
9562hunk ./src/allmydata/mutable/servermap.py 824
9563+        # This tuple uniquely identifies a share on the grid; we use it
9564+        # to keep track of the ones that we've already seen.
9565 
9566         if verinfo not in self._valid_versions:
9567hunk ./src/allmydata/mutable/servermap.py 828
9568-            # it's a new pair. Verify the signature.
9569-            valid = self._node.get_pubkey().verify(prefix, signature)
9570+            # This is a new version tuple, and we need to validate it
9571+            # against the public key before keeping track of it.
9572+            assert self._node.get_pubkey()
9573+            valid = self._node.get_pubkey().verify(prefix, signature[1])
9574             if not valid:
9575hunk ./src/allmydata/mutable/servermap.py 833
9576-                raise CorruptShareError(peerid, shnum, "signature is invalid")
9577+                raise CorruptShareError(peerid, shnum,
9578+                                        "signature is invalid")
9579 
9580hunk ./src/allmydata/mutable/servermap.py 836
9581-            # ok, it's a valid verinfo. Add it to the list of validated
9582-            # versions.
9583-            self.log(" found valid version %d-%s from %s-sh%d: %d-%d/%d/%d"
9584-                     % (seqnum, base32.b2a(root_hash)[:4],
9585-                        idlib.shortnodeid_b2a(peerid), shnum,
9586-                        k, N, segsize, datalength),
9587-                     parent=lp)
9588-            self._valid_versions.add(verinfo)
9589-        # We now know that this is a valid candidate verinfo.
9590+        # ok, it's a valid verinfo. Add it to the list of validated
9591+        # versions.
9592+        self.log(" found valid version %d-%s from %s-sh%d: %d-%d/%d/%d"
9593+                 % (seqnum, base32.b2a(root_hash)[:4],
9594+                    idlib.shortnodeid_b2a(peerid), shnum,
9595+                    k, n, segsize, datalen),
9596+                    parent=lp)
9597+        self._valid_versions.add(verinfo)
9598+        # We now know that this is a valid candidate verinfo. Whether or
9599+        # not this instance of it is valid is a matter for the next
9600+        # statement; at this point, we just know that if we see this
9601+        # version info again, that its signature checks out and that
9602+        # we're okay to skip the signature-checking step.
9603 
9604hunk ./src/allmydata/mutable/servermap.py 850
9605+        # (peerid, shnum) are bound in the method invocation.
9606         if (peerid, shnum) in self._servermap.bad_shares:
9607             # we've been told that the rest of the data in this share is
9608             # unusable, so don't add it to the servermap.
9609hunk ./src/allmydata/mutable/servermap.py 863
9610         self._servermap.add_new_share(peerid, shnum, verinfo, timestamp)
9611         # and the versionmap
9612         self.versionmap.add(verinfo, (shnum, peerid, timestamp))
9613+
9614+        # It's our job to set the protocol version of our parent
9615+        # filenode if it isn't already set.
9616+        if not self._node.get_version():
9617+            # The first byte of the prefix is the version.
9618+            v = struct.unpack(">B", prefix[:1])[0]
9619+            self.log("got version %d" % v)
9620+            self._node.set_version(v)
9621+
9622         return verinfo
9623 
9624hunk ./src/allmydata/mutable/servermap.py 874
9625-    def _deserialize_pubkey(self, pubkey_s):
9626-        verifier = rsa.create_verifying_key_from_string(pubkey_s)
9627-        return verifier
9628 
9629hunk ./src/allmydata/mutable/servermap.py 875
9630-    def _try_to_extract_privkey(self, data, peerid, shnum, lp):
9631-        try:
9632-            r = unpack_share(data)
9633-        except NeedMoreDataError, e:
9634-            # this share won't help us. oh well.
9635-            offset = e.encprivkey_offset
9636-            length = e.encprivkey_length
9637-            self.log("shnum %d on peerid %s: share was too short (%dB) "
9638-                     "to get the encprivkey; [%d:%d] ought to hold it" %
9639-                     (shnum, idlib.shortnodeid_b2a(peerid), len(data),
9640-                      offset, offset+length),
9641-                     parent=lp)
9642-            # NOTE: if uncoordinated writes are taking place, someone might
9643-            # change the share (and most probably move the encprivkey) before
9644-            # we get a chance to do one of these reads and fetch it. This
9645-            # will cause us to see a NotEnoughSharesError(unable to fetch
9646-            # privkey) instead of an UncoordinatedWriteError . This is a
9647-            # nuisance, but it will go away when we move to DSA-based mutable
9648-            # files (since the privkey will be small enough to fit in the
9649-            # write cap).
9650+    def _got_update_results_one_share(self, results, share):
9651+        """
9652+        I record the update results in results.
9653+        """
9654+        assert len(results) == 4
9655+        verinfo, blockhashes, start, end = results
9656+        (seqnum,
9657+         root_hash,
9658+         saltish,
9659+         segsize,
9660+         datalen,
9661+         k,
9662+         n,
9663+         prefix,
9664+         offsets) = verinfo
9665+        offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
9666 
9667hunk ./src/allmydata/mutable/servermap.py 892
9668-            return
9669+        # XXX: This should be done for us in the method, so
9670+        # presumably you can go in there and fix it.
9671+        verinfo = (seqnum,
9672+                   root_hash,
9673+                   saltish,
9674+                   segsize,
9675+                   datalen,
9676+                   k,
9677+                   n,
9678+                   prefix,
9679+                   offsets_tuple)
9680 
9681hunk ./src/allmydata/mutable/servermap.py 904
9682-        (seqnum, root_hash, IV, k, N, segsize, datalen,
9683-         pubkey, signature, share_hash_chain, block_hash_tree,
9684-         share_data, enc_privkey) = r
9685+        update_data = (blockhashes, start, end)
9686+        self._servermap.set_update_data_for_share_and_verinfo(share,
9687+                                                              verinfo,
9688+                                                              update_data)
9689 
9690hunk ./src/allmydata/mutable/servermap.py 909
9691-        return self._try_to_validate_privkey(enc_privkey, peerid, shnum, lp)
9692+
9693+    def _deserialize_pubkey(self, pubkey_s):
9694+        verifier = rsa.create_verifying_key_from_string(pubkey_s)
9695+        return verifier
9696 
9697hunk ./src/allmydata/mutable/servermap.py 914
9698-    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
9699 
9700hunk ./src/allmydata/mutable/servermap.py 915
9701+    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
9702+        """
9703+        Given a writekey from a remote server, I validate it against the
9704+        writekey stored in my node. If it is valid, then I set the
9705+        privkey and encprivkey properties of the node.
9706+        """
9707         alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
9708         alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
9709         if alleged_writekey != self._node.get_writekey():
9710hunk ./src/allmydata/mutable/servermap.py 993
9711         self._queries_completed += 1
9712         self._last_failure = f
9713 
9714-    def _got_privkey_results(self, datavs, peerid, shnum, started, lp):
9715-        now = time.time()
9716-        elapsed = now - started
9717-        self._status.add_per_server_time(peerid, "privkey", started, elapsed)
9718-        self._queries_outstanding.discard(peerid)
9719-        if not self._need_privkey:
9720-            return
9721-        if shnum not in datavs:
9722-            self.log("privkey wasn't there when we asked it",
9723-                     level=log.WEIRD, umid="VA9uDQ")
9724-            return
9725-        datav = datavs[shnum]
9726-        enc_privkey = datav[0]
9727-        self._try_to_validate_privkey(enc_privkey, peerid, shnum, lp)
9728 
9729     def _privkey_query_failed(self, f, peerid, shnum, lp):
9730         self._queries_outstanding.discard(peerid)
9731hunk ./src/allmydata/mutable/servermap.py 1007
9732         self._servermap.problems.append(f)
9733         self._last_failure = f
9734 
9735+
9736     def _check_for_done(self, res):
9737         # exit paths:
9738         #  return self._send_more_queries(outstanding) : send some more queries
9739hunk ./src/allmydata/mutable/servermap.py 1013
9740         #  return self._done() : all done
9741         #  return : keep waiting, no new queries
9742-
9743         lp = self.log(format=("_check_for_done, mode is '%(mode)s', "
9744                               "%(outstanding)d queries outstanding, "
9745                               "%(extra)d extra peers available, "
9746hunk ./src/allmydata/mutable/servermap.py 1204
9747 
9748     def _done(self):
9749         if not self._running:
9750+            self.log("not running; we're already done")
9751             return
9752         self._running = False
9753         now = time.time()
9754hunk ./src/allmydata/mutable/servermap.py 1219
9755         self._servermap.last_update_time = self._started
9756         # the servermap will not be touched after this
9757         self.log("servermap: %s" % self._servermap.summarize_versions())
9758+
9759         eventually(self._done_deferred.callback, self._servermap)
9760 
9761     def _fatal_error(self, f):
9762}
9763[tests:
9764Kevan Carstensen <kevan@isnotajoke.com>**20100819003531
9765 Ignore-this: 314e8bbcce532ea4d5d2cecc9f31cca0
9766 
9767     - A lot of existing tests relied on aspects of the mutable file
9768       implementation that were changed. This patch updates those tests
9769       to work with the changes.
9770     - This patch also adds tests for new features.
9771] {
9772hunk ./src/allmydata/test/common.py 11
9773 from foolscap.api import flushEventualQueue, fireEventually
9774 from allmydata import uri, dirnode, client
9775 from allmydata.introducer.server import IntroducerNode
9776-from allmydata.interfaces import IMutableFileNode, IImmutableFileNode, \
9777-     FileTooLargeError, NotEnoughSharesError, ICheckable
9778+from allmydata.interfaces import IMutableFileNode, IImmutableFileNode,\
9779+                                 NotEnoughSharesError, ICheckable, \
9780+                                 IMutableUploadable, SDMF_VERSION, \
9781+                                 MDMF_VERSION
9782 from allmydata.check_results import CheckResults, CheckAndRepairResults, \
9783      DeepCheckResults, DeepCheckAndRepairResults
9784 from allmydata.mutable.common import CorruptShareError
9785hunk ./src/allmydata/test/common.py 19
9786 from allmydata.mutable.layout import unpack_header
9787+from allmydata.mutable.publish import MutableData
9788 from allmydata.storage.server import storage_index_to_dir
9789 from allmydata.storage.mutable import MutableShareFile
9790 from allmydata.util import hashutil, log, fileutil, pollmixin
9791hunk ./src/allmydata/test/common.py 153
9792         consumer.write(data[start:end])
9793         return consumer
9794 
9795+
9796+    def get_best_readable_version(self):
9797+        return defer.succeed(self)
9798+
9799+
9800+    download_best_version = download_to_data
9801+
9802+
9803+    def download_to_data(self):
9804+        return download_to_data(self)
9805+
9806+
9807+    def get_size_of_best_version(self):
9808+        return defer.succeed(self.get_size)
9809+
9810+
9811 def make_chk_file_cap(size):
9812     return uri.CHKFileURI(key=os.urandom(16),
9813                           uri_extension_hash=os.urandom(32),
9814hunk ./src/allmydata/test/common.py 193
9815     MUTABLE_SIZELIMIT = 10000
9816     all_contents = {}
9817     bad_shares = {}
9818+    file_types = {} # storage index => MDMF_VERSION or SDMF_VERSION
9819 
9820     def __init__(self, storage_broker, secret_holder,
9821                  default_encoding_parameters, history):
9822hunk ./src/allmydata/test/common.py 200
9823         self.init_from_cap(make_mutable_file_cap())
9824     def create(self, contents, key_generator=None, keysize=None):
9825         initial_contents = self._get_initial_contents(contents)
9826-        if len(initial_contents) > self.MUTABLE_SIZELIMIT:
9827-            raise FileTooLargeError("SDMF is limited to one segment, and "
9828-                                    "%d > %d" % (len(initial_contents),
9829-                                                 self.MUTABLE_SIZELIMIT))
9830-        self.all_contents[self.storage_index] = initial_contents
9831+        data = initial_contents.read(initial_contents.get_size())
9832+        data = "".join(data)
9833+        self.all_contents[self.storage_index] = data
9834         return defer.succeed(self)
9835     def _get_initial_contents(self, contents):
9836hunk ./src/allmydata/test/common.py 205
9837-        if isinstance(contents, str):
9838-            return contents
9839         if contents is None:
9840hunk ./src/allmydata/test/common.py 206
9841-            return ""
9842+            return MutableData("")
9843+
9844+        if IMutableUploadable.providedBy(contents):
9845+            return contents
9846+
9847         assert callable(contents), "%s should be callable, not %s" % \
9848                (contents, type(contents))
9849         return contents(self)
9850hunk ./src/allmydata/test/common.py 258
9851     def get_storage_index(self):
9852         return self.storage_index
9853 
9854+    def get_servermap(self, mode):
9855+        return defer.succeed(None)
9856+
9857+    def set_version(self, version):
9858+        assert version in (SDMF_VERSION, MDMF_VERSION)
9859+        self.file_types[self.storage_index] = version
9860+
9861+    def get_version(self):
9862+        assert self.storage_index in self.file_types
9863+        return self.file_types[self.storage_index]
9864+
9865     def check(self, monitor, verify=False, add_lease=False):
9866         r = CheckResults(self.my_uri, self.storage_index)
9867         is_bad = self.bad_shares.get(self.storage_index, None)
9868hunk ./src/allmydata/test/common.py 327
9869         return d
9870 
9871     def download_best_version(self):
9872+        return defer.succeed(self._download_best_version())
9873+
9874+
9875+    def _download_best_version(self, ignored=None):
9876         if isinstance(self.my_uri, uri.LiteralFileURI):
9877hunk ./src/allmydata/test/common.py 332
9878-            return defer.succeed(self.my_uri.data)
9879+            return self.my_uri.data
9880         if self.storage_index not in self.all_contents:
9881hunk ./src/allmydata/test/common.py 334
9882-            return defer.fail(NotEnoughSharesError(None, 0, 3))
9883-        return defer.succeed(self.all_contents[self.storage_index])
9884+            raise NotEnoughSharesError(None, 0, 3)
9885+        return self.all_contents[self.storage_index]
9886+
9887 
9888     def overwrite(self, new_contents):
9889hunk ./src/allmydata/test/common.py 339
9890-        if len(new_contents) > self.MUTABLE_SIZELIMIT:
9891-            raise FileTooLargeError("SDMF is limited to one segment, and "
9892-                                    "%d > %d" % (len(new_contents),
9893-                                                 self.MUTABLE_SIZELIMIT))
9894         assert not self.is_readonly()
9895hunk ./src/allmydata/test/common.py 340
9896-        self.all_contents[self.storage_index] = new_contents
9897+        new_data = new_contents.read(new_contents.get_size())
9898+        new_data = "".join(new_data)
9899+        self.all_contents[self.storage_index] = new_data
9900         return defer.succeed(None)
9901     def modify(self, modifier):
9902         # this does not implement FileTooLargeError, but the real one does
9903hunk ./src/allmydata/test/common.py 350
9904     def _modify(self, modifier):
9905         assert not self.is_readonly()
9906         old_contents = self.all_contents[self.storage_index]
9907-        self.all_contents[self.storage_index] = modifier(old_contents, None, True)
9908+        new_data = modifier(old_contents, None, True)
9909+        self.all_contents[self.storage_index] = new_data
9910         return None
9911 
9912hunk ./src/allmydata/test/common.py 354
9913+    # As actually implemented, MutableFilenode and MutableFileVersion
9914+    # are distinct. However, nothing in the webapi uses (yet) that
9915+    # distinction -- it just uses the unified download interface
9916+    # provided by get_best_readable_version and read. When we start
9917+    # doing cooler things like LDMF, we will want to revise this code to
9918+    # be less simplistic.
9919+    def get_best_readable_version(self):
9920+        return defer.succeed(self)
9921+
9922+
9923+    def get_best_mutable_version(self):
9924+        return defer.succeed(self)
9925+
9926+    # Ditto for this, which is an implementation of IWritable.
9927+    # XXX: Declare that the same is implemented.
9928+    def update(self, data, offset):
9929+        assert not self.is_readonly()
9930+        def modifier(old, servermap, first_time):
9931+            new = old[:offset] + "".join(data.read(data.get_size()))
9932+            new += old[len(new):]
9933+            return new
9934+        return self.modify(modifier)
9935+
9936+
9937+    def read(self, consumer, offset=0, size=None):
9938+        data = self._download_best_version()
9939+        if size:
9940+            data = data[offset:offset+size]
9941+        consumer.write(data)
9942+        return defer.succeed(consumer)
9943+
9944+
9945 def make_mutable_file_cap():
9946     return uri.WriteableSSKFileURI(writekey=os.urandom(16),
9947                                    fingerprint=os.urandom(32))
9948hunk ./src/allmydata/test/test_checker.py 11
9949 from allmydata.test.no_network import GridTestMixin
9950 from allmydata.immutable.upload import Data
9951 from allmydata.test.common_web import WebRenderingMixin
9952+from allmydata.mutable.publish import MutableData
9953 
9954 class FakeClient:
9955     def get_storage_broker(self):
9956hunk ./src/allmydata/test/test_checker.py 291
9957         def _stash_immutable(ur):
9958             self.imm = c0.create_node_from_uri(ur.uri)
9959         d.addCallback(_stash_immutable)
9960-        d.addCallback(lambda ign: c0.create_mutable_file("contents"))
9961+        d.addCallback(lambda ign:
9962+            c0.create_mutable_file(MutableData("contents")))
9963         def _stash_mutable(node):
9964             self.mut = node
9965         d.addCallback(_stash_mutable)
9966hunk ./src/allmydata/test/test_cli.py 13
9967 from allmydata.util import fileutil, hashutil, base32
9968 from allmydata import uri
9969 from allmydata.immutable import upload
9970+from allmydata.mutable.publish import MutableData
9971 from allmydata.dirnode import normalize
9972 
9973 # Test that the scripts can be imported.
9974hunk ./src/allmydata/test/test_cli.py 662
9975 
9976         d = self.do_cli("create-alias", etudes_arg)
9977         def _check_create_unicode((rc, out, err)):
9978-            self.failUnlessReallyEqual(rc, 0)
9979+            #self.failUnlessReallyEqual(rc, 0)
9980             self.failUnlessReallyEqual(err, "")
9981             self.failUnlessIn("Alias %s created" % quote_output(u"\u00E9tudes"), out)
9982 
9983hunk ./src/allmydata/test/test_cli.py 967
9984         d.addCallback(lambda (rc,out,err): self.failUnlessReallyEqual(out, DATA2))
9985         return d
9986 
9987+    def test_mutable_type(self):
9988+        self.basedir = "cli/Put/mutable_type"
9989+        self.set_up_grid()
9990+        data = "data" * 100000
9991+        fn1 = os.path.join(self.basedir, "data")
9992+        fileutil.write(fn1, data)
9993+        d = self.do_cli("create-alias", "tahoe")
9994+        d.addCallback(lambda ignored:
9995+            self.do_cli("put", "--mutable", "--mutable-type=mdmf",
9996+                        fn1, "tahoe:uploaded.txt"))
9997+        d.addCallback(lambda ignored:
9998+            self.do_cli("ls", "--json", "tahoe:uploaded.txt"))
9999+        d.addCallback(lambda (rc, json, err): self.failUnlessIn("mdmf", json))
10000+        d.addCallback(lambda ignored:
10001+            self.do_cli("put", "--mutable", "--mutable-type=sdmf",
10002+                        fn1, "tahoe:uploaded2.txt"))
10003+        d.addCallback(lambda ignored:
10004+            self.do_cli("ls", "--json", "tahoe:uploaded2.txt"))
10005+        d.addCallback(lambda (rc, json, err):
10006+            self.failUnlessIn("sdmf", json))
10007+        return d
10008+
10009+    def test_mutable_type_unlinked(self):
10010+        self.basedir = "cli/Put/mutable_type_unlinked"
10011+        self.set_up_grid()
10012+        data = "data" * 100000
10013+        fn1 = os.path.join(self.basedir, "data")
10014+        fileutil.write(fn1, data)
10015+        d = self.do_cli("put", "--mutable", "--mutable-type=mdmf", fn1)
10016+        d.addCallback(lambda (rc, cap, err):
10017+            self.do_cli("ls", "--json", cap))
10018+        d.addCallback(lambda (rc, json, err): self.failUnlessIn("mdmf", json))
10019+        d.addCallback(lambda ignored:
10020+            self.do_cli("put", "--mutable", "--mutable-type=sdmf", fn1))
10021+        d.addCallback(lambda (rc, cap, err):
10022+            self.do_cli("ls", "--json", cap))
10023+        d.addCallback(lambda (rc, json, err):
10024+            self.failUnlessIn("sdmf", json))
10025+        return d
10026+
10027+    def test_mutable_type_invalid_format(self):
10028+        self.basedir = "cli/Put/mutable_type_invalid_format"
10029+        self.set_up_grid()
10030+        data = "data" * 100000
10031+        fn1 = os.path.join(self.basedir, "data")
10032+        fileutil.write(fn1, data)
10033+        d = self.do_cli("put", "--mutable", "--mutable-type=ldmf", fn1)
10034+        def _check_failure((rc, out, err)):
10035+            self.failIfEqual(rc, 0)
10036+            self.failUnlessIn("invalid", err)
10037+        d.addCallback(_check_failure)
10038+        return d
10039+
10040     def test_put_with_nonexistent_alias(self):
10041         # when invoked with an alias that doesn't exist, 'tahoe put'
10042         # should output a useful error message, not a stack trace
10043hunk ./src/allmydata/test/test_cli.py 2136
10044         self.set_up_grid()
10045         c0 = self.g.clients[0]
10046         DATA = "data" * 100
10047-        d = c0.create_mutable_file(DATA)
10048+        DATA_uploadable = MutableData(DATA)
10049+        d = c0.create_mutable_file(DATA_uploadable)
10050         def _stash_uri(n):
10051             self.uri = n.get_uri()
10052         d.addCallback(_stash_uri)
10053hunk ./src/allmydata/test/test_cli.py 2238
10054                                            upload.Data("literal",
10055                                                         convergence="")))
10056         d.addCallback(_stash_uri, "small")
10057-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"1"))
10058+        d.addCallback(lambda ign:
10059+            c0.create_mutable_file(MutableData(DATA+"1")))
10060         d.addCallback(lambda fn: self.rootnode.set_node(u"mutable", fn))
10061         d.addCallback(_stash_uri, "mutable")
10062 
10063hunk ./src/allmydata/test/test_cli.py 2257
10064         # root/small
10065         # root/mutable
10066 
10067+        # We haven't broken anything yet, so this should all be healthy.
10068         d.addCallback(lambda ign: self.do_cli("deep-check", "--verbose",
10069                                               self.rooturi))
10070         def _check2((rc, out, err)):
10071hunk ./src/allmydata/test/test_cli.py 2272
10072                             in lines, out)
10073         d.addCallback(_check2)
10074 
10075+        # Similarly, all of these results should be as we expect them to
10076+        # be for a healthy file layout.
10077         d.addCallback(lambda ign: self.do_cli("stats", self.rooturi))
10078         def _check_stats((rc, out, err)):
10079             self.failUnlessReallyEqual(err, "")
10080hunk ./src/allmydata/test/test_cli.py 2289
10081             self.failUnlessIn(" 317-1000 : 1    (1000 B, 1000 B)", lines)
10082         d.addCallback(_check_stats)
10083 
10084+        # Now we break things.
10085         def _clobber_shares(ignored):
10086             shares = self.find_uri_shares(self.uris[u"g\u00F6\u00F6d"])
10087             self.failUnlessReallyEqual(len(shares), 10)
10088hunk ./src/allmydata/test/test_cli.py 2314
10089 
10090         d.addCallback(lambda ign:
10091                       self.do_cli("deep-check", "--verbose", self.rooturi))
10092+        # This should reveal the missing share, but not the corrupt
10093+        # share, since we didn't tell the deep check operation to also
10094+        # verify.
10095         def _check3((rc, out, err)):
10096             self.failUnlessReallyEqual(err, "")
10097             self.failUnlessReallyEqual(rc, 0)
10098hunk ./src/allmydata/test/test_cli.py 2365
10099                                   "--verbose", "--verify", "--repair",
10100                                   self.rooturi))
10101         def _check6((rc, out, err)):
10102+            # We've just repaired the directory. There is no reason for
10103+            # that repair to be unsuccessful.
10104             self.failUnlessReallyEqual(err, "")
10105             self.failUnlessReallyEqual(rc, 0)
10106             lines = out.splitlines()
10107hunk ./src/allmydata/test/test_deepcheck.py 9
10108 from twisted.internet import threads # CLI tests use deferToThread
10109 from allmydata.immutable import upload
10110 from allmydata.mutable.common import UnrecoverableFileError
10111+from allmydata.mutable.publish import MutableData
10112 from allmydata.util import idlib
10113 from allmydata.util import base32
10114 from allmydata.scripts import runner
10115hunk ./src/allmydata/test/test_deepcheck.py 38
10116         self.basedir = "deepcheck/MutableChecker/good"
10117         self.set_up_grid()
10118         CONTENTS = "a little bit of data"
10119-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10120+        CONTENTS_uploadable = MutableData(CONTENTS)
10121+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10122         def _created(node):
10123             self.node = node
10124             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10125hunk ./src/allmydata/test/test_deepcheck.py 61
10126         self.basedir = "deepcheck/MutableChecker/corrupt"
10127         self.set_up_grid()
10128         CONTENTS = "a little bit of data"
10129-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10130+        CONTENTS_uploadable = MutableData(CONTENTS)
10131+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10132         def _stash_and_corrupt(node):
10133             self.node = node
10134             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10135hunk ./src/allmydata/test/test_deepcheck.py 99
10136         self.basedir = "deepcheck/MutableChecker/delete_share"
10137         self.set_up_grid()
10138         CONTENTS = "a little bit of data"
10139-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10140+        CONTENTS_uploadable = MutableData(CONTENTS)
10141+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10142         def _stash_and_delete(node):
10143             self.node = node
10144             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10145hunk ./src/allmydata/test/test_deepcheck.py 223
10146             self.root = n
10147             self.root_uri = n.get_uri()
10148         d.addCallback(_created_root)
10149-        d.addCallback(lambda ign: c0.create_mutable_file("mutable file contents"))
10150+        d.addCallback(lambda ign:
10151+            c0.create_mutable_file(MutableData("mutable file contents")))
10152         d.addCallback(lambda n: self.root.set_node(u"mutable", n))
10153         def _created_mutable(n):
10154             self.mutable = n
10155hunk ./src/allmydata/test/test_deepcheck.py 965
10156     def create_mangled(self, ignored, name):
10157         nodetype, mangletype = name.split("-", 1)
10158         if nodetype == "mutable":
10159-            d = self.g.clients[0].create_mutable_file("mutable file contents")
10160+            mutable_uploadable = MutableData("mutable file contents")
10161+            d = self.g.clients[0].create_mutable_file(mutable_uploadable)
10162             d.addCallback(lambda n: self.root.set_node(unicode(name), n))
10163         elif nodetype == "large":
10164             large = upload.Data("Lots of data\n" * 1000 + name + "\n", None)
10165hunk ./src/allmydata/test/test_dirnode.py 1304
10166     implements(IMutableFileNode)
10167     counter = 0
10168     def __init__(self, initial_contents=""):
10169-        self.data = self._get_initial_contents(initial_contents)
10170+        data = self._get_initial_contents(initial_contents)
10171+        self.data = data.read(data.get_size())
10172+        self.data = "".join(self.data)
10173+
10174         counter = FakeMutableFile.counter
10175         FakeMutableFile.counter += 1
10176         writekey = hashutil.ssk_writekey_hash(str(counter))
10177hunk ./src/allmydata/test/test_dirnode.py 1354
10178         pass
10179 
10180     def modify(self, modifier):
10181-        self.data = modifier(self.data, None, True)
10182+        data = modifier(self.data, None, True)
10183+        self.data = data
10184         return defer.succeed(None)
10185 
10186 class FakeNodeMaker(NodeMaker):
10187hunk ./src/allmydata/test/test_dirnode.py 1359
10188-    def create_mutable_file(self, contents="", keysize=None):
10189+    def create_mutable_file(self, contents="", keysize=None, version=None):
10190         return defer.succeed(FakeMutableFile(contents))
10191 
10192 class FakeClient2(Client):
10193hunk ./src/allmydata/test/test_filenode.py 98
10194         def _check_segment(res):
10195             self.failUnlessEqual(res, DATA[1:1+5])
10196         d.addCallback(_check_segment)
10197+        d.addCallback(lambda ignored: fn1.get_best_readable_version())
10198+        d.addCallback(lambda fn2: self.failUnlessEqual(fn1, fn2))
10199+        d.addCallback(lambda ignored:
10200+            fn1.get_size_of_best_version())
10201+        d.addCallback(lambda size:
10202+            self.failUnlessEqual(size, len(DATA)))
10203+        d.addCallback(lambda ignored:
10204+            fn1.download_to_data())
10205+        d.addCallback(lambda data:
10206+            self.failUnlessEqual(data, DATA))
10207+        d.addCallback(lambda ignored:
10208+            fn1.download_best_version())
10209+        d.addCallback(lambda data:
10210+            self.failUnlessEqual(data, DATA))
10211 
10212         return d
10213 
10214hunk ./src/allmydata/test/test_hung_server.py 10
10215 from allmydata.util.consumer import download_to_data
10216 from allmydata.immutable import upload
10217 from allmydata.mutable.common import UnrecoverableFileError
10218+from allmydata.mutable.publish import MutableData
10219 from allmydata.storage.common import storage_index_to_dir
10220 from allmydata.test.no_network import GridTestMixin
10221 from allmydata.test.common import ShouldFailMixin
10222hunk ./src/allmydata/test/test_hung_server.py 110
10223         self.servers = self.servers[5:] + self.servers[:5]
10224 
10225         if mutable:
10226-            d = nm.create_mutable_file(mutable_plaintext)
10227+            uploadable = MutableData(mutable_plaintext)
10228+            d = nm.create_mutable_file(uploadable)
10229             def _uploaded_mutable(node):
10230                 self.uri = node.get_uri()
10231                 self.shares = self.find_uri_shares(self.uri)
10232hunk ./src/allmydata/test/test_immutable.py 263
10233         d.addCallback(_after_attempt)
10234         return d
10235 
10236+    def test_download_to_data(self):
10237+        d = self.n.download_to_data()
10238+        d.addCallback(lambda data:
10239+            self.failUnlessEqual(data, common.TEST_DATA))
10240+        return d
10241 
10242hunk ./src/allmydata/test/test_immutable.py 269
10243+
10244+    def test_download_best_version(self):
10245+        d = self.n.download_best_version()
10246+        d.addCallback(lambda data:
10247+            self.failUnlessEqual(data, common.TEST_DATA))
10248+        return d
10249+
10250+
10251+    def test_get_best_readable_version(self):
10252+        d = self.n.get_best_readable_version()
10253+        d.addCallback(lambda n2:
10254+            self.failUnlessEqual(n2, self.n))
10255+        return d
10256+
10257+    def test_get_size_of_best_version(self):
10258+        d = self.n.get_size_of_best_version()
10259+        d.addCallback(lambda size:
10260+            self.failUnlessEqual(size, len(common.TEST_DATA)))
10261+        return d
10262+
10263+
10264 # XXX extend these tests to show bad behavior of various kinds from servers:
10265 # raising exception from each remove_foo() method, for example
10266 
10267hunk ./src/allmydata/test/test_mutable.py 2
10268 
10269-import struct
10270+import os
10271 from cStringIO import StringIO
10272 from twisted.trial import unittest
10273 from twisted.internet import defer, reactor
10274hunk ./src/allmydata/test/test_mutable.py 8
10275 from allmydata import uri, client
10276 from allmydata.nodemaker import NodeMaker
10277-from allmydata.util import base32
10278+from allmydata.util import base32, consumer
10279 from allmydata.util.hashutil import tagged_hash, ssk_writekey_hash, \
10280      ssk_pubkey_fingerprint_hash
10281hunk ./src/allmydata/test/test_mutable.py 11
10282+from allmydata.util.deferredutil import gatherResults
10283 from allmydata.interfaces import IRepairResults, ICheckAndRepairResults, \
10284hunk ./src/allmydata/test/test_mutable.py 13
10285-     NotEnoughSharesError
10286+     NotEnoughSharesError, SDMF_VERSION, MDMF_VERSION
10287 from allmydata.monitor import Monitor
10288 from allmydata.test.common import ShouldFailMixin
10289 from allmydata.test.no_network import GridTestMixin
10290hunk ./src/allmydata/test/test_mutable.py 27
10291      NeedMoreDataError, UnrecoverableFileError, UncoordinatedWriteError, \
10292      NotEnoughServersError, CorruptShareError
10293 from allmydata.mutable.retrieve import Retrieve
10294-from allmydata.mutable.publish import Publish
10295+from allmydata.mutable.publish import Publish, MutableFileHandle, \
10296+                                      MutableData, \
10297+                                      DEFAULT_MAX_SEGMENT_SIZE
10298 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
10299hunk ./src/allmydata/test/test_mutable.py 31
10300-from allmydata.mutable.layout import unpack_header, unpack_share
10301+from allmydata.mutable.layout import unpack_header, MDMFSlotReadProxy
10302 from allmydata.mutable.repairer import MustForceRepairError
10303 
10304 import allmydata.test.common_util as testutil
10305hunk ./src/allmydata/test/test_mutable.py 100
10306         self.storage = storage
10307         self.queries = 0
10308     def callRemote(self, methname, *args, **kwargs):
10309+        self.queries += 1
10310         def _call():
10311             meth = getattr(self, methname)
10312             return meth(*args, **kwargs)
10313hunk ./src/allmydata/test/test_mutable.py 107
10314         d = fireEventually()
10315         d.addCallback(lambda res: _call())
10316         return d
10317+
10318     def callRemoteOnly(self, methname, *args, **kwargs):
10319hunk ./src/allmydata/test/test_mutable.py 109
10320+        self.queries += 1
10321         d = self.callRemote(methname, *args, **kwargs)
10322         d.addBoth(lambda ignore: None)
10323         pass
10324hunk ./src/allmydata/test/test_mutable.py 157
10325             chr(ord(original[byte_offset]) ^ 0x01) +
10326             original[byte_offset+1:])
10327 
10328+def add_two(original, byte_offset):
10329+    # It isn't enough to simply flip the bit for the version number,
10330+    # because 1 is a valid version number. So we add two instead.
10331+    return (original[:byte_offset] +
10332+            chr(ord(original[byte_offset]) ^ 0x02) +
10333+            original[byte_offset+1:])
10334+
10335 def corrupt(res, s, offset, shnums_to_corrupt=None, offset_offset=0):
10336     # if shnums_to_corrupt is None, corrupt all shares. Otherwise it is a
10337     # list of shnums to corrupt.
10338hunk ./src/allmydata/test/test_mutable.py 167
10339+    ds = []
10340     for peerid in s._peers:
10341         shares = s._peers[peerid]
10342         for shnum in shares:
10343hunk ./src/allmydata/test/test_mutable.py 175
10344                 and shnum not in shnums_to_corrupt):
10345                 continue
10346             data = shares[shnum]
10347-            (version,
10348-             seqnum,
10349-             root_hash,
10350-             IV,
10351-             k, N, segsize, datalen,
10352-             o) = unpack_header(data)
10353-            if isinstance(offset, tuple):
10354-                offset1, offset2 = offset
10355-            else:
10356-                offset1 = offset
10357-                offset2 = 0
10358-            if offset1 == "pubkey":
10359-                real_offset = 107
10360-            elif offset1 in o:
10361-                real_offset = o[offset1]
10362-            else:
10363-                real_offset = offset1
10364-            real_offset = int(real_offset) + offset2 + offset_offset
10365-            assert isinstance(real_offset, int), offset
10366-            shares[shnum] = flip_bit(data, real_offset)
10367-    return res
10368+            # We're feeding the reader all of the share data, so it
10369+            # won't need to use the rref that we didn't provide, nor the
10370+            # storage index that we didn't provide. We do this because
10371+            # the reader will work for both MDMF and SDMF.
10372+            reader = MDMFSlotReadProxy(None, None, shnum, data)
10373+            # We need to get the offsets for the next part.
10374+            d = reader.get_verinfo()
10375+            def _do_corruption(verinfo, data, shnum):
10376+                (seqnum,
10377+                 root_hash,
10378+                 IV,
10379+                 segsize,
10380+                 datalen,
10381+                 k, n, prefix, o) = verinfo
10382+                if isinstance(offset, tuple):
10383+                    offset1, offset2 = offset
10384+                else:
10385+                    offset1 = offset
10386+                    offset2 = 0
10387+                if offset1 == "pubkey" and IV:
10388+                    real_offset = 107
10389+                elif offset1 == "share_data" and not IV:
10390+                    real_offset = 107
10391+                elif offset1 in o:
10392+                    real_offset = o[offset1]
10393+                else:
10394+                    real_offset = offset1
10395+                real_offset = int(real_offset) + offset2 + offset_offset
10396+                assert isinstance(real_offset, int), offset
10397+                if offset1 == 0: # verbyte
10398+                    f = add_two
10399+                else:
10400+                    f = flip_bit
10401+                shares[shnum] = f(data, real_offset)
10402+            d.addCallback(_do_corruption, data, shnum)
10403+            ds.append(d)
10404+    dl = defer.DeferredList(ds)
10405+    dl.addCallback(lambda ignored: res)
10406+    return dl
10407 
10408 def make_storagebroker(s=None, num_peers=10):
10409     if not s:
10410hunk ./src/allmydata/test/test_mutable.py 256
10411             self.failUnlessEqual(len(shnums), 1)
10412         d.addCallback(_created)
10413         return d
10414+    test_create.timeout = 15
10415+
10416+
10417+    def test_create_mdmf(self):
10418+        d = self.nodemaker.create_mutable_file(version=MDMF_VERSION)
10419+        def _created(n):
10420+            self.failUnless(isinstance(n, MutableFileNode))
10421+            self.failUnlessEqual(n.get_storage_index(), n._storage_index)
10422+            sb = self.nodemaker.storage_broker
10423+            peer0 = sorted(sb.get_all_serverids())[0]
10424+            shnums = self._storage._peers[peer0].keys()
10425+            self.failUnlessEqual(len(shnums), 1)
10426+        d.addCallback(_created)
10427+        return d
10428+
10429 
10430     def test_serialize(self):
10431         n = MutableFileNode(None, None, {"k": 3, "n": 10}, None)
10432hunk ./src/allmydata/test/test_mutable.py 301
10433             d.addCallback(lambda smap: smap.dump(StringIO()))
10434             d.addCallback(lambda sio:
10435                           self.failUnless("3-of-10" in sio.getvalue()))
10436-            d.addCallback(lambda res: n.overwrite("contents 1"))
10437+            d.addCallback(lambda res: n.overwrite(MutableData("contents 1")))
10438             d.addCallback(lambda res: self.failUnlessIdentical(res, None))
10439             d.addCallback(lambda res: n.download_best_version())
10440             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10441hunk ./src/allmydata/test/test_mutable.py 308
10442             d.addCallback(lambda res: n.get_size_of_best_version())
10443             d.addCallback(lambda size:
10444                           self.failUnlessEqual(size, len("contents 1")))
10445-            d.addCallback(lambda res: n.overwrite("contents 2"))
10446+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
10447             d.addCallback(lambda res: n.download_best_version())
10448             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10449             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
10450hunk ./src/allmydata/test/test_mutable.py 312
10451-            d.addCallback(lambda smap: n.upload("contents 3", smap))
10452+            d.addCallback(lambda smap: n.upload(MutableData("contents 3"), smap))
10453             d.addCallback(lambda res: n.download_best_version())
10454             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 3"))
10455             d.addCallback(lambda res: n.get_servermap(MODE_ANYTHING))
10456hunk ./src/allmydata/test/test_mutable.py 324
10457             # mapupdate-to-retrieve data caching (i.e. make the shares larger
10458             # than the default readsize, which is 2000 bytes). A 15kB file
10459             # will have 5kB shares.
10460-            d.addCallback(lambda res: n.overwrite("large size file" * 1000))
10461+            d.addCallback(lambda res: n.overwrite(MutableData("large size file" * 1000)))
10462             d.addCallback(lambda res: n.download_best_version())
10463             d.addCallback(lambda res:
10464                           self.failUnlessEqual(res, "large size file" * 1000))
10465hunk ./src/allmydata/test/test_mutable.py 332
10466         d.addCallback(_created)
10467         return d
10468 
10469+
10470+    def test_upload_and_download_mdmf(self):
10471+        d = self.nodemaker.create_mutable_file(version=MDMF_VERSION)
10472+        def _created(n):
10473+            d = defer.succeed(None)
10474+            d.addCallback(lambda ignored:
10475+                n.get_servermap(MODE_READ))
10476+            def _then(servermap):
10477+                dumped = servermap.dump(StringIO())
10478+                self.failUnlessIn("3-of-10", dumped.getvalue())
10479+            d.addCallback(_then)
10480+            # Now overwrite the contents with some new contents. We want
10481+            # to make them big enough to force the file to be uploaded
10482+            # in more than one segment.
10483+            big_contents = "contents1" * 100000 # about 900 KiB
10484+            big_contents_uploadable = MutableData(big_contents)
10485+            d.addCallback(lambda ignored:
10486+                n.overwrite(big_contents_uploadable))
10487+            d.addCallback(lambda ignored:
10488+                n.download_best_version())
10489+            d.addCallback(lambda data:
10490+                self.failUnlessEqual(data, big_contents))
10491+            # Overwrite the contents again with some new contents. As
10492+            # before, they need to be big enough to force multiple
10493+            # segments, so that we make the downloader deal with
10494+            # multiple segments.
10495+            bigger_contents = "contents2" * 1000000 # about 9MiB
10496+            bigger_contents_uploadable = MutableData(bigger_contents)
10497+            d.addCallback(lambda ignored:
10498+                n.overwrite(bigger_contents_uploadable))
10499+            d.addCallback(lambda ignored:
10500+                n.download_best_version())
10501+            d.addCallback(lambda data:
10502+                self.failUnlessEqual(data, bigger_contents))
10503+            return d
10504+        d.addCallback(_created)
10505+        return d
10506+
10507+
10508+    def test_mdmf_write_count(self):
10509+        # Publishing an MDMF file should only cause one write for each
10510+        # share that is to be published. Otherwise, we introduce
10511+        # undesirable semantics that are a regression from SDMF
10512+        upload = MutableData("MDMF" * 100000) # about 400 KiB
10513+        d = self.nodemaker.create_mutable_file(upload,
10514+                                               version=MDMF_VERSION)
10515+        def _check_server_write_counts(ignored):
10516+            sb = self.nodemaker.storage_broker
10517+            peers = sb.test_servers.values()
10518+            for peer in peers:
10519+                self.failUnlessEqual(peer.queries, 1)
10520+        d.addCallback(_check_server_write_counts)
10521+        return d
10522+
10523+
10524     def test_create_with_initial_contents(self):
10525hunk ./src/allmydata/test/test_mutable.py 388
10526-        d = self.nodemaker.create_mutable_file("contents 1")
10527+        upload1 = MutableData("contents 1")
10528+        d = self.nodemaker.create_mutable_file(upload1)
10529         def _created(n):
10530             d = n.download_best_version()
10531             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10532hunk ./src/allmydata/test/test_mutable.py 393
10533-            d.addCallback(lambda res: n.overwrite("contents 2"))
10534+            upload2 = MutableData("contents 2")
10535+            d.addCallback(lambda res: n.overwrite(upload2))
10536             d.addCallback(lambda res: n.download_best_version())
10537             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10538             return d
10539hunk ./src/allmydata/test/test_mutable.py 400
10540         d.addCallback(_created)
10541         return d
10542+    test_create_with_initial_contents.timeout = 15
10543+
10544+
10545+    def test_create_mdmf_with_initial_contents(self):
10546+        initial_contents = "foobarbaz" * 131072 # 900KiB
10547+        initial_contents_uploadable = MutableData(initial_contents)
10548+        d = self.nodemaker.create_mutable_file(initial_contents_uploadable,
10549+                                               version=MDMF_VERSION)
10550+        def _created(n):
10551+            d = n.download_best_version()
10552+            d.addCallback(lambda data:
10553+                self.failUnlessEqual(data, initial_contents))
10554+            uploadable2 = MutableData(initial_contents + "foobarbaz")
10555+            d.addCallback(lambda ignored:
10556+                n.overwrite(uploadable2))
10557+            d.addCallback(lambda ignored:
10558+                n.download_best_version())
10559+            d.addCallback(lambda data:
10560+                self.failUnlessEqual(data, initial_contents +
10561+                                           "foobarbaz"))
10562+            return d
10563+        d.addCallback(_created)
10564+        return d
10565+    test_create_mdmf_with_initial_contents.timeout = 20
10566+
10567 
10568     def test_response_cache_memory_leak(self):
10569         d = self.nodemaker.create_mutable_file("contents")
10570hunk ./src/allmydata/test/test_mutable.py 451
10571             key = n.get_writekey()
10572             self.failUnless(isinstance(key, str), key)
10573             self.failUnlessEqual(len(key), 16) # AES key size
10574-            return data
10575+            return MutableData(data)
10576         d = self.nodemaker.create_mutable_file(_make_contents)
10577         def _created(n):
10578             return n.download_best_version()
10579hunk ./src/allmydata/test/test_mutable.py 459
10580         d.addCallback(lambda data2: self.failUnlessEqual(data2, data))
10581         return d
10582 
10583+
10584+    def test_create_mdmf_with_initial_contents_function(self):
10585+        data = "initial contents" * 100000
10586+        def _make_contents(n):
10587+            self.failUnless(isinstance(n, MutableFileNode))
10588+            key = n.get_writekey()
10589+            self.failUnless(isinstance(key, str), key)
10590+            self.failUnlessEqual(len(key), 16)
10591+            return MutableData(data)
10592+        d = self.nodemaker.create_mutable_file(_make_contents,
10593+                                               version=MDMF_VERSION)
10594+        d.addCallback(lambda n:
10595+            n.download_best_version())
10596+        d.addCallback(lambda data2:
10597+            self.failUnlessEqual(data2, data))
10598+        return d
10599+
10600+
10601     def test_create_with_too_large_contents(self):
10602         BIG = "a" * (self.OLD_MAX_SEGMENT_SIZE + 1)
10603hunk ./src/allmydata/test/test_mutable.py 479
10604-        d = self.nodemaker.create_mutable_file(BIG)
10605+        BIG_uploadable = MutableData(BIG)
10606+        d = self.nodemaker.create_mutable_file(BIG_uploadable)
10607         def _created(n):
10608hunk ./src/allmydata/test/test_mutable.py 482
10609-            d = n.overwrite(BIG)
10610+            other_BIG_uploadable = MutableData(BIG)
10611+            d = n.overwrite(other_BIG_uploadable)
10612             return d
10613         d.addCallback(_created)
10614         return d
10615hunk ./src/allmydata/test/test_mutable.py 497
10616 
10617     def test_modify(self):
10618         def _modifier(old_contents, servermap, first_time):
10619-            return old_contents + "line2"
10620+            new_contents = old_contents + "line2"
10621+            return new_contents
10622         def _non_modifier(old_contents, servermap, first_time):
10623             return old_contents
10624         def _none_modifier(old_contents, servermap, first_time):
10625hunk ./src/allmydata/test/test_mutable.py 506
10626         def _error_modifier(old_contents, servermap, first_time):
10627             raise ValueError("oops")
10628         def _toobig_modifier(old_contents, servermap, first_time):
10629-            return "b" * (self.OLD_MAX_SEGMENT_SIZE+1)
10630+            new_content = "b" * (self.OLD_MAX_SEGMENT_SIZE + 1)
10631+            return new_content
10632         calls = []
10633         def _ucw_error_modifier(old_contents, servermap, first_time):
10634             # simulate an UncoordinatedWriteError once
10635hunk ./src/allmydata/test/test_mutable.py 514
10636             calls.append(1)
10637             if len(calls) <= 1:
10638                 raise UncoordinatedWriteError("simulated")
10639-            return old_contents + "line3"
10640+            new_contents = old_contents + "line3"
10641+            return new_contents
10642         def _ucw_error_non_modifier(old_contents, servermap, first_time):
10643             # simulate an UncoordinatedWriteError once, and don't actually
10644             # modify the contents on subsequent invocations
10645hunk ./src/allmydata/test/test_mutable.py 524
10646                 raise UncoordinatedWriteError("simulated")
10647             return old_contents
10648 
10649-        d = self.nodemaker.create_mutable_file("line1")
10650+        initial_contents = "line1"
10651+        d = self.nodemaker.create_mutable_file(MutableData(initial_contents))
10652         def _created(n):
10653             d = n.modify(_modifier)
10654             d.addCallback(lambda res: n.download_best_version())
10655hunk ./src/allmydata/test/test_mutable.py 582
10656             return d
10657         d.addCallback(_created)
10658         return d
10659+    test_modify.timeout = 15
10660+
10661 
10662     def test_modify_backoffer(self):
10663         def _modifier(old_contents, servermap, first_time):
10664hunk ./src/allmydata/test/test_mutable.py 609
10665         giveuper._delay = 0.1
10666         giveuper.factor = 1
10667 
10668-        d = self.nodemaker.create_mutable_file("line1")
10669+        d = self.nodemaker.create_mutable_file(MutableData("line1"))
10670         def _created(n):
10671             d = n.modify(_modifier)
10672             d.addCallback(lambda res: n.download_best_version())
10673hunk ./src/allmydata/test/test_mutable.py 659
10674             d.addCallback(lambda smap: smap.dump(StringIO()))
10675             d.addCallback(lambda sio:
10676                           self.failUnless("3-of-10" in sio.getvalue()))
10677-            d.addCallback(lambda res: n.overwrite("contents 1"))
10678+            d.addCallback(lambda res: n.overwrite(MutableData("contents 1")))
10679             d.addCallback(lambda res: self.failUnlessIdentical(res, None))
10680             d.addCallback(lambda res: n.download_best_version())
10681             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10682hunk ./src/allmydata/test/test_mutable.py 663
10683-            d.addCallback(lambda res: n.overwrite("contents 2"))
10684+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
10685             d.addCallback(lambda res: n.download_best_version())
10686             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10687             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
10688hunk ./src/allmydata/test/test_mutable.py 667
10689-            d.addCallback(lambda smap: n.upload("contents 3", smap))
10690+            d.addCallback(lambda smap: n.upload(MutableData("contents 3"), smap))
10691             d.addCallback(lambda res: n.download_best_version())
10692             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 3"))
10693             d.addCallback(lambda res: n.get_servermap(MODE_ANYTHING))
10694hunk ./src/allmydata/test/test_mutable.py 680
10695         return d
10696 
10697 
10698-class MakeShares(unittest.TestCase):
10699-    def test_encrypt(self):
10700-        nm = make_nodemaker()
10701-        CONTENTS = "some initial contents"
10702-        d = nm.create_mutable_file(CONTENTS)
10703-        def _created(fn):
10704-            p = Publish(fn, nm.storage_broker, None)
10705-            p.salt = "SALT" * 4
10706-            p.readkey = "\x00" * 16
10707-            p.newdata = CONTENTS
10708-            p.required_shares = 3
10709-            p.total_shares = 10
10710-            p.setup_encoding_parameters()
10711-            return p._encrypt_and_encode()
10712+    def test_size_after_servermap_update(self):
10713+        # a mutable file node should have something to say about how big
10714+        # it is after a servermap update is performed, since this tells
10715+        # us how large the best version of that mutable file is.
10716+        d = self.nodemaker.create_mutable_file()
10717+        def _created(n):
10718+            self.n = n
10719+            return n.get_servermap(MODE_READ)
10720+        d.addCallback(_created)
10721+        d.addCallback(lambda ignored:
10722+            self.failUnlessEqual(self.n.get_size(), 0))
10723+        d.addCallback(lambda ignored:
10724+            self.n.overwrite(MutableData("foobarbaz")))
10725+        d.addCallback(lambda ignored:
10726+            self.failUnlessEqual(self.n.get_size(), 9))
10727+        d.addCallback(lambda ignored:
10728+            self.nodemaker.create_mutable_file(MutableData("foobarbaz")))
10729+        d.addCallback(_created)
10730+        d.addCallback(lambda ignored:
10731+            self.failUnlessEqual(self.n.get_size(), 9))
10732+        return d
10733+
10734+
10735+class PublishMixin:
10736+    def publish_one(self):
10737+        # publish a file and create shares, which can then be manipulated
10738+        # later.
10739+        self.CONTENTS = "New contents go here" * 1000
10740+        self.uploadable = MutableData(self.CONTENTS)
10741+        self._storage = FakeStorage()
10742+        self._nodemaker = make_nodemaker(self._storage)
10743+        self._storage_broker = self._nodemaker.storage_broker
10744+        d = self._nodemaker.create_mutable_file(self.uploadable)
10745+        def _created(node):
10746+            self._fn = node
10747+            self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10748         d.addCallback(_created)
10749hunk ./src/allmydata/test/test_mutable.py 717
10750-        def _done(shares_and_shareids):
10751-            (shares, share_ids) = shares_and_shareids
10752-            self.failUnlessEqual(len(shares), 10)
10753-            for sh in shares:
10754-                self.failUnless(isinstance(sh, str))
10755-                self.failUnlessEqual(len(sh), 7)
10756-            self.failUnlessEqual(len(share_ids), 10)
10757-        d.addCallback(_done)
10758         return d
10759 
10760hunk ./src/allmydata/test/test_mutable.py 719
10761-    def test_generate(self):
10762-        nm = make_nodemaker()
10763-        CONTENTS = "some initial contents"
10764-        d = nm.create_mutable_file(CONTENTS)
10765-        def _created(fn):
10766-            self._fn = fn
10767-            p = Publish(fn, nm.storage_broker, None)
10768-            self._p = p
10769-            p.newdata = CONTENTS
10770-            p.required_shares = 3
10771-            p.total_shares = 10
10772-            p.setup_encoding_parameters()
10773-            p._new_seqnum = 3
10774-            p.salt = "SALT" * 4
10775-            # make some fake shares
10776-            shares_and_ids = ( ["%07d" % i for i in range(10)], range(10) )
10777-            p._privkey = fn.get_privkey()
10778-            p._encprivkey = fn.get_encprivkey()
10779-            p._pubkey = fn.get_pubkey()
10780-            return p._generate_shares(shares_and_ids)
10781+    def publish_mdmf(self):
10782+        # like publish_one, except that the result is guaranteed to be
10783+        # an MDMF file.
10784+        # self.CONTENTS should have more than one segment.
10785+        self.CONTENTS = "This is an MDMF file" * 100000
10786+        self.uploadable = MutableData(self.CONTENTS)
10787+        self._storage = FakeStorage()
10788+        self._nodemaker = make_nodemaker(self._storage)
10789+        self._storage_broker = self._nodemaker.storage_broker
10790+        d = self._nodemaker.create_mutable_file(self.uploadable, version=MDMF_VERSION)
10791+        def _created(node):
10792+            self._fn = node
10793+            self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10794         d.addCallback(_created)
10795hunk ./src/allmydata/test/test_mutable.py 733
10796-        def _generated(res):
10797-            p = self._p
10798-            final_shares = p.shares
10799-            root_hash = p.root_hash
10800-            self.failUnlessEqual(len(root_hash), 32)
10801-            self.failUnless(isinstance(final_shares, dict))
10802-            self.failUnlessEqual(len(final_shares), 10)
10803-            self.failUnlessEqual(sorted(final_shares.keys()), range(10))
10804-            for i,sh in final_shares.items():
10805-                self.failUnless(isinstance(sh, str))
10806-                # feed the share through the unpacker as a sanity-check
10807-                pieces = unpack_share(sh)
10808-                (u_seqnum, u_root_hash, IV, k, N, segsize, datalen,
10809-                 pubkey, signature, share_hash_chain, block_hash_tree,
10810-                 share_data, enc_privkey) = pieces
10811-                self.failUnlessEqual(u_seqnum, 3)
10812-                self.failUnlessEqual(u_root_hash, root_hash)
10813-                self.failUnlessEqual(k, 3)
10814-                self.failUnlessEqual(N, 10)
10815-                self.failUnlessEqual(segsize, 21)
10816-                self.failUnlessEqual(datalen, len(CONTENTS))
10817-                self.failUnlessEqual(pubkey, p._pubkey.serialize())
10818-                sig_material = struct.pack(">BQ32s16s BBQQ",
10819-                                           0, p._new_seqnum, root_hash, IV,
10820-                                           k, N, segsize, datalen)
10821-                self.failUnless(p._pubkey.verify(sig_material, signature))
10822-                #self.failUnlessEqual(signature, p._privkey.sign(sig_material))
10823-                self.failUnless(isinstance(share_hash_chain, dict))
10824-                self.failUnlessEqual(len(share_hash_chain), 4) # ln2(10)++
10825-                for shnum,share_hash in share_hash_chain.items():
10826-                    self.failUnless(isinstance(shnum, int))
10827-                    self.failUnless(isinstance(share_hash, str))
10828-                    self.failUnlessEqual(len(share_hash), 32)
10829-                self.failUnless(isinstance(block_hash_tree, list))
10830-                self.failUnlessEqual(len(block_hash_tree), 1) # very small tree
10831-                self.failUnlessEqual(IV, "SALT"*4)
10832-                self.failUnlessEqual(len(share_data), len("%07d" % 1))
10833-                self.failUnlessEqual(enc_privkey, self._fn.get_encprivkey())
10834-        d.addCallback(_generated)
10835         return d
10836 
10837hunk ./src/allmydata/test/test_mutable.py 735
10838-    # TODO: when we publish to 20 peers, we should get one share per peer on 10
10839-    # when we publish to 3 peers, we should get either 3 or 4 shares per peer
10840-    # when we publish to zero peers, we should get a NotEnoughSharesError
10841 
10842hunk ./src/allmydata/test/test_mutable.py 736
10843-class PublishMixin:
10844-    def publish_one(self):
10845-        # publish a file and create shares, which can then be manipulated
10846-        # later.
10847-        self.CONTENTS = "New contents go here" * 1000
10848+    def publish_sdmf(self):
10849+        # like publish_one, except that the result is guaranteed to be
10850+        # an SDMF file
10851+        self.CONTENTS = "This is an SDMF file" * 1000
10852+        self.uploadable = MutableData(self.CONTENTS)
10853         self._storage = FakeStorage()
10854         self._nodemaker = make_nodemaker(self._storage)
10855         self._storage_broker = self._nodemaker.storage_broker
10856hunk ./src/allmydata/test/test_mutable.py 744
10857-        d = self._nodemaker.create_mutable_file(self.CONTENTS)
10858+        d = self._nodemaker.create_mutable_file(self.uploadable, version=SDMF_VERSION)
10859         def _created(node):
10860             self._fn = node
10861             self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10862hunk ./src/allmydata/test/test_mutable.py 751
10863         d.addCallback(_created)
10864         return d
10865 
10866-    def publish_multiple(self):
10867+
10868+    def publish_multiple(self, version=0):
10869         self.CONTENTS = ["Contents 0",
10870                          "Contents 1",
10871                          "Contents 2",
10872hunk ./src/allmydata/test/test_mutable.py 758
10873                          "Contents 3a",
10874                          "Contents 3b"]
10875+        self.uploadables = [MutableData(d) for d in self.CONTENTS]
10876         self._copied_shares = {}
10877         self._storage = FakeStorage()
10878         self._nodemaker = make_nodemaker(self._storage)
10879hunk ./src/allmydata/test/test_mutable.py 762
10880-        d = self._nodemaker.create_mutable_file(self.CONTENTS[0]) # seqnum=1
10881+        d = self._nodemaker.create_mutable_file(self.uploadables[0], version=version) # seqnum=1
10882         def _created(node):
10883             self._fn = node
10884             # now create multiple versions of the same file, and accumulate
10885hunk ./src/allmydata/test/test_mutable.py 769
10886             # their shares, so we can mix and match them later.
10887             d = defer.succeed(None)
10888             d.addCallback(self._copy_shares, 0)
10889-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[1])) #s2
10890+            d.addCallback(lambda res: node.overwrite(self.uploadables[1])) #s2
10891             d.addCallback(self._copy_shares, 1)
10892hunk ./src/allmydata/test/test_mutable.py 771
10893-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[2])) #s3
10894+            d.addCallback(lambda res: node.overwrite(self.uploadables[2])) #s3
10895             d.addCallback(self._copy_shares, 2)
10896hunk ./src/allmydata/test/test_mutable.py 773
10897-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[3])) #s4a
10898+            d.addCallback(lambda res: node.overwrite(self.uploadables[3])) #s4a
10899             d.addCallback(self._copy_shares, 3)
10900             # now we replace all the shares with version s3, and upload a new
10901             # version to get s4b.
10902hunk ./src/allmydata/test/test_mutable.py 779
10903             rollback = dict([(i,2) for i in range(10)])
10904             d.addCallback(lambda res: self._set_versions(rollback))
10905-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[4])) #s4b
10906+            d.addCallback(lambda res: node.overwrite(self.uploadables[4])) #s4b
10907             d.addCallback(self._copy_shares, 4)
10908             # we leave the storage in state 4
10909             return d
10910hunk ./src/allmydata/test/test_mutable.py 786
10911         d.addCallback(_created)
10912         return d
10913 
10914+
10915     def _copy_shares(self, ignored, index):
10916         shares = self._storage._peers
10917         # we need a deep copy
10918hunk ./src/allmydata/test/test_mutable.py 810
10919                     shares[peerid][shnum] = oldshares[index][peerid][shnum]
10920 
10921 
10922+
10923+
10924 class Servermap(unittest.TestCase, PublishMixin):
10925     def setUp(self):
10926         return self.publish_one()
10927hunk ./src/allmydata/test/test_mutable.py 816
10928 
10929-    def make_servermap(self, mode=MODE_CHECK, fn=None, sb=None):
10930+    def make_servermap(self, mode=MODE_CHECK, fn=None, sb=None,
10931+                       update_range=None):
10932         if fn is None:
10933             fn = self._fn
10934         if sb is None:
10935hunk ./src/allmydata/test/test_mutable.py 823
10936             sb = self._storage_broker
10937         smu = ServermapUpdater(fn, sb, Monitor(),
10938-                               ServerMap(), mode)
10939+                               ServerMap(), mode, update_range=update_range)
10940         d = smu.update()
10941         return d
10942 
10943hunk ./src/allmydata/test/test_mutable.py 889
10944         # create a new file, which is large enough to knock the privkey out
10945         # of the early part of the file
10946         LARGE = "These are Larger contents" * 200 # about 5KB
10947-        d.addCallback(lambda res: self._nodemaker.create_mutable_file(LARGE))
10948+        LARGE_uploadable = MutableData(LARGE)
10949+        d.addCallback(lambda res: self._nodemaker.create_mutable_file(LARGE_uploadable))
10950         def _created(large_fn):
10951             large_fn2 = self._nodemaker.create_from_cap(large_fn.get_uri())
10952             return self.make_servermap(MODE_WRITE, large_fn2)
10953hunk ./src/allmydata/test/test_mutable.py 898
10954         d.addCallback(lambda sm: self.failUnlessOneRecoverable(sm, 10))
10955         return d
10956 
10957+
10958     def test_mark_bad(self):
10959         d = defer.succeed(None)
10960         ms = self.make_servermap
10961hunk ./src/allmydata/test/test_mutable.py 944
10962         self._storage._peers = {} # delete all shares
10963         ms = self.make_servermap
10964         d = defer.succeed(None)
10965-
10966+#
10967         d.addCallback(lambda res: ms(mode=MODE_CHECK))
10968         d.addCallback(lambda sm: self.failUnlessNoneRecoverable(sm))
10969 
10970hunk ./src/allmydata/test/test_mutable.py 996
10971         return d
10972 
10973 
10974+    def test_servermapupdater_finds_mdmf_files(self):
10975+        # setUp already published an MDMF file for us. We just need to
10976+        # make sure that when we run the ServermapUpdater, the file is
10977+        # reported to have one recoverable version.
10978+        d = defer.succeed(None)
10979+        d.addCallback(lambda ignored:
10980+            self.publish_mdmf())
10981+        d.addCallback(lambda ignored:
10982+            self.make_servermap(mode=MODE_CHECK))
10983+        # Calling make_servermap also updates the servermap in the mode
10984+        # that we specify, so we just need to see what it says.
10985+        def _check_servermap(sm):
10986+            self.failUnlessEqual(len(sm.recoverable_versions()), 1)
10987+        d.addCallback(_check_servermap)
10988+        return d
10989+
10990+
10991+    def test_fetch_update(self):
10992+        d = defer.succeed(None)
10993+        d.addCallback(lambda ignored:
10994+            self.publish_mdmf())
10995+        d.addCallback(lambda ignored:
10996+            self.make_servermap(mode=MODE_WRITE, update_range=(1, 2)))
10997+        def _check_servermap(sm):
10998+            # 10 shares
10999+            self.failUnlessEqual(len(sm.update_data), 10)
11000+            # one version
11001+            for data in sm.update_data.itervalues():
11002+                self.failUnlessEqual(len(data), 1)
11003+        d.addCallback(_check_servermap)
11004+        return d
11005+
11006+
11007+    def test_servermapupdater_finds_sdmf_files(self):
11008+        d = defer.succeed(None)
11009+        d.addCallback(lambda ignored:
11010+            self.publish_sdmf())
11011+        d.addCallback(lambda ignored:
11012+            self.make_servermap(mode=MODE_CHECK))
11013+        d.addCallback(lambda servermap:
11014+            self.failUnlessEqual(len(servermap.recoverable_versions()), 1))
11015+        return d
11016+
11017 
11018 class Roundtrip(unittest.TestCase, testutil.ShouldFailMixin, PublishMixin):
11019     def setUp(self):
11020hunk ./src/allmydata/test/test_mutable.py 1079
11021         if version is None:
11022             version = servermap.best_recoverable_version()
11023         r = Retrieve(self._fn, servermap, version)
11024-        return r.download()
11025+        c = consumer.MemoryConsumer()
11026+        d = r.download(consumer=c)
11027+        d.addCallback(lambda mc: "".join(mc.chunks))
11028+        return d
11029+
11030 
11031     def test_basic(self):
11032         d = self.make_servermap()
11033hunk ./src/allmydata/test/test_mutable.py 1160
11034         return d
11035     test_no_servers_download.timeout = 15
11036 
11037+
11038     def _test_corrupt_all(self, offset, substring,
11039hunk ./src/allmydata/test/test_mutable.py 1162
11040-                          should_succeed=False, corrupt_early=True,
11041-                          failure_checker=None):
11042+                          should_succeed=False,
11043+                          corrupt_early=True,
11044+                          failure_checker=None,
11045+                          fetch_privkey=False):
11046         d = defer.succeed(None)
11047         if corrupt_early:
11048             d.addCallback(corrupt, self._storage, offset)
11049hunk ./src/allmydata/test/test_mutable.py 1182
11050                     self.failUnlessIn(substring, "".join(allproblems))
11051                 return servermap
11052             if should_succeed:
11053-                d1 = self._fn.download_version(servermap, ver)
11054+                d1 = self._fn.download_version(servermap, ver,
11055+                                               fetch_privkey)
11056                 d1.addCallback(lambda new_contents:
11057                                self.failUnlessEqual(new_contents, self.CONTENTS))
11058             else:
11059hunk ./src/allmydata/test/test_mutable.py 1190
11060                 d1 = self.shouldFail(NotEnoughSharesError,
11061                                      "_corrupt_all(offset=%s)" % (offset,),
11062                                      substring,
11063-                                     self._fn.download_version, servermap, ver)
11064+                                     self._fn.download_version, servermap,
11065+                                                                ver,
11066+                                                                fetch_privkey)
11067             if failure_checker:
11068                 d1.addCallback(failure_checker)
11069             d1.addCallback(lambda res: servermap)
11070hunk ./src/allmydata/test/test_mutable.py 1201
11071         return d
11072 
11073     def test_corrupt_all_verbyte(self):
11074-        # when the version byte is not 0, we hit an UnknownVersionError error
11075-        # in unpack_share().
11076+        # when the version byte is not 0 or 1, we hit an UnknownVersionError
11077+        # error in unpack_share().
11078         d = self._test_corrupt_all(0, "UnknownVersionError")
11079         def _check_servermap(servermap):
11080             # and the dump should mention the problems
11081hunk ./src/allmydata/test/test_mutable.py 1208
11082             s = StringIO()
11083             dump = servermap.dump(s).getvalue()
11084-            self.failUnless("10 PROBLEMS" in dump, dump)
11085+            self.failUnless("30 PROBLEMS" in dump, dump)
11086         d.addCallback(_check_servermap)
11087         return d
11088 
11089hunk ./src/allmydata/test/test_mutable.py 1278
11090         return self._test_corrupt_all("enc_privkey", None, should_succeed=True)
11091 
11092 
11093+    def test_corrupt_all_encprivkey_late(self):
11094+        # this should work for the same reason as above, but we corrupt
11095+        # after the servermap update to exercise the error handling
11096+        # code.
11097+        # We need to remove the privkey from the node, or the retrieve
11098+        # process won't know to update it.
11099+        self._fn._privkey = None
11100+        return self._test_corrupt_all("enc_privkey",
11101+                                      None, # this shouldn't fail
11102+                                      should_succeed=True,
11103+                                      corrupt_early=False,
11104+                                      fetch_privkey=True)
11105+
11106+
11107     def test_corrupt_all_seqnum_late(self):
11108         # corrupting the seqnum between mapupdate and retrieve should result
11109         # in NotEnoughSharesError, since each share will look invalid
11110hunk ./src/allmydata/test/test_mutable.py 1298
11111         def _check(res):
11112             f = res[0]
11113             self.failUnless(f.check(NotEnoughSharesError))
11114-            self.failUnless("someone wrote to the data since we read the servermap" in str(f))
11115+            self.failUnless("uncoordinated write" in str(f))
11116         return self._test_corrupt_all(1, "ran out of peers",
11117                                       corrupt_early=False,
11118                                       failure_checker=_check)
11119hunk ./src/allmydata/test/test_mutable.py 1342
11120                             in str(servermap.problems[0]))
11121             ver = servermap.best_recoverable_version()
11122             r = Retrieve(self._fn, servermap, ver)
11123-            return r.download()
11124+            c = consumer.MemoryConsumer()
11125+            return r.download(c)
11126         d.addCallback(_do_retrieve)
11127hunk ./src/allmydata/test/test_mutable.py 1345
11128+        d.addCallback(lambda mc: "".join(mc.chunks))
11129         d.addCallback(lambda new_contents:
11130                       self.failUnlessEqual(new_contents, self.CONTENTS))
11131         return d
11132hunk ./src/allmydata/test/test_mutable.py 1350
11133 
11134-    def test_corrupt_some(self):
11135-        # corrupt the data of first five shares (so the servermap thinks
11136-        # they're good but retrieve marks them as bad), so that the
11137-        # MODE_READ set of 6 will be insufficient, forcing node.download to
11138-        # retry with more servers.
11139-        corrupt(None, self._storage, "share_data", range(5))
11140-        d = self.make_servermap()
11141+
11142+    def _test_corrupt_some(self, offset, mdmf=False):
11143+        if mdmf:
11144+            d = self.publish_mdmf()
11145+        else:
11146+            d = defer.succeed(None)
11147+        d.addCallback(lambda ignored:
11148+            corrupt(None, self._storage, offset, range(5)))
11149+        d.addCallback(lambda ignored:
11150+            self.make_servermap())
11151         def _do_retrieve(servermap):
11152             ver = servermap.best_recoverable_version()
11153             self.failUnless(ver)
11154hunk ./src/allmydata/test/test_mutable.py 1366
11155             return self._fn.download_best_version()
11156         d.addCallback(_do_retrieve)
11157         d.addCallback(lambda new_contents:
11158-                      self.failUnlessEqual(new_contents, self.CONTENTS))
11159+            self.failUnlessEqual(new_contents, self.CONTENTS))
11160         return d
11161 
11162hunk ./src/allmydata/test/test_mutable.py 1369
11163+
11164+    def test_corrupt_some(self):
11165+        # corrupt the data of first five shares (so the servermap thinks
11166+        # they're good but retrieve marks them as bad), so that the
11167+        # MODE_READ set of 6 will be insufficient, forcing node.download to
11168+        # retry with more servers.
11169+        return self._test_corrupt_some("share_data")
11170+
11171+
11172     def test_download_fails(self):
11173hunk ./src/allmydata/test/test_mutable.py 1379
11174-        corrupt(None, self._storage, "signature")
11175-        d = self.shouldFail(UnrecoverableFileError, "test_download_anyway",
11176+        d = corrupt(None, self._storage, "signature")
11177+        d.addCallback(lambda ignored:
11178+            self.shouldFail(UnrecoverableFileError, "test_download_anyway",
11179                             "no recoverable versions",
11180hunk ./src/allmydata/test/test_mutable.py 1383
11181-                            self._fn.download_best_version)
11182+                            self._fn.download_best_version))
11183         return d
11184 
11185 
11186hunk ./src/allmydata/test/test_mutable.py 1387
11187+
11188+    def test_corrupt_mdmf_block_hash_tree(self):
11189+        d = self.publish_mdmf()
11190+        d.addCallback(lambda ignored:
11191+            self._test_corrupt_all(("block_hash_tree", 12 * 32),
11192+                                   "block hash tree failure",
11193+                                   corrupt_early=False,
11194+                                   should_succeed=False))
11195+        return d
11196+
11197+
11198+    def test_corrupt_mdmf_block_hash_tree_late(self):
11199+        d = self.publish_mdmf()
11200+        d.addCallback(lambda ignored:
11201+            self._test_corrupt_all(("block_hash_tree", 12 * 32),
11202+                                   "block hash tree failure",
11203+                                   corrupt_early=True,
11204+                                   should_succeed=False))
11205+        return d
11206+
11207+
11208+    def test_corrupt_mdmf_share_data(self):
11209+        d = self.publish_mdmf()
11210+        d.addCallback(lambda ignored:
11211+            # TODO: Find out what the block size is and corrupt a
11212+            # specific block, rather than just guessing.
11213+            self._test_corrupt_all(("share_data", 12 * 40),
11214+                                    "block hash tree failure",
11215+                                    corrupt_early=True,
11216+                                    should_succeed=False))
11217+        return d
11218+
11219+
11220+    def test_corrupt_some_mdmf(self):
11221+        return self._test_corrupt_some(("share_data", 12 * 40),
11222+                                       mdmf=True)
11223+
11224+
11225 class CheckerMixin:
11226     def check_good(self, r, where):
11227         self.failUnless(r.is_healthy(), where)
11228hunk ./src/allmydata/test/test_mutable.py 1455
11229         d.addCallback(self.check_good, "test_check_good")
11230         return d
11231 
11232+    def test_check_mdmf_good(self):
11233+        d = self.publish_mdmf()
11234+        d.addCallback(lambda ignored:
11235+            self._fn.check(Monitor()))
11236+        d.addCallback(self.check_good, "test_check_mdmf_good")
11237+        return d
11238+
11239     def test_check_no_shares(self):
11240         for shares in self._storage._peers.values():
11241             shares.clear()
11242hunk ./src/allmydata/test/test_mutable.py 1469
11243         d.addCallback(self.check_bad, "test_check_no_shares")
11244         return d
11245 
11246+    def test_check_mdmf_no_shares(self):
11247+        d = self.publish_mdmf()
11248+        def _then(ignored):
11249+            for share in self._storage._peers.values():
11250+                share.clear()
11251+        d.addCallback(_then)
11252+        d.addCallback(lambda ignored:
11253+            self._fn.check(Monitor()))
11254+        d.addCallback(self.check_bad, "test_check_mdmf_no_shares")
11255+        return d
11256+
11257     def test_check_not_enough_shares(self):
11258         for shares in self._storage._peers.values():
11259             for shnum in shares.keys():
11260hunk ./src/allmydata/test/test_mutable.py 1489
11261         d.addCallback(self.check_bad, "test_check_not_enough_shares")
11262         return d
11263 
11264+    def test_check_mdmf_not_enough_shares(self):
11265+        d = self.publish_mdmf()
11266+        def _then(ignored):
11267+            for shares in self._storage._peers.values():
11268+                for shnum in shares.keys():
11269+                    if shnum > 0:
11270+                        del shares[shnum]
11271+        d.addCallback(_then)
11272+        d.addCallback(lambda ignored:
11273+            self._fn.check(Monitor()))
11274+        d.addCallback(self.check_bad, "test_check_mdmf_not_enougH_shares")
11275+        return d
11276+
11277+
11278     def test_check_all_bad_sig(self):
11279hunk ./src/allmydata/test/test_mutable.py 1504
11280-        corrupt(None, self._storage, 1) # bad sig
11281-        d = self._fn.check(Monitor())
11282+        d = corrupt(None, self._storage, 1) # bad sig
11283+        d.addCallback(lambda ignored:
11284+            self._fn.check(Monitor()))
11285         d.addCallback(self.check_bad, "test_check_all_bad_sig")
11286         return d
11287 
11288hunk ./src/allmydata/test/test_mutable.py 1510
11289+    def test_check_mdmf_all_bad_sig(self):
11290+        d = self.publish_mdmf()
11291+        d.addCallback(lambda ignored:
11292+            corrupt(None, self._storage, 1))
11293+        d.addCallback(lambda ignored:
11294+            self._fn.check(Monitor()))
11295+        d.addCallback(self.check_bad, "test_check_mdmf_all_bad_sig")
11296+        return d
11297+
11298     def test_check_all_bad_blocks(self):
11299hunk ./src/allmydata/test/test_mutable.py 1520
11300-        corrupt(None, self._storage, "share_data", [9]) # bad blocks
11301+        d = corrupt(None, self._storage, "share_data", [9]) # bad blocks
11302         # the Checker won't notice this.. it doesn't look at actual data
11303hunk ./src/allmydata/test/test_mutable.py 1522
11304-        d = self._fn.check(Monitor())
11305+        d.addCallback(lambda ignored:
11306+            self._fn.check(Monitor()))
11307         d.addCallback(self.check_good, "test_check_all_bad_blocks")
11308         return d
11309 
11310hunk ./src/allmydata/test/test_mutable.py 1527
11311+
11312+    def test_check_mdmf_all_bad_blocks(self):
11313+        d = self.publish_mdmf()
11314+        d.addCallback(lambda ignored:
11315+            corrupt(None, self._storage, "share_data"))
11316+        d.addCallback(lambda ignored:
11317+            self._fn.check(Monitor()))
11318+        d.addCallback(self.check_good, "test_check_mdmf_all_bad_blocks")
11319+        return d
11320+
11321     def test_verify_good(self):
11322         d = self._fn.check(Monitor(), verify=True)
11323         d.addCallback(self.check_good, "test_verify_good")
11324hunk ./src/allmydata/test/test_mutable.py 1541
11325         return d
11326+    test_verify_good.timeout = 15
11327 
11328     def test_verify_all_bad_sig(self):
11329hunk ./src/allmydata/test/test_mutable.py 1544
11330-        corrupt(None, self._storage, 1) # bad sig
11331-        d = self._fn.check(Monitor(), verify=True)
11332+        d = corrupt(None, self._storage, 1) # bad sig
11333+        d.addCallback(lambda ignored:
11334+            self._fn.check(Monitor(), verify=True))
11335         d.addCallback(self.check_bad, "test_verify_all_bad_sig")
11336         return d
11337 
11338hunk ./src/allmydata/test/test_mutable.py 1551
11339     def test_verify_one_bad_sig(self):
11340-        corrupt(None, self._storage, 1, [9]) # bad sig
11341-        d = self._fn.check(Monitor(), verify=True)
11342+        d = corrupt(None, self._storage, 1, [9]) # bad sig
11343+        d.addCallback(lambda ignored:
11344+            self._fn.check(Monitor(), verify=True))
11345         d.addCallback(self.check_bad, "test_verify_one_bad_sig")
11346         return d
11347 
11348hunk ./src/allmydata/test/test_mutable.py 1558
11349     def test_verify_one_bad_block(self):
11350-        corrupt(None, self._storage, "share_data", [9]) # bad blocks
11351+        d = corrupt(None, self._storage, "share_data", [9]) # bad blocks
11352         # the Verifier *will* notice this, since it examines every byte
11353hunk ./src/allmydata/test/test_mutable.py 1560
11354-        d = self._fn.check(Monitor(), verify=True)
11355+        d.addCallback(lambda ignored:
11356+            self._fn.check(Monitor(), verify=True))
11357         d.addCallback(self.check_bad, "test_verify_one_bad_block")
11358         d.addCallback(self.check_expected_failure,
11359                       CorruptShareError, "block hash tree failure",
11360hunk ./src/allmydata/test/test_mutable.py 1569
11361         return d
11362 
11363     def test_verify_one_bad_sharehash(self):
11364-        corrupt(None, self._storage, "share_hash_chain", [9], 5)
11365-        d = self._fn.check(Monitor(), verify=True)
11366+        d = corrupt(None, self._storage, "share_hash_chain", [9], 5)
11367+        d.addCallback(lambda ignored:
11368+            self._fn.check(Monitor(), verify=True))
11369         d.addCallback(self.check_bad, "test_verify_one_bad_sharehash")
11370         d.addCallback(self.check_expected_failure,
11371                       CorruptShareError, "corrupt hashes",
11372hunk ./src/allmydata/test/test_mutable.py 1579
11373         return d
11374 
11375     def test_verify_one_bad_encprivkey(self):
11376-        corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11377-        d = self._fn.check(Monitor(), verify=True)
11378+        d = corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11379+        d.addCallback(lambda ignored:
11380+            self._fn.check(Monitor(), verify=True))
11381         d.addCallback(self.check_bad, "test_verify_one_bad_encprivkey")
11382         d.addCallback(self.check_expected_failure,
11383                       CorruptShareError, "invalid privkey",
11384hunk ./src/allmydata/test/test_mutable.py 1589
11385         return d
11386 
11387     def test_verify_one_bad_encprivkey_uncheckable(self):
11388-        corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11389+        d = corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11390         readonly_fn = self._fn.get_readonly()
11391         # a read-only node has no way to validate the privkey
11392hunk ./src/allmydata/test/test_mutable.py 1592
11393-        d = readonly_fn.check(Monitor(), verify=True)
11394+        d.addCallback(lambda ignored:
11395+            readonly_fn.check(Monitor(), verify=True))
11396         d.addCallback(self.check_good,
11397                       "test_verify_one_bad_encprivkey_uncheckable")
11398         return d
11399hunk ./src/allmydata/test/test_mutable.py 1598
11400 
11401+
11402+    def test_verify_mdmf_good(self):
11403+        d = self.publish_mdmf()
11404+        d.addCallback(lambda ignored:
11405+            self._fn.check(Monitor(), verify=True))
11406+        d.addCallback(self.check_good, "test_verify_mdmf_good")
11407+        return d
11408+
11409+
11410+    def test_verify_mdmf_one_bad_block(self):
11411+        d = self.publish_mdmf()
11412+        d.addCallback(lambda ignored:
11413+            corrupt(None, self._storage, "share_data", [1]))
11414+        d.addCallback(lambda ignored:
11415+            self._fn.check(Monitor(), verify=True))
11416+        # We should find one bad block here
11417+        d.addCallback(self.check_bad, "test_verify_mdmf_one_bad_block")
11418+        d.addCallback(self.check_expected_failure,
11419+                      CorruptShareError, "block hash tree failure",
11420+                      "test_verify_mdmf_one_bad_block")
11421+        return d
11422+
11423+
11424+    def test_verify_mdmf_bad_encprivkey(self):
11425+        d = self.publish_mdmf()
11426+        d.addCallback(lambda ignored:
11427+            corrupt(None, self._storage, "enc_privkey", [1]))
11428+        d.addCallback(lambda ignored:
11429+            self._fn.check(Monitor(), verify=True))
11430+        d.addCallback(self.check_bad, "test_verify_mdmf_bad_encprivkey")
11431+        d.addCallback(self.check_expected_failure,
11432+                      CorruptShareError, "privkey",
11433+                      "test_verify_mdmf_bad_encprivkey")
11434+        return d
11435+
11436+
11437+    def test_verify_mdmf_bad_sig(self):
11438+        d = self.publish_mdmf()
11439+        d.addCallback(lambda ignored:
11440+            corrupt(None, self._storage, 1, [1]))
11441+        d.addCallback(lambda ignored:
11442+            self._fn.check(Monitor(), verify=True))
11443+        d.addCallback(self.check_bad, "test_verify_mdmf_bad_sig")
11444+        return d
11445+
11446+
11447+    def test_verify_mdmf_bad_encprivkey_uncheckable(self):
11448+        d = self.publish_mdmf()
11449+        d.addCallback(lambda ignored:
11450+            corrupt(None, self._storage, "enc_privkey", [1]))
11451+        d.addCallback(lambda ignored:
11452+            self._fn.get_readonly())
11453+        d.addCallback(lambda fn:
11454+            fn.check(Monitor(), verify=True))
11455+        d.addCallback(self.check_good,
11456+                      "test_verify_mdmf_bad_encprivkey_uncheckable")
11457+        return d
11458+
11459+
11460 class Repair(unittest.TestCase, PublishMixin, ShouldFailMixin):
11461 
11462     def get_shares(self, s):
11463hunk ./src/allmydata/test/test_mutable.py 1722
11464         current_shares = self.old_shares[-1]
11465         self.failUnlessEqual(old_shares, current_shares)
11466 
11467+
11468     def test_unrepairable_0shares(self):
11469         d = self.publish_one()
11470         def _delete_all_shares(ign):
11471hunk ./src/allmydata/test/test_mutable.py 1737
11472         d.addCallback(_check)
11473         return d
11474 
11475+    def test_mdmf_unrepairable_0shares(self):
11476+        d = self.publish_mdmf()
11477+        def _delete_all_shares(ign):
11478+            shares = self._storage._peers
11479+            for peerid in shares:
11480+                shares[peerid] = {}
11481+        d.addCallback(_delete_all_shares)
11482+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11483+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11484+        d.addCallback(lambda crr: self.failIf(crr.get_successful()))
11485+        return d
11486+
11487+
11488     def test_unrepairable_1share(self):
11489         d = self.publish_one()
11490         def _delete_all_shares(ign):
11491hunk ./src/allmydata/test/test_mutable.py 1766
11492         d.addCallback(_check)
11493         return d
11494 
11495+    def test_mdmf_unrepairable_1share(self):
11496+        d = self.publish_mdmf()
11497+        def _delete_all_shares(ign):
11498+            shares = self._storage._peers
11499+            for peerid in shares:
11500+                for shnum in list(shares[peerid]):
11501+                    if shnum > 0:
11502+                        del shares[peerid][shnum]
11503+        d.addCallback(_delete_all_shares)
11504+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11505+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11506+        def _check(crr):
11507+            self.failUnlessEqual(crr.get_successful(), False)
11508+        d.addCallback(_check)
11509+        return d
11510+
11511+    def test_repairable_5shares(self):
11512+        d = self.publish_mdmf()
11513+        def _delete_all_shares(ign):
11514+            shares = self._storage._peers
11515+            for peerid in shares:
11516+                for shnum in list(shares[peerid]):
11517+                    if shnum > 4:
11518+                        del shares[peerid][shnum]
11519+        d.addCallback(_delete_all_shares)
11520+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11521+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11522+        def _check(crr):
11523+            self.failUnlessEqual(crr.get_successful(), True)
11524+        d.addCallback(_check)
11525+        return d
11526+
11527+    def test_mdmf_repairable_5shares(self):
11528+        d = self.publish_mdmf()
11529+        def _delete_some_shares(ign):
11530+            shares = self._storage._peers
11531+            for peerid in shares:
11532+                for shnum in list(shares[peerid]):
11533+                    if shnum > 5:
11534+                        del shares[peerid][shnum]
11535+        d.addCallback(_delete_some_shares)
11536+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11537+        def _check(cr):
11538+            self.failIf(cr.is_healthy())
11539+            self.failUnless(cr.is_recoverable())
11540+            return cr
11541+        d.addCallback(_check)
11542+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11543+        def _check1(crr):
11544+            self.failUnlessEqual(crr.get_successful(), True)
11545+        d.addCallback(_check1)
11546+        return d
11547+
11548+
11549     def test_merge(self):
11550         self.old_shares = []
11551         d = self.publish_multiple()
11552hunk ./src/allmydata/test/test_mutable.py 1934
11553 class MultipleEncodings(unittest.TestCase):
11554     def setUp(self):
11555         self.CONTENTS = "New contents go here"
11556+        self.uploadable = MutableData(self.CONTENTS)
11557         self._storage = FakeStorage()
11558         self._nodemaker = make_nodemaker(self._storage, num_peers=20)
11559         self._storage_broker = self._nodemaker.storage_broker
11560hunk ./src/allmydata/test/test_mutable.py 1938
11561-        d = self._nodemaker.create_mutable_file(self.CONTENTS)
11562+        d = self._nodemaker.create_mutable_file(self.uploadable)
11563         def _created(node):
11564             self._fn = node
11565         d.addCallback(_created)
11566hunk ./src/allmydata/test/test_mutable.py 1944
11567         return d
11568 
11569-    def _encode(self, k, n, data):
11570+    def _encode(self, k, n, data, version=SDMF_VERSION):
11571         # encode 'data' into a peerid->shares dict.
11572 
11573         fn = self._fn
11574hunk ./src/allmydata/test/test_mutable.py 1960
11575         # and set the encoding parameters to something completely different
11576         fn2._required_shares = k
11577         fn2._total_shares = n
11578+        # Normally a servermap update would occur before a publish.
11579+        # Here, it doesn't, so we have to do it ourselves.
11580+        fn2.set_version(version)
11581 
11582         s = self._storage
11583         s._peers = {} # clear existing storage
11584hunk ./src/allmydata/test/test_mutable.py 1967
11585         p2 = Publish(fn2, self._storage_broker, None)
11586-        d = p2.publish(data)
11587+        uploadable = MutableData(data)
11588+        d = p2.publish(uploadable)
11589         def _published(res):
11590             shares = s._peers
11591             s._peers = {}
11592hunk ./src/allmydata/test/test_mutable.py 2235
11593         self.basedir = "mutable/Problems/test_publish_surprise"
11594         self.set_up_grid()
11595         nm = self.g.clients[0].nodemaker
11596-        d = nm.create_mutable_file("contents 1")
11597+        d = nm.create_mutable_file(MutableData("contents 1"))
11598         def _created(n):
11599             d = defer.succeed(None)
11600             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
11601hunk ./src/allmydata/test/test_mutable.py 2245
11602             d.addCallback(_got_smap1)
11603             # then modify the file, leaving the old map untouched
11604             d.addCallback(lambda res: log.msg("starting winning write"))
11605-            d.addCallback(lambda res: n.overwrite("contents 2"))
11606+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11607             # now attempt to modify the file with the old servermap. This
11608             # will look just like an uncoordinated write, in which every
11609             # single share got updated between our mapupdate and our publish
11610hunk ./src/allmydata/test/test_mutable.py 2254
11611                           self.shouldFail(UncoordinatedWriteError,
11612                                           "test_publish_surprise", None,
11613                                           n.upload,
11614-                                          "contents 2a", self.old_map))
11615+                                          MutableData("contents 2a"), self.old_map))
11616             return d
11617         d.addCallback(_created)
11618         return d
11619hunk ./src/allmydata/test/test_mutable.py 2263
11620         self.basedir = "mutable/Problems/test_retrieve_surprise"
11621         self.set_up_grid()
11622         nm = self.g.clients[0].nodemaker
11623-        d = nm.create_mutable_file("contents 1")
11624+        d = nm.create_mutable_file(MutableData("contents 1"))
11625         def _created(n):
11626             d = defer.succeed(None)
11627             d.addCallback(lambda res: n.get_servermap(MODE_READ))
11628hunk ./src/allmydata/test/test_mutable.py 2273
11629             d.addCallback(_got_smap1)
11630             # then modify the file, leaving the old map untouched
11631             d.addCallback(lambda res: log.msg("starting winning write"))
11632-            d.addCallback(lambda res: n.overwrite("contents 2"))
11633+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11634             # now attempt to retrieve the old version with the old servermap.
11635             # This will look like someone has changed the file since we
11636             # updated the servermap.
11637hunk ./src/allmydata/test/test_mutable.py 2282
11638             d.addCallback(lambda res:
11639                           self.shouldFail(NotEnoughSharesError,
11640                                           "test_retrieve_surprise",
11641-                                          "ran out of peers: have 0 shares (k=3)",
11642+                                          "ran out of peers: have 0 of 1",
11643                                           n.download_version,
11644                                           self.old_map,
11645                                           self.old_map.best_recoverable_version(),
11646hunk ./src/allmydata/test/test_mutable.py 2291
11647         d.addCallback(_created)
11648         return d
11649 
11650+
11651     def test_unexpected_shares(self):
11652         # upload the file, take a servermap, shut down one of the servers,
11653         # upload it again (causing shares to appear on a new server), then
11654hunk ./src/allmydata/test/test_mutable.py 2301
11655         self.basedir = "mutable/Problems/test_unexpected_shares"
11656         self.set_up_grid()
11657         nm = self.g.clients[0].nodemaker
11658-        d = nm.create_mutable_file("contents 1")
11659+        d = nm.create_mutable_file(MutableData("contents 1"))
11660         def _created(n):
11661             d = defer.succeed(None)
11662             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
11663hunk ./src/allmydata/test/test_mutable.py 2313
11664                 self.g.remove_server(peer0)
11665                 # then modify the file, leaving the old map untouched
11666                 log.msg("starting winning write")
11667-                return n.overwrite("contents 2")
11668+                return n.overwrite(MutableData("contents 2"))
11669             d.addCallback(_got_smap1)
11670             # now attempt to modify the file with the old servermap. This
11671             # will look just like an uncoordinated write, in which every
11672hunk ./src/allmydata/test/test_mutable.py 2323
11673                           self.shouldFail(UncoordinatedWriteError,
11674                                           "test_surprise", None,
11675                                           n.upload,
11676-                                          "contents 2a", self.old_map))
11677+                                          MutableData("contents 2a"), self.old_map))
11678             return d
11679         d.addCallback(_created)
11680         return d
11681hunk ./src/allmydata/test/test_mutable.py 2327
11682+    test_unexpected_shares.timeout = 15
11683 
11684     def test_bad_server(self):
11685         # Break one server, then create the file: the initial publish should
11686hunk ./src/allmydata/test/test_mutable.py 2361
11687         d.addCallback(_break_peer0)
11688         # now "create" the file, using the pre-established key, and let the
11689         # initial publish finally happen
11690-        d.addCallback(lambda res: nm.create_mutable_file("contents 1"))
11691+        d.addCallback(lambda res: nm.create_mutable_file(MutableData("contents 1")))
11692         # that ought to work
11693         def _got_node(n):
11694             d = n.download_best_version()
11695hunk ./src/allmydata/test/test_mutable.py 2370
11696             def _break_peer1(res):
11697                 self.g.break_server(self.server1.get_serverid())
11698             d.addCallback(_break_peer1)
11699-            d.addCallback(lambda res: n.overwrite("contents 2"))
11700+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11701             # that ought to work too
11702             d.addCallback(lambda res: n.download_best_version())
11703             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
11704hunk ./src/allmydata/test/test_mutable.py 2402
11705         peerids = [s.get_serverid() for s in sb.get_connected_servers()]
11706         self.g.break_server(peerids[0])
11707 
11708-        d = nm.create_mutable_file("contents 1")
11709+        d = nm.create_mutable_file(MutableData("contents 1"))
11710         def _created(n):
11711             d = n.download_best_version()
11712             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
11713hunk ./src/allmydata/test/test_mutable.py 2410
11714             def _break_second_server(res):
11715                 self.g.break_server(peerids[1])
11716             d.addCallback(_break_second_server)
11717-            d.addCallback(lambda res: n.overwrite("contents 2"))
11718+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11719             # that ought to work too
11720             d.addCallback(lambda res: n.download_best_version())
11721             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
11722hunk ./src/allmydata/test/test_mutable.py 2429
11723         d = self.shouldFail(NotEnoughServersError,
11724                             "test_publish_all_servers_bad",
11725                             "Ran out of non-bad servers",
11726-                            nm.create_mutable_file, "contents")
11727+                            nm.create_mutable_file, MutableData("contents"))
11728         return d
11729 
11730     def test_publish_no_servers(self):
11731hunk ./src/allmydata/test/test_mutable.py 2441
11732         d = self.shouldFail(NotEnoughServersError,
11733                             "test_publish_no_servers",
11734                             "Ran out of non-bad servers",
11735-                            nm.create_mutable_file, "contents")
11736+                            nm.create_mutable_file, MutableData("contents"))
11737         return d
11738     test_publish_no_servers.timeout = 30
11739 
11740hunk ./src/allmydata/test/test_mutable.py 2459
11741         # we need some contents that are large enough to push the privkey out
11742         # of the early part of the file
11743         LARGE = "These are Larger contents" * 2000 # about 50KB
11744-        d = nm.create_mutable_file(LARGE)
11745+        LARGE_uploadable = MutableData(LARGE)
11746+        d = nm.create_mutable_file(LARGE_uploadable)
11747         def _created(n):
11748             self.uri = n.get_uri()
11749             self.n2 = nm.create_from_cap(self.uri)
11750hunk ./src/allmydata/test/test_mutable.py 2495
11751         self.basedir = "mutable/Problems/test_privkey_query_missing"
11752         self.set_up_grid(num_servers=20)
11753         nm = self.g.clients[0].nodemaker
11754-        LARGE = "These are Larger contents" * 2000 # about 50KB
11755+        LARGE = "These are Larger contents" * 2000 # about 50KiB
11756+        LARGE_uploadable = MutableData(LARGE)
11757         nm._node_cache = DevNullDictionary() # disable the nodecache
11758 
11759hunk ./src/allmydata/test/test_mutable.py 2499
11760-        d = nm.create_mutable_file(LARGE)
11761+        d = nm.create_mutable_file(LARGE_uploadable)
11762         def _created(n):
11763             self.uri = n.get_uri()
11764             self.n2 = nm.create_from_cap(self.uri)
11765hunk ./src/allmydata/test/test_mutable.py 2509
11766         d.addCallback(_created)
11767         d.addCallback(lambda res: self.n2.get_servermap(MODE_WRITE))
11768         return d
11769+
11770+
11771+    def test_block_and_hash_query_error(self):
11772+        # This tests for what happens when a query to a remote server
11773+        # fails in either the hash validation step or the block getting
11774+        # step (because of batching, this is the same actual query).
11775+        # We need to have the storage server persist up until the point
11776+        # that its prefix is validated, then suddenly die. This
11777+        # exercises some exception handling code in Retrieve.
11778+        self.basedir = "mutable/Problems/test_block_and_hash_query_error"
11779+        self.set_up_grid(num_servers=20)
11780+        nm = self.g.clients[0].nodemaker
11781+        CONTENTS = "contents" * 2000
11782+        CONTENTS_uploadable = MutableData(CONTENTS)
11783+        d = nm.create_mutable_file(CONTENTS_uploadable)
11784+        def _created(node):
11785+            self._node = node
11786+        d.addCallback(_created)
11787+        d.addCallback(lambda ignored:
11788+            self._node.get_servermap(MODE_READ))
11789+        def _then(servermap):
11790+            # we have our servermap. Now we set up the servers like the
11791+            # tests above -- the first one that gets a read call should
11792+            # start throwing errors, but only after returning its prefix
11793+            # for validation. Since we'll download without fetching the
11794+            # private key, the next query to the remote server will be
11795+            # for either a block and salt or for hashes, either of which
11796+            # will exercise the error handling code.
11797+            killer = FirstServerGetsKilled()
11798+            for (serverid, ss) in nm.storage_broker.get_all_servers():
11799+                ss.post_call_notifier = killer.notify
11800+            ver = servermap.best_recoverable_version()
11801+            assert ver
11802+            return self._node.download_version(servermap, ver)
11803+        d.addCallback(_then)
11804+        d.addCallback(lambda data:
11805+            self.failUnlessEqual(data, CONTENTS))
11806+        return d
11807+
11808+
11809+class FileHandle(unittest.TestCase):
11810+    def setUp(self):
11811+        self.test_data = "Test Data" * 50000
11812+        self.sio = StringIO(self.test_data)
11813+        self.uploadable = MutableFileHandle(self.sio)
11814+
11815+
11816+    def test_filehandle_read(self):
11817+        self.basedir = "mutable/FileHandle/test_filehandle_read"
11818+        chunk_size = 10
11819+        for i in xrange(0, len(self.test_data), chunk_size):
11820+            data = self.uploadable.read(chunk_size)
11821+            data = "".join(data)
11822+            start = i
11823+            end = i + chunk_size
11824+            self.failUnlessEqual(data, self.test_data[start:end])
11825+
11826+
11827+    def test_filehandle_get_size(self):
11828+        self.basedir = "mutable/FileHandle/test_filehandle_get_size"
11829+        actual_size = len(self.test_data)
11830+        size = self.uploadable.get_size()
11831+        self.failUnlessEqual(size, actual_size)
11832+
11833+
11834+    def test_filehandle_get_size_out_of_order(self):
11835+        # We should be able to call get_size whenever we want without
11836+        # disturbing the location of the seek pointer.
11837+        chunk_size = 100
11838+        data = self.uploadable.read(chunk_size)
11839+        self.failUnlessEqual("".join(data), self.test_data[:chunk_size])
11840+
11841+        # Now get the size.
11842+        size = self.uploadable.get_size()
11843+        self.failUnlessEqual(size, len(self.test_data))
11844+
11845+        # Now get more data. We should be right where we left off.
11846+        more_data = self.uploadable.read(chunk_size)
11847+        start = chunk_size
11848+        end = chunk_size * 2
11849+        self.failUnlessEqual("".join(more_data), self.test_data[start:end])
11850+
11851+
11852+    def test_filehandle_file(self):
11853+        # Make sure that the MutableFileHandle works on a file as well
11854+        # as a StringIO object, since in some cases it will be asked to
11855+        # deal with files.
11856+        self.basedir = self.mktemp()
11857+        # necessary? What am I doing wrong here?
11858+        os.mkdir(self.basedir)
11859+        f_path = os.path.join(self.basedir, "test_file")
11860+        f = open(f_path, "w")
11861+        f.write(self.test_data)
11862+        f.close()
11863+        f = open(f_path, "r")
11864+
11865+        uploadable = MutableFileHandle(f)
11866+
11867+        data = uploadable.read(len(self.test_data))
11868+        self.failUnlessEqual("".join(data), self.test_data)
11869+        size = uploadable.get_size()
11870+        self.failUnlessEqual(size, len(self.test_data))
11871+
11872+
11873+    def test_close(self):
11874+        # Make sure that the MutableFileHandle closes its handle when
11875+        # told to do so.
11876+        self.uploadable.close()
11877+        self.failUnless(self.sio.closed)
11878+
11879+
11880+class DataHandle(unittest.TestCase):
11881+    def setUp(self):
11882+        self.test_data = "Test Data" * 50000
11883+        self.uploadable = MutableData(self.test_data)
11884+
11885+
11886+    def test_datahandle_read(self):
11887+        chunk_size = 10
11888+        for i in xrange(0, len(self.test_data), chunk_size):
11889+            data = self.uploadable.read(chunk_size)
11890+            data = "".join(data)
11891+            start = i
11892+            end = i + chunk_size
11893+            self.failUnlessEqual(data, self.test_data[start:end])
11894+
11895+
11896+    def test_datahandle_get_size(self):
11897+        actual_size = len(self.test_data)
11898+        size = self.uploadable.get_size()
11899+        self.failUnlessEqual(size, actual_size)
11900+
11901+
11902+    def test_datahandle_get_size_out_of_order(self):
11903+        # We should be able to call get_size whenever we want without
11904+        # disturbing the location of the seek pointer.
11905+        chunk_size = 100
11906+        data = self.uploadable.read(chunk_size)
11907+        self.failUnlessEqual("".join(data), self.test_data[:chunk_size])
11908+
11909+        # Now get the size.
11910+        size = self.uploadable.get_size()
11911+        self.failUnlessEqual(size, len(self.test_data))
11912+
11913+        # Now get more data. We should be right where we left off.
11914+        more_data = self.uploadable.read(chunk_size)
11915+        start = chunk_size
11916+        end = chunk_size * 2
11917+        self.failUnlessEqual("".join(more_data), self.test_data[start:end])
11918+
11919+
11920+class Version(GridTestMixin, unittest.TestCase, testutil.ShouldFailMixin, \
11921+              PublishMixin):
11922+    def setUp(self):
11923+        GridTestMixin.setUp(self)
11924+        self.basedir = self.mktemp()
11925+        self.set_up_grid()
11926+        self.c = self.g.clients[0]
11927+        self.nm = self.c.nodemaker
11928+        self.data = "test data" * 100000 # about 900 KiB; MDMF
11929+        self.small_data = "test data" * 10 # about 90 B; SDMF
11930+        return self.do_upload()
11931+
11932+
11933+    def do_upload(self):
11934+        d1 = self.nm.create_mutable_file(MutableData(self.data),
11935+                                         version=MDMF_VERSION)
11936+        d2 = self.nm.create_mutable_file(MutableData(self.small_data))
11937+        dl = gatherResults([d1, d2])
11938+        def _then((n1, n2)):
11939+            assert isinstance(n1, MutableFileNode)
11940+            assert isinstance(n2, MutableFileNode)
11941+
11942+            self.mdmf_node = n1
11943+            self.sdmf_node = n2
11944+        dl.addCallback(_then)
11945+        return dl
11946+
11947+
11948+    def test_get_readonly_mutable_version(self):
11949+        # Attempting to get a mutable version of a mutable file from a
11950+        # filenode initialized with a readcap should return a readonly
11951+        # version of that same node.
11952+        ro = self.mdmf_node.get_readonly()
11953+        d = ro.get_best_mutable_version()
11954+        d.addCallback(lambda version:
11955+            self.failUnless(version.is_readonly()))
11956+        d.addCallback(lambda ignored:
11957+            self.sdmf_node.get_readonly())
11958+        d.addCallback(lambda version:
11959+            self.failUnless(version.is_readonly()))
11960+        return d
11961+
11962+
11963+    def test_get_sequence_number(self):
11964+        d = self.mdmf_node.get_best_readable_version()
11965+        d.addCallback(lambda bv:
11966+            self.failUnlessEqual(bv.get_sequence_number(), 1))
11967+        d.addCallback(lambda ignored:
11968+            self.sdmf_node.get_best_readable_version())
11969+        d.addCallback(lambda bv:
11970+            self.failUnlessEqual(bv.get_sequence_number(), 1))
11971+        # Now update. The sequence number in both cases should be 1 in
11972+        # both cases.
11973+        def _do_update(ignored):
11974+            new_data = MutableData("foo bar baz" * 100000)
11975+            new_small_data = MutableData("foo bar baz" * 10)
11976+            d1 = self.mdmf_node.overwrite(new_data)
11977+            d2 = self.sdmf_node.overwrite(new_small_data)
11978+            dl = gatherResults([d1, d2])
11979+            return dl
11980+        d.addCallback(_do_update)
11981+        d.addCallback(lambda ignored:
11982+            self.mdmf_node.get_best_readable_version())
11983+        d.addCallback(lambda bv:
11984+            self.failUnlessEqual(bv.get_sequence_number(), 2))
11985+        d.addCallback(lambda ignored:
11986+            self.sdmf_node.get_best_readable_version())
11987+        d.addCallback(lambda bv:
11988+            self.failUnlessEqual(bv.get_sequence_number(), 2))
11989+        return d
11990+
11991+
11992+    def test_get_writekey(self):
11993+        d = self.mdmf_node.get_best_mutable_version()
11994+        d.addCallback(lambda bv:
11995+            self.failUnlessEqual(bv.get_writekey(),
11996+                                 self.mdmf_node.get_writekey()))
11997+        d.addCallback(lambda ignored:
11998+            self.sdmf_node.get_best_mutable_version())
11999+        d.addCallback(lambda bv:
12000+            self.failUnlessEqual(bv.get_writekey(),
12001+                                 self.sdmf_node.get_writekey()))
12002+        return d
12003+
12004+
12005+    def test_get_storage_index(self):
12006+        d = self.mdmf_node.get_best_mutable_version()
12007+        d.addCallback(lambda bv:
12008+            self.failUnlessEqual(bv.get_storage_index(),
12009+                                 self.mdmf_node.get_storage_index()))
12010+        d.addCallback(lambda ignored:
12011+            self.sdmf_node.get_best_mutable_version())
12012+        d.addCallback(lambda bv:
12013+            self.failUnlessEqual(bv.get_storage_index(),
12014+                                 self.sdmf_node.get_storage_index()))
12015+        return d
12016+
12017+
12018+    def test_get_readonly_version(self):
12019+        d = self.mdmf_node.get_best_readable_version()
12020+        d.addCallback(lambda bv:
12021+            self.failUnless(bv.is_readonly()))
12022+        d.addCallback(lambda ignored:
12023+            self.sdmf_node.get_best_readable_version())
12024+        d.addCallback(lambda bv:
12025+            self.failUnless(bv.is_readonly()))
12026+        return d
12027+
12028+
12029+    def test_get_mutable_version(self):
12030+        d = self.mdmf_node.get_best_mutable_version()
12031+        d.addCallback(lambda bv:
12032+            self.failIf(bv.is_readonly()))
12033+        d.addCallback(lambda ignored:
12034+            self.sdmf_node.get_best_mutable_version())
12035+        d.addCallback(lambda bv:
12036+            self.failIf(bv.is_readonly()))
12037+        return d
12038+
12039+
12040+    def test_toplevel_overwrite(self):
12041+        new_data = MutableData("foo bar baz" * 100000)
12042+        new_small_data = MutableData("foo bar baz" * 10)
12043+        d = self.mdmf_node.overwrite(new_data)
12044+        d.addCallback(lambda ignored:
12045+            self.mdmf_node.download_best_version())
12046+        d.addCallback(lambda data:
12047+            self.failUnlessEqual(data, "foo bar baz" * 100000))
12048+        d.addCallback(lambda ignored:
12049+            self.sdmf_node.overwrite(new_small_data))
12050+        d.addCallback(lambda ignored:
12051+            self.sdmf_node.download_best_version())
12052+        d.addCallback(lambda data:
12053+            self.failUnlessEqual(data, "foo bar baz" * 10))
12054+        return d
12055+
12056+
12057+    def test_toplevel_modify(self):
12058+        def modifier(old_contents, servermap, first_time):
12059+            return old_contents + "modified"
12060+        d = self.mdmf_node.modify(modifier)
12061+        d.addCallback(lambda ignored:
12062+            self.mdmf_node.download_best_version())
12063+        d.addCallback(lambda data:
12064+            self.failUnlessIn("modified", data))
12065+        d.addCallback(lambda ignored:
12066+            self.sdmf_node.modify(modifier))
12067+        d.addCallback(lambda ignored:
12068+            self.sdmf_node.download_best_version())
12069+        d.addCallback(lambda data:
12070+            self.failUnlessIn("modified", data))
12071+        return d
12072+
12073+
12074+    def test_version_modify(self):
12075+        # TODO: When we can publish multiple versions, alter this test
12076+        # to modify a version other than the best usable version, then
12077+        # test to see that the best recoverable version is that.
12078+        def modifier(old_contents, servermap, first_time):
12079+            return old_contents + "modified"
12080+        d = self.mdmf_node.modify(modifier)
12081+        d.addCallback(lambda ignored:
12082+            self.mdmf_node.download_best_version())
12083+        d.addCallback(lambda data:
12084+            self.failUnlessIn("modified", data))
12085+        d.addCallback(lambda ignored:
12086+            self.sdmf_node.modify(modifier))
12087+        d.addCallback(lambda ignored:
12088+            self.sdmf_node.download_best_version())
12089+        d.addCallback(lambda data:
12090+            self.failUnlessIn("modified", data))
12091+        return d
12092+
12093+
12094+    def test_download_version(self):
12095+        d = self.publish_multiple()
12096+        # We want to have two recoverable versions on the grid.
12097+        d.addCallback(lambda res:
12098+                      self._set_versions({0:0,2:0,4:0,6:0,8:0,
12099+                                          1:1,3:1,5:1,7:1,9:1}))
12100+        # Now try to download each version. We should get the plaintext
12101+        # associated with that version.
12102+        d.addCallback(lambda ignored:
12103+            self._fn.get_servermap(mode=MODE_READ))
12104+        def _got_servermap(smap):
12105+            versions = smap.recoverable_versions()
12106+            assert len(versions) == 2
12107+
12108+            self.servermap = smap
12109+            self.version1, self.version2 = versions
12110+            assert self.version1 != self.version2
12111+
12112+            self.version1_seqnum = self.version1[0]
12113+            self.version2_seqnum = self.version2[0]
12114+            self.version1_index = self.version1_seqnum - 1
12115+            self.version2_index = self.version2_seqnum - 1
12116+
12117+        d.addCallback(_got_servermap)
12118+        d.addCallback(lambda ignored:
12119+            self._fn.download_version(self.servermap, self.version1))
12120+        d.addCallback(lambda results:
12121+            self.failUnlessEqual(self.CONTENTS[self.version1_index],
12122+                                 results))
12123+        d.addCallback(lambda ignored:
12124+            self._fn.download_version(self.servermap, self.version2))
12125+        d.addCallback(lambda results:
12126+            self.failUnlessEqual(self.CONTENTS[self.version2_index],
12127+                                 results))
12128+        return d
12129+
12130+
12131+    def test_download_nonexistent_version(self):
12132+        d = self.mdmf_node.get_servermap(mode=MODE_WRITE)
12133+        def _set_servermap(servermap):
12134+            self.servermap = servermap
12135+        d.addCallback(_set_servermap)
12136+        d.addCallback(lambda ignored:
12137+           self.shouldFail(UnrecoverableFileError, "nonexistent version",
12138+                           None,
12139+                           self.mdmf_node.download_version, self.servermap,
12140+                           "not a version"))
12141+        return d
12142+
12143+
12144+    def test_partial_read(self):
12145+        # read only a few bytes at a time, and see that the results are
12146+        # what we expect.
12147+        d = self.mdmf_node.get_best_readable_version()
12148+        def _read_data(version):
12149+            c = consumer.MemoryConsumer()
12150+            d2 = defer.succeed(None)
12151+            for i in xrange(0, len(self.data), 10000):
12152+                d2.addCallback(lambda ignored, i=i: version.read(c, i, 10000))
12153+            d2.addCallback(lambda ignored:
12154+                self.failUnlessEqual(self.data, "".join(c.chunks)))
12155+            return d2
12156+        d.addCallback(_read_data)
12157+        return d
12158+
12159+
12160+    def test_read(self):
12161+        d = self.mdmf_node.get_best_readable_version()
12162+        def _read_data(version):
12163+            c = consumer.MemoryConsumer()
12164+            d2 = defer.succeed(None)
12165+            d2.addCallback(lambda ignored: version.read(c))
12166+            d2.addCallback(lambda ignored:
12167+                self.failUnlessEqual("".join(c.chunks), self.data))
12168+            return d2
12169+        d.addCallback(_read_data)
12170+        return d
12171+
12172+
12173+    def test_download_best_version(self):
12174+        d = self.mdmf_node.download_best_version()
12175+        d.addCallback(lambda data:
12176+            self.failUnlessEqual(data, self.data))
12177+        d.addCallback(lambda ignored:
12178+            self.sdmf_node.download_best_version())
12179+        d.addCallback(lambda data:
12180+            self.failUnlessEqual(data, self.small_data))
12181+        return d
12182+
12183+
12184+class Update(GridTestMixin, unittest.TestCase, testutil.ShouldFailMixin):
12185+    def setUp(self):
12186+        GridTestMixin.setUp(self)
12187+        self.basedir = self.mktemp()
12188+        self.set_up_grid()
12189+        self.c = self.g.clients[0]
12190+        self.nm = self.c.nodemaker
12191+        self.data = "test data" * 100000 # about 900 KiB; MDMF
12192+        self.small_data = "test data" * 10 # about 90 B; SDMF
12193+        return self.do_upload()
12194+
12195+
12196+    def do_upload(self):
12197+        d1 = self.nm.create_mutable_file(MutableData(self.data),
12198+                                         version=MDMF_VERSION)
12199+        d2 = self.nm.create_mutable_file(MutableData(self.small_data))
12200+        dl = gatherResults([d1, d2])
12201+        def _then((n1, n2)):
12202+            assert isinstance(n1, MutableFileNode)
12203+            assert isinstance(n2, MutableFileNode)
12204+
12205+            self.mdmf_node = n1
12206+            self.sdmf_node = n2
12207+        dl.addCallback(_then)
12208+        return dl
12209+
12210+
12211+    def test_append(self):
12212+        # We should be able to append data to the middle of a mutable
12213+        # file and get what we expect.
12214+        new_data = self.data + "appended"
12215+        d = self.mdmf_node.get_best_mutable_version()
12216+        d.addCallback(lambda mv:
12217+            mv.update(MutableData("appended"), len(self.data)))
12218+        d.addCallback(lambda ignored:
12219+            self.mdmf_node.download_best_version())
12220+        d.addCallback(lambda results:
12221+            self.failUnlessEqual(results, new_data))
12222+        return d
12223+    test_append.timeout = 15
12224+
12225+
12226+    def test_replace(self):
12227+        # We should be able to replace data in the middle of a mutable
12228+        # file and get what we expect back.
12229+        new_data = self.data[:100]
12230+        new_data += "appended"
12231+        new_data += self.data[108:]
12232+        d = self.mdmf_node.get_best_mutable_version()
12233+        d.addCallback(lambda mv:
12234+            mv.update(MutableData("appended"), 100))
12235+        d.addCallback(lambda ignored:
12236+            self.mdmf_node.download_best_version())
12237+        d.addCallback(lambda results:
12238+            self.failUnlessEqual(results, new_data))
12239+        return d
12240+
12241+
12242+    def test_replace_and_extend(self):
12243+        # We should be able to replace data in the middle of a mutable
12244+        # file and extend that mutable file and get what we expect.
12245+        new_data = self.data[:100]
12246+        new_data += "modified " * 100000
12247+        d = self.mdmf_node.get_best_mutable_version()
12248+        d.addCallback(lambda mv:
12249+            mv.update(MutableData("modified " * 100000), 100))
12250+        d.addCallback(lambda ignored:
12251+            self.mdmf_node.download_best_version())
12252+        d.addCallback(lambda results:
12253+            self.failUnlessEqual(results, new_data))
12254+        return d
12255+
12256+
12257+    def test_append_power_of_two(self):
12258+        # If we attempt to extend a mutable file so that its segment
12259+        # count crosses a power-of-two boundary, the update operation
12260+        # should know how to reencode the file.
12261+
12262+        # Note that the data populating self.mdmf_node is about 900 KiB
12263+        # long -- this is 7 segments in the default segment size. So we
12264+        # need to add 2 segments worth of data to push it over a
12265+        # power-of-two boundary.
12266+        segment = "a" * DEFAULT_MAX_SEGMENT_SIZE
12267+        new_data = self.data + (segment * 2)
12268+        d = self.mdmf_node.get_best_mutable_version()
12269+        d.addCallback(lambda mv:
12270+            mv.update(MutableData(segment * 2), len(self.data)))
12271+        d.addCallback(lambda ignored:
12272+            self.mdmf_node.download_best_version())
12273+        d.addCallback(lambda results:
12274+            self.failUnlessEqual(results, new_data))
12275+        return d
12276+    test_append_power_of_two.timeout = 15
12277+
12278+
12279+    def test_update_sdmf(self):
12280+        # Running update on a single-segment file should still work.
12281+        new_data = self.small_data + "appended"
12282+        d = self.sdmf_node.get_best_mutable_version()
12283+        d.addCallback(lambda mv:
12284+            mv.update(MutableData("appended"), len(self.small_data)))
12285+        d.addCallback(lambda ignored:
12286+            self.sdmf_node.download_best_version())
12287+        d.addCallback(lambda results:
12288+            self.failUnlessEqual(results, new_data))
12289+        return d
12290+
12291+    def test_replace_in_last_segment(self):
12292+        # The wrapper should know how to handle the tail segment
12293+        # appropriately.
12294+        replace_offset = len(self.data) - 100
12295+        new_data = self.data[:replace_offset] + "replaced"
12296+        rest_offset = replace_offset + len("replaced")
12297+        new_data += self.data[rest_offset:]
12298+        d = self.mdmf_node.get_best_mutable_version()
12299+        d.addCallback(lambda mv:
12300+            mv.update(MutableData("replaced"), replace_offset))
12301+        d.addCallback(lambda ignored:
12302+            self.mdmf_node.download_best_version())
12303+        d.addCallback(lambda results:
12304+            self.failUnlessEqual(results, new_data))
12305+        return d
12306+
12307+
12308+    def test_multiple_segment_replace(self):
12309+        replace_offset = 2 * DEFAULT_MAX_SEGMENT_SIZE
12310+        new_data = self.data[:replace_offset]
12311+        new_segment = "a" * DEFAULT_MAX_SEGMENT_SIZE
12312+        new_data += 2 * new_segment
12313+        new_data += "replaced"
12314+        rest_offset = len(new_data)
12315+        new_data += self.data[rest_offset:]
12316+        d = self.mdmf_node.get_best_mutable_version()
12317+        d.addCallback(lambda mv:
12318+            mv.update(MutableData((2 * new_segment) + "replaced"),
12319+                      replace_offset))
12320+        d.addCallback(lambda ignored:
12321+            self.mdmf_node.download_best_version())
12322+        d.addCallback(lambda results:
12323+            self.failUnlessEqual(results, new_data))
12324+        return d
12325hunk ./src/allmydata/test/test_sftp.py 32
12326 
12327 from allmydata.util.consumer import download_to_data
12328 from allmydata.immutable import upload
12329+from allmydata.mutable import publish
12330 from allmydata.test.no_network import GridTestMixin
12331 from allmydata.test.common import ShouldFailMixin
12332 from allmydata.test.common_util import ReallyEqualMixin
12333hunk ./src/allmydata/test/test_sftp.py 84
12334         return d
12335 
12336     def _set_up_tree(self):
12337-        d = self.client.create_mutable_file("mutable file contents")
12338+        u = publish.MutableData("mutable file contents")
12339+        d = self.client.create_mutable_file(u)
12340         d.addCallback(lambda node: self.root.set_node(u"mutable", node))
12341         def _created_mutable(n):
12342             self.mutable = n
12343hunk ./src/allmydata/test/test_sftp.py 1334
12344         d.addCallback(lambda ign: self.failUnlessEqual(sftpd.all_heisenfiles, {}))
12345         d.addCallback(lambda ign: self.failUnlessEqual(self.handler._heisenfiles, {}))
12346         return d
12347+    test_makeDirectory.timeout = 15
12348 
12349     def test_execCommand_and_openShell(self):
12350         class FakeProtocol:
12351hunk ./src/allmydata/test/test_storage.py 27
12352                                      LayoutInvalid, MDMFSIGNABLEHEADER, \
12353                                      SIGNED_PREFIX, MDMFHEADER, \
12354                                      MDMFOFFSETS, SDMFSlotWriteProxy
12355-from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \
12356-                                 SDMF_VERSION
12357+from allmydata.interfaces import BadWriteEnablerError
12358 from allmydata.test.common import LoggingServiceParent, ShouldFailMixin
12359 from allmydata.test.common_web import WebRenderingMixin
12360 from allmydata.web.storage import StorageStatus, remove_prefix
12361hunk ./src/allmydata/test/test_system.py 26
12362 from allmydata.monitor import Monitor
12363 from allmydata.mutable.common import NotWriteableError
12364 from allmydata.mutable import layout as mutable_layout
12365+from allmydata.mutable.publish import MutableData
12366 from foolscap.api import DeadReferenceError
12367 from twisted.python.failure import Failure
12368 from twisted.web.client import getPage
12369hunk ./src/allmydata/test/test_system.py 467
12370     def test_mutable(self):
12371         self.basedir = "system/SystemTest/test_mutable"
12372         DATA = "initial contents go here."  # 25 bytes % 3 != 0
12373+        DATA_uploadable = MutableData(DATA)
12374         NEWDATA = "new contents yay"
12375hunk ./src/allmydata/test/test_system.py 469
12376+        NEWDATA_uploadable = MutableData(NEWDATA)
12377         NEWERDATA = "this is getting old"
12378hunk ./src/allmydata/test/test_system.py 471
12379+        NEWERDATA_uploadable = MutableData(NEWERDATA)
12380 
12381         d = self.set_up_nodes(use_key_generator=True)
12382 
12383hunk ./src/allmydata/test/test_system.py 478
12384         def _create_mutable(res):
12385             c = self.clients[0]
12386             log.msg("starting create_mutable_file")
12387-            d1 = c.create_mutable_file(DATA)
12388+            d1 = c.create_mutable_file(DATA_uploadable)
12389             def _done(res):
12390                 log.msg("DONE: %s" % (res,))
12391                 self._mutable_node_1 = res
12392hunk ./src/allmydata/test/test_system.py 565
12393             self.failUnlessEqual(res, DATA)
12394             # replace the data
12395             log.msg("starting replace1")
12396-            d1 = newnode.overwrite(NEWDATA)
12397+            d1 = newnode.overwrite(NEWDATA_uploadable)
12398             d1.addCallback(lambda res: newnode.download_best_version())
12399             return d1
12400         d.addCallback(_check_download_3)
12401hunk ./src/allmydata/test/test_system.py 579
12402             newnode2 = self.clients[3].create_node_from_uri(uri)
12403             self._newnode3 = self.clients[3].create_node_from_uri(uri)
12404             log.msg("starting replace2")
12405-            d1 = newnode1.overwrite(NEWERDATA)
12406+            d1 = newnode1.overwrite(NEWERDATA_uploadable)
12407             d1.addCallback(lambda res: newnode2.download_best_version())
12408             return d1
12409         d.addCallback(_check_download_4)
12410hunk ./src/allmydata/test/test_system.py 649
12411         def _check_empty_file(res):
12412             # make sure we can create empty files, this usually screws up the
12413             # segsize math
12414-            d1 = self.clients[2].create_mutable_file("")
12415+            d1 = self.clients[2].create_mutable_file(MutableData(""))
12416             d1.addCallback(lambda newnode: newnode.download_best_version())
12417             d1.addCallback(lambda res: self.failUnlessEqual("", res))
12418             return d1
12419hunk ./src/allmydata/test/test_system.py 680
12420                                  self.key_generator_svc.key_generator.pool_size + size_delta)
12421 
12422         d.addCallback(check_kg_poolsize, 0)
12423-        d.addCallback(lambda junk: self.clients[3].create_mutable_file('hello, world'))
12424+        d.addCallback(lambda junk:
12425+            self.clients[3].create_mutable_file(MutableData('hello, world')))
12426         d.addCallback(check_kg_poolsize, -1)
12427         d.addCallback(lambda junk: self.clients[3].create_dirnode())
12428         d.addCallback(check_kg_poolsize, -2)
12429hunk ./src/allmydata/test/test_web.py 28
12430 from allmydata.util.encodingutil import to_str
12431 from allmydata.test.common import FakeCHKFileNode, FakeMutableFileNode, \
12432      create_chk_filenode, WebErrorMixin, ShouldFailMixin, make_mutable_file_uri
12433-from allmydata.interfaces import IMutableFileNode
12434+from allmydata.interfaces import IMutableFileNode, SDMF_VERSION, MDMF_VERSION
12435 from allmydata.mutable import servermap, publish, retrieve
12436 import allmydata.test.common_util as testutil
12437 from allmydata.test.no_network import GridTestMixin
12438hunk ./src/allmydata/test/test_web.py 57
12439         return FakeCHKFileNode(cap)
12440     def _create_mutable(self, cap):
12441         return FakeMutableFileNode(None, None, None, None).init_from_cap(cap)
12442-    def create_mutable_file(self, contents="", keysize=None):
12443+    def create_mutable_file(self, contents="", keysize=None,
12444+                            version=SDMF_VERSION):
12445         n = FakeMutableFileNode(None, None, None, None)
12446hunk ./src/allmydata/test/test_web.py 60
12447+        n.set_version(version)
12448         return n.create(contents)
12449 
12450 class FakeUploader(service.Service):
12451hunk ./src/allmydata/test/test_web.py 157
12452         self.nodemaker = FakeNodeMaker(None, self._secret_holder, None,
12453                                        self.uploader, None,
12454                                        None, None)
12455+        self.mutable_file_default = SDMF_VERSION
12456 
12457     def startService(self):
12458         return service.MultiService.startService(self)
12459hunk ./src/allmydata/test/test_web.py 762
12460                              self.PUT, base + "/@@name=/blah.txt", "")
12461         return d
12462 
12463+
12464     def test_GET_DIRURL_named_bad(self):
12465         base = "/file/%s" % urllib.quote(self._foo_uri)
12466         d = self.shouldFail2(error.Error, "test_PUT_DIRURL_named_bad",
12467hunk ./src/allmydata/test/test_web.py 878
12468                                                       self.NEWFILE_CONTENTS))
12469         return d
12470 
12471+    def test_PUT_NEWFILEURL_unlinked_mdmf(self):
12472+        # this should get us a few segments of an MDMF mutable file,
12473+        # which we can then test for.
12474+        contents = self.NEWFILE_CONTENTS * 300000
12475+        d = self.PUT("/uri?mutable=true&mutable-type=mdmf",
12476+                     contents)
12477+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12478+        d.addCallback(lambda json: self.failUnlessIn("mdmf", json))
12479+        return d
12480+
12481+    def test_PUT_NEWFILEURL_unlinked_sdmf(self):
12482+        contents = self.NEWFILE_CONTENTS * 300000
12483+        d = self.PUT("/uri?mutable=true&mutable-type=sdmf",
12484+                     contents)
12485+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12486+        d.addCallback(lambda json: self.failUnlessIn("sdmf", json))
12487+        return d
12488+
12489     def test_PUT_NEWFILEURL_range_bad(self):
12490         headers = {"content-range": "bytes 1-10/%d" % len(self.NEWFILE_CONTENTS)}
12491         target = self.public_url + "/foo/new.txt"
12492hunk ./src/allmydata/test/test_web.py 928
12493         return d
12494 
12495     def test_PUT_NEWFILEURL_mutable_toobig(self):
12496-        d = self.shouldFail2(error.Error, "test_PUT_NEWFILEURL_mutable_toobig",
12497-                             "413 Request Entity Too Large",
12498-                             "SDMF is limited to one segment, and 10001 > 10000",
12499-                             self.PUT,
12500-                             self.public_url + "/foo/new.txt?mutable=true",
12501-                             "b" * (self.s.MUTABLE_SIZELIMIT+1))
12502+        # It is okay to upload large mutable files, so we should be able
12503+        # to do that.
12504+        d = self.PUT(self.public_url + "/foo/new.txt?mutable=true",
12505+                     "b" * (self.s.MUTABLE_SIZELIMIT + 1))
12506         return d
12507 
12508     def test_PUT_NEWFILEURL_replace(self):
12509hunk ./src/allmydata/test/test_web.py 1026
12510         d.addCallback(_check1)
12511         return d
12512 
12513+    def test_GET_FILEURL_json_mutable_type(self):
12514+        # The JSON should include mutable-type, which says whether the
12515+        # file is SDMF or MDMF
12516+        d = self.PUT("/uri?mutable=true&mutable-type=mdmf",
12517+                     self.NEWFILE_CONTENTS * 300000)
12518+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12519+        def _got_json(json, version):
12520+            data = simplejson.loads(json)
12521+            assert "filenode" == data[0]
12522+            data = data[1]
12523+            assert isinstance(data, dict)
12524+
12525+            self.failUnlessIn("mutable-type", data)
12526+            self.failUnlessEqual(data['mutable-type'], version)
12527+
12528+        d.addCallback(_got_json, "mdmf")
12529+        # Now make an SDMF file and check that it is reported correctly.
12530+        d.addCallback(lambda ignored:
12531+            self.PUT("/uri?mutable=true&mutable-type=sdmf",
12532+                      self.NEWFILE_CONTENTS * 300000))
12533+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12534+        d.addCallback(_got_json, "sdmf")
12535+        return d
12536+
12537     def test_GET_FILEURL_json_missing(self):
12538         d = self.GET(self.public_url + "/foo/missing?json")
12539         d.addBoth(self.should404, "test_GET_FILEURL_json_missing")
12540hunk ./src/allmydata/test/test_web.py 1088
12541         d.addBoth(self.should404, "test_GET_FILEURL_uri_missing")
12542         return d
12543 
12544-    def test_GET_DIRECTORY_html_banner(self):
12545+    def test_GET_DIRECTORY_html(self):
12546         d = self.GET(self.public_url + "/foo", followRedirect=True)
12547         def _check(res):
12548             self.failUnlessIn('<div class="toolbar-item"><a href="../../..">Return to Welcome page</a></div>',res)
12549hunk ./src/allmydata/test/test_web.py 1092
12550+            self.failUnlessIn("mutable-type-mdmf", res)
12551+            self.failUnlessIn("mutable-type-sdmf", res)
12552         d.addCallback(_check)
12553         return d
12554 
12555hunk ./src/allmydata/test/test_web.py 1097
12556+    def test_GET_root_html(self):
12557+        # make sure that we have the option to upload an unlinked
12558+        # mutable file in SDMF and MDMF formats.
12559+        d = self.GET("/")
12560+        def _got_html(html):
12561+            # These are radio buttons that allow the user to toggle
12562+            # whether a particular mutable file is MDMF or SDMF.
12563+            self.failUnlessIn("mutable-type-mdmf", html)
12564+            self.failUnlessIn("mutable-type-sdmf", html)
12565+        d.addCallback(_got_html)
12566+        return d
12567+
12568+    def test_mutable_type_defaults(self):
12569+        # The checked="checked" attribute of the inputs corresponding to
12570+        # the mutable-type parameter should change as expected with the
12571+        # value configured in tahoe.cfg.
12572+        #
12573+        # By default, the value configured with the client is
12574+        # SDMF_VERSION, so that should be checked.
12575+        assert self.s.mutable_file_default == SDMF_VERSION
12576+
12577+        d = self.GET("/")
12578+        def _got_html(html, value):
12579+            i = 'input checked="checked" type="radio" id="mutable-type-%s"'
12580+            self.failUnlessIn(i % value, html)
12581+        d.addCallback(_got_html, "sdmf")
12582+        d.addCallback(lambda ignored:
12583+            self.GET(self.public_url + "/foo", followRedirect=True))
12584+        d.addCallback(_got_html, "sdmf")
12585+        # Now switch the configuration value to MDMF. The MDMF radio
12586+        # buttons should now be checked on these pages.
12587+        def _swap_values(ignored):
12588+            self.s.mutable_file_default = MDMF_VERSION
12589+        d.addCallback(_swap_values)
12590+        d.addCallback(lambda ignored: self.GET("/"))
12591+        d.addCallback(_got_html, "mdmf")
12592+        d.addCallback(lambda ignored:
12593+            self.GET(self.public_url + "/foo", followRedirect=True))
12594+        d.addCallback(_got_html, "mdmf")
12595+        return d
12596+
12597     def test_GET_DIRURL(self):
12598         # the addSlash means we get a redirect here
12599         # from /uri/$URI/foo/ , we need ../../../ to get back to the root
12600hunk ./src/allmydata/test/test_web.py 1227
12601         d.addCallback(self.failUnlessIsFooJSON)
12602         return d
12603 
12604+    def test_GET_DIRURL_json_mutable_type(self):
12605+        d = self.PUT(self.public_url + \
12606+                     "/foo/sdmf.txt?mutable=true&mutable-type=sdmf",
12607+                     self.NEWFILE_CONTENTS * 300000)
12608+        d.addCallback(lambda ignored:
12609+            self.PUT(self.public_url + \
12610+                     "/foo/mdmf.txt?mutable=true&mutable-type=mdmf",
12611+                     self.NEWFILE_CONTENTS * 300000))
12612+        # Now we have an MDMF and SDMF file in the directory. If we GET
12613+        # its JSON, we should see their encodings.
12614+        d.addCallback(lambda ignored:
12615+            self.GET(self.public_url + "/foo?t=json"))
12616+        def _got_json(json):
12617+            data = simplejson.loads(json)
12618+            assert data[0] == "dirnode"
12619+
12620+            data = data[1]
12621+            kids = data['children']
12622+
12623+            mdmf_data = kids['mdmf.txt'][1]
12624+            self.failUnlessIn("mutable-type", mdmf_data)
12625+            self.failUnlessEqual(mdmf_data['mutable-type'], "mdmf")
12626+
12627+            sdmf_data = kids['sdmf.txt'][1]
12628+            self.failUnlessIn("mutable-type", sdmf_data)
12629+            self.failUnlessEqual(sdmf_data['mutable-type'], "sdmf")
12630+        d.addCallback(_got_json)
12631+        return d
12632+
12633 
12634     def test_POST_DIRURL_manifest_no_ophandle(self):
12635         d = self.shouldFail2(error.Error,
12636hunk ./src/allmydata/test/test_web.py 1810
12637         return d
12638 
12639     def test_POST_upload_no_link_mutable_toobig(self):
12640-        d = self.shouldFail2(error.Error,
12641-                             "test_POST_upload_no_link_mutable_toobig",
12642-                             "413 Request Entity Too Large",
12643-                             "SDMF is limited to one segment, and 10001 > 10000",
12644-                             self.POST,
12645-                             "/uri", t="upload", mutable="true",
12646-                             file=("new.txt",
12647-                                   "b" * (self.s.MUTABLE_SIZELIMIT+1)) )
12648+        # The SDMF size limit is no longer in place, so we should be
12649+        # able to upload mutable files that are as large as we want them
12650+        # to be.
12651+        d = self.POST("/uri", t="upload", mutable="true",
12652+                      file=("new.txt", "b" * (self.s.MUTABLE_SIZELIMIT + 1)))
12653         return d
12654 
12655hunk ./src/allmydata/test/test_web.py 1817
12656+
12657+    def test_POST_upload_mutable_type_unlinked(self):
12658+        d = self.POST("/uri?t=upload&mutable=true&mutable-type=sdmf",
12659+                      file=("sdmf.txt", self.NEWFILE_CONTENTS * 300000))
12660+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12661+        def _got_json(json, version):
12662+            data = simplejson.loads(json)
12663+            data = data[1]
12664+
12665+            self.failUnlessIn("mutable-type", data)
12666+            self.failUnlessEqual(data['mutable-type'], version)
12667+        d.addCallback(_got_json, "sdmf")
12668+        d.addCallback(lambda ignored:
12669+            self.POST("/uri?t=upload&mutable=true&mutable-type=mdmf",
12670+                      file=('mdmf.txt', self.NEWFILE_CONTENTS * 300000)))
12671+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12672+        d.addCallback(_got_json, "mdmf")
12673+        return d
12674+
12675+    def test_POST_upload_mutable_type(self):
12676+        d = self.POST(self.public_url + \
12677+                      "/foo?t=upload&mutable=true&mutable-type=sdmf",
12678+                      file=("sdmf.txt", self.NEWFILE_CONTENTS * 300000))
12679+        fn = self._foo_node
12680+        def _got_cap(filecap, filename):
12681+            filenameu = unicode(filename)
12682+            self.failUnlessURIMatchesRWChild(filecap, fn, filenameu)
12683+            return self.GET(self.public_url + "/foo/%s?t=json" % filename)
12684+        d.addCallback(_got_cap, "sdmf.txt")
12685+        def _got_json(json, version):
12686+            data = simplejson.loads(json)
12687+            data = data[1]
12688+
12689+            self.failUnlessIn("mutable-type", data)
12690+            self.failUnlessEqual(data['mutable-type'], version)
12691+        d.addCallback(_got_json, "sdmf")
12692+        d.addCallback(lambda ignored:
12693+            self.POST(self.public_url + \
12694+                      "/foo?t=upload&mutable=true&mutable-type=mdmf",
12695+                      file=("mdmf.txt", self.NEWFILE_CONTENTS * 300000)))
12696+        d.addCallback(_got_cap, "mdmf.txt")
12697+        d.addCallback(_got_json, "mdmf")
12698+        return d
12699+
12700     def test_POST_upload_mutable(self):
12701         # this creates a mutable file
12702         d = self.POST(self.public_url + "/foo", t="upload", mutable="true",
12703hunk ./src/allmydata/test/test_web.py 1985
12704             self.failUnlessReallyEqual(headers["content-type"], ["text/plain"])
12705         d.addCallback(_got_headers)
12706 
12707-        # make sure that size errors are displayed correctly for overwrite
12708-        d.addCallback(lambda res:
12709-                      self.shouldFail2(error.Error,
12710-                                       "test_POST_upload_mutable-toobig",
12711-                                       "413 Request Entity Too Large",
12712-                                       "SDMF is limited to one segment, and 10001 > 10000",
12713-                                       self.POST,
12714-                                       self.public_url + "/foo", t="upload",
12715-                                       mutable="true",
12716-                                       file=("new.txt",
12717-                                             "b" * (self.s.MUTABLE_SIZELIMIT+1)),
12718-                                       ))
12719-
12720+        # make sure that outdated size limits aren't enforced anymore.
12721+        d.addCallback(lambda ignored:
12722+            self.POST(self.public_url + "/foo", t="upload",
12723+                      mutable="true",
12724+                      file=("new.txt",
12725+                            "b" * (self.s.MUTABLE_SIZELIMIT+1))))
12726         d.addErrback(self.dump_error)
12727         return d
12728 
12729hunk ./src/allmydata/test/test_web.py 1995
12730     def test_POST_upload_mutable_toobig(self):
12731-        d = self.shouldFail2(error.Error,
12732-                             "test_POST_upload_mutable_toobig",
12733-                             "413 Request Entity Too Large",
12734-                             "SDMF is limited to one segment, and 10001 > 10000",
12735-                             self.POST,
12736-                             self.public_url + "/foo",
12737-                             t="upload", mutable="true",
12738-                             file=("new.txt",
12739-                                   "b" * (self.s.MUTABLE_SIZELIMIT+1)) )
12740+        # SDMF had a size limti that was removed a while ago. MDMF has
12741+        # never had a size limit. Test to make sure that we do not
12742+        # encounter errors when trying to upload large mutable files,
12743+        # since there should be no coded prohibitions regarding large
12744+        # mutable files.
12745+        d = self.POST(self.public_url + "/foo",
12746+                      t="upload", mutable="true",
12747+                      file=("new.txt", "b" * (self.s.MUTABLE_SIZELIMIT + 1)))
12748         return d
12749 
12750     def dump_error(self, f):
12751hunk ./src/allmydata/test/test_web.py 3005
12752                                                       contents))
12753         return d
12754 
12755+    def test_PUT_NEWFILEURL_mdmf(self):
12756+        new_contents = self.NEWFILE_CONTENTS * 300000
12757+        d = self.PUT(self.public_url + \
12758+                     "/foo/mdmf.txt?mutable=true&mutable-type=mdmf",
12759+                     new_contents)
12760+        d.addCallback(lambda ignored:
12761+            self.GET(self.public_url + "/foo/mdmf.txt?t=json"))
12762+        def _got_json(json):
12763+            data = simplejson.loads(json)
12764+            data = data[1]
12765+            self.failUnlessIn("mutable-type", data)
12766+            self.failUnlessEqual(data['mutable-type'], "mdmf")
12767+        d.addCallback(_got_json)
12768+        return d
12769+
12770+    def test_PUT_NEWFILEURL_sdmf(self):
12771+        new_contents = self.NEWFILE_CONTENTS * 300000
12772+        d = self.PUT(self.public_url + \
12773+                     "/foo/sdmf.txt?mutable=true&mutable-type=sdmf",
12774+                     new_contents)
12775+        d.addCallback(lambda ignored:
12776+            self.GET(self.public_url + "/foo/sdmf.txt?t=json"))
12777+        def _got_json(json):
12778+            data = simplejson.loads(json)
12779+            data = data[1]
12780+            self.failUnlessIn("mutable-type", data)
12781+            self.failUnlessEqual(data['mutable-type'], "sdmf")
12782+        d.addCallback(_got_json)
12783+        return d
12784+
12785     def test_PUT_NEWFILEURL_uri_replace(self):
12786         contents, n, new_uri = self.makefile(8)
12787         d = self.PUT(self.public_url + "/foo/bar.txt?t=uri", new_uri)
12788hunk ./src/allmydata/test/test_web.py 3156
12789         d.addCallback(_done)
12790         return d
12791 
12792+
12793+    def test_PUT_update_at_offset(self):
12794+        file_contents = "test file" * 100000 # about 900 KiB
12795+        d = self.PUT("/uri?mutable=true", file_contents)
12796+        def _then(filecap):
12797+            self.filecap = filecap
12798+            new_data = file_contents[:100]
12799+            new = "replaced and so on"
12800+            new_data += new
12801+            new_data += file_contents[len(new_data):]
12802+            assert len(new_data) == len(file_contents)
12803+            self.new_data = new_data
12804+        d.addCallback(_then)
12805+        d.addCallback(lambda ignored:
12806+            self.PUT("/uri/%s?replace=True&offset=100" % self.filecap,
12807+                     "replaced and so on"))
12808+        def _get_data(filecap):
12809+            n = self.s.create_node_from_uri(filecap)
12810+            return n.download_best_version()
12811+        d.addCallback(_get_data)
12812+        d.addCallback(lambda results:
12813+            self.failUnlessEqual(results, self.new_data))
12814+        # Now try appending things to the file
12815+        d.addCallback(lambda ignored:
12816+            self.PUT("/uri/%s?offset=%d" % (self.filecap, len(self.new_data)),
12817+                     "puppies" * 100))
12818+        d.addCallback(_get_data)
12819+        d.addCallback(lambda results:
12820+            self.failUnlessEqual(results, self.new_data + ("puppies" * 100)))
12821+        return d
12822+
12823+
12824+    def test_PUT_update_at_offset_immutable(self):
12825+        file_contents = "Test file" * 100000
12826+        d = self.PUT("/uri", file_contents)
12827+        def _then(filecap):
12828+            self.filecap = filecap
12829+        d.addCallback(_then)
12830+        d.addCallback(lambda ignored:
12831+            self.shouldHTTPError("test immutable update",
12832+                                 400, "Bad Request",
12833+                                 "immutable",
12834+                                 self.PUT,
12835+                                 "/uri/%s?offset=50" % self.filecap,
12836+                                 "foo"))
12837+        return d
12838+
12839+
12840     def test_bad_method(self):
12841         url = self.webish_url + self.public_url + "/foo/bar.txt"
12842         d = self.shouldHTTPError("test_bad_method",
12843hunk ./src/allmydata/test/test_web.py 3473
12844         def _stash_mutable_uri(n, which):
12845             self.uris[which] = n.get_uri()
12846             assert isinstance(self.uris[which], str)
12847-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"3"))
12848+        d.addCallback(lambda ign:
12849+            c0.create_mutable_file(publish.MutableData(DATA+"3")))
12850         d.addCallback(_stash_mutable_uri, "corrupt")
12851         d.addCallback(lambda ign:
12852                       c0.upload(upload.Data("literal", convergence="")))
12853hunk ./src/allmydata/test/test_web.py 3620
12854         def _stash_mutable_uri(n, which):
12855             self.uris[which] = n.get_uri()
12856             assert isinstance(self.uris[which], str)
12857-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"3"))
12858+        d.addCallback(lambda ign:
12859+            c0.create_mutable_file(publish.MutableData(DATA+"3")))
12860         d.addCallback(_stash_mutable_uri, "corrupt")
12861 
12862         def _compute_fileurls(ignored):
12863hunk ./src/allmydata/test/test_web.py 4283
12864         def _stash_mutable_uri(n, which):
12865             self.uris[which] = n.get_uri()
12866             assert isinstance(self.uris[which], str)
12867-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"2"))
12868+        d.addCallback(lambda ign:
12869+            c0.create_mutable_file(publish.MutableData(DATA+"2")))
12870         d.addCallback(_stash_mutable_uri, "mutable")
12871 
12872         def _compute_fileurls(ignored):
12873hunk ./src/allmydata/test/test_web.py 4383
12874                                                         convergence="")))
12875         d.addCallback(_stash_uri, "small")
12876 
12877-        d.addCallback(lambda ign: c0.create_mutable_file("mutable"))
12878+        d.addCallback(lambda ign:
12879+            c0.create_mutable_file(publish.MutableData("mutable")))
12880         d.addCallback(lambda fn: self.rootnode.set_node(u"mutable", fn))
12881         d.addCallback(_stash_uri, "mutable")
12882 
12883}
12884[resolve conflicts between 393-MDMF patches and trunk as of 1.8.2
12885"Brian Warner <warner@lothar.com>"**20110220230201
12886 Ignore-this: 9bbf5d26c994e8069202331dcb4cdd95
12887] {
12888hunk ./docs/configuration.rst 323
12889     (Mutable files use a different share placement algorithm that does not
12890     currently consider this parameter.)
12891 
12892+``mutable.format = sdmf or mdmf``
12893+
12894+    This value tells Tahoe what the default mutable file format should
12895+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
12896+    in the old SDMF format. This is desirable for clients that operate on
12897+    grids where some peers run older versions of Tahoe, as these older
12898+    versions cannot read the new MDMF mutable file format. If
12899+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
12900+    the new MDMF format, which supports efficient in-place modification and
12901+    streaming downloads. You can overwrite this value using a special
12902+    mutable-type parameter in the webapi. If you do not specify a value here,
12903+    Tahoe will use SDMF for all newly-created mutable files.
12904+
12905+    Note that this parameter only applies to mutable files. Mutable
12906+    directories, which are stored as mutable files, are not controlled by
12907+    this parameter and will always use SDMF. We may revisit this decision
12908+    in future versions of Tahoe-LAFS.
12909+
12910 
12911 Storage Server Configuration
12912 ============================
12913hunk ./docs/configuration.rst 401
12914     `<garbage-collection.rst>`_ for full details.
12915 
12916 
12917-shares.needed = (int, optional) aka "k", default 3
12918-shares.total = (int, optional) aka "N", N >= k, default 10
12919-shares.happy = (int, optional) 1 <= happy <= N, default 7
12920-
12921- These three values set the default encoding parameters. Each time a new file
12922- is uploaded, erasure-coding is used to break the ciphertext into separate
12923- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
12924- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
12925- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
12926- Setting k to 1 is equivalent to simple replication (uploading N copies of
12927- the file).
12928-
12929- These values control the tradeoff between storage overhead, performance, and
12930- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
12931- backend storage space (the actual value will be a bit more, because of other
12932- forms of overhead). Up to N-k shares can be lost before the file becomes
12933- unrecoverable, so assuming there are at least N servers, up to N-k servers
12934- can be offline without losing the file. So large N/k ratios are more
12935- reliable, and small N/k ratios use less disk space. Clearly, k must never be
12936- smaller than N.
12937-
12938- Large values of N will slow down upload operations slightly, since more
12939- servers must be involved, and will slightly increase storage overhead due to
12940- the hash trees that are created. Large values of k will cause downloads to
12941- be marginally slower, because more servers must be involved. N cannot be
12942- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
12943- uses.
12944-
12945- shares.happy allows you control over the distribution of your immutable file.
12946- For a successful upload, shares are guaranteed to be initially placed on
12947- at least 'shares.happy' distinct servers, the correct functioning of any
12948- k of which is sufficient to guarantee the availability of the uploaded file.
12949- This value should not be larger than the number of servers on your grid.
12950-
12951- A value of shares.happy <= k is allowed, but does not provide any redundancy
12952- if some servers fail or lose shares.
12953-
12954- (Mutable files use a different share placement algorithm that does not
12955-  consider this parameter.)
12956-
12957-
12958-== Storage Server Configuration ==
12959-
12960-[storage]
12961-enabled = (boolean, optional)
12962-
12963- If this is True, the node will run a storage server, offering space to other
12964- clients. If it is False, the node will not run a storage server, meaning
12965- that no shares will be stored on this node. Use False this for clients who
12966- do not wish to provide storage service. The default value is True.
12967-
12968-readonly = (boolean, optional)
12969-
12970- If True, the node will run a storage server but will not accept any shares,
12971- making it effectively read-only. Use this for storage servers which are
12972- being decommissioned: the storage/ directory could be mounted read-only,
12973- while shares are moved to other servers. Note that this currently only
12974- affects immutable shares. Mutable shares (used for directories) will be
12975- written and modified anyway. See ticket #390 for the current status of this
12976- bug. The default value is False.
12977-
12978-reserved_space = (str, optional)
12979-
12980- If provided, this value defines how much disk space is reserved: the storage
12981- server will not accept any share which causes the amount of free disk space
12982- to drop below this value. (The free space is measured by a call to statvfs(2)
12983- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
12984- user account under which the storage server runs.)
12985-
12986- This string contains a number, with an optional case-insensitive scale
12987- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
12988- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
12989- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
12990-
12991-expire.enabled =
12992-expire.mode =
12993-expire.override_lease_duration =
12994-expire.cutoff_date =
12995-expire.immutable =
12996-expire.mutable =
12997-
12998- These settings control garbage-collection, in which the server will delete
12999- shares that no longer have an up-to-date lease on them. Please see the
13000- neighboring "garbage-collection.txt" document for full details.
13001-
13002-
13003-== Running A Helper ==
13004+Running A Helper
13005+================
13006 
13007 A "helper" is a regular client node that also offers the "upload helper"
13008 service.
13009replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
13010hunk ./src/allmydata/mutable/retrieve.py 7
13011 from zope.interface import implements
13012 from twisted.internet import defer
13013 from twisted.python import failure
13014-from foolscap.api import DeadReferenceError, eventually, fireEventually
13015-from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError
13016-from allmydata.util import hashutil, idlib, log
13017+from twisted.internet.interfaces import IPushProducer, IConsumer
13018+from foolscap.api import eventually, fireEventually
13019+from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError, \
13020+                                 MDMF_VERSION, SDMF_VERSION
13021+from allmydata.util import hashutil, log, mathutil
13022+from allmydata.util.dictutil import DictOfSets
13023 from allmydata import hashtree, codec
13024 from allmydata.storage.server import si_b2a
13025 from pycryptopp.cipher.aes import AES
13026hunk ./src/allmydata/mutable/retrieve.py 239
13027             # KiB, so we ask for that much.
13028             # TODO: Change the cache methods to allow us to fetch all of the
13029             # data that they have, then change this method to do that.
13030-            any_cache, timestamp = self._node._read_from_cache(self.verinfo,
13031-                                                               shnum,
13032-                                                               0,
13033-                                                               1000)
13034+            any_cache = self._node._read_from_cache(self.verinfo, shnum,
13035+                                                    0, 1000)
13036             ss = self.servermap.connections[peerid]
13037             reader = MDMFSlotReadProxy(ss,
13038                                        self._storage_index,
13039hunk ./src/allmydata/mutable/retrieve.py 373
13040                  (k, n, self._num_segments, self._segment_size,
13041                   self._tail_segment_size))
13042 
13043-        # ask the cache first
13044-        got_from_cache = False
13045-        datavs = []
13046-        for (offset, length) in readv:
13047-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
13048-                                                            offset, length)
13049-            if data is not None:
13050-                datavs.append(data)
13051-        if len(datavs) == len(readv):
13052-            self.log("got data from cache")
13053-            got_from_cache = True
13054-            d = fireEventually({shnum: datavs})
13055-            # datavs is a dict mapping shnum to a pair of strings
13056+        for i in xrange(self._total_shares):
13057+            # So we don't have to do this later.
13058+            self._block_hash_trees[i] = hashtree.IncompleteHashTree(self._num_segments)
13059+
13060+        # Our last task is to tell the downloader where to start and
13061+        # where to stop. We use three parameters for that:
13062+        #   - self._start_segment: the segment that we need to start
13063+        #     downloading from.
13064+        #   - self._current_segment: the next segment that we need to
13065+        #     download.
13066+        #   - self._last_segment: The last segment that we were asked to
13067+        #     download.
13068+        #
13069+        #  We say that the download is complete when
13070+        #  self._current_segment > self._last_segment. We use
13071+        #  self._start_segment and self._last_segment to know when to
13072+        #  strip things off of segments, and how much to strip.
13073+        if self._offset:
13074+            self.log("got offset: %d" % self._offset)
13075+            # our start segment is the first segment containing the
13076+            # offset we were given.
13077+            start = mathutil.div_ceil(self._offset,
13078+                                      self._segment_size)
13079+            # this gets us the first segment after self._offset. Then
13080+            # our start segment is the one before it.
13081+            start -= 1
13082+
13083+            assert start < self._num_segments
13084+            self._start_segment = start
13085+            self.log("got start segment: %d" % self._start_segment)
13086         else:
13087             self._start_segment = 0
13088 
13089hunk ./src/allmydata/mutable/servermap.py 7
13090 from itertools import count
13091 from twisted.internet import defer
13092 from twisted.python import failure
13093-from foolscap.api import DeadReferenceError, RemoteException, eventually
13094-from allmydata.util import base32, hashutil, idlib, log
13095+from foolscap.api import DeadReferenceError, RemoteException, eventually, \
13096+                         fireEventually
13097+from allmydata.util import base32, hashutil, idlib, log, deferredutil
13098+from allmydata.util.dictutil import DictOfSets
13099 from allmydata.storage.server import si_b2a
13100 from allmydata.interfaces import IServermapUpdaterStatus
13101 from pycryptopp.publickey import rsa
13102hunk ./src/allmydata/mutable/servermap.py 16
13103 
13104 from allmydata.mutable.common import MODE_CHECK, MODE_ANYTHING, MODE_WRITE, MODE_READ, \
13105-     DictOfSets, CorruptShareError, NeedMoreDataError
13106-from allmydata.mutable.layout import unpack_prefix_and_signature, unpack_header, unpack_share, \
13107-     SIGNED_PREFIX_LENGTH
13108+     CorruptShareError
13109+from allmydata.mutable.layout import SIGNED_PREFIX_LENGTH, MDMFSlotReadProxy
13110 
13111 class UpdateStatus:
13112     implements(IServermapUpdaterStatus)
13113hunk ./src/allmydata/mutable/servermap.py 391
13114         #  * if we need the encrypted private key, we want [-1216ish:]
13115         #   * but we can't read from negative offsets
13116         #   * the offset table tells us the 'ish', also the positive offset
13117-        # A future version of the SMDF slot format should consider using
13118-        # fixed-size slots so we can retrieve less data. For now, we'll just
13119-        # read 2000 bytes, which also happens to read enough actual data to
13120-        # pre-fetch a 9-entry dirnode.
13121+        # MDMF:
13122+        #  * Checkstring? [0:72]
13123+        #  * If we want to validate the checkstring, then [0:72], [143:?] --
13124+        #    the offset table will tell us for sure.
13125+        #  * If we need the verification key, we have to consult the offset
13126+        #    table as well.
13127+        # At this point, we don't know which we are. Our filenode can
13128+        # tell us, but it might be lying -- in some cases, we're
13129+        # responsible for telling it which kind of file it is.
13130         self._read_size = 4000
13131         if mode == MODE_CHECK:
13132             # we use unpack_prefix_and_signature, so we need 1k
13133hunk ./src/allmydata/mutable/servermap.py 633
13134         updated.
13135         """
13136         if verinfo:
13137-            self._node._add_to_cache(verinfo, shnum, 0, data, now)
13138+            self._node._add_to_cache(verinfo, shnum, 0, data)
13139 
13140 
13141     def _got_results(self, datavs, peerid, readsize, stuff, started):
13142hunk ./src/allmydata/mutable/servermap.py 664
13143 
13144         for shnum,datav in datavs.items():
13145             data = datav[0]
13146-            try:
13147-                verinfo = self._got_results_one_share(shnum, data, peerid, lp)
13148-                last_verinfo = verinfo
13149-                last_shnum = shnum
13150-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
13151-            except CorruptShareError, e:
13152-                # log it and give the other shares a chance to be processed
13153-                f = failure.Failure()
13154-                self.log(format="bad share: %(f_value)s", f_value=str(f.value),
13155-                         failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
13156-                self.notify_server_corruption(peerid, shnum, str(e))
13157-                self._bad_peers.add(peerid)
13158-                self._last_failure = f
13159-                checkstring = data[:SIGNED_PREFIX_LENGTH]
13160-                self._servermap.mark_bad_share(peerid, shnum, checkstring)
13161-                self._servermap.problems.append(f)
13162-                pass
13163+            reader = MDMFSlotReadProxy(ss,
13164+                                       storage_index,
13165+                                       shnum,
13166+                                       data)
13167+            self._readers.setdefault(peerid, dict())[shnum] = reader
13168+            # our goal, with each response, is to validate the version
13169+            # information and share data as best we can at this point --
13170+            # we do this by validating the signature. To do this, we
13171+            # need to do the following:
13172+            #   - If we don't already have the public key, fetch the
13173+            #     public key. We use this to validate the signature.
13174+            if not self._node.get_pubkey():
13175+                # fetch and set the public key.
13176+                d = reader.get_verification_key(queue=True)
13177+                d.addCallback(lambda results, shnum=shnum, peerid=peerid:
13178+                    self._try_to_set_pubkey(results, peerid, shnum, lp))
13179+                # XXX: Make self._pubkey_query_failed?
13180+                d.addErrback(lambda error, shnum=shnum, peerid=peerid:
13181+                    self._got_corrupt_share(error, shnum, peerid, data, lp))
13182+            else:
13183+                # we already have the public key.
13184+                d = defer.succeed(None)
13185 
13186             # Neither of these two branches return anything of
13187             # consequence, so the first entry in our deferredlist will
13188hunk ./src/allmydata/test/test_storage.py 1
13189-import time, os.path, platform, stat, re, simplejson, struct
13190+import time, os.path, platform, stat, re, simplejson, struct, shutil
13191 
13192hunk ./src/allmydata/test/test_storage.py 3
13193-import time, os.path, stat, re, simplejson, struct
13194+import mock
13195 
13196 from twisted.trial import unittest
13197 
13198}
13199[mutable/filenode.py: fix create_mutable_file('string')
13200"Brian Warner <warner@lothar.com>"**20110221014659
13201 Ignore-this: dc6bdad761089f0199681eeb784f1001
13202] hunk ./src/allmydata/mutable/filenode.py 137
13203         if contents is None:
13204             return MutableData("")
13205 
13206+        if isinstance(contents, str):
13207+            return MutableData(contents)
13208+
13209         if IMutableUploadable.providedBy(contents):
13210             return contents
13211 
13212[resolve more conflicts with current trunk
13213"Brian Warner <warner@lothar.com>"**20110221055600
13214 Ignore-this: 77ad038a478dbf5d9b34f7a68159a3e0
13215] hunk ./src/allmydata/mutable/servermap.py 461
13216         self._queries_completed = 0
13217 
13218         sb = self._storage_broker
13219-        full_peerlist = sb.get_servers_for_index(self._storage_index)
13220+        # All of the peers, permuted by the storage index, as usual.
13221+        full_peerlist = [(s.get_serverid(), s.get_rref())
13222+                         for s in sb.get_servers_for_psi(self._storage_index)]
13223         self.full_peerlist = full_peerlist # for use later, immutable
13224         self.extra_peers = full_peerlist[:] # peers are removed as we use them
13225         self._good_peers = set() # peers who had some shares
13226[update MDMF code with StorageFarmBroker changes
13227"Brian Warner <warner@lothar.com>"**20110221061004
13228 Ignore-this: a693b201d31125b391cebe0412ddd027
13229] {
13230hunk ./src/allmydata/mutable/publish.py 203
13231         self._encprivkey = self._node.get_encprivkey()
13232 
13233         sb = self._storage_broker
13234-        full_peerlist = sb.get_servers_for_index(self._storage_index)
13235+        full_peerlist = [(s.get_serverid(), s.get_rref())
13236+                         for s in sb.get_servers_for_psi(self._storage_index)]
13237         self.full_peerlist = full_peerlist # for use later, immutable
13238         self.bad_peers = set() # peerids who have errbacked/refused requests
13239 
13240hunk ./src/allmydata/test/test_mutable.py 2538
13241             # for either a block and salt or for hashes, either of which
13242             # will exercise the error handling code.
13243             killer = FirstServerGetsKilled()
13244-            for (serverid, ss) in nm.storage_broker.get_all_servers():
13245-                ss.post_call_notifier = killer.notify
13246+            for s in nm.storage_broker.get_connected_servers():
13247+                s.get_rref().post_call_notifier = killer.notify
13248             ver = servermap.best_recoverable_version()
13249             assert ver
13250             return self._node.download_version(servermap, ver)
13251}
13252[web: Use the string "replace" to trigger whole-file replacement when processing an offset parameter.
13253Kevan Carstensen <kevan@isnotajoke.com>**20110223000044
13254 Ignore-this: 7c2d2bb875bffd68090be186ad2c00b2
13255] {
13256hunk ./docs/frontends/webapi.rst 367
13257  specify an "offset" parameter -- a byte offset that determines where in
13258  the mutable file the data from the HTTP request body is placed. This
13259  operation is relatively efficient for MDMF mutable files, and is
13260- relatively inefficient (but still supported) for SDMF mutable files.
13261+ relatively inefficient (but still supported) for SDMF mutable files. If
13262+ no offset parameter is specified, or if the given offset is the string
13263+ "replace", then the entire file is replaced with the data from the HTTP
13264+ request body. For an immutable file, the "offset" parameter is only
13265+ valid if its value is the string "replace".
13266 
13267  When creating a new file, if "mutable=true" is in the query arguments, the
13268  operation will create a mutable file instead of an immutable one.
13269hunk ./src/allmydata/test/test_web.py 3187
13270             self.failUnlessEqual(results, self.new_data + ("puppies" * 100)))
13271         return d
13272 
13273+    def test_PUT_update_at_invalid_offset(self):
13274+        file_contents = "test file" * 100000 # about 900 KiB
13275+        d = self.PUT("/uri?mutable=true", file_contents)
13276+        def _then(filecap):
13277+            # We should see complaining from negative offsets.
13278+            self.filecap = filecap
13279+        d.addCallback(_then)
13280+        d.addCallback(lambda ignored:
13281+            self.shouldHTTPError("test mutable invalid offset negative",
13282+                                 400, "Bad Request",
13283+                                 "Invalid offset",
13284+                                 self.PUT,
13285+                                 "/uri/%s?offset=-1" % self.filecap,
13286+                                 "foo"))
13287+        d.addCallback(lambda ignored:
13288+            self.shouldHTTPError("test mutable invalid offset string",
13289+                                 400, "Bad Request",
13290+                                 "Invalid offset",
13291+                                 self.PUT,
13292+                                 "/uri/%s?offset=foobarbaz" % self.filecap,
13293+                                 "foo"))
13294+        return d
13295 
13296     def test_PUT_update_at_offset_immutable(self):
13297         file_contents = "Test file" * 100000
13298hunk ./src/allmydata/web/common.py 55
13299     # message? Since this call is going to be used by programmers and
13300     # their tools rather than users (through the wui), it is not
13301     # inconsistent to return that, I guess.
13302-    offset = int(offset)
13303     return offset
13304 
13305 
13306hunk ./src/allmydata/web/filenode.py 219
13307         req = IRequest(ctx)
13308         t = get_arg(req, "t", "").strip()
13309         replace = parse_replace_arg(get_arg(req, "replace", "true"))
13310-        offset = parse_offset_arg(get_arg(req, "offset", -1))
13311+        offset = parse_offset_arg(get_arg(req, "offset", "replace"))
13312 
13313         if not t:
13314hunk ./src/allmydata/web/filenode.py 222
13315-            if self.node.is_mutable() and offset >= 0:
13316-                return self.update_my_contents(req, offset)
13317-
13318-            elif self.node.is_mutable():
13319-                return self.replace_my_contents(req)
13320             if not replace:
13321                 # this is the early trap: if someone else modifies the
13322                 # directory while we're uploading, the add_file(overwrite=)
13323hunk ./src/allmydata/web/filenode.py 227
13324                 # call in replace_me_with_a_child will do the late trap.
13325                 raise ExistingChildError()
13326-            if offset >= 0:
13327-                raise WebError("PUT to a file: append operation invoked "
13328-                               "on an immutable cap")
13329 
13330hunk ./src/allmydata/web/filenode.py 228
13331+            if self.node.is_mutable():
13332+                if offset == "replace":
13333+                    return self.replace_my_contents(req)
13334+
13335+                try:
13336+                    offset = int(offset)
13337+
13338+                except ValueError:
13339+                    offset = -1
13340+
13341+                if offset >= 0:
13342+                    return self.update_my_contents(req, offset)
13343+
13344+                raise WebError("PUT to a mutable file: Invalid offset")
13345+
13346+            else:
13347+                if offset != "replace":
13348+                    raise WebError("PUT to a file: append operation invoked "
13349+                                   "on an immutable cap")
13350+
13351+                assert self.parentnode and self.name
13352+                return self.replace_me_with_a_child(req, self.client, replace)
13353 
13354hunk ./src/allmydata/web/filenode.py 251
13355-            assert self.parentnode and self.name
13356-            return self.replace_me_with_a_child(req, self.client, replace)
13357         if t == "uri":
13358             if not replace:
13359                 raise ExistingChildError()
13360}
13361[mutable/filenode: Clean up servermap handling in MutableFileVersion
13362Kevan Carstensen <kevan@isnotajoke.com>**20110226010433
13363 Ignore-this: 2257c9f65502098789f5ea355b94f130
13364 
13365 We want to update the servermap before attempting to modify a file,
13366 which we now do. This introduced code duplication, which was addressed
13367 by refactoring the servermap update into its own method, and then
13368 eliminating duplicate servermap updates throughout the
13369 MutableFileVersion.
13370] {
13371hunk ./src/allmydata/mutable/filenode.py 19
13372 from allmydata.mutable.publish import Publish, MutableData,\
13373                                       DEFAULT_MAX_SEGMENT_SIZE, \
13374                                       TransformingUploadable
13375-from allmydata.mutable.common import MODE_READ, MODE_WRITE, UnrecoverableFileError, \
13376+from allmydata.mutable.common import MODE_READ, MODE_WRITE, MODE_CHECK, UnrecoverableFileError, \
13377      ResponseCache, UncoordinatedWriteError
13378 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
13379 from allmydata.mutable.retrieve import Retrieve
13380hunk ./src/allmydata/mutable/filenode.py 807
13381         a little bit.
13382         """
13383         log.msg("doing modify")
13384-        d = self._modify_once(modifier, first_time)
13385+        if first_time:
13386+            d = self._update_servermap()
13387+        else:
13388+            # We ran into trouble; do MODE_CHECK so we're a little more
13389+            # careful on subsequent tries.
13390+            d = self._update_servermap(mode=MODE_CHECK)
13391+
13392+        d.addCallback(lambda ignored:
13393+            self._modify_once(modifier, first_time))
13394         def _retry(f):
13395             f.trap(UncoordinatedWriteError)
13396hunk ./src/allmydata/mutable/filenode.py 818
13397+            # Uh oh, it broke. We're allowed to trust the servermap for our
13398+            # first try, but after that we need to update it. It's
13399+            # possible that we've failed due to a race with another
13400+            # uploader, and if the race is to converge correctly, we
13401+            # need to know about that upload.
13402             d2 = defer.maybeDeferred(backoffer, self, f)
13403             d2.addCallback(lambda ignored:
13404                            self._modify_and_retry(modifier,
13405hunk ./src/allmydata/mutable/filenode.py 837
13406         I attempt to apply a modifier to the contents of the mutable
13407         file.
13408         """
13409-        # XXX: This is wrong -- we could get more servers if we updated
13410-        # in MODE_ANYTHING and possibly MODE_CHECK. Probably we want to
13411-        # assert that the last update wasn't MODE_READ
13412-        assert self._servermap.last_update_mode == MODE_WRITE
13413+        assert self._servermap.last_update_mode != MODE_READ
13414 
13415         # download_to_data is serialized, so we have to call this to
13416         # avoid deadlock.
13417hunk ./src/allmydata/mutable/filenode.py 1076
13418 
13419         # Now ask for the servermap to be updated in MODE_WRITE with
13420         # this update range.
13421-        u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13422-                             self._servermap,
13423-                             mode=MODE_WRITE,
13424-                             update_range=(start_segment, end_segment))
13425-        return u.update()
13426+        return self._update_servermap(update_range=(start_segment,
13427+                                                    end_segment))
13428 
13429 
13430     def _decode_and_decrypt_segments(self, ignored, data, offset):
13431hunk ./src/allmydata/mutable/filenode.py 1135
13432                                    segments_and_bht[1])
13433         p = Publish(self._node, self._storage_broker, self._servermap)
13434         return p.update(u, offset, segments_and_bht[2], self._version)
13435+
13436+
13437+    def _update_servermap(self, mode=MODE_WRITE, update_range=None):
13438+        """
13439+        I update the servermap. I return a Deferred that fires when the
13440+        servermap update is done.
13441+        """
13442+        if update_range:
13443+            u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13444+                                 self._servermap,
13445+                                 mode=mode,
13446+                                 update_range=update_range)
13447+        else:
13448+            u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13449+                                 self._servermap,
13450+                                 mode=mode)
13451+        return u.update()
13452}
13453
13454Context:
13455
13456[web/filenode.py: avoid calling req.finish() on closed HTTP connections. Closes #1366
13457"Brian Warner <warner@lothar.com>"**20110221061544
13458 Ignore-this: 799d4de19933f2309b3c0c19a63bb888
13459] 
13460[Add unit tests for cross_check_pkg_resources_versus_import, and a regression test for ref #1355. This requires a little refactoring to make it testable.
13461david-sarah@jacaranda.org**20110221015817
13462 Ignore-this: 51d181698f8c20d3aca58b057e9c475a
13463] 
13464[allmydata/__init__.py: .name was used in place of the correct .__name__ when printing an exception. Also, robustify string formatting by using %r instead of %s in some places. fixes #1355.
13465david-sarah@jacaranda.org**20110221020125
13466 Ignore-this: b0744ed58f161bf188e037bad077fc48
13467] 
13468[Refactor StorageFarmBroker handling of servers
13469Brian Warner <warner@lothar.com>**20110221015804
13470 Ignore-this: 842144ed92f5717699b8f580eab32a51
13471 
13472 Pass around IServer instance instead of (peerid, rref) tuple. Replace
13473 "descriptor" with "server". Other replacements:
13474 
13475  get_all_servers -> get_connected_servers/get_known_servers
13476  get_servers_for_index -> get_servers_for_psi (now returns IServers)
13477 
13478 This change still needs to be pushed further down: lots of code is now
13479 getting the IServer and then distributing (peerid, rref) internally.
13480 Instead, it ought to distribute the IServer internally and delay
13481 extracting a serverid or rref until the last moment.
13482 
13483 no_network.py was updated to retain parallelism.
13484] 
13485[TAG allmydata-tahoe-1.8.2
13486warner@lothar.com**20110131020101] 
13487Patch bundle hash:
1348861cda43ef05d5eb26f68d7b62c6db93c2659b406