Changeset fadfbca in trunk


Ignore:
Timestamp:
2021-09-28T14:37:56Z (4 years ago)
Author:
GitHub <noreply@…>
Branches:
master
Children:
0a072a9, a02d5f4
Parents:
eb5b6c5 (diff), f66f3e6 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
git-author:
Itamar Turner-Trauring <itamar@…> (2021-09-28 14:37:56)
git-committer:
GitHub <noreply@…> (2021-09-28 14:37:56)
Message:

Merge pull request #1130 from tahoe-lafs/3801-no-overlapping-writes-immutable-upload

Disallow conflicting overlapping writes when doing an immutable upload.

Fixes ticket:3801

Files:
2 added
9 edited

Legend:

Unmodified
Added
Removed
  • TabularUnified docs/proposed/http-storage-node-protocol.rst

    reb5b6c5 rfadfbca  
    498498The server must recognize when all of the data has been received and mark the share as complete
    499499(which it can do because it was informed of the size when the storage index was initialized).
    500 Clients should upload chunks in re-assembly order.
    501500
    502501* When a chunk that does not complete the share is successfully uploaded the response is ``OK``.
    503 * When the chunk that completes the share is successfully uploaded the response is ``CREATED``.
    504 * If the *Content-Range* for a request covers part of the share that has already been uploaded the response is ``CONFLICT``.
    505502  The response body indicates the range of share data that has yet to be uploaded.
    506503  That is::
     
    514511      ]
    515512    }
     513
     514* When the chunk that completes the share is successfully uploaded the response is ``CREATED``.
     515* If the *Content-Range* for a request covers part of the share that has already,
     516  and the data does not match already written data,
     517  the response is ``CONFLICT``.
     518  At this point the only thing to do is abort the upload and start from scratch (see below).
     519
     520``PUT /v1/immutable/:storage_index/:share_number/abort``
     521!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     522
     523This cancels an *in-progress* upload.
     524
     525The response code:
     526
     527* When the upload is still in progress and therefore the abort has succeeded,
     528  the response is ``OK``.
     529  Future uploads can start from scratch with no pre-existing upload state stored on the server.
     530* If the uploaded has already finished, the response is 405 (Method Not Allowed)
     531  and no change is made.
    516532
    517533
  • TabularUnified nix/overlays.nix

    reb5b6c5 rfadfbca  
    1919      # Need a newer version of Twisted, too.
    2020      twisted = python-super.callPackage ./twisted.nix { };
     21
     22      # collections-extended is not part of nixpkgs at this time.
     23      collections-extended = python-super.callPackage ./collections-extended.nix { };
    2124    };
    2225  };
  • TabularUnified nix/tahoe-lafs.nix

    reb5b6c5 rfadfbca  
    9898    service-identity pyyaml magic-wormhole treq
    9999    eliot autobahn cryptography netifaces setuptools
    100     future pyutil distro configparser
     100    future pyutil distro configparser collections-extended
    101101  ];
    102102
  • TabularUnified setup.py

    reb5b6c5 rfadfbca  
    138138    # Backported configparser for Python 2:
    139139    "configparser ; python_version < '3.0'",
     140
     141    # For the RangeMap datastructure.
     142    "collections-extended",
    140143]
    141144
  • TabularUnified src/allmydata/interfaces.py

    reb5b6c5 rfadfbca  
    5252LeaseRenewSecret = Hash # used to protect lease renewal requests
    5353LeaseCancelSecret = Hash # was used to protect lease cancellation requests
     54
     55
     56class DataTooLargeError(Exception):
     57    """The write went past the expected size of the bucket."""
     58
     59
     60class ConflictingWriteError(Exception):
     61    """Two writes happened to same immutable with different data."""
    5462
    5563
  • TabularUnified src/allmydata/storage/common.py

    reb5b6c5 rfadfbca  
    1414from allmydata.util import base32
    1515
    16 class DataTooLargeError(Exception):
    17     pass
     16# Backwards compatibility.
     17from allmydata.interfaces import DataTooLargeError  # noqa: F401
     18
    1819class UnknownMutableContainerVersionError(Exception):
    1920    pass
  • TabularUnified src/allmydata/storage/immutable.py

    reb5b6c5 rfadfbca  
    1414import os, stat, struct, time
    1515
     16from collections_extended import RangeMap
     17
    1618from foolscap.api import Referenceable
    1719
    1820from zope.interface import implementer
    19 from allmydata.interfaces import RIBucketWriter, RIBucketReader
     21from allmydata.interfaces import (
     22    RIBucketWriter, RIBucketReader, ConflictingWriteError,
     23    DataTooLargeError,
     24)
    2025from allmydata.util import base32, fileutil, log
    2126from allmydata.util.assertutil import precondition
    2227from allmydata.util.hashutil import timing_safe_compare
    2328from allmydata.storage.lease import LeaseInfo
    24 from allmydata.storage.common import UnknownImmutableContainerVersionError, \
    25      DataTooLargeError
     29from allmydata.storage.common import UnknownImmutableContainerVersionError
    2630
    2731# each share file (in storage/shares/$SI/$SHNUM) contains lease information
     
    218222        # added by simultaneous uploaders
    219223        self._sharefile.add_lease(lease_info)
     224        self._already_written = RangeMap()
    220225
    221226    def allocated_size(self):
     
    227232        if self.throw_out_all_data:
    228233            return
     234
     235        # Make sure we're not conflicting with existing data:
     236        end = offset + len(data)
     237        for (chunk_start, chunk_stop, _) in self._already_written.ranges(offset, end):
     238            chunk_len = chunk_stop - chunk_start
     239            actual_chunk = self._sharefile.read_share_data(chunk_start, chunk_len)
     240            writing_chunk = data[chunk_start - offset:chunk_stop - offset]
     241            if actual_chunk != writing_chunk:
     242                raise ConflictingWriteError(
     243                    "Chunk {}-{} doesn't match already written data.".format(chunk_start, chunk_stop)
     244                )
    229245        self._sharefile.write_share_data(offset, data)
     246
     247        self._already_written.set(True, offset, end)
    230248        self.ss.add_latency("write", time.time() - start)
    231249        self.ss.count("write")
  • TabularUnified src/allmydata/test/test_istorageserver.py

    reb5b6c5 rfadfbca  
    2525from twisted.internet.defer import inlineCallbacks
    2626
    27 from foolscap.api import Referenceable
     27from foolscap.api import Referenceable, RemoteException
    2828
    2929from allmydata.interfaces import IStorageServer
     
    249249        )
    250250
    251     @skipIf(True, "https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3801")
    252     def test_overlapping_writes(self):
    253         """
    254         The policy for overlapping writes is TBD:
    255         https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3801
    256         """
     251    @inlineCallbacks
     252    def test_non_matching_overlapping_writes(self):
     253        """
     254        When doing overlapping writes in immutable uploads, non-matching writes
     255        fail.
     256        """
     257        storage_index, renew_secret, cancel_secret = (
     258            new_storage_index(),
     259            new_secret(),
     260            new_secret(),
     261        )
     262        (_, allocated) = yield self.storage_server.allocate_buckets(
     263            storage_index,
     264            renew_secret,
     265            cancel_secret,
     266            sharenums={0},
     267            allocated_size=30,
     268            canary=Referenceable(),
     269        )
     270
     271        yield allocated[0].callRemote("write", 0, b"1" * 25)
     272        # Overlapping write that doesn't match:
     273        with self.assertRaises(RemoteException):
     274            yield allocated[0].callRemote("write", 20, b"2" * 10)
     275
     276    @inlineCallbacks
     277    def test_matching_overlapping_writes(self):
     278        """
     279        When doing overlapping writes in immutable uploads, matching writes
     280        succeed.
     281        """
     282        storage_index, renew_secret, cancel_secret = (
     283            new_storage_index(),
     284            new_secret(),
     285            new_secret(),
     286        )
     287        (_, allocated) = yield self.storage_server.allocate_buckets(
     288            storage_index,
     289            renew_secret,
     290            cancel_secret,
     291            sharenums={0},
     292            allocated_size=25,
     293            canary=Referenceable(),
     294        )
     295
     296        yield allocated[0].callRemote("write", 0, b"1" * 10)
     297        # Overlapping write that matches:
     298        yield allocated[0].callRemote("write", 5, b"1" * 20)
     299        yield allocated[0].callRemote("close")
     300
     301        buckets = yield self.storage_server.get_buckets(storage_index)
     302        self.assertEqual(set(buckets.keys()), {0})
     303
     304        self.assertEqual((yield buckets[0].callRemote("read", 0, 25)), b"1" * 25)
    257305
    258306    @inlineCallbacks
  • TabularUnified src/allmydata/test/test_storage.py

    reb5b6c5 rfadfbca  
    99from __future__ import unicode_literals
    1010
    11 from future.utils import native_str, PY2, bytes_to_native_str
     11from future.utils import native_str, PY2, bytes_to_native_str, bchr
    1212if PY2:
    1313    from future.builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, list, object, range, str, max, min  # noqa: F401
     
    2121import shutil
    2222import gc
     23from uuid import uuid4
    2324
    2425from twisted.trial import unittest
     
    2627from twisted.internet import defer
    2728from twisted.internet.task import Clock
     29
     30from hypothesis import given, strategies
    2831
    2932import itertools
     
    3437from allmydata.storage.mutable import MutableShareFile
    3538from allmydata.storage.immutable import BucketWriter, BucketReader, ShareFile
    36 from allmydata.storage.common import DataTooLargeError, storage_index_to_dir, \
     39from allmydata.storage.common import storage_index_to_dir, \
    3740     UnknownMutableContainerVersionError, UnknownImmutableContainerVersionError, \
    3841     si_b2a, si_a2b
     
    4851                                     VERIFICATION_KEY_SIZE, \
    4952                                     SHARE_HASH_CHAIN_SIZE
    50 from allmydata.interfaces import BadWriteEnablerError
     53from allmydata.interfaces import (
     54    BadWriteEnablerError, DataTooLargeError, ConflictingWriteError,
     55)
    5156from allmydata.test.no_network import NoNetworkServer
    5257from allmydata.storage_client import (
     
    147152        self.failUnlessEqual(br.remote_read(25, 25), b"b"*25)
    148153        self.failUnlessEqual(br.remote_read(50, 7), b"c"*7)
     154
     155    def test_write_past_size_errors(self):
     156        """Writing beyond the size of the bucket throws an exception."""
     157        for (i, (offset, length)) in enumerate([(0, 201), (10, 191), (202, 34)]):
     158            incoming, final = self.make_workdir(
     159                "test_write_past_size_errors-{}".format(i)
     160            )
     161            bw = BucketWriter(self, incoming, final, 200, self.make_lease(),
     162                              FakeCanary())
     163            with self.assertRaises(DataTooLargeError):
     164                bw.remote_write(offset, b"a" * length)
     165
     166    @given(
     167        maybe_overlapping_offset=strategies.integers(min_value=0, max_value=98),
     168        maybe_overlapping_length=strategies.integers(min_value=1, max_value=100),
     169    )
     170    def test_overlapping_writes_ok_if_matching(
     171            self, maybe_overlapping_offset, maybe_overlapping_length
     172    ):
     173        """
     174        Writes that overlap with previous writes are OK when the content is the
     175        same.
     176        """
     177        length = 100
     178        expected_data = b"".join(bchr(i) for i in range(100))
     179        incoming, final = self.make_workdir("overlapping_writes_{}".format(uuid4()))
     180        bw = BucketWriter(
     181            self, incoming, final, length, self.make_lease(),
     182            FakeCanary()
     183        )
     184        # Three writes: 10-19, 30-39, 50-59. This allows for a bunch of holes.
     185        bw.remote_write(10, expected_data[10:20])
     186        bw.remote_write(30, expected_data[30:40])
     187        bw.remote_write(50, expected_data[50:60])
     188        # Then, an overlapping write but with matching data:
     189        bw.remote_write(
     190            maybe_overlapping_offset,
     191            expected_data[
     192                maybe_overlapping_offset:maybe_overlapping_offset + maybe_overlapping_length
     193            ]
     194        )
     195        # Now fill in the holes:
     196        bw.remote_write(0, expected_data[0:10])
     197        bw.remote_write(20, expected_data[20:30])
     198        bw.remote_write(40, expected_data[40:50])
     199        bw.remote_write(60, expected_data[60:])
     200        bw.remote_close()
     201
     202        br = BucketReader(self, bw.finalhome)
     203        self.assertEqual(br.remote_read(0, length), expected_data)
     204
     205
     206    @given(
     207        maybe_overlapping_offset=strategies.integers(min_value=0, max_value=98),
     208        maybe_overlapping_length=strategies.integers(min_value=1, max_value=100),
     209    )
     210    def test_overlapping_writes_not_ok_if_different(
     211            self, maybe_overlapping_offset, maybe_overlapping_length
     212    ):
     213        """
     214        Writes that overlap with previous writes fail with an exception if the
     215        contents don't match.
     216        """
     217        length = 100
     218        incoming, final = self.make_workdir("overlapping_writes_{}".format(uuid4()))
     219        bw = BucketWriter(
     220            self, incoming, final, length, self.make_lease(),
     221            FakeCanary()
     222        )
     223        # Three writes: 10-19, 30-39, 50-59. This allows for a bunch of holes.
     224        bw.remote_write(10, b"1" * 10)
     225        bw.remote_write(30, b"1" * 10)
     226        bw.remote_write(50, b"1" * 10)
     227        # Then, write something that might overlap with some of them, but
     228        # conflicts. Then fill in holes left by first three writes. Conflict is
     229        # inevitable.
     230        with self.assertRaises(ConflictingWriteError):
     231            bw.remote_write(
     232                maybe_overlapping_offset,
     233                b'X' * min(maybe_overlapping_length, length - maybe_overlapping_offset),
     234            )
     235            bw.remote_write(0, b"1" * 10)
     236            bw.remote_write(20, b"1" * 10)
     237            bw.remote_write(40, b"1" * 10)
     238            bw.remote_write(60, b"1" * 40)
    149239
    150240    def test_read_past_end_of_share_data(self):
Note: See TracChangeset for help on using the changeset viewer.