[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2875: Multiples storage servers can be registered with the same FURL (UncoordinatedWriteError on I2P grid)

Tahoe-LAFS trac at tahoe-lafs.org
Fri Jun 23 18:04:06 UTC 2017


#2875: Multiples storage servers can be registered with the same FURL
(UncoordinatedWriteError on I2P grid)
-------------------------+------------------------------
     Reporter:  nwks     |      Owner:
         Type:  defect   |     Status:  new
     Priority:  normal   |  Milestone:  undecided
    Component:  unknown  |    Version:  1.12.1
   Resolution:           |   Keywords:  i2p introduction
Launchpad Bug:           |
-------------------------+------------------------------
Description changed by nwks:

Old description:

> On the I2P grid, some servers are misconfigured and share the same FURL.
>
> As ''!NativeStorageServer.get_serverid()'' returns key_s, it is possible
> for ''!StorageFarmBroker'' to register these misconfigured servers as
> separate ones. So when publishing, share writers can end up writing to
> the same destination and then find surprise shares, triggering a
> !UncoordinatedWriteError.
>
> In release 1.11, ''_tubid'' was used in ''get_serverid()'' instead of
> ''key_s'', so this issue did not exist. Also it is not yet widespread on
> the I2P grid as most nodes are still running 1.10.
>
> Workaround: check for duplicated FURLs in ''!NativeStorageServer''
> announcement processing, or restore the return value of
> ''!NativeStorageServer.get_serverid()'' to ''_tubid'', instead of
> ''key_s''.
>

> == Original description ==
>
> I've setup a node on the I2P grid which has ~30 storage server connected.
> However I am unable to create aliases or directories with the default
> setting (SDMF) as it always fails with an !UncoordinatedWriteError.
>
> {{{
> > tahoe create-alias newalias
> Error during HTTP request: 500 Internal Server Error
> Traceback (most recent call last):
> Failure: allmydata.mutable.common.UncoordinatedWriteError:
>
>  > tahoe mkdir tahoe:newdir
> Error during HTTP request: 500 Internal Server Error
> Traceback (most recent call last):
> Failure: allmydata.mutable.common.UncoordinatedWriteError:
> }}}
>
> Here is a flog dump when it happens (flogs are also attached):
> {{{
> Remote Versions:
>  Nevow: 0.14.2
>  OpenSSL: 1.0.1t
>  PyYAML: 3.12
>  Twisted: 17.1.0
>  appdirs: 1.4.3
>  asn1crypto: 0.22.0
>  attrs: 17.1.0
>  automat: 0.6.0
>  cffi: 1.10.0
>  characteristic: 14.3.0
>  constantly: 15.1.0
>  cryptography: 1.8.1
>  enum34: 1.1.6
>  foolscap: 0.12.6
>  idna: 2.5
>  incremental: 16.10.1
>  ipaddress: 1.0.18
>  packaging: 16.8
>  platform: Linux-debian_8.8-x86_64-64bit_ELF
>  pyOpenSSL: 17.0.0
>  pyasn1: 0.2.3
>  pyasn1-modules: 0.0.8
>  pycparser: 2.17
>  pycrypto: 2.6.1
>  pycryptopp: 0.7.1.869544967005693312591928092448767568728501330214
>  pyparsing: 2.2.0
>  python: 2.7.9
>  service-identity: 16.0.0
>  setuptools: 35.0.2
>  simplejson: 3.10.0
>  six: 1.10.0
>  tahoe-lafs: 1.12.1
>  twisted: 17.1.0
>  zfec: 1.4.24
>  zope.interface: unknown
> 08:19:43.855 L20 []#172995 Publish(iarly): starting
> 08:19:43.855 L20 []#172996 starting publish, datalen is 0
> 08:19:43.856 L10 []#172997 new seqnum will be 1
> 08:19:43.857 L20 []#172998 building encoding parameters for file
> 08:19:43.857 L20 []#172999 got segsize 0
> 08:19:43.857 L20 []#173000 got 0 segments
> 08:19:43.857 L20 []#173001 got start segment 0
> 08:19:43.857 L20 []#173002 got end segment -1
> 08:19:43.857 L10 []#173003 current goal: before update:
> 08:19:43.857 L10 []#173004 we are planning to push new seqnum=#1
> 08:19:43.857 L10 []#173005 current goal: after update: , sh0 to
> [3cqfcfch], sh1 to [ysse5rzw], sh2 to [vl3r33qr], sh3 to [xazwtxob], sh4
> to [t6b4wzcs], sh5 to [6i2sk6xm], sh6 to [wbaqeiex], sh7 to [qlnepnf6],
> sh8 to [svq7zfma], sh9 to [55khl43t]
> 08:19:43.857 L10 []#173006 we are planning to push new seqnum=#1
> 08:19:43.859 L20 []#173007 Starting push
> 08:19:44.306 L20 []#173008 storage: slot_writev
> iarlyicewprjrxlabf5pvsgaiu
> 08:19:44.320 L20 []#173009 _got_write_answer from 3cqfcfch, share 0
> 08:19:44.320 L20 []#173010 found the following surprise shares: set([])
> 08:19:44.321 L20 []#173011 wrote successfully: adding new share to
> servermap
> 08:19:44.864 L20 []#173012 _got_write_answer from qlnepnf6, share 7
> 08:19:44.864 L20 []#173013 found the following surprise shares: set([])
> 08:19:44.864 L20 []#173014 wrote successfully: adding new share to
> servermap
> 08:19:45.097 L20 []#173015 _got_write_answer from svq7zfma, share 8
> 08:19:45.097 L20 []#173016 found the following surprise shares: set([7])
> 08:19:45.098 L30 []#173017 they had shares [7] that we didn't know about
> 08:19:46.178 L20 []#173018 wrote successfully: adding new share to
> servermap
> 08:19:46.194 L20 []#173019 _got_write_answer from 6i2sk6xm, share 5
> 08:19:46.194 L20 []#173020 found the following surprise shares: set([])
> 08:19:46.194 L20 []#173021 wrote successfully: adding new share to
> servermap
> 08:19:46.208 L20 []#173022 _got_write_answer from 55khl43t, share 9
> 08:19:46.208 L20 []#173023 found the following surprise shares: set([])
> 08:19:46.208 L20 []#173024 wrote successfully: adding new share to
> servermap
> 08:19:46.877 L20 []#173025 _got_write_answer from ysse5rzw, share 1
> 08:19:46.878 L20 []#173026 found the following surprise shares: set([])
> 08:19:46.878 L20 []#173027 wrote successfully: adding new share to
> servermap
> 08:19:47.432 L20 []#173028 _got_write_answer from vl3r33qr, share 2
> 08:19:47.432 L20 []#173029 found the following surprise shares: set([1])
> 08:19:47.432 L30 []#173030 they had shares [1] that we didn't know about
> 08:19:47.433 L20 []#173031 wrote successfully: adding new share to
> servermap
> 08:19:47.641 L20 []#173032 _got_write_answer from wbaqeiex, share 6
> 08:19:47.641 L20 []#173033 found the following surprise shares: set([5])
> 08:19:47.641 L30 []#173034 they had shares [5] that we didn't know about
> 08:19:47.641 L20 []#173035 wrote successfully: adding new share to
> servermap
> 08:19:51.945 L20 []#173036 _got_write_answer from xazwtxob, share 3
> 08:19:51.945 L20 []#173037 found the following surprise shares: set([])
> 08:19:51.945 L20 []#173038 wrote successfully: adding new share to
> servermap
> 08:19:55.978 L20 []#173039 _got_write_answer from t6b4wzcs, share 4
> 08:19:55.978 L20 []#173040 found the following surprise shares: set([3])
> 08:19:55.978 L30 []#173041 they had shares [3] that we didn't know about
> 08:19:57.085 L20 []#173042 wrote successfully: adding new share to
> servermap
> 08:19:57.086 L20 []#173043 Publish failed with UncoordinatedWriteError
> 08:19:57.106 L20 []#173044 web: 127.0.0.1 POST /uri/[CENSORED]..?t=mkdir
> 500 95
> }}}
>
> Creating MDMF files or directories in the WUI, and un-linking throws the
> following !KeyError, but the job actually gets done.
> (I manually created the tahoe: alias that way.)
> {{{
>  > tahoe rm tahoe:testfile
> ERROR: 500 Internal Server Error
> "Traceback (most recent call last):
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/twisted/internet/base.py\", line 878, in runUntilCurrent
>     call.func(*call.args, **call.kw)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/foolscap/eventual.py\", line 26, in _turn
>     cb(*args, **kwargs)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/twisted/internet/defer.py\", line 459, in callback
>     self._startRunCallbacks(result)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/twisted/internet/defer.py\", line 567, in _startRunCallbacks
>     self._runCallbacks()
> --- <exception caught here> ---
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/twisted/internet/defer.py\", line 653, in _runCallbacks
>     current.result = callback(current.result, *args, **kw)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/allmydata/mutable/filenode.py\", line 856, in _retry
>     f.trap(UncoordinatedWriteError)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/twisted/python/failure.py\", line 346, in trap
>     self.raiseException()
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/twisted/internet/defer.py\", line 653, in _runCallbacks
>     current.result = callback(current.result, *args, **kw)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/allmydata/mutable/filenode.py\", line 854, in <lambda>
>     self._modify_once(modifier, first_time))
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/allmydata/mutable/filenode.py\", line 880, in _modify_once
>     d = self._try_to_download_data()
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/allmydata/mutable/filenode.py\", line 958, in
> _try_to_download_data
>     d = self._read(c, fetch_privkey=True)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/allmydata/mutable/filenode.py\", line 980, in _read
>     d = r.download(consumer, offset, size)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/allmydata/mutable/retrieve.py\", line 253, in download
>     self._start_download(consumer, offset, size)
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/allmydata/mutable/retrieve.py\", line 265, in _start_download
>     self._setup_download()
>   File \"/home/debian/.local/lib/python2.7/site-
> packages/allmydata/mutable/retrieve.py\", line 301, in _setup_download
>     shares = versionmap[self.verinfo]
> exceptions.KeyError: (12,
> \"\\x9f\\x02FDS5O\\xed\\x1fEz\\xa2\\x02\\xd1\\x97\\xb9%\\x0b\\xef\\xc2{\\xa6\\xcc\\x0b'&\\xf3]\\x9c\\x8e\\x05\\xa7\",
> None, 131073, 1108, 3, 10,
> \"\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x0c\\x9f\\x02FDS5O\\xed\\x1fEz\\xa2\\x02\\xd1\\x97\\xb9%\\x0b\\xef\\xc2{\\xa6\\xcc\\x0b'&\\xf3]\\x9c\\x8e\\x05\\xa7\\x03\\n\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x04T\",
> (('EOF', 2585), ('verification_key', 1731), ('enc_privkey', 123),
> ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2553),
> ('share_hash_chain', 1339), ('verification_key_end', 2023)))
>
> }}}
>
> I had no issues on the testGrid.

New description:

 On the I2P grid, introducers send announcements with different key_s but
 with the same FURL.

 As ''!NativeStorageServer.get_serverid()'' returns key_s, it is possible
 for ''!StorageFarmBroker'' to register these announcements as separate
 storage servers. So when publishing, share writers can end up writing to
 the same destination and then find surprise shares, triggering a
 !UncoordinatedWriteError.

 In release 1.11, ''_tubid'' was used in ''get_serverid()'' instead of
 ''key_s'', so this issue did not exist. Also it is not yet widespread on
 the I2P grid as most nodes are still running 1.10.

 Workaround: check for duplicated FURLs in ''!NativeStorageServer''
 announcement processing, or restore the return value of
 ''!NativeStorageServer.get_serverid()'' to ''_tubid'', instead of
 ''key_s''.


 == Original description ==

 I've setup a node on the I2P grid which has ~30 storage server connected.
 However I am unable to create aliases or directories with the default
 setting (SDMF) as it always fails with an !UncoordinatedWriteError.

 {{{
 > tahoe create-alias newalias
 Error during HTTP request: 500 Internal Server Error
 Traceback (most recent call last):
 Failure: allmydata.mutable.common.UncoordinatedWriteError:

  > tahoe mkdir tahoe:newdir
 Error during HTTP request: 500 Internal Server Error
 Traceback (most recent call last):
 Failure: allmydata.mutable.common.UncoordinatedWriteError:
 }}}

 Here is a flog dump when it happens (flogs are also attached):
 {{{
 Remote Versions:
  Nevow: 0.14.2
  OpenSSL: 1.0.1t
  PyYAML: 3.12
  Twisted: 17.1.0
  appdirs: 1.4.3
  asn1crypto: 0.22.0
  attrs: 17.1.0
  automat: 0.6.0
  cffi: 1.10.0
  characteristic: 14.3.0
  constantly: 15.1.0
  cryptography: 1.8.1
  enum34: 1.1.6
  foolscap: 0.12.6
  idna: 2.5
  incremental: 16.10.1
  ipaddress: 1.0.18
  packaging: 16.8
  platform: Linux-debian_8.8-x86_64-64bit_ELF
  pyOpenSSL: 17.0.0
  pyasn1: 0.2.3
  pyasn1-modules: 0.0.8
  pycparser: 2.17
  pycrypto: 2.6.1
  pycryptopp: 0.7.1.869544967005693312591928092448767568728501330214
  pyparsing: 2.2.0
  python: 2.7.9
  service-identity: 16.0.0
  setuptools: 35.0.2
  simplejson: 3.10.0
  six: 1.10.0
  tahoe-lafs: 1.12.1
  twisted: 17.1.0
  zfec: 1.4.24
  zope.interface: unknown
 08:19:43.855 L20 []#172995 Publish(iarly): starting
 08:19:43.855 L20 []#172996 starting publish, datalen is 0
 08:19:43.856 L10 []#172997 new seqnum will be 1
 08:19:43.857 L20 []#172998 building encoding parameters for file
 08:19:43.857 L20 []#172999 got segsize 0
 08:19:43.857 L20 []#173000 got 0 segments
 08:19:43.857 L20 []#173001 got start segment 0
 08:19:43.857 L20 []#173002 got end segment -1
 08:19:43.857 L10 []#173003 current goal: before update:
 08:19:43.857 L10 []#173004 we are planning to push new seqnum=#1
 08:19:43.857 L10 []#173005 current goal: after update: , sh0 to
 [3cqfcfch], sh1 to [ysse5rzw], sh2 to [vl3r33qr], sh3 to [xazwtxob], sh4
 to [t6b4wzcs], sh5 to [6i2sk6xm], sh6 to [wbaqeiex], sh7 to [qlnepnf6],
 sh8 to [svq7zfma], sh9 to [55khl43t]
 08:19:43.857 L10 []#173006 we are planning to push new seqnum=#1
 08:19:43.859 L20 []#173007 Starting push
 08:19:44.306 L20 []#173008 storage: slot_writev iarlyicewprjrxlabf5pvsgaiu
 08:19:44.320 L20 []#173009 _got_write_answer from 3cqfcfch, share 0
 08:19:44.320 L20 []#173010 found the following surprise shares: set([])
 08:19:44.321 L20 []#173011 wrote successfully: adding new share to
 servermap
 08:19:44.864 L20 []#173012 _got_write_answer from qlnepnf6, share 7
 08:19:44.864 L20 []#173013 found the following surprise shares: set([])
 08:19:44.864 L20 []#173014 wrote successfully: adding new share to
 servermap
 08:19:45.097 L20 []#173015 _got_write_answer from svq7zfma, share 8
 08:19:45.097 L20 []#173016 found the following surprise shares: set([7])
 08:19:45.098 L30 []#173017 they had shares [7] that we didn't know about
 08:19:46.178 L20 []#173018 wrote successfully: adding new share to
 servermap
 08:19:46.194 L20 []#173019 _got_write_answer from 6i2sk6xm, share 5
 08:19:46.194 L20 []#173020 found the following surprise shares: set([])
 08:19:46.194 L20 []#173021 wrote successfully: adding new share to
 servermap
 08:19:46.208 L20 []#173022 _got_write_answer from 55khl43t, share 9
 08:19:46.208 L20 []#173023 found the following surprise shares: set([])
 08:19:46.208 L20 []#173024 wrote successfully: adding new share to
 servermap
 08:19:46.877 L20 []#173025 _got_write_answer from ysse5rzw, share 1
 08:19:46.878 L20 []#173026 found the following surprise shares: set([])
 08:19:46.878 L20 []#173027 wrote successfully: adding new share to
 servermap
 08:19:47.432 L20 []#173028 _got_write_answer from vl3r33qr, share 2
 08:19:47.432 L20 []#173029 found the following surprise shares: set([1])
 08:19:47.432 L30 []#173030 they had shares [1] that we didn't know about
 08:19:47.433 L20 []#173031 wrote successfully: adding new share to
 servermap
 08:19:47.641 L20 []#173032 _got_write_answer from wbaqeiex, share 6
 08:19:47.641 L20 []#173033 found the following surprise shares: set([5])
 08:19:47.641 L30 []#173034 they had shares [5] that we didn't know about
 08:19:47.641 L20 []#173035 wrote successfully: adding new share to
 servermap
 08:19:51.945 L20 []#173036 _got_write_answer from xazwtxob, share 3
 08:19:51.945 L20 []#173037 found the following surprise shares: set([])
 08:19:51.945 L20 []#173038 wrote successfully: adding new share to
 servermap
 08:19:55.978 L20 []#173039 _got_write_answer from t6b4wzcs, share 4
 08:19:55.978 L20 []#173040 found the following surprise shares: set([3])
 08:19:55.978 L30 []#173041 they had shares [3] that we didn't know about
 08:19:57.085 L20 []#173042 wrote successfully: adding new share to
 servermap
 08:19:57.086 L20 []#173043 Publish failed with UncoordinatedWriteError
 08:19:57.106 L20 []#173044 web: 127.0.0.1 POST /uri/[CENSORED]..?t=mkdir
 500 95
 }}}

 Creating MDMF files or directories in the WUI, and un-linking throws the
 following !KeyError, but the job actually gets done.
 (I manually created the tahoe: alias that way.)
 {{{
  > tahoe rm tahoe:testfile
 ERROR: 500 Internal Server Error
 "Traceback (most recent call last):
   File \"/home/debian/.local/lib/python2.7/site-
 packages/twisted/internet/base.py\", line 878, in runUntilCurrent
     call.func(*call.args, **call.kw)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/foolscap/eventual.py\", line 26, in _turn
     cb(*args, **kwargs)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/twisted/internet/defer.py\", line 459, in callback
     self._startRunCallbacks(result)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/twisted/internet/defer.py\", line 567, in _startRunCallbacks
     self._runCallbacks()
 --- <exception caught here> ---
   File \"/home/debian/.local/lib/python2.7/site-
 packages/twisted/internet/defer.py\", line 653, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/allmydata/mutable/filenode.py\", line 856, in _retry
     f.trap(UncoordinatedWriteError)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/twisted/python/failure.py\", line 346, in trap
     self.raiseException()
   File \"/home/debian/.local/lib/python2.7/site-
 packages/twisted/internet/defer.py\", line 653, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/allmydata/mutable/filenode.py\", line 854, in <lambda>
     self._modify_once(modifier, first_time))
   File \"/home/debian/.local/lib/python2.7/site-
 packages/allmydata/mutable/filenode.py\", line 880, in _modify_once
     d = self._try_to_download_data()
   File \"/home/debian/.local/lib/python2.7/site-
 packages/allmydata/mutable/filenode.py\", line 958, in
 _try_to_download_data
     d = self._read(c, fetch_privkey=True)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/allmydata/mutable/filenode.py\", line 980, in _read
     d = r.download(consumer, offset, size)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/allmydata/mutable/retrieve.py\", line 253, in download
     self._start_download(consumer, offset, size)
   File \"/home/debian/.local/lib/python2.7/site-
 packages/allmydata/mutable/retrieve.py\", line 265, in _start_download
     self._setup_download()
   File \"/home/debian/.local/lib/python2.7/site-
 packages/allmydata/mutable/retrieve.py\", line 301, in _setup_download
     shares = versionmap[self.verinfo]
 exceptions.KeyError: (12,
 \"\\x9f\\x02FDS5O\\xed\\x1fEz\\xa2\\x02\\xd1\\x97\\xb9%\\x0b\\xef\\xc2{\\xa6\\xcc\\x0b'&\\xf3]\\x9c\\x8e\\x05\\xa7\",
 None, 131073, 1108, 3, 10,
 \"\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x0c\\x9f\\x02FDS5O\\xed\\x1fEz\\xa2\\x02\\xd1\\x97\\xb9%\\x0b\\xef\\xc2{\\xa6\\xcc\\x0b'&\\xf3]\\x9c\\x8e\\x05\\xa7\\x03\\n\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x04T\",
 (('EOF', 2585), ('verification_key', 1731), ('enc_privkey', 123),
 ('share_data', 2167), ('signature', 1475), ('block_hash_tree', 2553),
 ('share_hash_chain', 1339), ('verification_key_end', 2023)))

 }}}

 I had no issues on the testGrid.

--

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2875#comment:5>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list