#2204 closed defect (fixed)

cloud branch: allmydata.storage.leasedb.NonExistentShareError

Reported by: CyberAxe Owned by: exarkun
Priority: normal Milestone: undecided
Component: code-storage Version: cloud-branch
Keywords: renew leases leasedb blocks-cloud-merge Cc:
Launchpad Bug:

Description (last modified by exarkun)

processing 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9'
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_0.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_1.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_0.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'..
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files'
 creating directory for 'c:\BackMeUp\'
Traceback (most recent call last):
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 123, in run
    put_child(archives_url, now, new_backup_dircap)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child
    raise HTTPError("Error during put_child", resp)
HTTPError: Error during put_child: 500 Internal Server Error
"Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback (
failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r
emote host -- Traceback (most recent call last):\\n  File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n
 cb(*args, **kwargs)\\n  File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n    op(res)\\n  Fil
e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n    self
._startRunCallbacks(result)\\n  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\
", line 490, in _startRunCallbacks\\n    self._runCallbacks()\\n--- <exception caught here> ---\\n  File \"/usr/local/lib/python2.7/dist-pac
kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n    current.result = callback(current.re
sult, *args, **kw)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n    account.
add_or_renew_default_lease(self.storage_index, shnum)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in
add_or_renew_default_lease\\n    return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n  File \"/home/custom
er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n    renewal_time, expiration_time)\\n  File \"/home/cust
omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n    raise NonExistentShareError(si_s, shnum)\\nallmy
data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'o2ofk2psfahdhyf5uq3nqdmrwe\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None)

"

Extracting and reformatting the traceback contained in that error:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/foolscap/eventual.py", line 26, in _turn
 cb(*args, **kwargs)
  File "/home/customer/LAFS_source/src/allmydata/util/deferredutil.py", line 55, in _with_log
    op(res)
  File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py", line 382, in callback
    self._startRunCallbacks(result)
  File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py", line 490, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/customer/LAFS_source/src/allmydata/storage/backends/base.py", line 170, in _update_lease
    account.add_or_renew_default_lease(self.storage_index, shnum)
  File "/home/customer/LAFS_source/src/allmydata/storage/account.py", line 58, in add_or_renew_default_lease
    return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)
  File "/home/customer/LAFS_source/src/allmydata/storage/account.py", line 63, in add_or_renew_lease
    renewal_time, expiration_time)
  File "/home/customer/LAFS_source/src/allmydata/storage/leasedb.py", line 267, in add_or_renew_leases
    raise NonExistentShareError(si_s, shnum)
allmydata.storage.leasedb.NonExistentShareError: can't find SI='o2ofk2psfahdhyf5uq3nqdmrwe' shnum=0 in `shares` table

Change History (11)

comment:1 Changed at 2014-03-25T02:08:18Z by CyberAxe

I started with a new SQLite.db but get the same result.

skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'..
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files'
 re-using old directory for 'c:\BackMeUp\'
Traceback (most recent call last):
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 124, in run
    put_child(to_url, "Latest", new_backup_dircap)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child
    raise HTTPError("Error during put_child", resp)
HTTPError: Error during put_child: 500 Internal Server Error
"Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback (
failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r
emote host -- Traceback (most recent call last):\\n  File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n
 cb(*args, **kwargs)\\n  File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n    op(res)\\n  Fil
e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n    self
._startRunCallbacks(result)\\n  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\
", line 490, in _startRunCallbacks\\n    self._runCallbacks()\\n--- <exception caught here> ---\\n  File \"/usr/local/lib/python2.7/dist-pac
kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n    current.result = callback(current.re
sult, *args, **kw)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n    account.
add_or_renew_default_lease(self.storage_index, shnum)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in
add_or_renew_default_lease\\n    return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n  File \"/home/custom
er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n    renewal_time, expiration_time)\\n  File \"/home/cust
omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n    raise NonExistentShareError(si_s, shnum)\\nallmy
data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'acbv6l7p3wqrcaxgcskc5tmxgy\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None)

"

comment:2 Changed at 2014-04-15T16:06:30Z by zooko

  • Summary changed from in add_or_renew_leases\\n raise NonExistentShareError(si_s, shnum)\\nallmy to data.storage.leasedb.NonExistentShareError

comment:3 Changed at 2014-04-15T16:07:55Z by zooko

  • Keywords leasedb added; NonExistenShareError removed

comment:4 Changed at 2014-04-15T16:22:31Z by daira

  • Component changed from unknown to code-storage
  • Keywords leases added; lease removed
  • Summary changed from data.storage.leasedb.NonExistentShareError to cloud branch: allmydata.storage.leasedb.NonExistentShareError
  • Version changed from 1.10.0 to cloud-branch

comment:5 Changed at 2014-04-15T16:23:47Z by daira

  • Keywords cloud-merge added

comment:6 Changed at 2014-04-15T16:24:55Z by daira

Possible duplicate of #1921.

comment:7 Changed at 2014-05-15T15:59:20Z by daira

  • Keywords blocks-cloud-merge added; cloud-merge removed

comment:8 Changed at 2017-08-08T16:25:41Z by exarkun

  • Description modified (diff)

comment:9 Changed at 2017-08-08T16:26:52Z by exarkun

  • Owner changed from daira to exarkun
  • Status changed from new to assigned

comment:10 Changed at 2017-08-10T19:43:34Z by exarkun

I've encountered this now as well.

The comment about starting with a new SQLite3 lease database is appreciated - that agrees with my situation and supports my hypothesis about what's going wrong here.

My understanding of the shares table of the lease database is that it is populated in one of two ways:

  1. A new share is created.
  2. The accounting crawler finds a share not in the lease database.

The code is all meant to tolerate lose of the lease database, too.

When _locked_testv_and_readv_and_writev is done applying writes, it attempts to add_or_renew_default_lease for the share (via the account). This eventually calls down to add_or_renew_leases on the lease database. add_or_renew_leases requires that the share being leased already exists in the shares table.

So, if the lease database is ever lost, there is a period while the accounting crawler is re-populating the shares table when it is not possible to write to mutable files.

Because the problem depends on both the implementation of mutable storage, the accounting system, the lease database, and on an external event causing the loss of the lease database, it's not clear to me what testing approach will work.

Also, the implementation of _locked_testv_and_readv_and_writev is sufficiently complex that the idea of touching it at all is pretty scary. It seems like a reasonable fix would be to cause it to _always_ account.add_share before _update_lease (rather than the current behavior of only account.add_shareing if shnum not in sharemap).

comment:11 Changed at 2017-08-15T13:04:53Z by exarkun

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.