[tahoe-lafs-trac-stream] [Tahoe-LAFS] #4097: 1.19.0 node connection issues.
Tahoe-LAFS
trac at tahoe-lafs.org
Tue Apr 9 17:35:43 UTC 2024
#4097: 1.19.0 node connection issues.
----------------------+---------------------------
Reporter: tlhonmey | Owner:
Type: defect | Status: new
Priority: normal | Milestone: undecided
Component: unknown | Version: n/a
Keywords: | Launchpad Bug:
----------------------+---------------------------
I recently decided to update my grid. It was running a mix of 1.14, 1.15,
and 1.17. I had upgraded one of the nodes to 1.19 and it started
complaining about SSL bad certificate issues when trying to communicate
with other nodes.
After some discussion with meejah on IRC, it seemed like the best way to
deal with the certificate mismatches was to just rebuild the grid, and
then copy in the old storage folder.
After rebuilding the grid, things are... Strange.
The introducer node, can talk to everyone. That's good.
Node No. 1, which is running on the same machine as the introducer, with a
different port, can talk to everyone as well. That's good.
All the other nodes in the grid can only talk to one or maybe two
different nodes, and that doesn't necessarily include themselves for some
reason.
What's more, the helpful connection error report on the web status page
has been replaced with opaque stack traces -- without even any line breaks
-- like:
{{{
failure: [Failure instance: Traceback: <class
'allmydata.util.deferredutil.MultiFailure'>:
/home/user/.local/lib/python3.12/site-
packages/twisted/internet/defer.py:916:errback
/home/user/.local/lib/python3.12/site-
packages/twisted/internet/defer.py:984:_startRunCallbacks
/home/user/.local/lib/python3.12/site-
packages/twisted/internet/defer.py:1078:_runCallbacks
/home/user/.local/lib/python3.12/site-
packages/twisted/internet/defer.py:1949:_gotResultInlineCallbacks ---
<exception caught here> --- /home/annie/.local/lib/python3.12/site-
packages/twisted/internet/defer.py:1078:_runCallbacks
/home/user/.local/lib/python3.12/site-
packages/twisted/internet/defer.py:809:convertCancelled
/home/user/.local/lib/python3.12/site-
packages/twisted/internet/defer.py:292:_cancelledToTimedOutError
/home/user/.local/lib/python3.12/site-
packages/twisted/python/failure.py:481:trap
/home/user/.local/lib/python3.12/site-
packages/twisted/python/failure.py:505:raiseException
/home/user/.local/lib/python3.12/site-
packages/twisted/internet/defer.py:1999:_inlineCallbacks
/home/user/.local/lib/python3.12/site-
packages/twisted/python/failure.py:519:throwExceptionIntoGenerator
/home/user/.local/lib/python3.12/site-
packages/allmydata/storage_client.py:1348:_pick_server_and_get_version
/home/user/.local/lib/python3.12/site-
packages/allmydata/storage_client.py:1338:get_istorage_server ]
}}}
The stdout of the half-connected nodes contains nothing but messages about
factories being started and stopped, with no real indication about why.
Meejah seemed to think this may have something to do with GBS. I'd be
happy to do some diagnostics if there's some way we can coax something
useful out of the system.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/4097>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list