[tahoe-lafs-trac-stream] [Tahoe-LAFS] #4097: 1.19.0 node connection issues.

Tahoe-LAFS trac at tahoe-lafs.org
Tue Apr 9 17:35:43 UTC 2024


#4097: 1.19.0 node connection issues.
----------------------+---------------------------
 Reporter:  tlhonmey  |          Owner:
     Type:  defect    |         Status:  new
 Priority:  normal    |      Milestone:  undecided
Component:  unknown   |        Version:  n/a
 Keywords:            |  Launchpad Bug:
----------------------+---------------------------
 I recently decided to update my grid.  It was running a mix of 1.14, 1.15,
 and 1.17.  I had upgraded one of the nodes to 1.19 and it started
 complaining about SSL bad certificate issues when trying to communicate
 with other nodes.

 After some discussion with meejah on IRC, it seemed like the best way to
 deal with the certificate mismatches was to just rebuild the grid, and
 then copy in the old storage folder.

 After rebuilding the grid, things are...  Strange.

 The introducer node, can talk to everyone.  That's good.
 Node No. 1, which is running on the same machine as the introducer, with a
 different port, can talk to everyone as well.  That's good.

 All the other nodes in the grid can only talk to one or maybe two
 different nodes, and that doesn't necessarily include themselves for some
 reason.

 What's more, the helpful connection error report on the web status page
 has been replaced with opaque stack traces -- without even any line breaks
 -- like:

 {{{
 failure: [Failure instance: Traceback: <class
 'allmydata.util.deferredutil.MultiFailure'>:
 /home/user/.local/lib/python3.12/site-
 packages/twisted/internet/defer.py:916:errback
 /home/user/.local/lib/python3.12/site-
 packages/twisted/internet/defer.py:984:_startRunCallbacks
 /home/user/.local/lib/python3.12/site-
 packages/twisted/internet/defer.py:1078:_runCallbacks
 /home/user/.local/lib/python3.12/site-
 packages/twisted/internet/defer.py:1949:_gotResultInlineCallbacks ---
 <exception caught here> --- /home/annie/.local/lib/python3.12/site-
 packages/twisted/internet/defer.py:1078:_runCallbacks
 /home/user/.local/lib/python3.12/site-
 packages/twisted/internet/defer.py:809:convertCancelled
 /home/user/.local/lib/python3.12/site-
 packages/twisted/internet/defer.py:292:_cancelledToTimedOutError
 /home/user/.local/lib/python3.12/site-
 packages/twisted/python/failure.py:481:trap
 /home/user/.local/lib/python3.12/site-
 packages/twisted/python/failure.py:505:raiseException
 /home/user/.local/lib/python3.12/site-
 packages/twisted/internet/defer.py:1999:_inlineCallbacks
 /home/user/.local/lib/python3.12/site-
 packages/twisted/python/failure.py:519:throwExceptionIntoGenerator
 /home/user/.local/lib/python3.12/site-
 packages/allmydata/storage_client.py:1348:_pick_server_and_get_version
 /home/user/.local/lib/python3.12/site-
 packages/allmydata/storage_client.py:1338:get_istorage_server ]
 }}}

 The stdout of the half-connected nodes contains nothing but messages about
 factories being started and stopped, with no real indication about why.

 Meejah seemed to think this may have something to do with GBS.  I'd be
 happy to do some diagnostics if there's some way we can coax something
 useful out of the system.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/4097>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list