[tahoe-lafs-trac-stream] [Tahoe-LAFS] #4186: One server process did not start on testgrid due to PID File collision

Tahoe-LAFS trac at tahoe-lafs.org
Wed Aug 20 14:18:56 UTC 2025


#4186: One server process did not start on testgrid due to PID File collision
---------------------------+---------------------------
 Reporter:  hacklschorsch  |          Owner:
     Type:  defect         |         Status:  new
 Priority:  normal         |      Milestone:  undecided
Component:  unknown        |        Version:  n/a
 Keywords:                 |  Launchpad Bug:
---------------------------+---------------------------
 After a HW failure and rebooting, one of the storage node servers on the
 testgrid did not start because the PID was already in use.

 This issue seems two-fold:

 1. PIDfiles are still used even when they shouldn't be.  I thought I had
 turned them off by setting `pidfile=` (empty), because systemd does not
 need them.  Seems that does not work.
 2. The PIDfile mechanism should compare PID *and* start time ([https
 ://tahoe-lafs.readthedocs.io/en/latest/running.html#multiple-instances|as
 documented) but it seems the recorded start time does not help to discern
 the running PID from own process.

 Snippet:

 {{{
 Aug 20 04:52:48 testgrid tahoe[5284]: ERROR: A process is already running
 as PID 738
 Aug 20 04:52:48 testgrid tahoe[5284]: 'tahoe run' in '/var/lib/tahoe-
 lafs/alpha'
 }}}

 -----

 Full systemctl status output:

 {{{
 [root at testgrid:~]# systemctl status tahoe.alpha.service
 × tahoe.alpha.service - Tahoe LAFS node alpha
      Loaded: loaded (/etc/systemd/system/tahoe.alpha.service; enabled;
 preset: ignored)
      Active: failed (Result: exit-code) since Wed 2025-08-20 04:52:48 UTC;
 1min 21s ago
    Duration: 1.533s
  Invocation: debbaab77c594b87983e657284f8d1b1
     Process: 5280 ExecStartPre=/nix/store
 /sgw152fwaab21wr3fch4ig8cqcz3nw3n-unit-script-tahoe.alpha-pre-
 start/bin/tahoe.alpha-pre-start>
     Process: 5284 ExecStart=/nix/store/v0a0359ssg6avrf34a06kaz02cmz860p-
 python3-tahoe-lafs/bin/tahoe run --allow-stdin-close $STATE_DI>
    Main PID: 5284 (code=exited, status=1/FAILURE)
          IP: 0B in, 0B out
          IO: 0B read, 0B written
    Mem peak: 61.6M
         CPU: 1.514s

 Aug 20 04:52:46 testgrid systemd[1]: Starting Tahoe LAFS node alpha...
 Aug 20 04:52:46 testgrid systemd[1]: Started Tahoe LAFS node alpha.
 Aug 20 04:52:48 testgrid tahoe[5284]: ERROR: A process is already running
 as PID 738
 Aug 20 04:52:48 testgrid tahoe[5284]: 'tahoe run' in '/var/lib/tahoe-
 lafs/alpha'
 Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Main process
 exited, code=exited, status=1/FAILURE
 Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Failed with
 result 'exit-code'.
 Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Consumed 1.514s
 CPU time, 61.6M memory peak.
 }}}

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/4186>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list