Opened at 2025-08-20T14:18:56Z
Last modified at 2025-08-29T13:29:24Z
#4186 new defect
One server process did not start on testgrid due to PID File collision
| Reported by: | hacklschorsch | Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | undecided |
| Component: | unknown | Version: | n/a |
| Keywords: | Cc: | ||
| Launchpad Bug: |
Description
After a HW failure and rebooting, one of the storage node servers on the testgrid did not start because the PID was already in use.
This issue seems two-fold:
- PIDfiles are still used even when they shouldn't be. I thought I had turned them off by setting pidfile= (empty), because systemd does not need them. Seems that does not work.
- The PIDfile mechanism should compare PID *and* start time ([https://tahoe-lafs.readthedocs.io/en/latest/running.html#multiple-instances|as documented) but it seems the recorded start time does not help to discern the running PID from own process.
Snippet:
Aug 20 04:52:48 testgrid tahoe[5284]: ERROR: A process is already running as PID 738 Aug 20 04:52:48 testgrid tahoe[5284]: 'tahoe run' in '/var/lib/tahoe-lafs/alpha'
Full systemctl status output:
[root@testgrid:~]# systemctl status tahoe.alpha.service
× tahoe.alpha.service - Tahoe LAFS node alpha
Loaded: loaded (/etc/systemd/system/tahoe.alpha.service; enabled; preset: ignored)
Active: failed (Result: exit-code) since Wed 2025-08-20 04:52:48 UTC; 1min 21s ago
Duration: 1.533s
Invocation: debbaab77c594b87983e657284f8d1b1
Process: 5280 ExecStartPre=/nix/store/sgw152fwaab21wr3fch4ig8cqcz3nw3n-unit-script-tahoe.alpha-pre-start/bin/tahoe.alpha-pre-start>
Process: 5284 ExecStart=/nix/store/v0a0359ssg6avrf34a06kaz02cmz860p-python3-tahoe-lafs/bin/tahoe run --allow-stdin-close $STATE_DI>
Main PID: 5284 (code=exited, status=1/FAILURE)
IP: 0B in, 0B out
IO: 0B read, 0B written
Mem peak: 61.6M
CPU: 1.514s
Aug 20 04:52:46 testgrid systemd[1]: Starting Tahoe LAFS node alpha...
Aug 20 04:52:46 testgrid systemd[1]: Started Tahoe LAFS node alpha.
Aug 20 04:52:48 testgrid tahoe[5284]: ERROR: A process is already running as PID 738
Aug 20 04:52:48 testgrid tahoe[5284]: 'tahoe run' in '/var/lib/tahoe-lafs/alpha'
Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Main process exited, code=exited, status=1/FAILURE
Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Failed with result 'exit-code'.
Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Consumed 1.514s CPU time, 61.6M memory peak.
Change History (2)
comment:1 Changed at 2025-08-20T14:22:17Z by hacklschorsch
comment:2 Changed at 2025-08-29T13:29:24Z by hacklschorsch
This PR might have a fix for this: https://github.com/tahoe-lafs/tahoe-lafs/pull/1412
Note: See
TracTickets for help on using
tickets.

https://docs.twisted.org/en/stable/core/howto/systemd.html#create-a-systemd-service-file