﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
4186	One server process did not start on testgrid due to PID File collision	hacklschorsch		"After a HW failure and rebooting, one of the storage node servers on the testgrid did not start because the PID was already in use.

This issue seems two-fold:

1. PIDfiles are still used even when they shouldn't be.  I thought I had turned them off by setting `pidfile=` (empty), because systemd does not need them.  Seems that does not work.
2. The PIDfile mechanism should compare PID *and* start time ([https://tahoe-lafs.readthedocs.io/en/latest/running.html#multiple-instances|as documented) but it seems the recorded start time does not help to discern the running PID from own process.

Snippet:

{{{
Aug 20 04:52:48 testgrid tahoe[5284]: ERROR: A process is already running as PID 738
Aug 20 04:52:48 testgrid tahoe[5284]: 'tahoe run' in '/var/lib/tahoe-lafs/alpha'
}}}

-----

Full systemctl status output:

{{{
[root@testgrid:~]# systemctl status tahoe.alpha.service
× tahoe.alpha.service - Tahoe LAFS node alpha
     Loaded: loaded (/etc/systemd/system/tahoe.alpha.service; enabled; preset: ignored)
     Active: failed (Result: exit-code) since Wed 2025-08-20 04:52:48 UTC; 1min 21s ago
   Duration: 1.533s
 Invocation: debbaab77c594b87983e657284f8d1b1
    Process: 5280 ExecStartPre=/nix/store/sgw152fwaab21wr3fch4ig8cqcz3nw3n-unit-script-tahoe.alpha-pre-start/bin/tahoe.alpha-pre-start>
    Process: 5284 ExecStart=/nix/store/v0a0359ssg6avrf34a06kaz02cmz860p-python3-tahoe-lafs/bin/tahoe run --allow-stdin-close $STATE_DI>
   Main PID: 5284 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
         IO: 0B read, 0B written
   Mem peak: 61.6M
        CPU: 1.514s

Aug 20 04:52:46 testgrid systemd[1]: Starting Tahoe LAFS node alpha...
Aug 20 04:52:46 testgrid systemd[1]: Started Tahoe LAFS node alpha.
Aug 20 04:52:48 testgrid tahoe[5284]: ERROR: A process is already running as PID 738
Aug 20 04:52:48 testgrid tahoe[5284]: 'tahoe run' in '/var/lib/tahoe-lafs/alpha'
Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Main process exited, code=exited, status=1/FAILURE
Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Failed with result 'exit-code'.
Aug 20 04:52:48 testgrid systemd[1]: tahoe.alpha.service: Consumed 1.514s CPU time, 61.6M memory peak.
}}}"	defect	new	normal	undecided	unknown	n/a				
