﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
529	Implement Halt and Catch Fire	zandr		"There have been a few cases lately where nodes have been in some impaired state, but still responding (if badly) to network requests. This caused other components of the system to block.

If in these conditions we instead stopped responding to network requests, the rest of the system would just ignore the wounded node and move on.

In particular, the recent webapi3 issue would have been invisible to users if the webapi node stopped responding to http. Then the balancer would have marked it as failed and moved on.

Same with the prodtahoe7 meltdown.

I acknowledge that deciding when to catch fire is non-trivial, so I'm filing this more to provoke conversation than to request any specific behavior."	defect	new	major	undecided	code-frontend-web	1.2.0		reliability availability anti-censorship error		
