[tahoe-dev] crash-only design (self-citation by zooko)
zooko
zooko at zooko.com
Thu Jan 10 14:50:11 PST 2008
Folks:
Since Rob is making it possible to run Tahoe as a Windows service, we
are faced again with the issue of "crash-only design" versus "clean
shutdown". Brian mentioned today that he still wasn't happy with the
fact that when servers get stopped or restarted they might corrupt
any mutable file shares that they were in the middle of updating at
that moment.
I looked at the relevant tickets (#181 and #200), and here I quote
myself at my most eloquent:
"Since SDMFs get overwritten in their entirety each time, why is it
more I/O expense? Oh, I know, because of the metadata such as leases.
...
"?;/
I don't necessarily object to giving the process a SIGTERM warning
before the SIGKILL. I think this change to crash-only was useful
because it prompted us to think through these kinds of questions
about intermediate persistent state, and because it led us to not
waste developer time (and sysadmin time) on "clean shutdown" behavior
that we didn't really need.
If letting the filesystem I/O buffers flush is some clean shutdown
behavior that we *do* really need, I'm okay with that, as long as it
doesn't mislead us into adding other (harder to maintain) behavior
that we don't really need. (Or lead us to forget about the chance of
leaving inconsistent persistent state that we don't know how to deal
with afterward.)
Make sense?
I guess there are two orthogonal reasons why I like crash-only:
1. force us to think the effect of unclean shutdown
2. don't add maintenance burden for something we can live without
Basically, the I regard the behavior of Python, the operating system,
file system, etc., in response to kill -15 $pid ; sleep 5 ; kill -9
$pid as being easy for us to maintain. I regard the behavior of
Twisted and any Python code of ours in response to that sequence as
relatively hard to maintain. :-)
So overall I'm +1 on leaving the SIGKILL shutdown as is, in order to
keep reminding us to think about consistency of intermediate
persistent state, but I'm also +0 on adding a SIGTERM in order to let
filesystem updates flush."
More information about the tahoe-dev
mailing list