devchat notes 11-Apr-2017
Brian Warner
warner at lothar.com
Tue Apr 11 18:49:17 UTC 2017
Tahoe-LAFS devchat 11-Apr-2017
Attendees: warner, meejah, exarkun, daira, dawuud
* added meejah, exarkun, dawuud as committers!
* warner is replacing test_web with treq to fix deprecation warnings
* treq test strategies
* build a simplistic implementation of a target object
* interact with the frontend
* look at the target and make sure it's state has changed in the
expected way
* potential cleanup passes
* switch to new-style classes (inherit from object), use
super(MyClass,self).__init__(args) instead of
MyParent.__init__(self,args)
* need to think about test mixin classes and setUp methods,
especially ones that return a Deferred
* maybe make a rule: test-class setUp methods are always
inlineCallbacks and "yield super()...setUp()"
* switch to inlineCallbacks, maybe
* replace ShouldFailMixin with calls to .failureResultOf or
.assertFailure
* move OneShotObserverList into Twisted itself
* provide a way to pass the reactor in, combine with
flushEventualQueue
* allow tests to crank the mock reactor forwards until the results
are ready
* so tests can avoid really yielding to the real reactor
* switch to relative imports
* start on python3 work
* switch to print_function
* be more clear about bytes-vs-unicode
* daira says that in most places where we say "str", we do in fact
mean bytes, we've been pretty good about that
* add b"" or u"" tags
* util/hashutil.py should definitely be b""
* maybe add from __future__ import unicode_literals to other files
* foolscap will be a big blocker
* warner saw intermittent test coverage (flagged by codecov on the last
few PRs that landed)
* will file a ticket with the lines that are sometimes covered
* we should figure out what's going on, add tests to cover them
properly
* new PRs that don't touch code (e.g. docs) should (obviously) never
cause coverage regressions
* it'd be nice if there was a tool to show which test files provided
which coverage
* so we can make sure unit-test files get full coverage on
individual modules
* then integration-like tests are useful too, but we don't depend
upon them for coverage
* "hypothesis" module for adding to test coverage
* meejah working on accounting branch
* start by asyncifying leasedb API
* messy/large
* how to report fatal errors that are async? eg. leasedb-on-S3 fails
* buildbot would tail the logfile until it saw an "I'm happy"
message, print any failures it saw
* he's trying to replace constructors with more async-friendly factory
functions
* warner is anxious about that
* goal is to not reference a LeaseDB object until it knows it can talk
to an (async/cloud) database
* but currently everything in the Node/Client constructor is created
synchronously
* when is the node "ready"?
* what does that even mean?
* not ready: I don't have any recent evidence that this might even
work
* e.g. haven't heard from the introducer at all
* we used to have code that would react to an initial Introducer
connection failure with sys.abort()
* there are some early indicators of problems that are so severe, we
want to inform the "tahoe start" user on stderr and exit
* it's unfortunate to signal these errors in a log file instead of
stderr
* the "tahoe start" user will think that the node is viable, when it
fact it's unusable
* two basic approaches
* approach one:
* DON'T START UNTIL WE PROVE FEASIBILITY OF THE LAST STEP
* replace Node/Client.__init__ with factory construction functions
* change self.create_* to return the thing they create, instead of
stashing it on self.x=
* and change them to accept their dependencies as arguments
* so tub = self.create_tub(config); introducer_client =
self.create_introducer_client(config, tub)
* then self.create_tub() no longer references "self", can be
converted from a method of Node into a standalone function
* then Node.__init__ is replaced with (standalong function)
create_node()
* change create_storage_server() (which does create_leasedb) to return
a Deferred
* which doesn't fire until the early can-I-even-DB checks pass
* rewrite daemonization code to tolerate makeService() returning a
Deferred
* not easy: twistd can't handle this
* may have to do all the daemonization ourselves
* we need the reactor running to do that leaseDB setup work
* not a good idea to start the reactor before daemonization: then
you wind up with two reactors, still sharing FDs
* one incomplete direction to explore:
* startstop_node.start could examine nodetype before touching
StartTahoeNodePlugin
* do some async stuff early, before touching twistd.runApp
* however, the reactor is not running at that point, which we'd
need for leaseDB stuff
* approach two:
* like what Buildbot used to do a long time ago
* "tahoe start" forks
* parent watches the child until it gets past an initial startup phase
* if child signals an early error, parent prints that error to stderr
and exits with rc=1
* if child signals success, parent prints nothing and exits with rc=0
* parent either follows a child logfile, or connects to a socket
* logfile makes it easier to deal with the child exiting suddenly
* socket makes it easier to distinguish between old and new
messages, auto-flushes
* maybe protocol should be: parent truncates logfile, parent
opens+reads, child opens +writes +flushes
* maybe pass argument into child to tell it where the logfile is
* maybe pass an FD into child, rather than a logfile on disk
* if the daemonization code enables that, e.g. O_CLOEXEC
* may require a lot of custom daemonization code too
* twistd daemonization code makes the parent exit, so we lose
control
* parent can't use reactor to follow logfile
* could just do blocking reads, with maybe a SIGALRM for timeout
* how to retain control of the parent while still taking advantage
of twistd's code?
* the appearance of "twist" (in Twisted-16.4) means that "twistd" may
eventually get deprecated, so maybe we should stop relying on the
twistd internals quite so much (note that "twist" is almost entirely
private; we think Twisted regrets exposing so much of twistd, although
it's certainly been handy for us)
* another thought: what if we give up "tahoe start"?
* only provide "tahoe run" (the non-daemonization tool)
* and basically force operators to learn supervisord or systemd or
daemontools
* warner would miss "tahoe start", I think it's super handy
* we could maybe make "tahoe start" spawn a "tahoe run", but I think
that'd have all the same problems as above
* the "user story" for the above is, more-or-less: "the user should see
any important early failure-messages on the console where they do
'tahoe start'"
More information about the tahoe-dev
mailing list