[tahoe-dev] safety of remote exceptions Re: patch review

Brian Warner warner-tahoe at allmydata.com
Wed Jan 7 17:15:04 PST 2009


> What do you think?

I've looked over this more carefully and I now agree with you: being able to
use f.trap() to distinguish between a local exception and something that
happened during the callRemote() is a useful thing, and the ServerFailure
class is a useful way to accomplish this.

Calling it "ServerFailure" is confusing, though, because it's really an
Exception. If it were a subclass of twisted.python.failure.Failure then it'd
be appropriate to call it "ServerFailure". Perhaps we could call it
"RemoteException" instead?

The way "ServerFailure" is implemented, downstream code will wind up with a
twisted Failure instance which contains a ServerFailure exception which
points (through .remote_failure) at a second Failure instance which is most
likely a foolscap CopiedFailure instance (since that's what Foolscap returns
from a failing callRemote). The second Failure might be a real Failure if the
exception was an early DeadReferenceError (i.e. the connection was dead
before we sent any data), an outbound schema Violation, or a late
DeadReferenceError (the connection was lost before we received a response).
If the exception occurred on the far side, they'll send back a CopiedFailure.
The following code demonstrates the likely collection of wrappers.

 from twisted.python.failure import Failure
 from allmydata.util.rrefutil import ServerFailure
 from foolscap.call import CopiedFailure

 d = WrappedRemoteReference(rref).callRemote("always_fail")
 # def remote_always_fail(self): raise ValueError("boom")
 def _done(wrapper_f):
     assert isinstance(wrapper_f, Failure)
     assert wrapper_f.check(ServerFailure)
     server_failure_exception = wrapper_f.value
     real_failure = server_failure_exception.remote_failure
     assert isinstance(real_failure, CopiedFailure)
     assert real_failure.check(ValueError)
     real_exception = real_failure.value
     assert real_exception.args[0] == "boom"
 d.addErrback(_done)

Code with does a str() or a repr() of this generated wrapper Failure will see
the same thing as if they'd str/repr()'ed the real Failure. But if they use
other methods on it, they'll see the wrapper: in particular, f.getTraceback()
will probably show an exception occurring at rrefutil.py line 17, where the
'raise ServerFailure(f)' appears, which would be unhelpful.


The places that trap ServerFailure need to log the actual failure. For
example, immutable.download.BlockDownloader._got_block_error():

    def _got_block_error(self, f):
        failtype = f.trap(ServerFailure,
                          IntegrityCheckReject,
                          layout.LayoutInvalid,
                          layout.ShareVersionIncompatible)
        if f.check(ServerFailure):
            level = log.UNUSUAL
        else:
            level = log.WEIRD
        self.log("failure to get block", level=level, umid="5Z4uHQ")
        if self.results:
            peerid = self.vbucket.bucket.get_peerid()
            self.results.server_problems[peerid] = str(f)
        self.parent.bucket_failed(self.vbucket)

The call to self.log() doesn't include the failure object, without which
we're going to be scratching our heads for a long time when/if something
log.WEIRD actually happens. The first step would be to just add the failure
to the log call:

        self.log("failure to get block", failure=f, level=level, umid="5Z4uHQ")

But since Foolscap thinks it knows how to log a Failure instance, it is going
to use a foolscap FailureSlicer() on the failure= object, which will look at
things like str(f.value), reflect.qual(f.type), and f.getTraceback(). None of
these will be delegated through to the original failure, so the foolscap log
will not record anything useful. To deal with this, you could use code like:

        if f.check(ServerFailure):
            level = log.UNUSUAL
            realfailure = f.value.remote_failure
        else:
            level = log.WEIRD
            realfailure = f
        self.log("failure to get block", failure=realfailure, level=level,
                 umid="5Z4uHQ")

That's getting kind of gross though.


On a related note, the test failure when run against python2.4 involves
foolscap's handling of the log.msg(failure=) argument, as it attempts to use
f.parents (which is supposed to contain a list of all parent classes of the
Exception type). This might be a bug in Foolscap: it looks like foolscap is
assuming that f.parents is a list of strings, but there's a code path in
Failure that might let it be a list of something else. I need to look at it
more closely, though.


cheers,
 -Brian


More information about the tahoe-dev mailing list