[tahoe-dev] safety of remote exceptions Re: patch review

Tue Jan 6 08:20:05 PST 2009

Brian:

Thanks for the patch review!  Very good work!  Here is my response to
the biggest issue that you raised.  I'll respond to the others
separately.

On Jan 2, 2009, at 18:38 PM, Brian Warner wrote:

 > (I'm reviewing patches that landed while I was out on vacation).
...
 > [3305]: http://allmydata.org/trac/tahoe/changeset/3305 "immutable
 > download: don't catch all exceptions, just DeadReferenceError and
 > IntegrityCheckReject"
 >
 > I haven't looked closely enough to be sure, but does this mean that
 >  a single server could cause the whole download to fail, just by
 >  throwing some unexpected exception? That doesn't seem like a good
 >  ability to grant to storage servers.

Good catch, but since [3334], the current source code is:

http://allmydata.org/trac/tahoe/browser/src/allmydata/immutable/ 
download.py?rev=20090105234645-92b7f- 
f1bd3b4f608a0a60707f8d7203ff0c8dfbbaa5e6#L508

That line says "f.trap(ServerFailure, IntegrityCheckReject,
layout.LayoutInvalid, layout.ShareVersionIncompatible)".

Because of [3323], discussed below, servers can't cause exceptions
other than ServerFailures to be raised in the client code.

 > In general, when we're asking multiple servers for data, and any
 > subset could do, I've tried to deal with potential exceptions in
 > three classes:

 >  DeadReferenceError: log it as UNUSUAL, give up on that server.

Agreed. DeadReferenceError is now encapsulated in ServerFailure, so
the trap() line above catches that.

 > Integrity failures: log it as WEIRD, give up on that server (or on
 >  that block but be willing to use others).

Agreed. This is the type IntegrityCheckReject.

 > Other exceptions: log it as WEIRD, give up on that server (or on
 >  that block)

Well, I distinguish between exceptions which originated from
client-side code, such as bugs in the client (I've seen a lot of those
recently in my sandbox), and exceptions that originated on server side
and were propagated to the client and raised by foolscap.  I don't
want servers to be able to cause clients to stop or to enter different
code paths due to uncaught exceptions, but neither do I want
unspecified exceptions originating from client code to be caught and
treated as a download failure.

 > [3323]: http://allmydata.org/trac/tahoe/changeset/3323 "generically
 > wrap any errback from callRemote() in a ServerFailure instance"
...
 > This is probably going to lose the stack frames which are normally
 > stored in a Failure, as well as the exception type (so f.trap or
 > f.check will be unable to examine the original exception type). This
 > will deny the caller the ability to make decisions based upon the
 > remote exception type

No it doesn't -- they can get the original failure instance from the
attribute of the ServerFailure instance.  However, per the discussion
above, I would usually rather *not* distinguish among different remote
exception types -- it is too easy for a server to (accidentally or
maliciously) bypass the client-side validation code which is supposed
to handle the server's response.

Now, I don't think I understood some of what you wrote about this
ServerFailure wrapper losing stack frames.  But before we proceed, do
you understand that my intent is that it should not be *possible* for
the client to do something like the following, in order to handle
IndexErrors originating *either* in the client or in the server:

d = do_something_involving_a_server()
def _errb(f):
     if f.check(IndexError):
         return handle_this()
d.addCallback(_errb)

I want clients to have to do this instead:

d = do_something_involving_a_server()
def _errb(f):
     if f.check(IndexError):
         # Either there was an IndexError in our code that was preparing
         # a message to send to the server or there was an IndexError
         # in our code that was processing the reply from the server.
         return handle_this()
     if f.check(ServerFailure) and f.remote_failure.check(IndexError):
         # There was IndexError in the server code.
         return handle_that()
d.addCallback(_errb)

In the former case, servers can accidentally or malicious trigger the
exception handling code in the client even though that code may have
been intended to handle only exceptions arising from the client code
itself.

What do you think?

Regards,

Zooko