[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2861: SSL handshake failure with 1.12 storage nodes over I2P

Tue Jan 31 19:31:31 UTC 2017

#2861: SSL handshake failure with 1.12 storage nodes over I2P
--------------------------+--------------------
     Reporter:  str4d     |      Owner:
         Type:  defect    |     Status:  new
     Priority:  critical  |  Milestone:  1.13.0
    Component:  unknown   |    Version:  1.12.0
   Resolution:            |   Keywords:  i2p
Launchpad Bug:            |
--------------------------+--------------------

Comment (by warner):

 We did a lot of digging in today's devchat, and learned the following:

 * txi2p's "listening" sockets are actually clients: the
 tahoe/foolscap/txi2p side makes an outbound TCP connection to the local
 I2P daemon (to the "SAM" API port), writes a message that says "hi, I'd
 like to receive connections for i2p address BLAH", then waits for a
 response. Later, when someone connects to the I2P daemon using that
 address, the tahoe node gets a message on the socket that says "connection
 incoming!", then the next byte is data from the remote end
 * Twisted's `startTLS()` method knows whether its connection is a client-
 like or a server-like connection (`transport._tlsClientDefault`), and
 tells TLS to use a !ClientHello or !ServerHello to match
 * normal Foolscap outbound connections use IStreamClientEndpoint, and
 inbound Listeners use IStreamServerEndpoint, so TLS gets the right
 direction. The `txtorcon` onion-service listener uses a server-like
 connection, so that works too.
 * startTLS is probably getting the direction of the connection backwards
 for txi2p's listener, because it's using an outbound connection to the SAM
 endpoint, but then needs to run the *server* side of the TLS handshake
 * you can use `transport.startTLS(ctx, normal=False)` to tell it to flip
 the direction, which would probably help

 We don't yet know a good way to tell Foolscap that it needs to pass in
 this argument. Some options:

 * the foolscap connection-handler could emit an additional value (so
 `handler.hint_to_endpoint()` could somehow return `(endpoint,
 tls_is_reversed)`
 * the handler could be responsible for producing an ITransport that has an
 extra attribute, and foolscap could check this attribute before calling
 startTLS: `normal = getattr(self.transport, "_foolscap_tls_is_normal",
 True)`
 * the handler could produce an ITransport that overrides `startTLS()` to
 upcall with the right `normal` argument

 One complication is that it isn't always obvious (to e.g. txi2p) that the
 connection it was given is a client-like or server-like transport (or
 whether it's capable of startTLS at all). It's unfortunate that
 `startTLS()` takes `normal=` rather than `isClient=`. Foolscap knows for
 sure whether it wants TLS to be client-like or server-like, but when the
 only knob we have is `normal=`, we must *also* know whether the underlying
 ITransport is client-like or server-like (so we know when to reverse TLS's
 handling). As far as we've been able to tell, the ITransport client-vs-
 server flag is private, even though the `normal=` argument is public.

 Some additional things to check before diving too deep into finding a good
 approach:

 * look at the extra cruft in the "client-to-introducer-1.12.ssldump.txt"
 trace above, immediately following the `Client: Upgrade` message, and
 confirm that it really is a !ClientHello. (it is supposed to be a
 !ServerHello, but if startTLS is confused by txi2p using client-like
 connections, it makes sense that we'd send a !ClientHello here). We don't
 know why tlsdump didn't parse it as such (maybe it wasn't expecting a TLS
 packet to appear in the middle of a protocol stream, which would imply
 that tlsdump doesn't handle STARTTLS-like protocols very well). Either
 compare these bytes against a normal wireshark trace, or look up the TLS
 docs and manually check the packet format. str4d astutely noticed that the
 cruft bytes include things like "c0 30" and "c0 2c", which were identified
 by tlsdump (in the "-noi2p.ssldump.txt" trace) as unrecognized ciphersuite
 values, and that only the !ClientHello contains multiple ciphersuites
 (since the !ServerHello only contains the decision). He also noticed that
 the Foolscap server shouldn't be sending any TLS messages at all until the
 client has sent the !ClientHello, since TLS servers make the decision, so
 they can't send anything without first hearing the client's hello.
 * hack something (probably foolscap/negotiate.py) to set `normal=False`
 and see if that makes the connections work

 Other possibilities that we came up with:

 * txi2p's "parsely" parser might be incorrectly matching something in the
 TLS Hello and thinking it's a SAM message
 * txi2p's protocol handoff (where the SAM parser stops, and all further
 bytes are delivered to the wrapped protocol) might be dropping,
 duplicating, delaying, or reordering bytes, causing a TLS message to be
 corrupted or delivered twice. Foolscap has suffered from reentrancy and
 buffering problems in areas like this in the past; it's fertile breeding
 ground for bugs.

 We identified at least two concerns about the way txi2p is working, that
 shouldn't affect correctness but probably affect performance:

 * `txi2p.sam.stream.StreamAcceptReceiver.dataReceived`: any application
 data that is received in the same chunk as the initial peer-destination
 line will be delayed. It gets stashed as `self.initialData` properly, but
 will not be delivered until the next `dataReceived` is called. If the peer
 sends an initial chunk and then waits for a response, the local
 application will never receive that chunk. This is not a problem for
 client-goes-first protocols like HTTP, but would cause a loss of progress
 for server-goes-first protocols like SMTP
 * inbound streams will have application data delivered one byte at a time,
 because the Parsely -based parser used by `StreamAcceptReceiver`
 (`txi2p/grammar.py`) uses an `anything:data` clause to match all bytes
 once the parser has moved into the post-SAM `State_readData` state, and
 that clause probably just matches a single wildcard byte. This is sound,
 but probably bad for performance (especially for foolscap), since a large
 chain of python methods will be executed for every byte of the input. It
 would be fastest if  large bytestrings could be transferred in complete
 buffers in a single call. We should do some performance tests on this and
 compare the CPU usage of a tahoe server (during file upload) for a given
 fixed data rate, I2P vs plain TCP. Ideally the txi2p parser would be
 bypassed completely once a connection has been moved to `State_readData`,
 similar to `twisted.protocols.basic.LineReceiver.setRawMode()`, but doing
 that safely requires careful attention to the `.dataReceived()`
 ordering/duplication/reentrancy concerns described above.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2861#comment:9>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage