Context Navigation

← Previous Ticket
Next Ticket →

Opened at 2016-12-30T16:44:13Z

Last modified at 2021-03-30T18:40:19Z

#2861 new defect

SSL handshake failure with 1.12 storage nodes over I2P

Reported by:	str4d	Owned by:
Priority:	critical	Milestone:	soon
Component:	unknown	Version:	1.12.0
Keywords:	i2p	Cc:
Launchpad Bug:

Description

Steps to recreate:

Create a storage node with --hide-ip --listen=i2p and [connections] tcp = disabled.
Set the introducer to an I2P grid.
Start the storage node.

Expected behaviour:

The storage node connects to all existing storage nodes that are online and reachable.

Actual behaviour:

The storage node connects fine to older storage nodes (running the patched 1.9.2 or 1.10.0 Tahoe-LAFS+I2P builds), but fails to connect to 1.12.0 storage nodes (including itself via loopback). Specifically, the web UI shows that it gets past "connecting" to "negotiating", and then throws:

negotiation failed: [('SSL routines', 'SSL23_READ', 'ssl handshake failure')]

Attachments (5)

introducer-log.txt (15.1 KB) - added by str4d at 2017-01-10T17:56:12Z.: Log of 1.12 introducer receiving connection from 1.12 storage node
storage-log.txt (27.6 KB) - added by str4d at 2017-01-10T18:22:28Z.: Log of 1.12 storage node connecting to 1.12 introducer over I2P
client-to-introducer-1.12.ssldump.txt (6.3 KB) - added by str4d at 2017-01-10T20:05:29Z.: ssldump of communication between Tahoe 1.12 client and 1.12 introducer
client-to-introducer-1.11.ssldump.txt (3.5 KB) - added by str4d at 2017-01-10T20:10:17Z.: ssldump of communication between Tahoe 1.12 client and 1.11 introducer
client-to-introducer-1.12-noi2p.ssldump.txt (3.4 KB) - added by str4d at 2017-01-10T20:26:52Z.: ssldump of communication between Tahoe 1.12 client and 1.12 introducer using localhost TCP instead of I2P

Download all attachments as: .zip

Change History (21)

comment:1 Changed at 2017-01-01T19:08:56Z by warner

Huh, I don't know what that could be. My hunch is that the I2P connection is not really established, and the negotiation messages are getting dropped or corrupted somehow.

Let's try to extract more information from foolscap:

shut everything down
start your Introducer
start your client node
flogtool tail -s out.flog CLIENTNODEDIR/private/logport.furl
start a 1.12.0 storage node

When the storage node's announcement arrives (via the Introducer), the client will attempt to connect, and will record some of the negotiation process into out.flog. We're looking for deviations from the usual negotiation process, maybe something about an expected message not being seen.

If that doesn't yield anything immediately useful, the next step will be to modify the foolscap code on both sides and have them log everything they get over the connection. It'd be interesting to know whether they make it far enough to switch to TLS (and it's really the TLS handshake that's getting broken), or if something goes awry before that point (when they're still speaking HTTP-ish).

For background, Foolscap starts by making a plain TCP connection, then exchanges a very HTTP-like request and response, then both sides are supposed to do .start_tls(). So the connecting host will send GET /id/$tubid HTTP/1.1 and some headers (including Upgrade: TLS/1.0) and a double-newline and then the very next byte will be the TLS negotiation (maybe CLIENTHELLO?). The receiving host will send HTTP/1.1 101 Switching Protocols and some headers and a double-newline and then start on the TLS bytes.

Foolscap is expecting the connection it gets to be 8bit-clean and transparent. Can you think of any reason why the I2P proxy might be interpreting HTTP-like data inside the connection and maybe modifying the data or its behavior in response to what it sees?

comment:2 Changed at 2017-01-10T17:32:50Z by daira

Keywords i2p added

Changed at 2017-01-10T17:56:12Z by str4d

Attachment introducer-log.txt added

Log of 1.12 introducer receiving connection from 1.12 storage node

comment:3 Changed at 2017-01-10T18:15:06Z by str4d

This issue affects a 1.12 storage node connecting to a 1.12 introducer as well. See the attached logs.

Changed at 2017-01-10T18:22:28Z by str4d

Attachment storage-log.txt added

Log of 1.12 storage node connecting to 1.12 introducer over I2P

Changed at 2017-01-10T20:05:29Z by str4d

Attachment client-to-introducer-1.12.ssldump.txt added

ssldump of communication between Tahoe 1.12 client and 1.12 introducer

Changed at 2017-01-10T20:10:17Z by str4d

Attachment client-to-introducer-1.11.ssldump.txt added

ssldump of communication between Tahoe 1.12 client and 1.11 introducer

comment:4 Changed at 2017-01-10T20:18:58Z by str4d

The two new files show the network behaviour for a 1.12 client connecting over I2P to a 1.11 introducer vs a 1.12 introducer. The 1.11 introducer is my custom-patched build, but the patches don't touch negotiation at all, so it is equivalent to stock 1.11 for these purposes.

The only difference between the traces pre-TLS is that the 1.12 introducer sends the client a bunch of extra cruft at the end of its HTTP/1.1 101 Switching Protocols packet. The client doesn't seem to care, though.

The client then sends a ClientHello packet to the introducer, which AFAICT is identical in both cases. However, the 1.11 introducer responds with a ServerHello, while the 1.12 introducer closes the connection.

comment:5 Changed at 2017-01-10T20:21:03Z by str4d

Note that the introducer-to-client cruft visible in the ssldump trace looks exactly the same (at least, the first few bytes match) as that in the additional dataReceived() calls in the earlier storage node logs.

Changed at 2017-01-10T20:26:52Z by str4d

Attachment client-to-introducer-1.12-noi2p.ssldump.txt added

ssldump of communication between Tahoe 1.12 client and 1.12 introducer using localhost TCP instead of I2P

comment:6 Changed at 2017-01-10T20:27:52Z by str4d

Okay, it is definitely related to whatever that cruft is. I manually tweaked the tahoe.cfg files of the 1.12 client and 1.12 introducer to use localhost TCP instead of I2P. The cruft disappeared and the connection started working.

comment:7 Changed at 2017-01-17T16:40:43Z by warner

we're on a tight timeline for the debian freeze.. I think we need to get 1.12.1 released in about 24 hours to get it into Stretch. Any progress on this one?

comment:8 Changed at 2017-01-19T01:09:13Z by warner

Milestone changed from 1.12.1 to 1.13.0

Ticket retargeted after milestone closed

comment:9 Changed at 2017-01-31T19:31:31Z by warner

We did a lot of digging in today's devchat, and learned the following:

txi2p's "listening" sockets are actually clients: the tahoe/foolscap/txi2p side makes an outbound TCP connection to the local I2P daemon (to the "SAM" API port), writes a message that says "hi, I'd like to receive connections for i2p address BLAH", then waits for a response. Later, when someone connects to the I2P daemon using that address, the tahoe node gets a message on the socket that says "connection incoming!", then the next byte is data from the remote end
Twisted's startTLS() method knows whether its connection is a client-like or a server-like connection (transport._tlsClientDefault), and tells TLS to use a ClientHello or ServerHello to match
normal Foolscap outbound connections use IStreamClientEndpoint, and inbound Listeners use IStreamServerEndpoint, so TLS gets the right direction. The txtorcon onion-service listener uses a server-like connection, so that works too.
startTLS is probably getting the direction of the connection backwards for txi2p's listener, because it's using an outbound connection to the SAM endpoint, but then needs to run the *server* side of the TLS handshake
you can use transport.startTLS(ctx, normal=False) to tell it to flip the direction, which would probably help

We don't yet know a good way to tell Foolscap that it needs to pass in this argument. Some options:

the foolscap connection-handler could emit an additional value (so handler.hint_to_endpoint() could somehow return (endpoint, tls_is_reversed)
the handler could be responsible for producing an ITransport that has an extra attribute, and foolscap could check this attribute before calling startTLS: normal = getattr(self.transport, "_foolscap_tls_is_normal", True)
the handler could produce an ITransport that overrides startTLS() to upcall with the right normal argument

One complication is that it isn't always obvious (to e.g. txi2p) that the connection it was given is a client-like or server-like transport (or whether it's capable of startTLS at all). It's unfortunate that startTLS() takes normal= rather than isClient=. Foolscap knows for sure whether it wants TLS to be client-like or server-like, but when the only knob we have is normal=, we must *also* know whether the underlying ITransport is client-like or server-like (so we know when to reverse TLS's handling). As far as we've been able to tell, the ITransport client-vs-server flag is private, even though the normal= argument is public.

Some additional things to check before diving too deep into finding a good approach:

look at the extra cruft in the "client-to-introducer-1.12.ssldump.txt" trace above, immediately following the Client: Upgrade message, and confirm that it really is a ClientHello. (it is supposed to be a ServerHello, but if startTLS is confused by txi2p using client-like connections, it makes sense that we'd send a ClientHello here). We don't know why tlsdump didn't parse it as such (maybe it wasn't expecting a TLS packet to appear in the middle of a protocol stream, which would imply that tlsdump doesn't handle STARTTLS-like protocols very well). Either compare these bytes against a normal wireshark trace, or look up the TLS docs and manually check the packet format. str4d astutely noticed that the cruft bytes include things like "c0 30" and "c0 2c", which were identified by tlsdump (in the "-noi2p.ssldump.txt" trace) as unrecognized ciphersuite values, and that only the ClientHello contains multiple ciphersuites (since the ServerHello only contains the decision). He also noticed that the Foolscap server shouldn't be sending any TLS messages at all until the client has sent the ClientHello, since TLS servers make the decision, so they can't send anything without first hearing the client's hello.
hack something (probably foolscap/negotiate.py) to set normal=False and see if that makes the connections work

Other possibilities that we came up with:

txi2p's "parsely" parser might be incorrectly matching something in the TLS Hello and thinking it's a SAM message
txi2p's protocol handoff (where the SAM parser stops, and all further bytes are delivered to the wrapped protocol) might be dropping, duplicating, delaying, or reordering bytes, causing a TLS message to be corrupted or delivered twice. Foolscap has suffered from reentrancy and buffering problems in areas like this in the past; it's fertile breeding ground for bugs.

We identified at least two concerns about the way txi2p is working, that shouldn't affect correctness but probably affect performance:

txi2p.sam.stream.StreamAcceptReceiver.dataReceived: any application data that is received in the same chunk as the initial peer-destination line will be delayed. It gets stashed as self.initialData properly, but will not be delivered until the next dataReceived is called. If the peer sends an initial chunk and then waits for a response, the local application will never receive that chunk. This is not a problem for client-goes-first protocols like HTTP, but would cause a loss of progress for server-goes-first protocols like SMTP
inbound streams will have application data delivered one byte at a time, because the Parsely -based parser used by StreamAcceptReceiver (txi2p/grammar.py) uses an anything:data clause to match all bytes once the parser has moved into the post-SAM State_readData state, and that clause probably just matches a single wildcard byte. This is sound, but probably bad for performance (especially for foolscap), since a large chain of python methods will be executed for every byte of the input. It would be fastest if large bytestrings could be transferred in complete buffers in a single call. We should do some performance tests on this and compare the CPU usage of a tahoe server (during file upload) for a given fixed data rate, I2P vs plain TCP. Ideally the txi2p parser would be bypassed completely once a connection has been moved to State_readData, similar to twisted.protocols.basic.LineReceiver.setRawMode(), but doing that safely requires careful attention to the .dataReceived() ordering/duplication/reentrancy concerns described above.

comment:10 Changed at 2017-01-31T19:55:59Z by warner

I've confirmed that a quick hack to call startTLS(ctx, normal=False) on an I2P-based Introducer (but not the client) was enough to allow a connection to get through. Not a solution yet, but it suggests we're headed in the right direction.

comment:11 Changed at 2017-01-31T21:32:24Z by warner

Hm, it might be cleanest to get a new API added to Twisted: maybe a side= argument to ITLSTransport.startTLS(). It would override normal=, and would explicitly declare the TLS stack to be "client" or "server", independent of the underlying transport. Foolscap knows exactly which TLS side it's supposed to be, so it could just set it directly.

I'll see if there are any relevant Twisted tickets on this, and if not I'll add one.

comment:12 Changed at 2017-01-31T22:10:07Z by warner

Twisted#3278 appears to be the same issue (two AMP clients trying to use TLS). The patch on that ticket is 9 years old (!), so will undoubtedly need some cleanup, but it's probably a good place to start.

comment:13 Changed at 2017-01-31T23:12:59Z by warner

and I've confirmed that the 16 03 01 00 e0 01 00 00 dc 03 03 22 71 sequence (in the "cruft") is a ClientHello: the 16 means "Handshake", 03 01 is TLS-1.0, 00 e0 is a packet length, 01 is ClientHello, then 00 00 dc` is handshake length (all according to the wikipedia TLS diagram).

comment:14 Changed at 2017-04-03T06:06:54Z by str4d

In order to make storage nodes usable in the short term, I've pushed a workaround for this bug to txi2p, which I will release shortly as 0.3.2. It's not particularly robust, as it assumes that the underlying transport always behaves as a client; I agree with warner that a new API in Twisted would be a much better solution.

comment:15 Changed at 2020-06-30T14:45:13Z by exarkun

Milestone changed from 1.13.0 to 1.15.0

Moving open issues out of closed milestones.

comment:16 Changed at 2021-03-30T18:40:19Z by meejah

Milestone changed from 1.15.0 to soon

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.

Download in other formats: