#521 new defect

disconnect unresponsive servers (using foolscap's disconnectTimeout)

Reported by: warner Owned by: warner
Priority: major Milestone: undecided
Component: code-network Version: 1.2.0
Keywords: availability foolscap anti-censorship Cc:
Launchpad Bug:

Description

#287 describes an important application-level fix we need to make: when uploading or downloading files, don't depend upon timely responses from our servers. Another aspect of this is to try and identify servers that are stuck (i.e. trapped in an infinite loop, or having memory problems: something that allows the TCP connections to look alive but prevents responses to the Foolscap messages). Yesterday, prodtahoe7 got into a situation like this, because of disk problems that got so bad we couldn't even log into the host through the local console.

To this end, Foolscap offers two timer options: "keepaliveTimeout" (which defaults to four minutes) and "disconnectTimeout" (which defaults to off). We are considering activating the disconnectTimeout to reduce the time during which a stuck server causes clients to hang until we implement the #287 fix.

The keepaliveTimeout means that every $TIMEOUT the Tub-to-Tub protocol object (called "banana") will check to see how long it has been since any data was received on that connection. If this "age" is greater than $TIMEOUT, the Tub sends a PING message to the other side. As soon as the remote Banana protocol instance receives the PING, it will send back a PONG message. The idea is that the PONG will reset the last-heard-from-age.

This approach, while cheap (the only operation that occurs during dataReceived is to record the current time), means that the PING might occur after 4 minutes of silence, or after almost 8 minutes (imagine that the four-minute timer is reset at T=0, data is received at time T=1s, the timer fires at T=4min and sees that age=3:59s, so it resets, the timer fires again at T=8min and then sends the PING).

keepaliveTimeout is mainly intended to keep entries in NAT tables alive. It also has the effect of giving TCP something to work with: TCP will drop a connection that does not ACK the outbound data within some (rather long) time period; the keepaliveTimeout insures that each side tries to send at least a few bytes every 8ish minutes, allowing TCP to drop the connection within a few hours of a silent disconnect.

The other Foolscap timeout is "disconnectTimeout". It works the same way as keepaliveTimeout, but when the timer fires and the silence is found to be too long, it drops the connection. This timer is not enabled in Foolscap by default because I wasn't sure what could be a safe+appropriate value to use.

The metric of interest here is the min/max period of unresponsiveness after which the connection will be dropped. This is nontrivial because of the way that these two timers interact (i.e. it depends upon their relative phases). If we set disconnectTimeout to 15 minutes, then after we send a PING, we might wait anywhere from 7 to 38 minutes before disconnecting. (DT-2*KT to 2*DT+2*KT).

The problem is that the lack of inbound traffic (which would reset the timers and prevent the disconnect) is not a good indicator of a stuck server. Client A might be uploading a large amount of data to Server B. The server sees lots of data arriving, so its keepalive and disconnect timers are happy. If the server doesn't need to respond to the client for anything, it won't be sending any data. Eventually (max=2*keepaliveTimeout) the client will send a PING, but this could get stuck behind the data that's being sent, so the PONG won't be sent until that PING finally makes it across the wire.

For Tahoe, the worst case here is when a client is uploading a file, which involves sending a block of data (128KiB/3==40KiB) to each of 10 servers at the same time (400KiB in total). If this takes more than 7 minutes to transfer (an upstream rate of 975Bps/7.8kbps), then we're in danger of abandoning one or more of the connections. The problem is worse if we're uploading several files at once, or if the user's upstream pipe is being shared with other applications or other computers.

Increasing the proposed disconnectTimeout to 30 minutes results in a 22-68 minute window of silence-before-disconnect.

It may be that the best fix would be to modify Foolscap to use a different timer mechanism: a timer which fires once every keepaliveTimeout/4 would reduce the variability considerably, while not increasing the quiescent CPU usage by more than a factor of four. The range would then be from (DT-1.25*KT to 1.25*DT+1.25*KT), so KT=4min and DT=15min would give us 10-23.75min, and DT=30min would give us 25-42.5min .

The real answer, of course, is that connections are nothing more than a convenient fiction, and that we must be prepared to suffer the reality that lies behind that curtain. The timeout tradeoffs in #287 are the real questions to address.

Change History (8)

comment:1 Changed at 2008-09-24T13:26:06Z by zooko

See also #193 and #253.

comment:2 Changed at 2009-02-07T19:49:05Z by zooko

  • Milestone 1.3.0 deleted

This doesn't seem to be necessary for 1.3.0.

comment:3 Changed at 2009-11-22T16:12:41Z by davidsarah

  • Keywords reliability added

Can the server send unsolicited PONGs to a client that is uploading to it?

(I agree that fixing #287 is the real solution.)

comment:4 Changed at 2009-11-24T06:07:42Z by warner

hm, I suppose. I guess that would take the form of a third timer, which keeps track of how long it's been since we last *sent* anything, and sends a PONG (or similar no-op message) when the timer fires. Perhaps give it the same value (and timer) as the first one, so the code that might send a PING will also always send a PONG.

Foolscap#143 has been opened for this one.

comment:5 Changed at 2009-12-04T04:40:28Z by davidsarah

  • Keywords availability added; reliability removed

comment:6 Changed at 2009-12-12T03:05:34Z by davidsarah

  • Keywords foolscap added

comment:7 Changed at 2010-12-16T00:53:52Z by davidsarah

  • Keywords anti-censorship added

A case possibly related to this was reported by Shu Lin on tahoe-dev.

comment:8 Changed at 2011-08-16T04:33:15Z by davidsarah

  • Milestone set to undecided
Note: See TracTickets for help on using tickets.