[volunteergrid2-l] [ej3fw] connection timeouts

Brian Warner warner at lothar.com
Mon Mar 26 05:40:20 UTC 2012


I did some brief digging this afternoon, uploading small files and
watching with a packet sniffer. I was connected to 15 out of 17 servers
(all but ianchov's [aty4r] and slush-backup's [2na4j]).

One thing I observed was that my connection to stercor's server (ej3fw)
was dropping and reconnecting about once every 16 minutes. I noticed
this by looking at the "Since" column on the welcome page's server list:
it shows a timestamp of a few seconds after node reboot for most
servers, but that one server showed a fairly recent timestamp, changing
every once in a while.

I think I figured it out, and have a workaround (as well as notes on
tools to build to help diagnose the issue more easily). I'll write a
longer letter to the tahoe-dev list with the details, so everyone can
see them. For VG2's purposes, my advice is to do at least one of:

 1: add [node]timeout.keepalive=120 to all client's tahoe.cfg
 2: have stercor add timeout.keepalive=120 to the server's tahoe.cfg
 3: have stercor reconfigure the NAT/router box to increase the timeout
    for "idle" TCP connections to at least 10 minutes (it currently
    might be set to more like 5 minutes)

I this this might explain a few problems, but I've certainly seen others
that this doesn't account for. I'll keep digging.

cheers,
 -Brian


More information about the volunteergrid2-l mailing list