[volunteergrid2-l] [ej3fw] connection timeouts
Brian Warner
warner at lothar.com
Mon Mar 26 05:40:20 UTC 2012
I did some brief digging this afternoon, uploading small files and
watching with a packet sniffer. I was connected to 15 out of 17 servers
(all but ianchov's [aty4r] and slush-backup's [2na4j]).
One thing I observed was that my connection to stercor's server (ej3fw)
was dropping and reconnecting about once every 16 minutes. I noticed
this by looking at the "Since" column on the welcome page's server list:
it shows a timestamp of a few seconds after node reboot for most
servers, but that one server showed a fairly recent timestamp, changing
every once in a while.
I think I figured it out, and have a workaround (as well as notes on
tools to build to help diagnose the issue more easily). I'll write a
longer letter to the tahoe-dev list with the details, so everyone can
see them. For VG2's purposes, my advice is to do at least one of:
1: add [node]timeout.keepalive=120 to all client's tahoe.cfg
2: have stercor add timeout.keepalive=120 to the server's tahoe.cfg
3: have stercor reconfigure the NAT/router box to increase the timeout
for "idle" TCP connections to at least 10 minutes (it currently
might be set to more like 5 minutes)
I this this might explain a few problems, but I've certainly seen others
that this doesn't account for. I'll keep digging.
cheers,
-Brian
More information about the volunteergrid2-l
mailing list