#2364 new defect

Clients in onion grid busy-wait if a storage node is unreachable

Reported by: mhazinsk Owned by:
Priority: major Milestone: undecided
Component: code-network Version: 1.10.0
Keywords: availability reliability anti-censorship tor-protocol anonymity Cc:
Launchpad Bug:

Description (last modified by mhazinsk)

When running "torify tahoe start", the process uses 100% CPU while attempting to connect to a down storage node. Meanwhile, connections to all other storage nodes and the introducer show as intermittent on the grid status page so the grid is unusable.

I'm using the latest trunk version of Tahoe on Arch Linux with Torsocks 2.0.0

Here's the errors that get printed:

[Jan 18 14:42:14] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:42:14-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=awo7) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2015-01-18 14:42:14-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=poce) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2)
	
[Jan 18 14:42:24] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:42:33] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:42:56] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:42:56-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
[Jan 18 14:43:20] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:43:33] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:43:43] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:43:43-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=xivp) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2015-01-18 14:43:43-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2015-01-18 14:43:43-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=poce) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2)
	
[Jan 18 14:43:53] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:44:20] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:44:35] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:44:47] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:44:47-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
[Jan 18 14:44:58] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:45:23] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:45:34] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:45:53] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:46:21] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:46:21-0500 [-] Reconnector._failed (furl=[redacted]): [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.NegotiationError'>: no connection established within client timeout

This problem persists across restarts of tahoe. However, when I remove the tahoe directory, recreate a tahoe directory with "tahoe create-client ~/.tahoe", and restart tahoe, the problem goes away. So far I have not been able to reproduce this on any other machines.

Change History (7)

comment:1 Changed at 2015-01-18T22:26:24Z by daira

  • Component changed from unknown to code-network
  • Keywords tor added

comment:2 Changed at 2015-01-18T22:26:53Z by daira

  • Description modified (diff)

comment:3 Changed at 2015-01-19T01:20:01Z by mhazinsk

  • Description modified (diff)

comment:4 Changed at 2015-01-19T02:23:36Z by daira

mhazinsk: did you retain the original .tahoe directory, and can you diff it against the current one? (If you paste the result, don't include contents of files under private/, just which files are different.)

comment:5 Changed at 2015-01-25T06:43:30Z by mhazinsk

I did, here's the result:

[matt@teselecta .tahoe]$ diff . ../.tahoe-badwolf/
Only in ../.tahoe-badwolf/: announcement-seqnum
Common subdirectories: ./logs and ../.tahoe-badwolf/logs
diff ./my_nodeid ../.tahoe-badwolf/my_nodeid
1c1
< nqgzgxcihanqdsxeusrmlohitkrvamvs
---
> okossyjuxucppahd6pns3bvniubwmip5
diff ./node.pubkey ../.tahoe-badwolf/node.pubkey
1c1
< pub-v0-d27copzopwk4yegc6reegfdjobjgzcn6ylt2bd2xnlddwy3gcejq
---
> pub-v0-lxibmoqhu4ielykc4fid3duj7duvjxylwlafm2wc464ytpjzy57q
Common subdirectories: ./private and ../.tahoe-badwolf/private
diff ./tahoe.cfg ../.tahoe-badwolf/tahoe.cfg
11c11
< nickname = teselecta
---
> nickname = badwolf
Common subdirectories: ./tmp and ../.tahoe-badwolf/tmp
Only in .: twistd.pid

The "~/.tahoe-badwolf" directory was migrated from another machine and caused the problem shown in the description.

comment:6 Changed at 2015-02-06T20:35:43Z by daira

  • Keywords tor-protocol added; tor removed

comment:7 Changed at 2015-12-07T16:33:38Z by leif

  • Keywords anonymity added

I've encountered this problem after upgrading to torsocks 2.x too. The problem did not exist with torsocks 1.x (version 2 is a complete rewrite, btw).

I'm not sure what is causing it, but rather than figuring it out dawuud and I are working on native proxy support in foolscap so that we can stop using torsocks altogether. It is working with dawuud's branches of tahoe and foolscap, but it is not ready to be merged yet. See #517.

Note: See TracTickets for help on using tickets.