[tahoe-dev] [tahoe-lafs] #653: introducer client: connection count is wrong, !VersionedRemoteReference needs EQ

tahoe-lafs trac at allmydata.org
Fri Jul 17 07:11:49 PDT 2009


#653: introducer client: connection count is wrong, !VersionedRemoteReference
needs EQ
--------------------------+-------------------------------------------------
 Reporter:  warner        |           Owner:  warner  
     Type:  defect        |          Status:  assigned
 Priority:  major         |       Milestone:  1.5.0   
Component:  code-network  |         Version:  1.3.0   
 Keywords:                |   Launchpad_bug:          
--------------------------+-------------------------------------------------

Comment(by zooko):

 Oookay, I read the relevant source code and the miscounting of number of
 connected storage servers was fixed between [3897] (the version that
 exhibited the problem) and current HEAD.  However, I'm not sure if that
 also means that whatever caused the failures on TestGrid was also fixed.
 Read on.  Here is the path of the code that shows how "Connected to %s of
 %s known storage servers" was produced at version [3897]:

 [source:src/allmydata/web/welcome.xhtml at 3897#L55]

 [source:src/allmydata/web/root.py at 3897#L240]

 [source:src/allmydata/introducer/client.py at 3897#L277]

 Here is how it is produced today:

 [source:src/allmydata/web/welcome.xhtml at 3997#L55]

 [source:src/allmydata/web/root.py at 3997#L240]

 [source:src/allmydata/storage_client.py at 3997#L41]

 The old way could potentially have lots of tuples of {{{(serverid,
 servicename, rref)}}} for the same serverid and servicename, if new
 connections were established to the same serverid but the old connection
 was not lost, i.e. {{{notifyOnDisconnect()}}} wasn't getting called.  The
 new way cannot possibly have an inconsistency between the number of known
 storage servers and the number of connected storage servers, since both
 are computed from the same dict -- the  known servers are all the items of
 the dict and the connected storage servers are the ones that have an rref.

 Brian: what do you think about {{{notifyOnDisconnect()}}} not getting
 called even while new connections to the same foolscap peer are being
 established?  That's the only explanation I can see for the observed
 miscounting on the web gateway that was running allmydata-tahoe [3897] and
 foolscap 0.4.1.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/653#comment:11>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list