#1988 closed defect (fixed)

don't stop the process if you can't execute "ifconfig" or "route.exe"

Reported by: zooko Owned by: zooko
Priority: normal Milestone: 1.10.1
Component: code-network Version: 1.10.0
Keywords: iputil portability transparency review-needed Cc:
Launchpad Bug:

Description

Currently, the tahoe-lafs process tries to learn its IP address by creating a socket (iputil.get_local_ip_for()), and also by executing either ifconfig (on unix) or route.exe (on windows) (iputil code).

Even if it doesn't learn its IP address, or if it learns and incorrect IP address, it can still work, provided that the other processes that it needs to talk to have announced their IP addresses and it is able to open TCP connections to them.

Now the problem is that if it can't execute ifconfig/route.exe successfully, then it stops the process.

This is the number one problem that prevents people from running Tahoe-LAFS on new platforms (#898, #1536, #1918, #1707). In fact, it may be the only thing that prevents Tahoe-LAFS from being portable to all sorts of unix-likes! It also causes various other problems due to security constraints (#982), and subprocesses being finicky (#854, #1064, #1381, #1935)

What if instead, if the call to the subprocess (ifconfig or route.exe) fails, that it logs this fact and then moves on. This would change the failure mode from the user's perspective from "It stopped at startup" to one of the following:

  1. "I can't connect to anyone."
  2. "I can't connect to some servers/clients but can connect to others."
  3. "I can connect to all servers/clients and everything works."

I don't like, in principle, changing a hard "stop loudly" failure into a soft "misbehave confusingly" failure, but in this case it might be worth it. The thing is that the confusing 3-way failure mode from (partial) inability to connect to other processes already happens due to other reasons. We wouldn't be adding that failure mode, we would be converting certain situations (previously unsupported platform, PATH is set weirdly, security feature prevent access to ifconfig, etc.) from the hard-stop failure mode to the latter failure mode.

This ticket supercedes #854 and #1536. Oh! And I see that #1536 has a patch from mk.fg! Good...

Change History (11)

comment:1 Changed at 2013-05-27T17:27:31Z by zooko

Okay, now I have a question: how often does iputil.get_local_ip_for() fail and the subprocess code succeed? When we first started using this technique it was literally 14 years ago, in 1999, at Mojo Nation, and at the time it seemed like the subprocess technique was necessary to get a high success rate of figuring our your own IP address. I have no idea if that is really true today.

comment:2 Changed at 2013-05-27T17:30:54Z by zooko

Actually let's leave #1536 (log these kinds of errors usefully) separate from this ticket (don't stop the process when you have this kind of error).

comment:3 follow-up: Changed at 2013-06-15T07:36:56Z by utf8notsupported

Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig.

comment:4 in reply to: ↑ 3 ; follow-up: Changed at 2013-06-15T14:53:51Z by zooko

Replying to utf8notsupported:

Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig.

Using route.exe is known to work. Let's not change it unless there's a compelling reason to do.

comment:5 in reply to: ↑ 4 Changed at 2013-06-17T12:30:54Z by daira

Replying to zooko:

Replying to utf8notsupported:

Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig.

Using route.exe is known to work. Let's not change it unless there's a compelling reason to do.

+1

comment:6 Changed at 2013-06-25T18:15:57Z by Daira Hopwood <david-sarah@…>

In a493ee0bb641175ecf918e28fce4d25df15994b6/trunk:

iputil.py: add tests for recent changes. refs #1381, #1988, #982, #1064, #1536, #1935, #898, #1707, #1918

Signed-off-by: Daira Hopwood <david-sarah@…>

comment:7 Changed at 2013-06-27T02:01:24Z by daira

  • Keywords review-needed added
  • Milestone changed from undecided to 1.11.0
  • Owner set to zooko

This should now be fixed on trunk, but the changes haven't been reviewed. I made a couple of mistakes on the way due partly to accidentally committing my branch before it was ready, and partly to platform differences that were revealed by buildbot testing, so I suggest reviewing the overall diff. (Note that this includes the fix to #1381.)

comment:8 follow-up: Changed at 2013-08-07T19:04:41Z by markberger

I'm looking at this patch, and one part is unclear: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/util/iputil.py?rev=b0883807361830c609dff1677c3cb34fd64d3ebb#L194

Why is the same try statement executed 5 times before failure?

comment:9 in reply to: ↑ 8 Changed at 2013-08-18T18:20:44Z by daira

Replying to markberger:

Why is the same try statement executed 5 times before failure?

That's the fix for #1381. subprocess.Popen() or Popen.communicate() can randomly fail with EINTR.

comment:10 Changed at 2013-09-16T23:34:43Z by markberger

  • Resolution set to fixed
  • Status changed from new to closed

I finally got back to reviewing this and it looks good to me. Since the patch has already been committed to trunk, I'm closing this ticket.

comment:11 Changed at 2014-11-27T03:48:03Z by daira

  • Milestone changed from soon to 1.11.0
Note: See TracTickets for help on using tickets.