<div dir="ltr"><div><div>Greg,<br><br></div>I'm sorry it's taken so long to respond.  This has been one of the most in depth emails that I've read from a subject matter expert.  My list of read RFCs is quite a bit longer at this moment.  I'm sorry that I'm implementing this patch and not somebody else who more thoroughly knows what IPv6 is from a theoretical point as apposed to the place where I'm approaching it from a user point of view.  I'm also a pragmatist in some senses and may be oversimplifying things because I haven't done my homework on how long actual delays are, what different implementations exist and how these two things will impact performance.<br>

<br>I really appreciate somebody who knows more about IPv6 than what Hurricane Electric will certify as a "sage".  A fool could follow a blog for a half a day to get there and who knows how much better I am than that.  I hope that you take my questions as something like a student-teacher questioning.  As I said above, your email has pushed me to learn more about this complicated web of RFCs that make up IPv6, which has been a very fulfilling journey.<br>

<br></div>Here's what I've come up with:<br><div><br><div><div><div class="gmail_extra"><div class="gmail_quote">On Sat, Feb 16, 2013 at 4:19 PM, Greg Troxel <span dir="ltr"><<a href="mailto:gdt@ir.bbn.com" target="_blank">gdt@ir.bbn.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br>

  Would you be able to elaborate about this?  Specifically about my use case<br>

  of two hosts on tunnel brokers, but link-local.  I feel it's important, and<br>

  nobody's going to be typing in the furls manually, so who does it benefit<br>

  to have less capability than more?<br>

<br>

</div>It's not about the benefit of less capability.  It's about avoiding<br>

mess.  If the fe80:: addresses of a host are advertised to an<br>

introducer, then almost all nodes (in general) will not be able to reach<br>

that address.  Even worse, it will time out rather than get a connection<br>

refused.  And it leaks the MAC address, which its itself a security<br>

concern (possible with stateless autoconf, but people who have<br>

configured static addresses to avoid this should not be exposed).<br></blockquote><div><br>Avoiding mess is always good.  What happens currently with <a href="http://169.254.0.0/16" target="_blank">169.254.0.0/16</a> addresses in Tahoe?  What about RFC1918 addresses?  What about <a href="http://127.0.0.0/8" target="_blank">127.0.0.0/8</a>? Are they deprioritized so that connections happen to them in the last case?  What is the delay if the connection times out?  Does Tahoe only connect in serial, as apposed to starting to open up x different connections and take the first one that connects?  Does Tahoe use the order of the address in the furl?  What's the current algorithm for IPv4 addresses and the justification for it?<br>

<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

In the v6 specifications, the intent of link-local addresses (in<br>

fe80::/16) is that they are only used for things like neighbor discovery<br>

and routing protocols.</blockquote><div> </div><div>This is true, but in [RFC4291](<a href="http://tools.ietf.org/html/rfc4291#page-11" target="_blank">http://tools.ietf.org/html/rfc4291#page-11</a>) Link-Local is for three cases, the two sited, and when no routers are present:<br>

<br><pre>   Link-Local addresses are designed to be used for addressing on a

   single link for purposes such as automatic address configuration,

   neighbor discovery, or when no routers are present.

</pre></div><div>It is not a requirement that Link-Local addresses are used, but it is one of the purposes of Link-Local addresses.  My use only falls in the third category, but I think that may be enough.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  When you use a link-local address, the address<br>

itself does not specify to the host stack which interface to use.<br>

That's why you have the %wm0 or whatever showing up in the display<br>

representation, which is based on the ifindex being inserted in one of<br>

the bytes of the address.  To have the address be used by another host,<br>

it has to not have that, and has to insert its own index.  So passing<br>

them around is non-obvious; typically a routing protocol will just<br>

record the other side's address and reuse it - but there it has its<br>

*own* ifindex, specifying the interface the packet arrived on, rather<br>

than tha remote side's index.<br></blockquote><div><br>After a lot of research I get what you're saying.  I do filter off the ifindex that BSD includes in the ifconfig -a results so it looked familiar, but I didn't know how necessary it was on Linux.  Picturing a multihomed computer, say WiFi + Ethernet, there is no way to know WHICH fe80:: interface we should use.  Why is it that BSD and Windows don't seem to have a problem.  I ping6 fe80 addresses without an interface identifier on those two platforms and don't have a problem.  Even with multihomed devices.  Is this a deviation from the standards with Linux or some bizarre speed optimization?  Why can't it ARP all interfaces?<br>

</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

tahoe is designed primarily as a wide-area protocol; in the general case<br>

all nodes are expected to be able to open a TCP connection to the<br>

introducer, and then all clients are expected to open a connection to<br>

all servers.  To do this sensibly, the introducer and each server node<br>

need a globally-routable address.<br></blockquote><div><br>This is the main purpose for my work.  I think that the global nature of IPv6 and Tahoe are a match made in heaven.  It will be a wonderful day when I can have a Mobile IPv6 my laptop and my phone and never think about which link to communicate over, multiple DNS entries for a single host, what network it's hooked up to, and which interface is actually connected right now.  At least right now it's light years ahead of NAT and port forwarding. :-)<br>

</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I should point out that my bias is that running tahoe only locally<br>

doesn't make sense in the first place.  There are other filesystems that<br>

deal with resilience against disk failure and some of them have better<br>

performance.  The real point is to get resilience against hazards that<br>

affect an entire site, so wide-area connectivity is IMHO intrinsically<br>

tied to the main use case.<br></blockquote><div>This exactly.  If somebody thinks this is a replacement for ceph or RAID, they will be sorely disappointed with the speed and how it is a file store and not a file system.<br>

</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div><br>

  Other advantages are that they are not routed, so that they can be more<br>

  "secret" than other addresses.  If you didn't want the world to know that<br>

  you were using Tahoe, preferring more local over more remote addresses<br>

  could be better.<br>

<br>

</div>I think this is the usual security-through-obscurity issue, and I don't<br>

think it makes sense.  Outsiders can no more tell that you are using<br>

tahoe with global addresses on your ethernet than they can tell about<br>

private addresses.<br></blockquote><div><br>True, but this isn't about security through obscurity, it's about security and while we're at it, why not obscurity as well.  There is nothing wrong with obscurity, until people think that it can replace security.  Security and obscurity is even better than just security.  For example I would never send unencrypted information about dissidents in an oppressive regime, but I'd be happy if I didn't have to spend the rest of my life in jail to not reveal the encryption keys.  If I can both protect the information, and conceal that I'm transmitting it, I'm better off.  There aren't causes worth giving one's life for...<br>

</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I don't really understand your dual tunnel broker case.  With a normal<br>

tunnel broker, you get a routable address for your end of the tunnel<br>

(typically ::2 where the tunnel far end is ::1), and often an entire<br>

/64.  So just using the routable addresses would work fine.  (I have 3<br>

tunnels, one of which feeds a routed network of about 6 subnets carve up<br>

>From a /48, used by my group of about 100 at work).<br></blockquote><div><br>It's a pretty far out case and I don't think I did a good job explaining it.  I would never design a network to work that way and I'm not that sad about punishing people who (ab)use IPv6 that way.  This would also be a case where IPv4 should be used instead of IPv6, but the way other major IPv6 capable applications work, here's what would happen:<br>

<br></div><div>Let's say that my site gives out real IPv4 addresses liberally, allows 

me to use tunnel brokers, but also forbids me running any DHCPv6 or 

router advertisements.  Because of this I need Hurricane Electric to give me two tunnels for two computers that I control: <a href="http://host1.mason.ch" target="_blank">host1.mason.ch</a> and <a href="http://host2.mason.ch" target="_blank">host2.mason.ch</a>.  Google chrome on host2 looks up <a href="http://host1.mason.ch" target="_blank">host1.mason.ch</a>.  It gets a few address, let's say I have in DNS the ipv6 tunnel address for home 2001:123::1/64 and my work address 2001:1234::1/64, and an IPv4 address of 1.2.3.4.  Google Chrome does connect() to all different IPv6 addresses first, then IPv4 address, and then keeps the first connection it gets.  If it were on the same IPv6 network as host1, it would not matter which address of those it connected to, but it isn't.  It's on a separate network because I NEED two tunnels.  Chrome (and other applications) are set up to prefer IPv6 connectivity.  In this case, I will end up communicating over two tunnels and thousands of miles to get to the computer 2 meters of Ethernet cable away.  It would be nice if we could prioritize IPv6 addresses on a local computer, but we would need the whole routing table locally.  The way Chrome gets around this is that it just take the fastest response (which may not be the highest bandwidth response).  If I had included an fe80:: address in my DNS, Google Chrome would connect to that first because it's the least latent link.  This is one idea behind including all possible addresses of the host in the furl.  If Tahoe will issue connects to each address in the list and choose the first response, or if Tahoe will issue connects to each in order of configuration, there is the possibility of getting the best behavior, and not just the most reliable.<br>

</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">  If you bring up a host, or set of hosts, in an environment without a DHCP<br><div>

  server, and no IPv6 router, and don't run Avahi/Bonjour the only address<br>

  that you'll come up with is the fe80 address.  With them included, your<br>

  tahoe cluster can be brought up and connected to without any configuration,<br>

  without any infrastructure, it would even work with only a crossover cable.<br>

<br>

</div>I really don't follow this.  You're saying that if you bring up 4<br>

computers with no addressing, and then somehow figure out the link-local<br>

address of the introducer, and put it in a furl and config files, and<br>

then run tahoe, that this is somehow better than manually configuring 4<br>

addresses (which then makes all sorts of other things easier)?  This is<br>

IMHO a degenerate case, and it seems odd to want to add complexity to<br>

tahoe to support it.  The furl is already a capability, so<br>

auto-discovery seems inconsistent with tahoe's security goals.<br></blockquote><div> </div><div>Link-local addresses can either be randomly generated, or deterministicly generated, or specifically assigned.  If there is no router on your network, and you want to run IPv6, you are only able to use FE80:: addresses.  With all the Ubuntu/Debian boxen that I have and the two OS X boxes that my wife owns, they are always fe80:2::0 & MAC address.  I don't know about internal representations of this in memory, but ifconfig -a is where I see it.  I don't ever need to figure out the Link-Local addresses of my computers because I know their mac addresses.<br>

<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

A further complexity is that when you look at your link-local address on<br>

the introducer, you'll see (assuming a:b:c:d:e:f ethernet address)<br>

something that looks like<br>

  fe80::a:b:c:d:e:f%wm0<br>

But if you grab that out of buffer, it will look something like<br>

<br>

  fe80::2:a:b:c:d:e:f<br>

<br>

assuming 2 is the ifindex of wm0.  I don't remember which byte is used,<br>

but I did figure this out in NetBSD recently.  (If my job weren't<br>

developing new network protocols (some of which do use link-local<br>

addresses), I can't imagine I would have dug into this.)  This is a<br>

local OS decision how to represent the interface.  BSD does this (as<br>

implemented by KAME), and I am 95% sure Linux does essentially the same<br>

thing.  So to make this work, the server has to send the fe80:: address<br>

without the interface ID to an introducer which is on the *same link*,<br>

and the introdcer can then send it to clients which are also on the same<br>

link.<br></blockquote><div><br>As I said above, I only see the second address everywhere I've looked.  Is this due to an old KAME version in OS X?  And as I said above I also don't know what buffer it would be grabbed out of.<br>

<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

But, how does the client know which interface to use to contact the<br>

introducer?  So you need the client to take the actual LL address and<br>

then add a e.g. %wm0 scopeid when they configure their client.<br>

<br></blockquote><div>Strangely, on Linux, scopeid is required, on Windows and OS X, it just works.  Ping6 fe80:: addresses to your heart's desire.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

So if you really want to use fe80:: addresses, I would say the following<br>

should let you do everything which will actually work, and avoid<br>

cluttering everyone else with them:<br>

<br>

  Users have to configure each client and server node with a LL address<br>

  for the introducer, and put a %intfN scopeid on it.  Of course the<br>

  introducer has to share a link with each node.<br>

<br>

  Nodes contact the introducer, and send their bare (no scopeid) LL<br>

  address.  The introducer keeps track of the incoming scopeid (it could<br>

  have multiple interfaces), and applies it.  When sending addresses to<br>

  a node, it checks that the scopeid matches the interface over which<br>

  that node, and if so sends the bare LL address, and if not does not<br>

  (because the two nodes are on different interfaces and therefore<br>

  different links and thus will to interoperate).<br>

<br></blockquote><div><br>This does seem like what would be needed.  The other option would be to run a connect() with each interface that's up as a source interface.  A little less scientific, but maybe functional.  Is scopeid somehow global?  How does OS X and Windows get around not using it?<br>

 <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I've been running IPV6 for 15 years or so (I don't really remember when<br>

I brought up the 6bone connection).</blockquote><div><br>Major nerd jealousy right here... hopefully I'm in your position when IPv8 comes out :-)<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  The general plan has been to have<br>

global addresses on systems, and to use them.  Those have varied from<br>

6bone addresses to 6to4 addresses to modern addresses in 2001::.  The<br>

only times I deal with link-local addresses are:<br>

<br>

  looking at ndp entries<br>

<br>

  looking at ripng status<br>

<br>

  in occasional desparate times, using ssh to them to recover things.<br>

  (I would never try to make tahoe work this way; I would fix the<br>

  addresses and then have tahoe work with regular addresses.)<br>

<br>

This is a little bit like putting RFC1918 addresses in the introducer,<br>

where for a typical useful grid most nodes will not be able to connect<br>

to most RFC1918 addresses.  But people do have significant RFC1918<br>

privately routed networks.  They don't have routed link-local, by<br>

definitions.<br></blockquote><div><br>The exact IPv4 analog is the <a href="http://169.254.0.0/16">169.254.0.0/16</a> addresses. <br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

So to summarize, the only time it makes sense to use link-local<br>

addresses are in situations where you will be talking to only hosts that<br>

are on-link, and not talking to anything farther.  tahoe is almost never<br>

that case.  Supporting link-local in tahoe requires a lot of complexity,<br>

clutters the lists of addresses, and results in clients trying to make<br>

connections that cannot succeed.<br>

<br>

So my advice is to at least for now, limit things to global addresses.<br>

It's easy enough to add in LL later, but I think it will be a tremendous<br>

source of complexity.  And it should be optional and off by default,<br>

because it's an irregular thing to do.<br>

<span><font color="#888888"><br>

Greg</font></span></blockquote><div> <br><br></div><div>TL;DR:<br></div></div>I think I will have to go read up on how foolscap does the connections to see if it does them in parallel and how often it is done.  If it is only done once on startup and in parallel, I may leave the addresses in.  It is functional for Windows and OS X.  If it goes through the list of addresses in the furl every time it does any communication with a node, this may be a bug in foolscap.<br>

<br>One of the reasons I'm pushing back on this is that it seems to kludge the code to just start filtering out addresses in a big if then statement.  The fe80 block is not an 4 bit-even block, so it's not even a simple if addr[:4] == statement.  It also, architecturally, may not be at the Tahoe level that this should be filtered out.  It could be better put in foolscap.  The whole point of foolscap is to abstract away networking into a simple RPC statement.  That's one reason why adding so little to Tahoe got some functionality.<br>

<br></div><div class="gmail_extra">Would foolscap be able to hide this away for all hosts that don't just work with link local addresses?  Is it possible to just have foolscap figure out if Tahoe (or any foolscap service) is on any interface with that Link-Local address?<br>

</div><div class="gmail_extra"><br></div><div class="gmail_extra">I think I now understand why getaddrinfo returns a 4-tupple for IPv6 when it just spit out a 2-tupple for IPv4.  The documentation says that "Note, however, omission of <em>scopeid</em> can cause problems

in manipulating scoped IPv6 addresses."  Such a small note, such a big issue.<br><br></div><div class="gmail_extra">Once again, thanks for helping think through this.  This is far bigger an issue than I could have imagined.  I feel so small. :-)<br>

<br></div><div class="gmail_extra">-Randall<br></div></div></div></div></div>