[tahoe-dev] Distributed Introduction: A closer look at Comment 11 Algorithm

Tue Apr 6 08:03:41 PDT 2010

On Tue, Apr 6, 2010 at 12:09 AM, M O Faruque Sarker
<writefaruq at gmail.com> wrote:
>
> 1. How do nodes (or  peer introducers) come to know about the
> subscribable introducer(s) at first place?

The same way that nodes come to learn about the SPoF introducer in the
current introduction scheme -- the user manually enters the
introducer's furl into the tahoe.cfg file.

> 2. Can the additional communication overhead degrade the overall
> network performance  when additional number of introducer(s) are
> placed in the  network ?

The communication load on each introduction server is K log N where K
is the number of introduction clients and N is the number of
introduction servers.

Another way to phrase your question is: how large of a grid can this
scheme support before the communication load becomes a significant
cost?

I would guess that there are other parts of the Tahoe-LAFS
architecture that would fail to scale up before this scheme would fail
to scale up. :-)

However, I would encourage you to investigate this question, both
analytically and by experimentation or simulation.

Our practice in the Tahoe-LAFS project often follows this pattern:
first implement something stupid but functional, using test-driven and
documentation-driven development, then implement an improved
replacement for it which offers better behavior.

The advantage of doing it this way is that we have measurements of
correctness and performance which you can apply to the new, fancier,
version to tell whether it is *actually* an improvement over its
stupid predecessor in practical terms. :-)

> But this case is a heterogeneous one. Some nodes are introducers and
> some are clients to them. Some admin configures and maintains them as
> introducers (and hence get paid :-). If one introducer disappears
> other introducer(s) are still there to serve the job. Perhaps, the
> current policy of Tahoe-lafs may prohibit us to make the role of all
> nodes identical. In that case some admin staffs could be jobless :-(.

I think this might be conflating different issues: issue 1: is there a
distinction between clients and servers or do all nodes play both
roles, issue 2: do servers require manual administration from a human.

I'm not convinced that these two issues are tied together. I think we
will end up with issue 1 == yes there is a distinction between client
and server and not all nodes play both roles and issue 2 == no the
servers do not require manual administration.

I think the key to determine how much manual administration is
required is POLA -- the Principle of Least Authority. If introduction
servers have the power to deny access (access control ==
authorization) in addition to the power to introduce nodes to one
another (introduction) then they require a lot more manual
administration from humans. If the introducer is centralized (as it is
currently) then it can singlehandedly break availability (as it can
currently) which requires more manual administration from humans to
manage that.

So by decentralizing the availability issue without adding an access
control/authorization issue, we will lessen the requirement for human
administration on servers. That's the goal of #68.

Note that there is a much harder ticket, #295, for making a
distributed access control/authorization mechanism. Currently users
are abusing the SPoF nature of the single introducer to use it for
access control/authorization. They withhold knowledge of the
introducer's furl from people who shouldn't be able to use their
storage servers. Implementing #68 will make this strategy less
effective, increasing the need for someone to implement #295.

Faruq: you should go ahead and submit a Proposal on the GSoC web site
as soon as you can, even if your Proposal is incomplete or uncertain.
We can then edit the proposal to improve it after it is submitted.

Regards,

Zooko

http://tahoe-lafs.org/trac/tahoe-lafs/ticket/68# implement distributed
introduction, remove Introducer as a single point of failure
http://tahoe-lafs.org/trac/tahoe-lafs/ticket/295# distributed
authorization of access to nodes