[tahoe-dev] Dedicated LAFS nodes offer

Avi Freedman freedman at freedman.net
Fri Jul 26 16:52:23 UTC 2013


> Avi,
> 
> Congratulations on deploying this new service!

Thanks...

More congrats will be due to the LA team shortly on their stuff
shortly.

We just got a rewriting proxy going to let people have access to
the web interface on the introducer using a URL based on the furl.
Hopefully we'll progress to sending coupons out this weekend or 
Monday, but have some docs to finish first.
 
> > We wanted to let everyone in the community know that Havenco
> > is launching storage (and VPN services) in beta this month, and the
> > storage service is going to offer LAFS nodes (like rentanode.nl
> > had been doing) as well as S3-compatible buckets.
> >
> 
> Nice!  The S3-compatible service is unrelated to LAFS, correct?

Completely unrelated other than that we're pooling the 'usage'
across LAFS and buckets to make it easy for people to use either.

<snip>

> Presumably many S3 clients depend on different distributed update
> semantics than are provided by LAFS, though, so for many applications
> this may be an architectural mistake.  Still, writing the adaption
> from S3 client -> proxy -> LAFS webapi seems fairly straightforward.
> The proxy might be configured with a single writecap, and it may store
> the index of buckets to bucket contents as a LAFS directory, or that
> mapping may be represented outside of LAFS proper, especially if some
> kind of distributed update mechanism is in place.

Yep, that seems interesting.  Running the LAFS nodes on top of RAID
could be one way to eliminate resync/rebuild issues with something
like this.

> > The architecture we're running for LAFS to do accounting is one
> > that the LA team helped us validate - we're doing one Linux uid
> > per customer and running 10 tahoe procs each in a separate directory
> > across separate machines, and running RAID underneath so that we
> > can ignore the potential performance issues with client-side
> > resync right now.
> 
> Those uids / directories are for LAFS storage nodes, right?  So each
> storage node is in a separate LAFS grid, and each customer uses their
> own grid?  This makes sense for accounting: the provider can examine
> accordingly.

Yes, each uid/dir is for a LAFS node process.

And each node is part of a set of nodes all with the same uid on the
backend servers, which compose a grid privately for each customer.

> This is also how Least Authority's TLoS3 (and soon to be announced
> descendent service) operate.  (Disclaimer: I work for Least Authority,
> although not directly on their products.  I did not vet anything in
> this email with anyone there and the views are my own, and obviously
> biased.)

LA advised us on this setup though we guessed our way to it.  I am
going to do a writeup on it once we've launched, and Zooko suggested
and I agree, it would be good to make a FAQ for LAFS service providers.

> It's a shame though, IMO, that all users cannot share a single grid in
> the service.  This could have two benefits: First, customers can share
> caps with each other; Second, the overhead for the service provider
> may be much lower (for example ~100 storage node hosts for tens of
> thousands of customers).

Yep, at 100m per node process (ish), that does limit the # of users
per set of 2-4 machines to 500-ish (depending on RAM and actual usage,
of course).

Also...  Having everyone's data chunks mixed together on the backend
may not actually add more security but it can.  If we as a provider 
are forced to give the data we have over time, somoene could see with
dedicated nodes whether you have added content in a period of time,
which is some amount of information to be giving up. 

We toyed with the idea of auto shutdown and startup of the underlying
processes but I was concerned with what would happen with latencies
for procs to find the introducer, failures on first transactions, etc
and it seemed that at that point it was complex enough to just do
the Right Thing as you suggest below if we get a ton of users actively
using LAFS.  In the mean time, servers and RAM are cheap.

> It seems "the Right Way" to enable this is to implement the accounting
> roadmap merged into the open source trunk.  It is described here for
> all interested:
> 
> https://tahoe-lafs.org/trac/tahoe-lafs/browser/docs/proposed/accounting-overview.txt

<snip>

> If anyone reading this list is looking for what technology to donate
> time or funding to, I propose the accounting featureset for LAFS.
> IMO, this is more fundamental for scaling grid sizes or branching out
> into more use cases than any other proposed feature set.  Of course,
> distributed introduction and UI / usability features are crucial for
> wide spread adoption... but to me it seems accounting is the last
> infrastructural / incentive-engineering hurdle.

And it looks like the share placement is being addressed, which is
also a big deal - that's awesome...

http://markjberger.blogspot.com/2013/06/file-upload-in-tahoe-lafs.html

Then distributed introduction.

On the UI side, we get into a sticky wicket.

We are proxying to give people access to a web interface 'for them'
but of course for any US-based infrastructure we could be compelled
to log what they do with it.  So it's better if they use it on their
end.

Fast supported simple to use FUSE and syneed folders with a shared
cap for all data underneath are probably the simplest UI.  Or running
the web interface locally.

I just don't see enough people (as a % of the masses) to get that
set up while it requires CLI fu.

> > We've got the basics of node and introducer setup going but would
> > love to have some LAFS users do some testing with us next week.
> > I'll make sure that anyone who does it can have 50GB free of
> > private backend for a few years; if you're on this list as an
> > enthusiast, dev, or potential dev we'd just like to support the
> > community so that LAFS can fulfill Zooko+team's vision of
> > empowering user freedom [he puts it much better than I could
> > though].
> 
> Well, I'm really booked these days, but I'd love to help with user
> testing.  How will it be organized?  I recommend sharing a voice
> connection and screen share and watching the users walk through the
> entire process.

The backend is getting fancied up but that should be done by Sunday.
I'll shoot you a code to try it out.  We can do a call, hangout,
skype, whatever.

> Again, as a Least Authority employee this would also be competitive
> research.  ;-)  However, I see a lot of potential for LAFS service
> providers cooperating.  For example, if a single service provider can
> operate a single grid for all customers, then how much more effort
> does it take to have a single grid across providers?

I think we have shared everything we think and plan with the LA team
or at least with Zooko so there shouldn't be any issue there.

And I am hoping that we can offer cross-provider grids sooner 
rather than later in a few different ways.

<snip>

> A service provider FAQ would be awesome.  Also, if people with
> experience operating friend nets put together a FAQ for that usage, it
> would be a great complement.
> 
> BTW, I have a high tolerance for "silly questions" since I ask many
> myself.  ;-)  In other words, don't be shy on the list or on IRC.

Sounds good.

> > If we get some new-user usage of LAFS, we'll also try to
> > summarize the questions we get about LAFS use and user/usability
> > feedback to the list every so often.
> 
> That would be invaluable.  On this list, sometimes I wonder how many
> people get tripped up that we never hear from.  OTOH, when people are
> paying for a service, and the service operator's next meal comes from
> the service, there's much stronger feedback.

Yes.  We wimped out a bit by providing an S3-compatible interface also,
but I thought we had to in order to be able to satisfy the masses.

It'd be better if LAFS grew in features and usability, though.  I
think sometimes the religion of the correctness of approaches 
dominates the creativity towards coming up with expedient ways of
doing/using LAFS tech in such a way as to be parallel and less 
secure/private than 'correct' usage.

> > In our limited testing the first thing that sysadminnish
> > people have asked for is the ability to have something like
> > ssh key passphrases for caps.
> 
> Interesting.  I wonder if something like this could be designed which
> also has the "stateless" property of caps, meaning the string of bytes
> is all that's necessary to access that content (assuming one's on the
> correct grid).
> 
> Maybe a passphrase could deterministically generate the cryptographic
> keys in the cap derivation chain?  This would not work for convergent
> encryption on immutables, but it may be possible for mutables or
> non-convergent immutables.

iiiinteresting....

That is exactly the kind of discussion I'd love to see - how to
implement something that many see as silly, but in as elegant and
property-preserving way as possible.

But - we have 5 data points so we'll gather more before determining
that it's something to jump up and down about.

> Regards,
> nejucomo

Thanks for the details!

Also -

Anyone who will be at Defcon can come and ask us for a free LAFS
t-shirt (w LA and Havenco logos on the back).

If there are enough ppl we can do something social && || geeky.

Thanks,

Avi



More information about the tahoe-dev mailing list