#1657 new enhancement

Sneakernet grid scenario — at Version 12

Reported by: amontero Owned by: nobody
Priority: normal Milestone: undecided
Component: code-network Version: 1.8.3
Keywords: sneakernet repair location Cc: alsuren, mmoya@…
Launchpad Bug:

Description (last modified by amontero)

Hi all.

I'm trying to achieve a familynet/sneakernet grid.

As I'm learning more and more about tahoe, I'm still trying to improve some issues in order to achieve my goals. I've created this ticket to keep track of those issues and relevant tickets. Later, as I get advice/comments/suggestions I will spawn more detailed tickets as necessary, keeping them also tracked here.

Use case

Family grid for reciprocally storing each members personal files (mostly photos). I will be the sole admin of the grid because other grid members have no skills to manage the grid.

I created a root: dir as the grid admin where I create each users' home dir as a subdirectory of root. The users will store their pictures' backups in their home dirs via "tahoe backup". So, just keeping root dir healthy across all members will do in order to achieve safety of those backups.

Set up a introducer on public inet and a local storage node in each of the grid members. An important point to note here is that most of the time, when users will do their backups, their local node will be the only present node on the grid. So I lowered "shares.happy" to 1. The rest are as default 3/7/10. Thus the 'sneakernet' grid name.

I'm doing the replication work manually when 2 nodes do rendez-vous and that's the only time when they will have direct (ie LAN) connection. For example, my brother has a node stored in a external USB drive and he brings it with him to my home. My computer is a desktop one, but my dad's is a laptop, and so on.

They will rendez-vous their "home" nodes carrying their latest backups to another member's node and repair, thus getting also those member backups replicated in the exchange, too.

Issues

Here are some issues and how I'm addressing them:

  1. Storage use: I don't want any node to store a full set of shares since it doesn't add security to the grid and it is a waste. I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node. Now I'm acomplishing this by fully repairing the grid on that node isolated and later pruning storage shares down to the desired count with a script (it's dirty, but works). Thinking about it, I've come to the conclusion that a 'hold-no-more-than-Z-shares' kind of setting for storage nodes will help me a lot. Ticket #711 would also be useful. Also #1340 and comment on #1212
  2. Repairing: Related to the above, I have to always ensure that no repair will end with all shares on the same node. So before doing a repair between 2 nodes I ensure that each isolated node is 100% repaired (10/10) and all files healthy. Then I 'prune' the storage shares to 5 and now is when I can do a 2 node verify/repair. I know this is very inefficient, so any advice on how to improve this is welcome.
  3. Verification: I would like to place in each node crontab to do a deep-check-verify of the root verifycap and currently I can't because of #568. So I keep an eye on it.
  4. Verification: In my usage scenario, a healthy file will be any one just readable in the local node or somewhat configurable. Related issues: #614, #1212.
  5. Verification caps: I also planned to ease the verification/repair process via the WUI by linking the root verifycap into each user's home dir. But the WUI gives me an error when attempting to do it. I plan to use this also for establishing a "reciprocity list" for each user. I mean, if I grow the grid to outsiders, and I don't want them to hold some users home dirs, a "verifycaps" folder with the desired users home's verifycaps will do. In both members and outsiders cases, they just have to deep-check-repair their home's verifycaps-dir.
  6. Helper: Another idea I've come to is having a helper node that could "spool" the shares until they were pushed to at least X different nodes or until configurable expiration. Since the helper would be accessible by everyone, that would mitigate the isolation effect when doing backups. This can be useful for more use cases, IMO.

I've also read a lot of tickets with rebalancing issues and server distribution, but I doubt they'll fit to my use case. And since I'm not a Python programmer, I think bite-sized and simpler issues will allow me to help test improvements and suggestions and get to a usable state soon.

I'll keep adding issues as they come up. I know I'm trying to address too much issues in one single ticket, but I'm doing it to keep them organized in a single place. I expect to get some starting tips or advice on improving my use case and will gladly open new tickets as needed to get into the details, referencing this ticket. Later, this issue can be used as a base documentation for those trying to achieve the same scenario.

Thanks in advance.

Change History (12)

comment:1 follow-up: Changed at 2012-01-16T06:32:49Z by zooko

Hi, amontero! Welcome. Thanks for the good ticket. Please see also https://tahoe-lafs.org/trac/tahoe-lafs/wiki/ServerSelection if you haven't already, and add a link from there back to this ticket. Thank you!

comment:2 in reply to: ↑ 1 Changed at 2012-01-16T19:35:46Z by amontero

Replying to zooko:

Hi, amontero! Welcome. Thanks for the good ticket. Please see also https://tahoe-lafs.org/trac/tahoe-lafs/wiki/ServerSelection if you haven't already, and add a link from there back to this ticket. Thank you!

Hi Zooko, Thanks for the prompt response. Linked this ticket in ServerSelection and also in the UseCases page, too. I will wait some time for input on how I can improve my scenario. If there is no much room for improving, then I will spawn tickets as necessary for documenting needed features.

comment:3 follow-up: Changed at 2012-01-16T22:57:40Z by davidsarah

I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node.

How does this improve on replication (i.e. k = 1)? Replication is simpler.

comment:4 Changed at 2012-01-20T20:53:41Z by amontero

  • Description modified (diff)
  • Version changed from 1.9.0 to 1.8.3

comment:5 in reply to: ↑ 3 Changed at 2012-01-20T21:01:55Z by amontero

Replying to davidsarah:

I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node.

How does this improve on replication (i.e. k = 1)? Replication is simpler.

Hi [...]

As most of time the nodes will be isolated, the default N/happy/k of 3/7/10 doesn't works, so by now I'm using 3/1/10. Using 3/7 just because it is the recommended setting in the docs and I think (maybe I'm wrong) they're a fine enough settings for my goals.

Do you mean setting 1/1/10? That would be a worst case of space waste as I understand. I made the test, just to be sure, and it is a tenfold increase in space requirements when uploading. The grid members have to do their backups isolated and replicate to other nodes when they rendez-vous.

Or do you mean some other values for N/k? You made me think about it, and since my goal seems a replication (with LAFS privacy) setup it may work. But I can see no difference apart from the expansion factor in let's say 1/1/2. I'm aware that, as shares would be named either 0 or 1, that would make it somewhat simpler to script management, but the annoyances when repairing would stand.

A downside I can think of with less shares is losing the ability to hold an "extra share more than needed" at a reasonable space cost. I would like to keep it, just to be covered against "bit flippings" since I'll be using USB external drives as nodes also. However, that is secondary and I can give up that ability if thus makes things easier.

I've set up a testbed with 1/1/2 and as long as #1 and #2 it makes little difference when managing the grid nodes, so I would like to ask you help me understand better what N/k parameters do you mean.

Thanks.

Last edited at 2013-09-04T22:00:44Z by daira (previous) (diff)

comment:6 Changed at 2012-03-04T20:37:23Z by amontero

  • Description modified (diff)

comment:7 Changed at 2012-03-12T19:24:02Z by davidsarah

  • Keywords sneakernet repair location added

I meant, how is (k, (k+1)*H, (k+1)*Z) for k > 1 and (k+1) shares on each node, better than (1, H, Z)? They have approximately the same reliability, but (1, H, Z) is simpler and has a slightly lower expansion factor. The extra share on each node doesn't make much difference to reliability because bit flips are not much more likely than whole node failures.

comment:8 Changed at 2012-03-29T19:12:37Z by davidsarah

  • Priority changed from major to normal

comment:9 Changed at 2012-08-07T17:50:14Z by alsuren

Sorry for the "me to" bug comment noise, but I can't find a way to subscribe to changes on the ticket.

The low upload bandwidth problem is one that becomes painfully obvious when bootstrapping any cloud-based backup/sharing service (like google drive and dropbox) with more than 10GB of data.

http://www.kickstarter.com/projects/joeyh/git-annex-assistant-like-dropbox-but-with-your-own is an attempt to solve a similar use-case.

comment:10 Changed at 2012-08-07T18:40:42Z by davidsarah

  • Cc alsuren added

comment:11 Changed at 2013-03-03T15:05:21Z by mmoya

  • Cc mmoya@… added

comment:12 Changed at 2013-11-30T20:15:55Z by amontero

  • Description modified (diff)

I've opened #2123, which would help get closer to address items 1, 2 and 4.

Note: See TracTickets for help on using tickets.