source: trunk/docs/helper.rst

Last change on this file was 7b1bfad, checked in by Itamar Turner-Trauring <itamar@…>, at 2021-01-06T18:39:52Z

Rip out FTP.

  • Property mode set to 100644
File size: 8.3 KB
Line 
1.. -*- coding: utf-8-with-signature -*-
2
3=======================
4The Tahoe Upload Helper
5=======================
6
71. `Overview`_
82. `Setting Up A Helper`_
93. `Using a Helper`_
104. `Other Helper Modes`_
11
12Overview
13========
14
15As described in the "Swarming Download, Trickling Upload" section of
16:doc:`architecture`, Tahoe uploads require more bandwidth than downloads: you
17must push the redundant shares during upload, but you do not need to retrieve
18them during download. With the default 3-of-10 encoding parameters, this
19means that an upload will require about 3.3x the traffic as a download of the
20same file.
21
22Unfortunately, this "expansion penalty" occurs in the same upstream direction
23that most consumer DSL lines are slow anyways. Typical ADSL lines get 8 times
24as much download capacity as upload capacity. When the ADSL upstream penalty
25is combined with the expansion penalty, the result is uploads that can take
26up to 32 times longer than downloads.
27
28The "Helper" is a service that can mitigate the expansion penalty by
29arranging for the client node to send data to a central Helper node instead
30of sending it directly to the storage servers. It sends ciphertext to the
31Helper, so the security properties remain the same as with non-Helper
32uploads. The Helper is responsible for applying the erasure encoding
33algorithm and placing the resulting shares on the storage servers.
34
35Of course, the helper cannot mitigate the ADSL upstream penalty.
36
37The second benefit of using an upload helper is that clients who lose their
38network connections while uploading a file (because of a network flap, or
39because they shut down their laptop while an upload was in progress) can
40resume their upload rather than needing to start again from scratch. The
41helper holds the partially-uploaded ciphertext on disk, and when the client
42tries to upload the same file a second time, it discovers that the partial
43ciphertext is already present. The client then only needs to upload the
44remaining ciphertext. This reduces the "interrupted upload penalty" to a
45minimum.
46
47This also serves to reduce the number of active connections between the
48client and the outside world: most of their traffic flows over a single TCP
49connection to the helper. This can improve TCP fairness, and should allow
50other applications that are sharing the same uplink to compete more evenly
51for the limited bandwidth.
52
53Setting Up A Helper
54===================
55
56Who should consider running a helper?
57
58* Benevolent entities which wish to provide better upload speed for clients
59  that have slow uplinks
60* Folks which have machines with upload bandwidth to spare.
61* Server grid operators who want clients to connect to a small number of
62  helpers rather than a large number of storage servers (a "multi-tier"
63  architecture)
64
65What sorts of machines are good candidates for running a helper?
66
67* The Helper needs to have good bandwidth to the storage servers. In
68  particular, it needs to have at least 3.3x better upload bandwidth than
69  the client does, or the client might as well upload directly to the
70  storage servers. In a commercial grid, the helper should be in the same
71  colo (and preferably in the same rack) as the storage servers.
72* The Helper will take on most of the CPU load involved in uploading a file.
73  So having a dedicated machine will give better results.
74* The Helper buffers ciphertext on disk, so the host will need at least as
75  much free disk space as there will be simultaneous uploads. When an upload
76  is interrupted, that space will be used for a longer period of time.
77
78To turn a Tahoe-LAFS node into a helper (i.e. to run a helper service in
79addition to whatever else that node is doing), edit the tahoe.cfg file in your
80node's base directory and set "enabled = true" in the section named
81"[helper]".
82
83Then restart the node. This will signal the node to create a Helper service
84and listen for incoming requests. Once the node has started, there will be a
85file named private/helper.furl which contains the contact information for the
86helper: you will need to give this FURL to any clients that wish to use your
87helper.
88
89::
90
91  cat $BASEDIR/private/helper.furl | mail -s "helper furl" friend@example.com
92
93You can tell if your node is running a helper by looking at its web status
94page. Assuming that you've set up the 'webport' to use port 3456, point your
95browser at ``http://localhost:3456/`` . The welcome page will say "Helper: 0
96active uploads" or "Not running helper" as appropriate. The
97http://localhost:3456/helper_status page will also provide details on what
98the helper is currently doing.
99
100The helper will store the ciphertext that is is fetching from clients in
101$BASEDIR/helper/CHK_incoming/ . Once all the ciphertext has been fetched, it
102will be moved to $BASEDIR/helper/CHK_encoding/ and erasure-coding will
103commence. Once the file is fully encoded and the shares are pushed to the
104storage servers, the ciphertext file will be deleted.
105
106If a client disconnects while the ciphertext is being fetched, the partial
107ciphertext will remain in CHK_incoming/ until they reconnect and finish
108sending it. If a client disconnects while the ciphertext is being encoded,
109the data will remain in CHK_encoding/ until they reconnect and encoding is
110finished. For long-running and busy helpers, it may be a good idea to delete
111files in these directories that have not been modified for a week or two.
112Future versions of tahoe will try to self-manage these files a bit better.
113
114Using a Helper
115==============
116
117Who should consider using a Helper?
118
119* clients with limited upstream bandwidth, such as a consumer ADSL line
120* clients who believe that the helper will give them faster uploads than
121  they could achieve with a direct upload
122* clients who experience problems with TCP connection fairness: if other
123  programs or machines in the same home are getting less than their fair
124  share of upload bandwidth. If the connection is being shared fairly, then
125  a Tahoe upload that is happening at the same time as a single SFTP upload
126  should get half the bandwidth.
127* clients who have been given the helper.furl by someone who is running a
128  Helper and is willing to let them use it
129
130To take advantage of somebody else's Helper, take the helper furl that they
131give you, and edit your tahoe.cfg file. Enter the helper's furl into the
132value of the key "helper.furl" in the "[client]" section of tahoe.cfg, as
133described in the "Client Configuration" section of :doc:`configuration`.
134
135Then restart the node. This will signal the client to try and connect to the
136helper. Subsequent uploads will use the helper rather than using direct
137connections to the storage server.
138
139If the node has been configured to use a helper, that node's HTTP welcome
140page (``http://localhost:3456/``) will say "Helper: $HELPERFURL" instead of
141"Helper: None". If the helper is actually running and reachable, the bullet
142to the left of "Helper" will be green.
143
144The helper is optional. If a helper is connected when an upload begins, the
145upload will use the helper. If there is no helper connection present when an
146upload begins, that upload will connect directly to the storage servers. The
147client will automatically attempt to reconnect to the helper if the
148connection is lost, using the same exponential-backoff algorithm as all other
149tahoe/foolscap connections.
150
151The upload/download status page (``http://localhost:3456/status``) will announce
152the using-helper-or-not state of each upload, in the "Helper?" column.
153
154Other Helper Modes
155==================
156
157The Tahoe Helper only currently helps with one kind of operation: uploading
158immutable files. There are three other things it might be able to help with
159in the future:
160
161* downloading immutable files
162* uploading mutable files (such as directories)
163* downloading mutable files (like directories)
164
165Since mutable files are currently limited in size, the ADSL upstream penalty
166is not so severe for them. There is no ADSL penalty to downloads, but there
167may still be benefit to extending the helper interface to assist with them:
168fewer connections to the storage servers, and better TCP fairness.
169
170A future version of the Tahoe helper might provide assistance with these
171other modes. If it were to help with all four modes, then the clients would
172not need direct connections to the storage servers at all: clients would
173connect to helpers, and helpers would connect to servers. For a large grid
174with tens of thousands of clients, this might make the grid more scalable.
Note: See TracBrowser for help on using the repository browser.