Context Navigation

source: trunk/docs/architecture.rst

Visit:

Last change on this file was 7e94ab2, checked in by GitHub <noreply@…>, at 2023-08-01T14:58:52Z
Merge branch 'master' into 4041
Property mode set to `100644`
File size: 30.2 KB

Line
1	.. -- coding: utf-8-with-signature --
2
3	=======================
4	Tahoe-LAFS Architecture
5	=======================
6
7	1. `Overview`_
8	2. `The Key-Value Store`_
9	3. `File Encoding`_
10	4. `Capabilities`_
11	5. `Server Selection`_
12	6. `Swarming Download, Trickling Upload`_
13	7. `The File Store Layer`_
14	8. `Leases, Refreshing, Garbage Collection`_
15	9. `File Repairer`_
16	10. `Security`_
17	11. `Reliability`_
18
19
20	Overview
21	========
22
23	(See the `docs/specifications directory`_ for more details.)
24
25	There are three layers: the key-value store, the file store, and the
26	application.
27
28	The lowest layer is the key-value store. The keys are "capabilities" -- short
29	ASCII strings -- and the values are sequences of data bytes. This data is
30	encrypted and distributed across a number of nodes, such that it will survive
31	the loss of most of the nodes. There are no hard limits on the size of the
32	values, but there may be performance issues with extremely large values (just
33	due to the limitation of network bandwidth). In practice, values as small as
34	a few bytes and as large as tens of gigabytes are in common use.
35
36	The middle layer is the decentralized file store: a directed graph in which
37	the intermediate nodes are directories and the leaf nodes are files. The leaf
38	nodes contain only the data -- they contain no metadata other than the length
39	in bytes. The edges leading to leaf nodes have metadata attached to them
40	about the file they point to. Therefore, the same file may be associated with
41	different metadata if it is referred to through different edges.
42
43	The top layer consists of the applications using the file store.
44	Allmydata.com used it for a backup service: the application periodically
45	copies files from the local disk onto the decentralized file store. We later
46	provide read-only access to those files, allowing users to recover them.
47	There are several other applications built on top of the Tahoe-LAFS
48	file store (see the RelatedProjects_ page of the wiki for a list).
49
50	.. _docs/specifications directory: https://github.com/tahoe-lafs/tahoe-lafs/tree/master/docs/specifications
51	.. _RelatedProjects: https://tahoe-lafs.org/trac/tahoe-lafs/wiki/RelatedProjects
52
53	The Key-Value Store
54	===================
55
56	The key-value store is implemented by a grid of Tahoe-LAFS storage servers --
57	user-space processes. Tahoe-LAFS storage clients communicate with the storage
58	servers over TCP.
59
60	There are two supported protocols:
61
62	* Foolscap, the only supported protocol in release before v1.19.
63	* HTTPS, new in v1.19.
64
65	By default HTTPS is enabled. When HTTPS is enabled on the server, the server
66	transparently listens for both Foolscap and HTTPS on the same port. When it is
67	disabled, the server only supports Foolscap. Clients can use either; by default
68	they will use HTTPS when possible, falling back to I2p, but when configured
69	appropriately they will only use Foolscap. At this time the only limitations of
70	HTTPS is that I2P is not supported, so any usage of I2P only uses Foolscap.
71
72	Storage servers hold data in the form of "shares". Shares are encoded pieces
73	of files. There are a configurable number of shares for each file, 10 by
74	default. Normally, each share is stored on a separate server, but in some
75	cases a single server can hold multiple shares of a file.
76
77	Nodes learn about each other through an "introducer". Each server connects to
78	the introducer at startup and announces its presence. Each client connects to
79	the introducer at startup, and receives a list of all servers from it. Each
80	client then connects to every server, creating a "bi-clique" topology. In the
81	current release, nodes behind NAT boxes will connect to all nodes that they
82	can open connections to, but they cannot open connections to other nodes
83	behind NAT boxes. Therefore, the more nodes behind NAT boxes, the less the
84	topology resembles the intended bi-clique topology.
85
86	The introducer is a Single Point of Failure ("SPoF"), in that clients who
87	never connect to the introducer will be unable to connect to any storage
88	servers, but once a client has been introduced to everybody, it does not need
89	the introducer again until it is restarted. The danger of a SPoF is further
90	reduced in two ways. First, the introducer is defined by a hostname and a
91	private key, which are easy to move to a new host in case the original one
92	suffers an unrecoverable hardware problem. Second, even if the private key is
93	lost, clients can be reconfigured to use a new introducer.
94
95	For future releases, we have plans to decentralize introduction, allowing any
96	server to tell a new client about all the others.
97
98
99	File Encoding
100	=============
101
102	When a client stores a file on the grid, it first encrypts the file. It then
103	breaks the encrypted file into small segments, in order to reduce the memory
104	footprint, and to decrease the lag between initiating a download and
105	receiving the first part of the file; for example the lag between hitting
106	"play" and a movie actually starting.
107
108	The client then erasure-codes each segment, producing blocks of which only a
109	subset are needed to reconstruct the segment (3 out of 10, with the default
110	settings).
111
112	It sends one block from each segment to a given server. The set of blocks on
113	a given server constitutes a "share". Therefore a subset of the shares (3 out
114	of 10, by default) are needed to reconstruct the file.
115
116	A hash of the encryption key is used to form the "storage index", which is
117	used for both server selection (described below) and to index shares within
118	the Storage Servers on the selected nodes.
119
120	The client computes secure hashes of the ciphertext and of the shares. It
121	uses `Merkle Trees`_ so that it is possible to verify the correctness of a
122	subset of the data without requiring all of the data. For example, this
123	allows you to verify the correctness of the first segment of a movie file and
124	then begin playing the movie file in your movie viewer before the entire
125	movie file has been downloaded.
126
127	These hashes are stored in a small datastructure named the Capability
128	Extension Block which is stored on the storage servers alongside each share.
129
130	The capability contains the encryption key, the hash of the Capability
131	Extension Block, and any encoding parameters necessary to perform the
132	eventual decoding process. For convenience, it also contains the size of the
133	file being stored.
134
135	To download, the client that wishes to turn a capability into a sequence of
136	bytes will obtain the blocks from storage servers, use erasure-decoding to
137	turn them into segments of ciphertext, use the decryption key to convert that
138	into plaintext, then emit the plaintext bytes to the output target.
139
140	.. _`Merkle Trees`: http://systems.cs.colorado.edu/grunwald/Classes/Fall2003-InformationStorage/Papers/merkle-tree.pdf
141
142
143	Capabilities
144	============
145
146	Capabilities to immutable files represent a specific set of bytes. Think of
147	it like a hash function: you feed in a bunch of bytes, and you get out a
148	capability, which is deterministically derived from the input data: changing
149	even one bit of the input data will result in a completely different
150	capability.
151
152	Read-only capabilities to mutable files represent the ability to get a set of
153	bytes representing some version of the file, most likely the latest version.
154	Each read-only capability is unique. In fact, each mutable file has a unique
155	public/private key pair created when the mutable file is created, and the
156	read-only capability to that file includes a secure hash of the public key.
157
158	Read-write capabilities to mutable files represent the ability to read the
159	file (just like a read-only capability) and also to write a new version of
160	the file, overwriting any extant version. Read-write capabilities are unique
161	-- each one includes the secure hash of the private key associated with that
162	mutable file.
163
164	The capability provides both "location" and "identification": you can use it
165	to retrieve a set of bytes, and then you can use it to validate ("identify")
166	that these potential bytes are indeed the ones that you were looking for.
167
168	The "key-value store" layer doesn't include human-meaningful names.
169	Capabilities sit on the "global+secure" edge of `Zooko's Triangle`_. They are
170	self-authenticating, meaning that nobody can trick you into accepting a file
171	that doesn't match the capability you used to refer to that file. The
172	file store layer (described below) adds human-meaningful names atop the
173	key-value layer.
174
175	.. _`Zooko's Triangle`: https://en.wikipedia.org/wiki/Zooko%27s_triangle
176
177
178	Server Selection
179	================
180
181	When a file is uploaded, the encoded shares are sent to some servers. But to
182	which ones? The "server selection" algorithm is used to make this choice.
183
184	The storage index is used to consistently-permute the set of all servers nodes
185	(by sorting them by ``HASH(storage_index+nodeid)``). Each file gets a different
186	permutation, which (on average) will evenly distribute shares among the grid
187	and avoid hotspots. Each server has announced its available space when it
188	connected to the introducer, and we use that available space information to
189	remove any servers that cannot hold an encoded share for our file. Then we ask
190	some of the servers thus removed if they are already holding any encoded shares
191	for our file; we use this information later. (We ask any servers which are in
192	the first 2*``N`` elements of the permuted list.)
193
194	We then use the permuted list of servers to ask each server, in turn, if it
195	will hold a share for us (a share that was not reported as being already
196	present when we talked to the full servers earlier, and that we have not
197	already planned to upload to a different server). We plan to send a share to a
198	server by sending an 'allocate_buckets() query' to the server with the number
199	of that share. Some will say yes they can hold that share, others (those who
200	have become full since they announced their available space) will say no; when
201	a server refuses our request, we take that share to the next server on the
202	list. In the response to allocate_buckets() the server will also inform us of
203	any shares of that file that it already has. We keep going until we run out of
204	shares that need to be stored. At the end of the process, we'll have a table
205	that maps each share number to a server, and then we can begin the encode and
206	push phase, using the table to decide where each share should be sent.
207
208	Most of the time, this will result in one share per server, which gives us
209	maximum reliability. If there are fewer writable servers than there are
210	unstored shares, we'll be forced to loop around, eventually giving multiple
211	shares to a single server.
212
213	If we have to loop through the node list a second time, we accelerate the query
214	process, by asking each node to hold multiple shares on the second pass. In
215	most cases, this means we'll never send more than two queries to any given
216	node.
217
218	If a server is unreachable, or has an error, or refuses to accept any of our
219	shares, we remove it from the permuted list, so we won't query it again for
220	this file. If a server already has shares for the file we're uploading, we add
221	that information to the share-to-server table. This lets us do less work for
222	files which have been uploaded once before, while making sure we still wind up
223	with as many shares as we desire.
224
225	Before a file upload is called successful, it has to pass an upload health
226	check. For immutable files, we check to see that a condition called
227	'servers-of-happiness' is satisfied. When satisfied, 'servers-of-happiness'
228	assures us that enough pieces of the file are distributed across enough
229	servers on the grid to ensure that the availability of the file will not be
230	affected if a few of those servers later fail. For mutable files and
231	directories, we check to see that all of the encoded shares generated during
232	the upload process were successfully placed on the grid. This is a weaker
233	check than 'servers-of-happiness'; it does not consider any information about
234	how the encoded shares are placed on the grid, and cannot detect situations in
235	which all or a majority of the encoded shares generated during the upload
236	process reside on only one storage server. We hope to extend
237	'servers-of-happiness' to mutable files in a future release of Tahoe-LAFS. If,
238	at the end of the upload process, the appropriate upload health check fails,
239	the upload is considered a failure.
240
241	The current defaults use ``k`` = 3, ``servers_of_happiness`` = 7, and ``N`` = 10.
242	``N`` = 10 means that we'll try to place 10 shares. ``k`` = 3 means that we need
243	any three shares to recover the file. ``servers_of_happiness`` = 7 means that
244	we'll consider an immutable file upload to be successful if we can place shares
245	on enough servers that there are 7 different servers, the correct functioning
246	of any ``k`` of which guarantee the availability of the immutable file.
247
248	``N`` = 10 and ``k`` = 3 means there is a 3.3x expansion factor. On a small grid, you
249	should set ``N`` about equal to the number of storage servers in your grid; on a
250	large grid, you might set it to something smaller to avoid the overhead of
251	contacting every server to place a file. In either case, you should then set ``k``
252	such that ``N``/``k`` reflects your desired availability goals. The best value for
253	``servers_of_happiness`` will depend on how you use Tahoe-LAFS. In a friendnet
254	with a variable number of servers, it might make sense to set it to the smallest
255	number of servers that you expect to have online and accepting shares at any
256	given time. In a stable environment without much server churn, it may make
257	sense to set ``servers_of_happiness`` = ``N``.
258
259	When downloading a file, the current version just asks all known servers for
260	any shares they might have. Once it has received enough responses that it
261	knows where to find the needed k shares, it downloads at least the first
262	segment from those servers. This means that it tends to download shares from
263	the fastest servers. If some servers had more than one share, it will continue
264	sending "Do You Have Block" requests to other servers, so that it can download
265	subsequent segments from distinct servers (sorted by their DYHB round-trip
266	times), if possible.
267
268	future work
269
270	A future release will use the server selection algorithm to reduce the
271	number of queries that must be sent out.
272
273	Other peer-node selection algorithms are possible. One earlier version
274	(known as "Tahoe 3") used the permutation to place the nodes around a large
275	ring, distributed the shares evenly around the same ring, then walked
276	clockwise from 0 with a basket. Each time it encountered a share, it put it
277	in the basket, each time it encountered a server, give it as many shares
278	from the basket as they'd accept. This reduced the number of queries
279	(usually to 1) for small grids (where ``N`` is larger than the number of
280	nodes), but resulted in extremely non-uniform share distribution, which
281	significantly hurt reliability (sometimes the permutation resulted in most
282	of the shares being dumped on a single node).
283
284	Another algorithm (known as "denver airport" [#naming]_) uses the permuted hash to
285	decide on an approximate target for each share, then sends lease requests
286	via Chord routing. The request includes the contact information of the
287	uploading node, and asks that the node which eventually accepts the lease
288	should contact the uploader directly. The shares are then transferred over
289	direct connections rather than through multiple Chord hops. Download uses
290	the same approach. This allows nodes to avoid maintaining a large number of
291	long-term connections, at the expense of complexity and latency.
292
293	.. [#naming] all of these names are derived from the location where they were
294	concocted, in this case in a car ride from Boulder to DEN. To be
295	precise, "Tahoe 1" was an unworkable scheme in which everyone who holds
296	shares for a given file would form a sort of cabal which kept track of
297	all the others, "Tahoe 2" is the first-100-nodes in the permuted hash
298	described in this document, and "Tahoe 3" (or perhaps "Potrero hill 1")
299	was the abandoned ring-with-many-hands approach.
300
301
302	Swarming Download, Trickling Upload
303	===================================
304
305	Because the shares being downloaded are distributed across a large number of
306	nodes, the download process will pull from many of them at the same time. The
307	current encoding parameters require 3 shares to be retrieved for each
308	segment, which means that up to 3 nodes will be used simultaneously. For
309	larger networks, 8-of-22 encoding could be used, meaning 8 nodes can be used
310	simultaneously. This allows the download process to use the sum of the
311	available nodes' upload bandwidths, resulting in downloads that take full
312	advantage of the common 8x disparity between download and upload bandwith on
313	modern ADSL lines.
314
315	On the other hand, uploads are hampered by the need to upload encoded shares
316	that are larger than the original data (3.3x larger with the current default
317	encoding parameters), through the slow end of the asymmetric connection. This
318	means that on a typical 8x ADSL line, uploading a file will take about 32
319	times longer than downloading it again later.
320
321	Smaller expansion ratios can reduce this upload penalty, at the expense of
322	reliability (see `Reliability`_, below). By using an "upload helper", this
323	penalty is eliminated: the client does a 1x upload of encrypted data to the
324	helper, then the helper performs encoding and pushes the shares to the
325	storage servers. This is an improvement if the helper has significantly
326	higher upload bandwidth than the client, so it makes the most sense for a
327	commercially-run grid for which all of the storage servers are in a colo
328	facility with high interconnect bandwidth. In this case, the helper is placed
329	in the same facility, so the helper-to-storage-server bandwidth is huge.
330
331	See :doc:`helper` for details about the upload helper.
332
333
334	The File Store Layer
335	====================
336
337	The "file store" layer is responsible for mapping human-meaningful pathnames
338	(directories and filenames) to pieces of data. The actual bytes inside these
339	files are referenced by capability, but the file store layer is where the
340	directory names, file names, and metadata are kept.
341
342	The file store layer is a graph of directories. Each directory contains a
343	table of named children. These children are either other directories or
344	files. All children are referenced by their capability.
345
346	A directory has two forms of capability: read-write caps and read-only caps.
347	The table of children inside the directory has a read-write and read-only
348	capability for each child. If you have a read-only capability for a given
349	directory, you will not be able to access the read-write capability of its
350	children. This results in "transitively read-only" directory access.
351
352	By having two different capabilities, you can choose which you want to share
353	with someone else. If you create a new directory and share the read-write
354	capability for it with a friend, then you will both be able to modify its
355	contents. If instead you give them the read-only capability, then they will
356	not be able to modify the contents. Any capability that you receive can be
357	linked in to any directory that you can modify, so very powerful
358	shared+published directory structures can be built from these components.
359
360	This structure enable individual users to have their own personal space, with
361	links to spaces that are shared with specific other users, and other spaces
362	that are globally visible.
363
364
365	Leases, Refreshing, Garbage Collection
366	======================================
367
368	When a file or directory in the file store is no longer referenced, the space
369	that its shares occupied on each storage server can be freed, making room for
370	other shares. Tahoe-LAFS uses a garbage collection ("GC") mechanism to
371	implement this space-reclamation process. Each share has one or more
372	"leases", which are managed by clients who want the file/directory to be
373	retained. The storage server accepts each share for a pre-defined period of
374	time, and is allowed to delete the share if all of the leases are cancelled
375	or allowed to expire.
376
377	Garbage collection is not enabled by default: storage servers will not delete
378	shares without being explicitly configured to do so. When GC is enabled,
379	clients are responsible for renewing their leases on a periodic basis at
380	least frequently enough to prevent any of the leases from expiring before the
381	next renewal pass.
382
383	See :doc:`garbage-collection` for further information, and for how to
384	configure garbage collection.
385
386	File Repairer
387	=============
388
389	Shares may go away because the storage server hosting them has suffered a
390	failure: either temporary downtime (affecting availability of the file), or a
391	permanent data loss (affecting the preservation of the file). Hard drives
392	crash, power supplies explode, coffee spills, and asteroids strike. The goal
393	of a robust distributed file store is to survive these setbacks.
394
395	To work against this slow, continual loss of shares, a File Checker is used
396	to periodically count the number of shares still available for any given
397	file. A more extensive form of checking known as the File Verifier can
398	download the ciphertext of the target file and perform integrity checks
399	(using strong hashes) to make sure the data is still intact. When the file is
400	found to have decayed below some threshold, the File Repairer can be used to
401	regenerate and re-upload the missing shares. These processes are conceptually
402	distinct (the repairer is only run if the checker/verifier decides it is
403	necessary), but in practice they will be closely related, and may run in the
404	same process.
405
406	The repairer process does not get the full capability of the file to be
407	maintained: it merely gets the "repairer capability" subset, which does not
408	include the decryption key. The File Verifier uses that data to find out
409	which nodes ought to hold shares for this file, and to see if those nodes are
410	still around and willing to provide the data. If the file is not healthy
411	enough, the File Repairer is invoked to download the ciphertext, regenerate
412	any missing shares, and upload them to new nodes. The goal of the File
413	Repairer is to finish up with a full set of ``N`` shares.
414
415	There are a number of engineering issues to be resolved here. The bandwidth,
416	disk IO, and CPU time consumed by the verification/repair process must be
417	balanced against the robustness that it provides to the grid. The nodes
418	involved in repair will have very different access patterns than normal
419	nodes, such that these processes may need to be run on hosts with more memory
420	or network connectivity than usual. The frequency of repair will directly
421	affect the resources consumed. In some cases, verification of multiple files
422	can be performed at the same time, and repair of files can be delegated off
423	to other nodes.
424
425	future work
426
427	Currently there are two modes of checking on the health of your file:
428	"Checker" simply asks storage servers which shares they have and does
429	nothing to try to verify that they aren't lying. "Verifier" downloads and
430	cryptographically verifies every bit of every share of the file from every
431	server, which costs a lot of network and CPU. A future improvement would be
432	to make a random-sampling verifier which downloads and cryptographically
433	verifies only a few randomly-chosen blocks from each server. This would
434	require much less network and CPU but it could make it extremely unlikely
435	that any sort of corruption -- even malicious corruption intended to evade
436	detection -- would evade detection. This would be an instance of a
437	cryptographic notion called "Proof of Retrievability". Note that to implement
438	this requires no change to the server or to the cryptographic data structure
439	-- with the current data structure and the current protocol it is up to the
440	client which blocks they choose to download, so this would be solely a change
441	in client behavior.
442
443
444	Security
445	========
446
447	The design goal for this project is that an attacker may be able to deny
448	service (i.e. prevent you from recovering a file that was uploaded earlier)
449	but can accomplish none of the following three attacks:
450
451	1) violate confidentiality: the attacker gets to view data to which you have
452	not granted them access
453	2) violate integrity: the attacker convinces you that the wrong data is
454	actually the data you were intending to retrieve
455	3) violate unforgeability: the attacker gets to modify a mutable file or
456	directory (either the pathnames or the file contents) to which you have
457	not given them write permission
458
459	Integrity (the promise that the downloaded data will match the uploaded data)
460	is provided by the hashes embedded in the capability (for immutable files) or
461	the digital signature (for mutable files). Confidentiality (the promise that
462	the data is only readable by people with the capability) is provided by the
463	encryption key embedded in the capability (for both immutable and mutable
464	files). Data availability (the hope that data which has been uploaded in the
465	past will be downloadable in the future) is provided by the grid, which
466	distributes failures in a way that reduces the correlation between individual
467	node failure and overall file recovery failure, and by the erasure-coding
468	technique used to generate shares.
469
470	Many of these security properties depend upon the usual cryptographic
471	assumptions: the resistance of AES and RSA to attack, the resistance of
472	SHA-256 to collision attacks and pre-image attacks, and upon the proximity of
473	2^-128 and 2^-256 to zero. A break in AES would allow a confidentiality
474	violation, a collision break in SHA-256 would allow a consistency violation,
475	and a break in RSA would allow a mutability violation.
476
477	There is no attempt made to provide anonymity, neither of the origin of a
478	piece of data nor the identity of the subsequent downloaders. In general,
479	anyone who already knows the contents of a file will be in a strong position
480	to determine who else is uploading or downloading it. Also, it is quite easy
481	for a sufficiently large coalition of nodes to correlate the set of nodes who
482	are all uploading or downloading the same file, even if the attacker does not
483	know the contents of the file in question.
484
485	Also note that the file size and (when convergence is being used) a keyed
486	hash of the plaintext are not protected. Many people can determine the size
487	of the file you are accessing, and if they already know the contents of a
488	given file, they will be able to determine that you are uploading or
489	downloading the same one.
490
491	The capability-based security model is used throughout this project.
492	Directory operations are expressed in terms of distinct read- and write-
493	capabilities. Knowing the read-capability of a file is equivalent to the
494	ability to read the corresponding data. The capability to validate the
495	correctness of a file is strictly weaker than the read-capability (possession
496	of read-capability automatically grants you possession of
497	validate-capability, but not vice versa). These capabilities may be expressly
498	delegated (irrevocably) by simply transferring the relevant secrets.
499
500	The application layer can provide whatever access model is desired, built on
501	top of this capability access model.
502
503
504	Reliability
505	===========
506
507	File encoding and peer-node selection parameters can be adjusted to achieve
508	different goals. Each choice results in a number of properties; there are
509	many tradeoffs.
510
511	First, some terms: the erasure-coding algorithm is described as ``k``-out-of-``N``
512	(for this release, the default values are ``k`` = 3 and ``N`` = 10). Each grid will
513	have some number of nodes; this number will rise and fall over time as nodes
514	join, drop out, come back, and leave forever. Files are of various sizes, some
515	are popular, others are unpopular. Nodes have various capacities, variable
516	upload/download bandwidths, and network latency. Most of the mathematical
517	models that look at node failure assume some average (and independent)
518	probability 'P' of a given node being available: this can be high (servers
519	tend to be online and available >90% of the time) or low (laptops tend to be
520	turned on for an hour then disappear for several days). Files are encoded in
521	segments of a given maximum size, which affects memory usage.
522
523	The ratio of ``N``/``k`` is the "expansion factor". Higher expansion factors
524	improve reliability very quickly (the binomial distribution curve is very sharp),
525	but consumes much more grid capacity. When P=50%, the absolute value of ``k``
526	affects the granularity of the binomial curve (1-out-of-2 is much worse than
527	50-out-of-100), but high values asymptotically approach a constant (i.e.
528	500-of-1000 is not much better than 50-of-100). When P is high and the
529	expansion factor is held at a constant, higher values of ``k`` and ``N`` give
530	much better reliability (for P=99%, 50-out-of-100 is much much better than
531	5-of-10, roughly 10^50 times better), because there are more shares that can
532	be lost without losing the file.
533
534	Likewise, the total number of nodes in the network affects the same
535	granularity: having only one node means a single point of failure, no matter
536	how many copies of the file you make. Independent nodes (with uncorrelated
537	failures) are necessary to hit the mathematical ideals: if you have 100 nodes
538	but they are all in the same office building, then a single power failure
539	will take out all of them at once. Pseudospoofing, also called a "Sybil Attack",
540	is where a single attacker convinces you that they are actually multiple
541	servers, so that you think you are using a large number of independent nodes,
542	but in fact you have a single point of failure (where the attacker turns off
543	all their machines at once). Large grids, with lots of truly independent nodes,
544	will enable the use of lower expansion factors to achieve the same reliability,
545	but will increase overhead because each node needs to know something about
546	every other, and the rate at which nodes come and go will be higher (requiring
547	network maintenance traffic). Also, the File Repairer work will increase with
548	larger grids, although then the job can be distributed out to more nodes.
549
550	Higher values of ``N`` increase overhead: more shares means more Merkle hashes
551	that must be included with the data, and more nodes to contact to retrieve
552	the shares. Smaller segment sizes reduce memory usage (since each segment
553	must be held in memory while erasure coding runs) and improves "alacrity"
554	(since downloading can validate a smaller piece of data faster, delivering it
555	to the target sooner), but also increase overhead (because more blocks means
556	more Merkle hashes to validate them).
557
558	In general, small private grids should work well, but the participants will
559	have to decide between storage overhead and reliability. Large stable grids
560	will be able to reduce the expansion factor down to a bare minimum while
561	still retaining high reliability, but large unstable grids (where nodes are
562	coming and going very quickly) may require more repair/verification bandwidth
563	than actual upload/download traffic.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: