Context Navigation

source: trunk/README.rst

Last change on this file was 307b550, checked in by Ramakrishnan Muthukrishnan <ram@…>, 9 years ago
zfec: rearrange files
Property mode set to `100644`
File size: 14.5 KB

Line
1
2
3	zfec -- efficient, portable erasure coding tool
4	===============================================
5
6	Generate redundant blocks of information such that if some of the blocks are
7	lost then the original data can be recovered from the remaining blocks. This
8	package includes command-line tools, C API, Python API, and Haskell API.
9
10
11	Intro and Licence
12	-----------------
13
14	This package implements an "erasure code", or "forward error correction
15	code".
16
17	You may use this package under the GNU General Public License, version 2 or,
18	at your option, any later version. You may use this package under the
19	Transitive Grace Period Public Licence, version 1.0 or, at your option, any
20	later version. (You may choose to use this package under the terms of either
21	licence, at your option.) See the file COPYING.GPL for the terms of the GNU
22	General Public License, version 2. See the file COPYING.TGPPL.rst for the
23	terms of the Transitive Grace Period Public Licence, version 1.0.
24
25	The most widely known example of an erasure code is the RAID-5 algorithm
26	which makes it so that in the event of the loss of any one hard drive, the
27	stored data can be completely recovered. The algorithm in the zfec package
28	has a similar effect, but instead of recovering from the loss of only a
29	single element, it can be parameterized to choose in advance the number of
30	elements whose loss it can tolerate.
31
32	This package is largely based on the old "fec" library by Luigi Rizzo et al.,
33	which is a mature and optimized implementation of erasure coding. The zfec
34	package makes several changes from the original "fec" package, including
35	addition of the Python API, refactoring of the C API to support zero-copy
36	operation, a few clean-ups and optimizations of the core code itself, and the
37	addition of a command-line tool named "zfec".
38
39
40	Installation
41	------------
42
43	This package is managed with the "setuptools" package management tool. To
44	build and install the package directly into your system, just run ``python
45	./setup.py install``. If you prefer to keep the package limited to a
46	specific directory so that you can manage it yourself (perhaps by using the
47	"GNU stow") tool, then give it these arguments: ``python ./setup.py install
48	--single-version-externally-managed
49	--record=${specificdirectory}/zfec-install.log
50	--prefix=${specificdirectory}``
51
52	To run the self-tests, execute ``python ./setup.py test`` (or if you have
53	Twisted Python installed, you can run ``trial zfec`` for nicer output and
54	test options.) This will run the tests of the C API, the Python API, and the
55	command-line tools.
56
57	To run the tests of the Haskell API: ``runhaskell haskell/test/FECTest.hs``
58
59	Note that in order to run the Haskell API tests you must have installed the
60	library first due to the fact that the interpreter cannot process FEC.hs as
61	it takes a reference to an FFI function.
62
63
64	Community
65	---------
66
67	The source is currently available via darcs on the web with the command:
68
69	darcs get https://tahoe-lafs.org/source/zfec/trunk
70
71	More information on darcs is available at http://darcs.net
72
73	Please post about zfec to the Tahoe-LAFS mailing list and contribute patches:
74
75	<https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev>
76
77
78	Overview
79	--------
80
81	This package performs two operations, encoding and decoding. Encoding takes
82	some input data and expands its size by producing extra "check blocks", also
83	called "secondary blocks". Decoding takes some data -- any combination of
84	blocks of the original data (called "primary blocks") and "secondary blocks",
85	and produces the original data.
86
87	The encoding is parameterized by two integers, k and m. m is the total
88	number of blocks produced, and k is how many of those blocks are necessary to
89	reconstruct the original data. m is required to be at least 1 and at most
90	256, and k is required to be at least 1 and at most m.
91
92	(Note that when k == m then there is no point in doing erasure coding -- it
93	degenerates to the equivalent of the Unix "split" utility which simply splits
94	the input into successive segments. Similarly, when k == 1 it degenerates to
95	the equivalent of the unix "cp" utility -- each block is a complete copy of
96	the input data.)
97
98	Note that each "primary block" is a segment of the original data, so its size
99	is 1/k'th of the size of original data, and each "secondary block" is of the
100	same size, so the total space used by all the blocks is m/k times the size of
101	the original data (plus some padding to fill out the last primary block to be
102	the same size as all the others). In addition to the data contained in the
103	blocks themselves there are also a few pieces of metadata which are necessary
104	for later reconstruction. Those pieces are: 1. the value of K, 2. the
105	value of M, 3. the sharenum of each block, 4. the number of bytes of
106	padding that were used. The "zfec" command-line tool compresses these pieces
107	of data and prepends them to the beginning of each share, so each the
108	sharefile produced by the "zfec" command-line tool is between one and four
109	bytes larger than the share data alone.
110
111	The decoding step requires as input k of the blocks which were produced by
112	the encoding step. The decoding step produces as output the data that was
113	earlier input to the encoding step.
114
115
116	Command-Line Tool
117	-----------------
118
119	The bin/ directory contains two Unix-style, command-line tools "zfec" and
120	"zunfec". Execute ``zfec --help`` or ``zunfec --help`` for usage
121	instructions.
122
123
124	Performance
125	-----------
126
127	To run the benchmarks, execute the included bench/bench_zfec.py script with
128	optional --k= and --m= arguments.
129
130	On my Athlon 64 2.4 GHz workstation (running Linux), the "zfec" command-line
131	tool encoded a 160 MB file with m=100, k=94 (about 6% redundancy) in 3.9
132	seconds, where the "par2" tool encoded the file with about 6% redundancy in
133	27 seconds. zfec encoded the same file with m=12, k=6 (100% redundancy) in
134	4.1 seconds, where par2 encoded it with about 100% redundancy in 7 minutes
135	and 56 seconds.
136
137	The underlying C library in benchmark mode encoded from a file at about 4.9
138	million bytes per second and decoded at about 5.8 million bytes per second.
139
140	On Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file
141	at about 6.2 million bytes per second.
142
143	On my even fancier Intel Mac laptop (2.33 GHz Core Duo), it encoded from a
144	file at about 6.8 million bytes per second.
145
146	On my old PowerPC G4 867 MHz Mac laptop, it encoded from a file at about 1.3
147	million bytes per second.
148
149	Here is a paper analyzing the performance of various erasure codes and their
150	implementations, including zfec:
151
152	http://www.usenix.org/events/fast09/tech/full_papers/plank/plank.pdf
153
154	Zfec shows good performance on different machines and with different values
155	of K and M. It also has a nice small memory footprint.
156
157
158	API
159	---
160
161	Each block is associated with "blocknum". The blocknum of each primary block
162	is its index (starting from zero), so the 0'th block is the first primary
163	block, which is the first few bytes of the file, the 1'st block is the next
164	primary block, which is the next few bytes of the file, and so on. The last
165	primary block has blocknum k-1. The blocknum of each secondary block is an
166	arbitrary integer between k and 255 inclusive. (When using the Python API,
167	if you don't specify which secondary blocks you want when invoking encode(),
168	then it will by default provide the blocks with ids from k to m-1 inclusive.)
169
170	- C API
171
172	fec_encode() takes as input an array of k pointers, where each pointer
173	points to a memory buffer containing the input data (i.e., the i'th buffer
174	contains the i'th primary block). There is also a second parameter which
175	is an array of the blocknums of the secondary blocks which are to be
176	produced. (Each element in that array is required to be the blocknum of a
177	secondary block, i.e. it is required to be >= k and < m.)
178
179	The output from fec_encode() is the requested set of secondary blocks which
180	are written into output buffers provided by the caller.
181
182	Note that this fec_encode() is a "low-level" API in that it requires the
183	input data to be provided in a set of memory buffers of exactly the right
184	sizes. If you are starting instead with a single buffer containing all of
185	the data then please see easyfec.py's "class Encoder" as an example of how
186	to split a single large buffer into the appropriate set of input buffers
187	for fec_encode(). If you are starting with a file on disk, then please see
188	filefec.py's encode_file_stringy_easyfec() for an example of how to read
189	the data from a file and pass it to "class Encoder". The Python interface
190	provides these higher-level operations, as does the Haskell interface. If
191	you implement functions to do these higher-level tasks in other languages,
192	please send a patch to tahoe-dev@tahoe-lafs.org so that your API can be
193	included in future releases of zfec.
194
195	fec_decode() takes as input an array of k pointers, where each pointer
196	points to a buffer containing a block. There is also a separate input
197	parameter which is an array of blocknums, indicating the blocknum of each
198	of the blocks which is being passed in.
199
200	The output from fec_decode() is the set of primary blocks which were
201	missing from the input and had to be reconstructed. These reconstructed
202	blocks are written into output buffers provided by the caller.
203
204
205	- Python API
206
207	encode() and decode() take as input a sequence of k buffers, where a
208	"sequence" is any object that implements the Python sequence protocol (such
209	as a list or tuple) and a "buffer" is any object that implements the Python
210	buffer protocol (such as a string or array). The contents that are
211	required to be present in these buffers are the same as for the C API.
212
213	encode() also takes a list of desired blocknums. Unlike the C API, the
214	Python API accepts blocknums of primary blocks as well as secondary blocks
215	in its list of desired blocknums. encode() returns a list of buffer
216	objects which contain the blocks requested. For each requested block which
217	is a primary block, the resulting list contains a reference to the
218	apppropriate primary block from the input list. For each requested block
219	which is a secondary block, the list contains a newly created string object
220	containing that block.
221
222	decode() also takes a list of integers indicating the blocknums of the
223	blocks being passed int. decode() returns a list of buffer objects which
224	contain all of the primary blocks of the original data (in order). For
225	each primary block which was present in the input list, then the result
226	list simply contains a reference to the object that was passed in the input
227	list. For each primary block which was not present in the input, the
228	result list contains a newly created string object containing that primary
229	block.
230
231	Beware of a "gotcha" that can result from the combination of mutable data
232	and the fact that the Python API returns references to inputs when
233	possible.
234
235	Returning references to its inputs is efficient since it avoids making an
236	unnecessary copy of the data, but if the object which was passed as input
237	is mutable and if that object is mutated after the call to zfec returns,
238	then the result from zfec -- which is just a reference to that same object
239	-- will also be mutated. This subtlety is the price you pay for avoiding
240	data copying. If you don't want to have to worry about this then you can
241	simply use immutable objects (e.g. Python strings) to hold the data that
242	you pass to zfec.
243
244	- Haskell API
245
246	The Haskell code is fully Haddocked, to generate the documentation, run
247	``runhaskell Setup.lhs haddock``.
248
249
250	Utilities
251	---------
252
253	The filefec.py module has a utility function for efficiently reading a file
254	and encoding it piece by piece. This module is used by the "zfec" and
255	"zunfec" command-line tools from the bin/ directory.
256
257
258	Dependencies
259	------------
260
261	A C compiler is required. To use the Python API or the command-line tools a
262	Python interpreter is also required. We have tested it with Python v2.4,
263	v2.5, v2.6, and v2.7. For the Haskell interface, GHC >= 6.8.1 is required.
264
265
266	Acknowledgements
267	----------------
268
269	Thanks to the author of the original fec lib, Luigi Rizzo, and the folks that
270	contributed to it: Phil Karn, Robert Morelos-Zaragoza, Hari Thirumoorthy, and
271	Dan Rubenstein. Thanks to the Mnet hackers who wrote an earlier Python
272	wrapper, especially Myers Carpenter and Hauke Johannknecht. Thanks to Brian
273	Warner and Amber O'Whielacronx for help with the API, documentation,
274	debugging, compression, and unit tests. Thanks to Adam Langley for improving
275	the C API and contributing the Haskell API. Thanks to the creators of GCC
276	(starting with Richard M. Stallman) and Valgrind (starting with Julian
277	Seward) for a pair of excellent tools. Thanks to my coworkers at Allmydata
278	-- http://allmydata.com -- Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian
279	Warner, Zandr Milewski, Justin Boreta, Mark Meras for sponsoring this work
280	and releasing it under a Free Software licence. Thanks to Jack Lloyd, Samuel
281	Neves, and David-Sarah Hopwood.
282
283
284	Related Works
285	-------------
286
287	Note: a Unix-style tool like "zfec" does only one thing -- in this case
288	erasure coding -- and leaves other tasks to other tools. Other Unix-style
289	tools that go well with zfec include `GNU tar`_ for archiving multiple files
290	and directories into one file, `lzip`_ for compression, and `GNU Privacy
291	Guard`_ for encryption or `b2sum`_ for integrity. It is important to do
292	things in order: first archive, then compress, then either encrypt or
293	integrity-check, then erasure code. Note that if GNU Privacy Guard is used
294	for privacy, then it will also ensure integrity, so the use of b2sum is
295	unnecessary in that case. Note also that you also need to do integrity
296	checking (such as with b2sum) on the blocks that result from the erasure
297	coding in addition to doing it on the file contents! (There are two
298	different subtle failure modes -- see "more than one file can match an
299	immutable file cap" on the `Hack Tahoe-LAFS!`_ Hall of Fame.)
300
301	The `Tahoe-LAFS`_ project uses zfec as part of a complete distributed
302	filesystem with integrated encryption, integrity, remote distribution of the
303	blocks, directory structure, backup of changed files or directories, access
304	control, immutable files and directories, proof-of-retrievability, and repair
305	of damaged files and directories.
306
307	`fecpp`_ is an alternative to zfec. It implements a bitwise-compatible
308	algorithm to zfec and is BSD-licensed.
309
310	.. _GNU tar: http://directory.fsf.org/project/tar/
311	.. _lzip: http://www.nongnu.org/lzip/lzip.html
312	.. _GNU Privacy Guard: http://gnupg.org/
313	.. _b2sum: https://blake2.net/
314	.. _Tahoe-LAFS: https://tahoe-lafs.org
315	.. _Hack Tahoe-LAFS!: https://tahoe-lafs.org/hacktahoelafs/
316	.. _fecpp: http://www.randombit.net/code/fecpp/
317
318
319	Enjoy!
320
321	Zooko Wilcox-O'Hearn
322
323	2013-05-15
324
325	Boulder, Colorado

Note: See TracBrowser for help on using the repository browser.

Download in other formats: