source: trunk/docs/specifications/file-encoding.rst

Last change on this file was 0bebbe3, checked in by Zooko Wilcox-O'Hearn <zooko@…>, at 2013-11-11T22:04:11Z

fix warnings from rst2html

Apparently the in-line link syntax with "<>" in them causes these warnings. I
don't know why. I changed them all to a slightly more verbose syntax.

Thanks to Mike Kazantzsev's review comment
(, I
moved the links to the end of each section.

  • Property mode set to 100644
File size: 7.8 KB
1.. -*- coding: utf-8-with-signature -*-
4File Encoding
7When the client wishes to upload an immutable file, the first step is to
8decide upon an encryption key. There are two methods: convergent or random.
9The goal of the convergent-key method is to make sure that multiple uploads
10of the same file will result in only one copy on the grid, whereas the
11random-key method does not provide this "convergence" feature.
13The convergent-key method computes the SHA-256d hash of a single-purpose tag,
14the encoding parameters, a "convergence secret", and the contents of the
15file. It uses a portion of the resulting hash as the AES encryption key.
16There are security concerns with using convergence this approach (the
17"partial-information guessing attack", please see ticket #365 for some
18references), so Tahoe uses a separate (randomly-generated) "convergence
19secret" for each node, stored in NODEDIR/private/convergence . The encoding
20parameters (k, N, and the segment size) are included in the hash to make sure
21that two different encodings of the same file will get different keys. This
22method requires an extra IO pass over the file, to compute this key, and
23encryption cannot be started until the pass is complete. This means that the
24convergent-key method will require at least two total passes over the file.
26The random-key method simply chooses a random encryption key. Convergence is
27disabled, however this method does not require a separate IO pass, so upload
28can be done with a single pass. This mode makes it easier to perform
29streaming upload.
31Regardless of which method is used to generate the key, the plaintext file is
32encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is
33then erasure-coded and uploaded to the servers. Two hashes of the ciphertext
34are generated as the encryption proceeds: a flat hash of the whole
35ciphertext, and a Merkle tree. These are used to verify the correctness of
36the erasure decoding step, and can be used by a "verifier" process to make
37sure the file is intact without requiring the decryption key.
39The encryption key is hashed (with SHA-256d and a single-purpose tag) to
40produce the "Storage Index". This Storage Index (or SI) is used to identify
41the shares produced by the method described below. The grid can be thought of
42as a large table that maps Storage Index to a ciphertext. Since the
43ciphertext is stored as erasure-coded shares, it can also be thought of as a
44table that maps SI to shares.
46Anybody who knows a Storage Index can retrieve the associated ciphertext:
47ciphertexts are not secret.
49.. image:: file-encoding1.svg
51The ciphertext file is then broken up into segments. The last segment is
52likely to be shorter than the rest. Each segment is erasure-coded into a
53number of "blocks". This takes place one segment at a time. (In fact,
54encryption and erasure-coding take place at the same time, once per plaintext
55segment). Larger segment sizes result in less overhead overall, but increase
56both the memory footprint and the "alacrity" (the number of bytes we have to
57receive before we can deliver validated plaintext to the user). The current
58default segment size is 128KiB.
60One block from each segment is sent to each shareholder (aka leaseholder,
61aka landlord, aka storage node, aka peer). The "share" held by each remote
62shareholder is nominally just a collection of these blocks. The file will
63be recoverable when a certain number of shares have been retrieved.
65.. image:: file-encoding2.svg
67The blocks are hashed as they are generated and transmitted. These
68block hashes are put into a Merkle hash tree. When the last share has been
69created, the merkle tree is completed and delivered to the peer. Later, when
70we retrieve these blocks, the peer will send many of the merkle hash tree
71nodes ahead of time, so we can validate each block independently.
73The root of this block hash tree is called the "block root hash" and
74used in the next step.
76.. image:: file-encoding3.svg
78There is a higher-level Merkle tree called the "share hash tree". Its leaves
79are the block root hashes from each share. The root of this tree is called
80the "share root hash" and is included in the "URI Extension Block", aka UEB.
81The ciphertext hash and Merkle tree are also put here, along with the
82original file size, and the encoding parameters. The UEB contains all the
83non-secret values that could be put in the URI, but would have made the URI
84too big. So instead, the UEB is stored with the share, and the hash of the
85UEB is put in the URI.
87The URI then contains the secret encryption key and the UEB hash. It also
88contains the basic encoding parameters (k and N) and the file size, to make
89download more efficient (by knowing the number of required shares ahead of
90time, sufficient download queries can be generated in parallel).
92The URI (also known as the immutable-file read-cap, since possessing it
93grants the holder the capability to read the file's plaintext) is then
94represented as a (relatively) short printable string like so::
96 URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000
98.. image:: file-encoding4.svg
100During download, when a peer begins to transmit a share, it first transmits
101all of the parts of the share hash tree that are necessary to validate its
102block root hash. Then it transmits the portions of the block hash tree
103that are necessary to validate the first block. Then it transmits the
104first block. It then continues this loop: transmitting any portions of the
105block hash tree to validate block#N, then sending block#N.
107.. image:: file-encoding5.svg
109So the "share" that is sent to the remote peer actually consists of three
110pieces, sent in a specific order as they become available, and retrieved
111during download in a different order according to when they are needed.
113The first piece is the blocks themselves, one per segment. The last
114block will likely be shorter than the rest, because the last segment is
115probably shorter than the rest. The second piece is the block hash tree,
116consisting of a total of two SHA-1 hashes per block. The third piece is a
117hash chain from the share hash tree, consisting of log2(numshares) hashes.
119During upload, all blocks are sent first, followed by the block hash
120tree, followed by the share hash chain. During download, the share hash chain
121is delivered first, followed by the block root hash. The client then uses
122the hash chain to validate the block root hash. Then the peer delivers
123enough of the block hash tree to validate the first block, followed by
124the first block itself. The block hash chain is used to validate the
125block, then it is passed (along with the first block from several other
126peers) into decoding, to produce the first segment of crypttext, which is
127then decrypted to produce the first segment of plaintext, which is finally
128delivered to the user.
130.. image:: file-encoding6.svg
136All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson
137and Schneier). All hashes use a single-purpose tag, e.g. the hash that
138converts an encryption key into a storage index is defined as follows::
140 SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)
142When two separate values need to be combined together in a hash, we wrap each
143in a netstring.
145Using SHA-256d (instead of plain SHA-256) guards against length-extension
146attacks. Using the tag protects our Merkle trees against attacks in which the
147hash of a leaf is confused with a hash of two children (allowing an attacker
148to generate corrupted data that nevertheless appears to be valid), and is
149simply good "cryptograhic hygiene". The `“Chosen Protocol Attack” by Kelsey,
150Schneier, and Wagner`_ is relevant. Putting the tag in a netstring guards
151against attacks that seek to confuse the end of the tag with the beginning of
152the subsequent value.
154.. _“Chosen Protocol Attack” by Kelsey, Schneier, and Wagner:
Note: See TracBrowser for help on using the repository browser.