source: trunk/docs/performance.rst

Last change on this file was 82579ce, checked in by Zooko Wilcox-O'Hearn <zooko@…>, at 2013-11-08T21:08:05Z

magic first line tells emacs to use utf8+bom

Add ".. -*- coding: utf-8-with-signature -*-" to the first line of each .rst
file. This tells emacs to treat the file contents as utf-8, and also to prepend
a so-called utf-8 "bom" marker at the beginning of the file. This patch also
prepends those markers to each of those files.

  • Property mode set to 100644
File size: 7.7 KB
Line 
1.. -*- coding: utf-8-with-signature -*-
2
3============================================
4Performance costs for some common operations
5============================================
6
71.  `Publishing an A-byte immutable file`_
82.  `Publishing an A-byte mutable file`_
93.  `Downloading B bytes of an A-byte immutable file`_
104.  `Downloading B bytes of an A-byte mutable file`_
115.  `Modifying B bytes of an A-byte mutable file`_
126.  `Inserting/Removing B bytes in an A-byte mutable file`_
137.  `Adding an entry to an A-entry directory`_
148.  `Listing an A entry directory`_
159.  `Checking an A-byte file`_
1610. `Verifying an A-byte file (immutable)`_
1711. `Repairing an A-byte file (mutable or immutable)`_
18
19``K`` indicates the number of shares required to reconstruct the file
20(default: 3)
21
22``N`` indicates the total number of shares produced (default: 10)
23
24``S`` indicates the segment size (default: 128 KiB)
25
26``A`` indicates the number of bytes in a file
27
28``B`` indicates the number of bytes of a file that are being read or
29written
30
31``G`` indicates the number of storage servers on your grid
32
33Most of these cost estimates may have a further constant multiplier: when a
34formula says ``N/K*S``, the cost may actually be ``2*N/K*S`` or ``3*N/K*S``.
35Also note that all references to mutable files are for SDMF-formatted files;
36this document has not yet been updated to describe the MDMF format.
37
38Publishing an ``A``-byte immutable file
39=======================================
40
41when the file is already uploaded
42---------------------------------
43
44If the file is already uploaded with the exact same contents, same
45erasure coding parameters (K, N), and same added convergence secret,
46then it reads the whole file from disk one time while hashing it to
47compute the storage index, then contacts about N servers to ask each
48one to store a share. All of the servers reply that they already have
49a copy of that share, and the upload is done.
50
51disk: A
52
53cpu: ~A
54
55network: ~N
56
57memory footprint: S
58
59when the file is not already uploaded
60-------------------------------------
61
62If the file is not already uploaded with the exact same contents, same
63erasure coding parameters (K, N), and same added convergence secret,
64then it reads the whole file from disk one time while hashing it to
65compute the storage index, then contacts about N servers to ask each
66one to store a share. Then it uploads each share to a storage server.
67
68disk: 2*A
69
70cpu: 2*~A
71
72network: N/K*A
73
74memory footprint: N/K*S
75
76Publishing an ``A``-byte mutable file
77=====================================
78
79cpu: ~A + a large constant for RSA keypair generation
80
81network: A
82
83memory footprint: N/K*A
84
85notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that it
86publishes to a grid. This takes up to 1 or 2 seconds on a typical desktop PC.
87
88Part of the process of encrypting, encoding, and uploading a mutable file to a
89Tahoe-LAFS grid requires that the entire file be in memory at once. For larger
90files, this may cause Tahoe-LAFS to have an unacceptably large memory footprint
91(at least when uploading a mutable file).
92
93Downloading ``B`` bytes of an ``A``-byte immutable file
94=======================================================
95
96cpu: ~B
97
98network: B
99
100notes: When Tahoe-LAFS 1.8.0 or later is asked to read an arbitrary
101range of an immutable file, only the S-byte segments that overlap the
102requested range will be downloaded.
103
104(Earlier versions would download from the beginning of the file up
105until the end of the requested range, and then continue to download
106the rest of the file even after the request was satisfied.)
107
108Downloading ``B`` bytes of an ``A``-byte mutable file
109=====================================================
110
111cpu: ~A
112
113network: A
114
115memory footprint: A
116
117notes: As currently implemented, mutable files must be downloaded in
118their entirety before any part of them can be read. We are
119exploring fixes for this; see ticket #393 for more information.
120
121Modifying ``B`` bytes of an ``A``-byte mutable file
122===================================================
123
124cpu: ~A
125
126network: A
127
128memory footprint: N/K*A
129
130notes: If you upload a changed version of a mutable file that you
131earlier put onto your grid with, say, 'tahoe put --mutable',
132Tahoe-LAFS will replace the old file with the new file on the
133grid, rather than attempting to modify only those portions of the
134file that have changed. Modifying a file in this manner is
135essentially uploading the file over again, except that it re-uses
136the existing RSA keypair instead of generating a new one.
137
138Inserting/Removing ``B`` bytes in an ``A``-byte mutable file
139============================================================
140
141cpu: ~A
142
143network: A
144
145memory footprint: N/K*A
146
147notes: Modifying any part of a mutable file in Tahoe-LAFS requires that
148the entire file be downloaded, modified, held in memory while it is
149encrypted and encoded, and then re-uploaded. A future version of the
150mutable file layout ("LDMF") may provide efficient inserts and
151deletes. Note that this sort of modification is mostly used internally
152for directories, and isn't something that the WUI, CLI, or other
153interfaces will do -- instead, they will simply overwrite the file to
154be modified, as described in "Modifying B bytes of an A-byte mutable
155file".
156
157Adding an entry to an ``A``-entry directory
158===========================================
159
160cpu: ~A
161
162network: ~A
163
164memory footprint: N/K*~A
165
166notes: In Tahoe-LAFS, directories are implemented as specialized mutable
167files. So adding an entry to a directory is essentially adding B
168(actually, 300-330) bytes somewhere in an existing mutable file.
169
170Listing an ``A`` entry directory
171================================
172
173cpu: ~A
174
175network: ~A
176
177memory footprint: N/K*~A
178
179notes: Listing a directory requires that the mutable file storing the
180directory be downloaded from the grid. So listing an A entry
181directory requires downloading a (roughly) 330 * A byte mutable
182file, since each directory entry is about 300-330 bytes in size.
183
184Checking an ``A``-byte file
185===========================
186
187cpu: ~G
188
189network: ~G
190
191memory footprint: negligible
192
193notes: To check a file, Tahoe-LAFS queries all the servers that it knows
194about. Note that neither of these values directly depend on the size
195of the file. This is relatively inexpensive, compared to the verify
196and repair operations.
197
198Verifying an A-byte file (immutable)
199====================================
200
201cpu: ~N/K*A
202
203network: N/K*A
204
205memory footprint: N/K*S
206
207notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
208shares that were originally uploaded to the grid and integrity checks
209them. This is (for grids with good redundancy) more expensive than
210downloading an A-byte file, since only a fraction of these shares would
211be necessary to recover the file.
212
213Verifying an A-byte file (mutable)
214==================================
215
216cpu: ~N/K*A
217
218network: N/K*A
219
220memory footprint: N/K*A
221
222notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
223shares that were originally uploaded to the grid and integrity checks
224them. This is (for grids with good redundancy) more expensive than
225downloading an A-byte file, since only a fraction of these shares would
226be necessary to recover the file.
227
228Repairing an ``A``-byte file (mutable or immutable)
229===================================================
230
231cpu: variable, between ~A and ~N/K*A
232
233network: variable; between A and N/K*A
234
235memory footprint (immutable): (1+N/K)*S
236              (SDMF mutable): (1+N/K)*A
237
238notes: To repair a file, Tahoe-LAFS downloads the file, and
239generates/uploads missing shares in the same way as when it initially
240uploads the file.  So, depending on how many shares are missing, this
241can cost as little as a download or as much as a download followed by
242a full upload.
243
244Since SDMF files have only one segment, which must be processed in its
245entirety, repair requires a full-file download followed by a full-file
246upload.
Note: See TracBrowser for help on using the repository browser.