source: trunk/docs/performance.rst

Last change on this file was c49aa44, checked in by Jean-Paul Calderone <exarkun@…>, at 2023-03-22T13:04:15Z

Update the raw number and give a reference for interpretation

  • Property mode set to 100644
File size: 7.7 KB
Line 
1.. -*- coding: utf-8-with-signature -*-
2
3============================================
4Performance costs for some common operations
5============================================
6
71.  `Publishing an A-byte immutable file`_
82.  `Publishing an A-byte mutable file`_
93.  `Downloading B bytes of an A-byte immutable file`_
104.  `Downloading B bytes of an A-byte mutable file`_
115.  `Modifying B bytes of an A-byte mutable file`_
126.  `Inserting/Removing B bytes in an A-byte mutable file`_
137.  `Adding an entry to an A-entry directory`_
148.  `Listing an A entry directory`_
159.  `Checking an A-byte file`_
1610. `Verifying an A-byte file (immutable)`_
1711. `Repairing an A-byte file (mutable or immutable)`_
18
19``K`` indicates the number of shares required to reconstruct the file
20(default: 3)
21
22``N`` indicates the total number of shares produced (default: 10)
23
24``S`` indicates the segment size (default: 128 KiB)
25
26``A`` indicates the number of bytes in a file
27
28``B`` indicates the number of bytes of a file that are being read or
29written
30
31``G`` indicates the number of storage servers on your grid
32
33Most of these cost estimates may have a further constant multiplier: when a
34formula says ``N/K*S``, the cost may actually be ``2*N/K*S`` or ``3*N/K*S``.
35Also note that all references to mutable files are for SDMF-formatted files;
36this document has not yet been updated to describe the MDMF format.
37
38Publishing an ``A``-byte immutable file
39=======================================
40
41when the file is already uploaded
42---------------------------------
43
44If the file is already uploaded with the exact same contents, same
45erasure coding parameters (K, N), and same added convergence secret,
46then it reads the whole file from disk one time while hashing it to
47compute the storage index, then contacts about N servers to ask each
48one to store a share. All of the servers reply that they already have
49a copy of that share, and the upload is done.
50
51disk: A
52
53cpu: ~A
54
55network: ~N
56
57memory footprint: S
58
59when the file is not already uploaded
60-------------------------------------
61
62If the file is not already uploaded with the exact same contents, same
63erasure coding parameters (K, N), and same added convergence secret,
64then it reads the whole file from disk one time while hashing it to
65compute the storage index, then contacts about N servers to ask each
66one to store a share. Then it uploads each share to a storage server.
67
68disk: 2*A
69
70cpu: 2*~A
71
72network: N/K*A
73
74memory footprint: N/K*S
75
76Publishing an ``A``-byte mutable file
77=====================================
78
79cpu: ~A + a large constant for RSA keypair generation
80
81network: A
82
83memory footprint: N/K*A
84
85notes:
86Tahoe-LAFS generates a new RSA keypair for each mutable file that it publishes to a grid.
87This takes around 100 milliseconds on a relatively high-end laptop from 2021.
88
89Part of the process of encrypting, encoding, and uploading a mutable file to a
90Tahoe-LAFS grid requires that the entire file be in memory at once. For larger
91files, this may cause Tahoe-LAFS to have an unacceptably large memory footprint
92(at least when uploading a mutable file).
93
94Downloading ``B`` bytes of an ``A``-byte immutable file
95=======================================================
96
97cpu: ~B
98
99network: B
100
101notes: When Tahoe-LAFS 1.8.0 or later is asked to read an arbitrary
102range of an immutable file, only the S-byte segments that overlap the
103requested range will be downloaded.
104
105(Earlier versions would download from the beginning of the file up
106until the end of the requested range, and then continue to download
107the rest of the file even after the request was satisfied.)
108
109Downloading ``B`` bytes of an ``A``-byte mutable file
110=====================================================
111
112cpu: ~A
113
114network: A
115
116memory footprint: A
117
118notes: As currently implemented, mutable files must be downloaded in
119their entirety before any part of them can be read. We are
120exploring fixes for this; see ticket #393 for more information.
121
122Modifying ``B`` bytes of an ``A``-byte mutable file
123===================================================
124
125cpu: ~A
126
127network: A
128
129memory footprint: N/K*A
130
131notes: If you upload a changed version of a mutable file that you
132earlier put onto your grid with, say, 'tahoe put --mutable',
133Tahoe-LAFS will replace the old file with the new file on the
134grid, rather than attempting to modify only those portions of the
135file that have changed. Modifying a file in this manner is
136essentially uploading the file over again, except that it re-uses
137the existing RSA keypair instead of generating a new one.
138
139Inserting/Removing ``B`` bytes in an ``A``-byte mutable file
140============================================================
141
142cpu: ~A
143
144network: A
145
146memory footprint: N/K*A
147
148notes: Modifying any part of a mutable file in Tahoe-LAFS requires that
149the entire file be downloaded, modified, held in memory while it is
150encrypted and encoded, and then re-uploaded. A future version of the
151mutable file layout ("LDMF") may provide efficient inserts and
152deletes. Note that this sort of modification is mostly used internally
153for directories, and isn't something that the WUI, CLI, or other
154interfaces will do -- instead, they will simply overwrite the file to
155be modified, as described in "Modifying B bytes of an A-byte mutable
156file".
157
158Adding an entry to an ``A``-entry directory
159===========================================
160
161cpu: ~A
162
163network: ~A
164
165memory footprint: N/K*~A
166
167notes: In Tahoe-LAFS, directories are implemented as specialized mutable
168files. So adding an entry to a directory is essentially adding B
169(actually, 300-330) bytes somewhere in an existing mutable file.
170
171Listing an ``A`` entry directory
172================================
173
174cpu: ~A
175
176network: ~A
177
178memory footprint: N/K*~A
179
180notes: Listing a directory requires that the mutable file storing the
181directory be downloaded from the grid. So listing an A entry
182directory requires downloading a (roughly) 330 * A byte mutable
183file, since each directory entry is about 300-330 bytes in size.
184
185Checking an ``A``-byte file
186===========================
187
188cpu: ~G
189
190network: ~G
191
192memory footprint: negligible
193
194notes: To check a file, Tahoe-LAFS queries all the servers that it knows
195about. Note that neither of these values directly depend on the size
196of the file. This is relatively inexpensive, compared to the verify
197and repair operations.
198
199Verifying an A-byte file (immutable)
200====================================
201
202cpu: ~N/K*A
203
204network: N/K*A
205
206memory footprint: N/K*S
207
208notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
209shares that were originally uploaded to the grid and integrity checks
210them. This is (for grids with good redundancy) more expensive than
211downloading an A-byte file, since only a fraction of these shares would
212be necessary to recover the file.
213
214Verifying an A-byte file (mutable)
215==================================
216
217cpu: ~N/K*A
218
219network: N/K*A
220
221memory footprint: N/K*A
222
223notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
224shares that were originally uploaded to the grid and integrity checks
225them. This is (for grids with good redundancy) more expensive than
226downloading an A-byte file, since only a fraction of these shares would
227be necessary to recover the file.
228
229Repairing an ``A``-byte file (mutable or immutable)
230===================================================
231
232cpu: variable, between ~A and ~N/K*A
233
234network: variable; between A and N/K*A
235
236memory footprint (immutable): (1+N/K)*S
237              (SDMF mutable): (1+N/K)*A
238
239notes: To repair a file, Tahoe-LAFS downloads the file, and
240generates/uploads missing shares in the same way as when it initially
241uploads the file.  So, depending on how many shares are missing, this
242can cost as little as a download or as much as a download followed by
243a full upload.
244
245Since SDMF files have only one segment, which must be processed in its
246entirety, repair requires a full-file download followed by a full-file
247upload.
Note: See TracBrowser for help on using the repository browser.