source: trunk/docs/performance.rst @ 3523b50

Last change on this file since 3523b50 was 3523b50, checked in by Zooko O'Whielacronx <zooko@…>, at 2011-01-04T06:54:55Z

docs: update performance.rst to describe the difference between already-uploaded and not-already-uploaded, to parameterize segment size, and to use "~A" to mean "approximately A"

  • Property mode set to 100644
File size: 6.8 KB
Line 
1============================================
2Performance costs for some common operations
3============================================
4
51.  `Publishing an A-byte immutable file`_
62.  `Publishing an A-byte mutable file`_
73.  `Downloading B bytes of an A-byte immutable file`_
84.  `Downloading B bytes of an A-byte mutable file`_
95.  `Modifying B bytes of an A-byte mutable file`_
106.  `Inserting/Removing B bytes in an A-byte mutable file`_
117.  `Adding an entry to an A-entry directory`_
128.  `Listing an A entry directory`_
139.  `Performing a file-check on an A-byte file`_
1410. `Performing a file-verify on an A-byte file`_
1511. `Repairing an A-byte file (mutable or immutable)`_
16
17``K`` indicates the number of shares required to reconstruct the file
18(default: 3)
19
20``N`` indicates the total number of shares produced (default: 10)
21
22``S`` indicates the segment size (default: 128 KiB)
23
24``A`` indicates the number of bytes in a file
25
26``B`` indicates the number of bytes of a file which are being read or
27written
28
29``G`` indicates the number of storage servers on your grid
30
31Publishing an ``A``-byte immutable file
32=======================================
33
34when the file is already uploaded
35---------------------------------
36
37If the file is already uploaded with the exact same contents, same
38erasure coding parameters (K, N), and same added convergence secret,
39then it reads the whole file from disk one time while hashing it to
40compute the storage index, then contacts about N servers to ask each
41one to store a share. All of the servers reply that they already have
42a copy of that share, and the upload is done.
43
44disk: A
45
46cpu: ~A
47
48network: ~N
49
50memory footprint: N/K*S
51
52when the file is not already uploaded
53-------------------------------------
54
55If the file is not already uploaded with the exact same contents, same
56erasure coding parameters (K, N), and same added convergence secret,
57then it reads the whole file from disk one time while hashing it to
58compute the storage index, then contacts about N servers to ask each
59one to store a share. Then it uploads each share to a storage server.
60
61disk: 2*A
62
63cpu: 2*~A
64
65network: ~N + ~A
66
67memory footprint: N/K*S
68
69Publishing an ``A``-byte mutable file
70=====================================
71
72network: A
73
74memory footprint: N/K*A
75
76cpu: ~A + a large constant for RSA keypair generation
77
78notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that it
79publishes to a grid. This takes up to 1 or 2 seconds on a typical desktop PC.
80
81Part of the process of encrypting, encoding, and uploading a mutable file to a
82Tahoe-LAFS grid requires that the entire file be in memory at once. For larger
83files, this may cause Tahoe-LAFS to have an unacceptably large memory footprint
84(at least when uploading a mutable file).
85
86Downloading ``B`` bytes of an ``A``-byte immutable file
87=======================================================
88
89network: B
90
91cpu: ~A
92
93notes: When Tahoe-LAFS 1.8.0 or later is asked to read an arbitrary
94range of an immutable file, only the S-byte segments that overlap the
95requested range will be downloaded.
96
97(Earlier versions would download from the beginning of the file up
98until the end of the requested range, and then continue to download
99the rest of the file even after the request was satisfied.)
100
101Downloading ``B`` bytes of an ``A``-byte mutable file
102=====================================================
103
104network: A
105
106memory footprint: A
107
108notes: As currently implemented, mutable files must be downloaded in
109their entirety before any part of them can be read. We are
110exploring fixes for this; see ticket #393 for more information.
111
112Modifying ``B`` bytes of an ``A``-byte mutable file
113===================================================
114
115network: A
116
117memory footprint: N/K*A
118
119notes: If you upload a changed version of a mutable file that you
120earlier put onto your grid with, say, 'tahoe put --mutable',
121Tahoe-LAFS will replace the old file with the new file on the
122grid, rather than attempting to modify only those portions of the
123file that have changed. Modifying a file in this manner is
124essentially uploading the file over again, except that it re-uses
125the existing RSA keypair instead of generating a new one.
126
127Inserting/Removing ``B`` bytes in an ``A``-byte mutable file
128============================================================
129
130network: A
131
132memory footprint: N/K*A
133
134notes: Modifying any part of a mutable file in Tahoe-LAFS requires that
135the entire file be downloaded, modified, held in memory while it is
136encrypted and encoded, and then re-uploaded. A future version of the
137mutable file layout ("LDMF") may provide efficient inserts and
138deletes. Note that this sort of modification is mostly used internally
139for directories, and isn't something that the WUI, CLI, or other
140interfaces will do -- instead, they will simply overwrite the file to
141be modified, as described in "Modifying B bytes of an A-byte mutable
142file".
143
144Adding an entry to an ``A``-entry directory
145===========================================
146
147network: ~A
148
149memory footprint: N/K*A
150
151notes: In Tahoe-LAFS, directories are implemented as specialized mutable
152files. So adding an entry to a directory is essentially adding B
153(actually, 300-330) bytes somewhere in an existing mutable file.
154
155Listing an ``A`` entry directory
156================================
157
158network: ~A
159
160memory footprint: N/K*A
161
162notes: Listing a directory requires that the mutable file storing the
163directory be downloaded from the grid. So listing an A entry
164directory requires downloading a (roughly) 330 * A byte mutable
165file, since each directory entry is about 300-330 bytes in size.
166
167Performing a file-check on an ``A``-byte file
168=============================================
169
170network: ~G, where G is the number of servers on your grid
171
172memory footprint: negligible
173
174notes: To check a file, Tahoe-LAFS queries all the servers that it knows
175about. Note that neither of these values directly depend on the size
176of the file. This is relatively inexpensive, compared to the verify
177and repair operations.
178
179Performing a file-verify on an ``A``-byte file
180==============================================
181
182network: N/K*A
183
184memory footprint: N/K*S
185
186notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
187shares that were originally uploaded to the grid and integrity
188checks them. This is, for well-behaved grids, likely to be more
189expensive than downloading an A-byte file, since only a fraction
190of these shares are necessary to recover the file.
191
192Repairing an ``A``-byte file (mutable or immutable)
193===================================================
194
195network: variable; up to around ~A
196
197memory footprint: from S to (1+N/K)*S
198
199notes: To repair a file, Tahoe-LAFS downloads the file, and generates/uploads
200missing shares in the same way as when it initially uploads the file.
201So, depending on how many shares are missing, this can be about as
202expensive as initially uploading the file in the first place.
Note: See TracBrowser for help on using the repository browser.