Changes between Version 45 and Version 46 of FAQ


Ignore:
Timestamp:
2011-07-21T20:43:10Z (11 years ago)
Author:
zooko
Comment:

many changes, additions, editing

Legend:

Unmodified
Added
Removed
Modified
  • FAQ

    v45 v46  
     1
    12'''Q: What is special about Tahoe-LAFS? Why should anyone care about it instead of [http://tahoe-lafs.org/trac/tahoe/wiki/RelatedProjects#OtherProjects other distributed storage systems]?'''
    23
    34A1: Tahoe-LAFS is the first Free !Software/Open Source storage technology to offer ''provider-independent security''.  ''Provider-independent security'' means that the integrity and confidentiality of your files is guaranteed by mathematics computed on the client side, and is independent of the servers, which may be owned and operated by someone else.  To learn more, read [http://tahoe-lafs.org/source/tahoe/trunk/docs/about.html our one-page explanation].
    45
    5 A2: Tahoe-LAFS provides extremely reliable, fault-tolerant storage. Even if you do not need its security properties, you might want to use Tahoe-LAFS as an extremely reliable storage system. (Tahoe-LAFS's security features do an excellent job of staying out of your way when you don't need them.)
     6A2: Tahoe-LAFS provides reliable, fault-tolerant storage. Even if you do not need its security properties, you might want to use Tahoe-LAFS for extremely reliable storage. (Tahoe-LAFS's security features do a good job of staying out of your way when you don't need them.)
    67
    78'''Q: "Erasure-coding"?  What's that?'''
    89
    9 A: You know how with RAID-5 you can lose any one drive and still recover?  And there is also something called RAID-6 where you can lose any two drives and still recover.  Erasure coding is the generalization of this pattern: you get to configure it for how many drives you could lose and still recover.  Tahoe-LAFS is typically configured to upload each file to 10 different drives, where you can lose any 7 of them and still recover the entire file.  This gives radically better reliability than comparable RAID setups, at a cost of only 3.3 times the storage space that a single copy takes. (This technique is also known as "forward error correction" and as an "information dispersal algorithm".)
     10A: You know how with RAID-5 you can lose any one drive and still recover?  And there is also something called RAID-6 where you can lose any two drives and still recover.  Erasure coding is the generalization of this pattern: you get to configure how many drives you could lose and still recover.  You can choose how many drives (actually storage servers) will be used in total, from 1 to 256, and how many storage servers are required to recover all the data, from 1 to however many storage servers there are.  We call the number of total servers {{{N}}} and the number required {{{K}}}, and we write the parameters as "{{{K-of-N}}}".
    1011
    11 '''[=#Q3_disable_encryption Q3:] Is there a way to disable the encryption phase and just use the encoding on the actual content? Won't that save a lot of CPU cycles?'''
     12This uses an amount of space on each server equal to the total size of your data divided by {{{K}}}.
    1213
    13 A: There isn't currently a way to disable or skip the encryption phase, but if you watch the "Recent Uploads and Downloads" page on your local tahoe-lafs gateway, you'll see that the encryption time is orders of magnitude (yes, plural) smaller than the upload time, so there isn't significant performance to be gained by skipping the encryption. We prefer 'secure by default', so without a compelling reason to allow insecure operation, our plan is to leave encryption turned on all the time.
     14Tahoe-LAFS is typically used with {{{3-of-10}}} parameters, so the data is spread over 10 different drives, and you can lose any 7 of them and still recover the entire data.  This gives much better reliability than comparable RAID setups, at a cost of only 3.3 times the storage space that a single copy takes.  It takes about 3.3 times the storage space, because it uses space on each server needs equal to 1/3 of the size of the data and there are 10 servers.
     15
     16Erasure coding is also known as "forward error correction" and as an "information dispersal algorithm".
     17
     18'''[=#Q3_disable_encryption Q3:] Is there a way to disable the encryption for content which isn't secret? Won't that save a lot of CPU cycles?'''
     19
     20A: There isn't currently a way to disable the encryption, but if you look at the "Recent Uploads and Downloads" page on your local tahoe-lafs gateway, you'll see that the encryption takes a tiny sliver of the total time to upload or download a file, so there isn't significant performance to be gained by skipping the encryption. We prefer 'secure by default', so without a compelling reason to allow insecure operation, our plan is to leave encryption turned on all the time.  Note that because Tahoe-LAFS includes the decryption key in the capability to a file, it is trivial to share or to publish an encrypted file—you just share or publish the capability, and everyone who uses that capability automatically sees the plaintext of the file.
    1421
    1522'''Q: Where should I look for current documentation about the Tahoe-LAFS protocols?'''
     
    1926'''Q: Does Tahoe-LAFS work on embedded devices such as a [http://www.pogoplug.com PogoPlug] or an [http://openwrt.org OpenWRT] router?'''
    2027
    21 A: Yes! François Deppierraz contributes [http://tahoe-lafs.org/buildbot/builders/FranXois%20lenny-armv5tel a buildbot] which shows that Tahoe-LAFS builds and all the unit tests pass on his Intel SS4000-E NAS box running under Debian Squeeze.  Zandr Milewski [http://tahoe-lafs.org/pipermail/tahoe-dev/2009-November/003157.html reported] that it took him only an hour to build, install, and test Tahoe-LAFS on a !PogoPlug.
     28A: Yes. François Deppierraz contributes [http://tahoe-lafs.org/buildbot/builders/FranXois%20lenny-armv5tel a buildbot] which shows that Tahoe-LAFS builds and all the unit tests pass on his Intel SS4000-E NAS box running under Debian Squeeze.  Zandr Milewski [http://tahoe-lafs.org/pipermail/tahoe-dev/2009-November/003157.html reported] that it took him only an hour to build, install, and test Tahoe-LAFS on a !PogoPlug.
    2229
    2330'''Q: Does Tahoe-LAFS work on Windows?'''
    2431
    25 A: Yes.  Follow [http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/quickstart.html the standard quickstart instructions] to get Tahoe-LAFS running on Windows. (There was also an "Allmydata Windows client", but that is not actively maintained at the moment, and relied on some components that are not open-source.)
     32A: Yes.  Follow [http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/quickstart.rst the standard quickstart instructions] to get Tahoe-LAFS running on Windows. (There was also an "Allmydata Windows client", but that is not actively maintained at the moment, and relied on some components that are not open-source.)
    2633
    2734'''Q: Does Tahoe-LAFS work on Mac OS X?'''
    2835
    29 A: Yes.  Follow [http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/quickstart.html the standard quickstart instructions] on Mac OS X and it will result in a working command-line tool on Mac OS X just as it does on other Unixes.
     36A: Yes.  Follow [http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/quickstart.rst the standard quickstart instructions] on Mac OS X and it will result in a working command-line tool on Mac OS X just as it does on other Unixes.
    3037
    3138'''Q: Can there be more than one storage folder on a storage node? So if a storage server contains 3 drives without RAID, can it use all 3 for storage?'''
    3239
    33 A: Not directly. Each storage server has a single "base directory" which we abbreviate as $BASEDIR. The server keeps all of its shares in a subdirectory named $BASEDIR/storage/shares/ . (Note that you can symlink  this to whatever you want: you can run most of the node from one place, and store all the shares somewhere else). Since there's only one such subdirectory, you can only use one filesystem per node.On the other hand, shares are stored in a set of 1024 subdirectories of that one, named $BASEDIR/storage/shares/aa/, $BASEDIR/storage/shares/ab/, etc. If you were to symlink the first third of these to one filesystem, the next third to a second filesystem, etc, (hopefully with a script!), then you'd get about 1/3rd of the shares stored on each disk. The "how much space is available" and space-reservation tools would be confused (including making the 'reserved_space' parameter unusable), but basically everything else should work normally.
     40A: Not directly. Each storage server has a single "base directory" which we term {{{$BASEDIR}}}. The server keeps all of its shares in a subdirectory named {{{$BASEDIR/storage/shares/}}}. (Note that you can symlink this to whatever you want: you can keep the rest of the node's files in one place, and store all the shares somewhere else). Since there's only one such subdirectory, you can only use one filesystem per node. On the other hand, shares are stored in a set of 1024 subdirectories of that one, named {{{$BASEDIR/storage/shares/aa/}}}, {{{$BASEDIR/storage/shares/ab/}}}, etc. If you were to symlink the first third of these to one filesystem, the next third to a second filesystem, etc, (hopefully with a script!), then you'd get about 1/3rd of the shares stored on each disk. The "how much space is available" and space-reservation tools would be confused (including making the {{{reserved_space}}} parameter unusable), but basically everything else should work normally.
    3441
    35 A cleaner solution might be to use LVM instead, which can combine several physical disks (or loop devices consisting of common files) to a single logical volume. This logical volume can then be mounted (not symlinked!) to $BASEDIR/storage. This is also a much more flexible solution; new disks can then be added seamlessly to LVM.
     42A cleaner solution would be to use LVM instead, which can combine several physical disks (or loop devices consisting of common files) to a single logical volume. This logical volume can then be mounted or symlinked to {{{$BASEDIR/storage}}}. This also is a more flexible solution because new disks can then be added seamlessly to the volume with LVM.
    3643
    3744'''Q: Would it make sense to not use any RAID and let Tahoe-LAFS deal with the redundancy?'''
     
    3946A: The Allmydata grid didn't bother with RAID at all: each Tahoe-LAFS storage server node used a single spindle.
    4047
    41 The answer depends on how expensive the different forms of repair would be. Tahoe-LAFS can correctly be thought of as a form of "application-level RAID", with more flexibility than the usual RAID-1/4/5 styles (RAID-1 is equivalent to 1-of-2 encoding, and RAID-5 is like 3-of-4).
     48The optimal layout depends on how expensive the different forms of repair would be. Tahoe-LAFS can correctly be thought of as a form of "application-level RAID", with more flexibility than the usual RAID-1/4/5 styles (RAID-1 is equivalent to {{{1-of-2}}} encoding, and RAID-5 is like {{{3-of-4}}}).
    4249
    43 Using RAID to achieve your redundancy gets you fairly fast repair, because it's all being handled by a controller that sits right on top of
    44 the raw drive. Tahoe-LAFS's repair is a lot slower, because it is driven by a client that's examining one file at a time, and since there are a lot of
    45 network roundtrips for each file. Doing a repair of a 1TB RAID-5 drive can easily be finished in a day. If that 1TB drive is filled with a
    46 million Tahoe-LAFS files, the repair could take a month. On the other hand, many RAID configurations degrade significantly when a drive is lost, and
    47 Tahoe-LAFS's read performance is nearly unaffected. So repair events may be infrequent enough to just let them happen quietly in the background and
    48 not care much about how long they take.
    49 
    50 The optimal choice is a complicated one. Given inputs of:
    51 
    52 * how much data will be stored, how it changes over time (inlet rate,churn)[[BR]]
    53 * expected drive failure rate (both single sector errors and complete fail)[[BR]]
    54 * server/datacenter layout, inter/intra-colo bandwidth, costs[[BR]]
    55 * drive/hardware costs[[BR]]
    56 
    57 it becomes a tradeoff between money (number of Tahoe-LAFS storage nodes, what sort of RAID [if any] you use for them, how many disks that means, how
    58 much those disks cost, how many computers you need to host them, how much bandwidth you spend doing upload/download/repair), bandwidth costs,
    59 read/write performance, and probability of file loss due to failures happening faster than repair.
     50Using RAID for your redundancy gets you fairly fast repair, because it's all being handled by a controller that sits right on top of the raw drive. Tahoe-LAFS's repair is a lot slower, because it is driven by a client that's examining one file at a time, and since there are a lot of network roundtrips for each file. Doing a repair of a 1TB RAID-5 drive can easily be finished in a day. If that 1TB drive is filled with a million Tahoe-LAFS files that are being repaired over a Wide Area Network, the repair could take a month.  On the other hand, many RAID configurations degrade significantly when a drive is lost, and Tahoe-LAFS's read performance is nearly unaffected.  So repair events may be infrequent enough to just let them happen quietly in the background and not care much about how long they take.
    6051
    6152'''Q: Suppose I have a file of 100GB and 2 storage nodes each with 75GB available, will I be able to store the file or does it have to fit
    6253within the realms of a single node?'''
    6354
    64 A: The ability to store the file will depend upon how you set the encoding parameters: you get to choose the tradeoff between expansion (how much
    65 space gets used) and reliability. The default settings are "3-of-10" (very conservative), which means the file is encoded into 10 shares, and
    66 any 3 will be sufficient to reconstruct it. That means each share will be 1/3rd the size of the original file (plus a small overhead, less than
    67 0.5% for large files). For your 100GB file, that means 10 shares, each of which is 33GB in size, which would not fit (it could get two shares
    68 on each server, but it couldn't place all ten, so it would return an error).
     55A: The ability to store the file will depend upon how you set the encoding parameters: you get to choose the tradeoff between expansion (how much space gets used) and reliability. The default settings are {{{3-of-10}}}, which means the file is encoded into 10 shares, and any 3 will be sufficient to reconstruct it. That means each share will be 1/3rd the size of the original file (plus a small overhead, less than 0.5% for large files). For your 100GB file, that means 10 shares, each of which is 33GB in size, which would not fit (it could get two shares on each server, but it couldn't place all ten, so it would return an error).
    6956
    70 But you could set the encoding to 2-of-2, which would give you two 50GB shares, and it would happily put one share on each server. That would
    71 store the file, but it wouldn't give you any redundancy: a failure of either server would prevent you from recovering the file.
     57But you could set the encoding to {{{2-of-2}}}, which would give you two 50GB shares, and it would happily put one share on each server. That would store the file, but it wouldn't give you any redundancy: a failure of either server would prevent you from recovering the file.
    7258
    73 You could also set the encoding to 4-of-6, which would generate six 25GB shares, and put three on each server. This would still be vulnerable to
    74 either server being down (since neither server has enough shares to give you the whole file by itself), but would become tolerant to errors in an
    75 individual share (if only one share file were damaged, there are still five other shares, and we only need four). A lot of disk errors affect
    76 only a single file, so there's some benefit to this even if you're still vulnerable to a full disk/server failure.
     59You could also set the encoding to {{{4-of-6}}}, which would generate six 25GB shares, and put three on each server. This would still be vulnerable to either server being down (since neither server has enough shares to give you the whole file by itself), but would become tolerant to errors in an individual share (if only one share file were damaged, there are still five other shares, and we only need four). A lot of disk errors affect only a single file, so there's some benefit to this even if you're still vulnerable to a full disk/server failure.
    7760
    7861'''Q: Do I need to shutdown all clients/servers to add a storage node?'''
    7962
    80 A: No, You can add or remove clients or servers anytime you like. The central "Introducer" is responsible for telling clients and servers
    81 about each other, and it acts as a simple publish-subscribe hub, so everything is very dynamic. Clients re-evaluate the list of available
    82 servers each time they do an upload.
     63A: No, You can add or remove clients or servers anytime you like. The central "Introducer" is responsible for telling clients and servers about each other, and it acts as a simple publish-subscribe hub, so everything is very dynamic. Clients re-evaluate the list of available servers each time they do an upload.
    8364
    84 This is great for long-term servers, but can be a bit surprising in the short-term: if you've just started your client and upload a file before
    85 it has a chance to connect to all of the servers, your file may be stored on a small subset of the servers, with less reliability than you
    86 wanted. We're still working on a good way to prevent this while still retaining the dynamic server discovery properties (probably in the form
    87 of a client-side configuration statement that lists all the servers that you expect to connect to, so it can refuse to do an upload until it's
    88 connected to at least those). A list like that might require a client restart when you wanted to add to this "required" list, but we could
    89 implement such a feature without a restart requirement too.
     65This is great for long-term servers, but can cause a problem right then the node starts up. if you've just started your client and upload a file before it has a chance to connect to all of the servers, your upload may fail due to insufficient servers. Usually you can just try again (your client will usually have finished connecting to all the servers in the time it takes you to see the error message and click retry).
    9066
    9167'''Q: If I had 3 locations each with 5 storage nodes, could I configure the grid to ensure a file is written to each location so that I could handle all
     
    10076than the others.
    10177
    102 For example, if you have 15 servers in three locations A:1/2/3/4/5, B:6/7/8/9/10, C:11/12/13/14/15, and use the default 3-of-10 encoding,
     78For example, if you have 15 servers in three locations A:1/2/3/4/5, B:6/7/8/9/10, C:11/12/13/14/15, and use the default {{{3-of-10}}} encoding,
    10379your worst case is winding up with shares on 1/2/3/4/5/6/7/8/9/10, and not use location C at all. The most *likely* case is that you'll wind up
    10480with 3 or 4 shares in each location, but there's nothing in the system to enforce that: it's just shuffling all the servers into a ring,
    10581starting at 0, and assigning shares to servers around and around the ring until all the shares have a home.
    10682
    107 There's some math we could do to estimate the probability of things like this, but I'd have to dust off a stats textbook to remember what it is.
    108 (actually, since 15-choose-10 is only 3003).
    109 
    110 Ok, so the possibilities are:
     83The possible distributions of shares into locations (A, B, C) are:
    11184
    11285(3, 3, 4) 1500[[BR]]
     
    12194'''Q: Is it possible to modify a mutable file by "patching" it? Also... if I have a file stored and I want to update a section of the file in the middle, is that possible or would be file need to be downloaded, patched and re-uploaded?'''
    12295
    123 A: Not at present. We've only implemented "Small Distributed Mutable Files" (SDMF) so far, which have the property that the whole file must be
     96A: Not at present. We've implemented only "Small Distributed Mutable Files" (SDMF) so far, which have the property that the whole file must be
    12497downloaded or uploaded at once. We have plans for "medium" MDMF files, which will fix this. MDMF files are broken into segments (default size
    12598is 128KiB), and you only have to replace the segments that are dirtied by the write, so changing a single byte would only require the upload of
    126 N/k*128KiB or about 440KiB for the default 3-of-10 encoding.
     99N/k*128KiB or about 440KiB for the default {{{3-of-10}}} encoding.
    127100
    128 Kevan Carstensen is spending his summer implementing MDMF, thanks to the sponsorship of Google Summer Of Code. Ticket #393 is tracking this work.
     101Kevan Carstensen has implemented MDMF, thanks in part to the sponsorship of Google Summer Of Code. Ticket #393 is tracking this work.
    129102
    130 '''Q: How can tahoe ensures that, every node id is unique ?'''
     103'''Q: How can Tahoe-LAFS ensure that every node ID is unique?'''
    131104
    132 A: The node ID is randomly-generated, so there is no way to guarantee its uniqueness.  However, the ID is long enough that the probability of two randomly-generated IDs colliding is negligible.
     105A: The node ID is the secure hash of the SSL public key certificate of the node.  As long the node's public key is unique and the secure hash function doesn't allow collisions, then the node ID will be unique.
    133106
    134 '''Q: If upload the same file again and again, tahoe will give the same capability string. How is tahoe identifies that the client is same, when i upload files mutiple times, is it based on node id ?'''
     107'''Q: If upload the same file again and again, Tahoe-LAFS will return the same capability. How does Tahoe-LAFS identify that the client is same, when I upload files mutiple times, is it based on node ID?'''
    135108
    136 A: For immutable files this is true.  The capability string is derived from two pieces of information:  The content of the file and the "convergence secret".  By default, the convergence secret is randomly generated by the node when it first starts up, then stored and re-used after that.  So the same file content uploaded from the same node will always have the same cap string.  Uploading the file from a different node with a different convergence secret would result in a different cap string -- and a second copy of the file's contents stored in the grid, though there's no way to tell that the two stored files are the same, because they're encrypted with different keys.
     109A: For immutable files this is true—the resulting capability will be the same each time you upload the same file contents.  The capability is derived from two pieces of information:  The content of the file and the "convergence secret".  By default, the convergence secret is randomly generated by the node when it first starts up, then stored and re-used after that.  So the same file content uploaded from the same node will always have the same cap string.  Uploading the file from a different node with a different convergence secret would result in a different cap string—and in a second copy of the file's contents stored on the grid. If you files you upload to converge (also known as "deduplicate") with files uploaded by someone else, just make sure you're using the same convergence secret as they are.
    137110
    138 '''Q: When i stop a node and start it again, will the node have the same node id as of previous node start ?'''
     111'''Q: If I move the client node base directory to different machine and start the client there, will the node have the same node ID as on the previous machine?'''
    139112
    140 A: Yes.  The node ID is stored in the my_nodeid file in your tahoe directory.
    141 
    142 '''Q: If i move the client node base directory to different maching and start the client there again, will the node have the same node id as of previous machine start ?'''
    143 
    144 A: Yes, as long as you move that my_nodeid file.
     113A: Yes, the node ID is stored in the {{{my_nodeid}}} file in your tahoe base directory, and it is derived from the SSL public/private keypair which is stored in the {{{private}}} subdirectory of the tahoe base directory. As long as you move both of those then the node on the new machine will have the same node ID.
    145114
    146115'''Q: Is it possible to run multiple introducers on the same grid?'''
    147116
    148 A: Faruque Sarker has been working on this as a Google Summer of Code project. His changes are due to be integrated in Tahoe-LAFS v1.9.0. For more information please take a look at ticket #68
     117A: Faruque Sarker has been working on this as a Google Summer of Code project. His changes are blocked due to needing more people to test them, review their code, and write more unit tests. For more information please take a look at ticket #68
    149118
    150 '''Q: Will this thing only run when I tell it to?'''
     119'''Q: Will this thing run only when I tell it to? Will it use up a lot of my network bandwidth, CPU, or RAM?'''
    151120
    152 A: Yes. First of all, it doesn't run except when you tell it to—you start it with {{{tahoe start}}} and stop it with {{{tahoe stop}}}. Secondly, the software doesn't act as a server unless you configure it to do so (it isn't like peer-to-peer software which automatically acts as a server as well as a client). Thirdly, the client doesn't do anything except in response to the user starting an upload or a download (it doesn't do anything automatically or in the background).
     121A: Tahoe-LAFS is designed to be unobtrusive. First of all, it doesn't start at all except when you tell it to—you start it with {{{tahoe start}}} and stop it with {{{tahoe stop}}}. Secondly, the software doesn't act as a server unless you configure it to do so—it isn't like peer-to-peer software which automatically acts as a server as well as a client. Thirdly, the client doesn't do anything except in response to the user starting an upload or a download—it doesn't do anything automatically or in the background. Fourthly, with two minor exceptions described below, the server doesn't do anything either, except in response to clients doing uploads or downloads. Finally, even when the server is actively serving clients it isn't too intensive of a process. It uses between 40 and 56 MB of RAM on a 64-bit Linux server. We used to run eight of them on a single-core 2 GHz Opteron and had plenty of CPU to spare, so it isn't too CPU intensive.
     122
     123The two minor exceptions are that the server periodically inspects all of the ciphertext that it is storing on behalf of clients. It is configured to do this "in the background", by doing it only for a second at a time and waiting for a few seconds in between each step. The intent is that this will not noticably impact other users of the same server. For all the details about when these background processes run and what they do, read the documentation in [http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/storage/crawler.py?annotate=blame&rev=4164 storage/crawler.py] and [http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/storage/expirer.py?annotate=blame&rev=4329 storage/expirer.py].