Version 70 (modified by zooko, at 2012-03-23T13:55:48Z) (diff) |
---|
Q1: What is special about Tahoe-LAFS? Why should anyone care about it instead of other distributed storage systems?
A1: Tahoe-LAFS is the first Free Software/Open Source storage technology to offer provider-independent security. Provider-independent security means that the integrity and confidentiality of your files is guaranteed by mathematics computed on the client side, and is independent of the servers, which may be owned and operated by someone else. To learn more, read our one-page explanation.
A2: Tahoe-LAFS provides reliable, fault-tolerant storage. Even if you do not need its security properties, you might want to use Tahoe-LAFS for extremely reliable storage. (Tahoe-LAFS's security features do a good job of staying out of your way when you don't need them.)
Q1.5: What's the difference between Tahoe-LAFS and Freenet?
A: Zooko wrote a long post about that to the tahoe-dev mailing list.
Q2: "Erasure-coding"? What's that?
A: You know how with RAID-5 you can lose any one drive and still recover? And there is also something called RAID-6 where you can lose any two drives and still recover. Erasure coding is the generalization of this pattern: you get to configure how many drives you could lose and still recover. You can choose how many drives (actually storage servers) will be used in total, from 1 to 256, and how many storage servers are required to recover all the data, from 1 to however many storage servers there are. We call the number of total servers N and the number required K, and we write the parameters as "K-of-N".
This uses an amount of space on each server equal to the total size of your data divided by K.
The default Tahoe-LAFS parameters are 3-of-10, so the data is spread over 10 different drives, and you can lose any 7 of them and still recover the entire data. This gives much better reliability than comparable RAID setups, at a cost of only 3.3 times the storage space that a single copy takes. It takes about 3.3 times the storage space, because it uses space on each server needs equal to 1/3 of the size of the data and there are 10 servers.
Erasure coding is also known as "forward error correction" and as an "information dispersal algorithm".
Q3: Is there a way to disable the encryption for content which isn't secret? Won't that save a lot of CPU cycles?
A: There isn't currently a way to disable the encryption, but if you look at the "Recent Uploads and Downloads" page on your local tahoe-lafs gateway, you'll see that the encryption takes a tiny sliver of the total time to upload or download a file, so there isn't significant performance to be gained by skipping the encryption. We prefer 'secure by default', so without a compelling reason to allow insecure operation, our plan is to leave encryption turned on all the time. Note that because Tahoe-LAFS includes the decryption key in the capability to a file, it is trivial to share or to publish an encrypted file—you just share or publish the capability, and everyone who uses that capability automatically sees the plaintext of the file.
Q4: Where should I look for current documentation about the Tahoe-LAFS protocols?
A: https://tahoe-lafs.org/source/tahoe/trunk/docs/architecture.rst
Q5: Does Tahoe-LAFS work on embedded devices such as a PogoPlug or an OpenWRT router?
A: Yes. François Deppierraz contributes a buildbot which shows that Tahoe-LAFS builds and all the unit tests pass on his Intel SS4000-E NAS box running under Debian Squeeze. Zandr Milewski reported that it took him only an hour to build, install, and test Tahoe-LAFS on a PogoPlug.
Q6: Does Tahoe-LAFS work on Windows?
A: Yes. Follow the standard quickstart instructions to get Tahoe-LAFS running on Windows. (There was also an "Allmydata Windows client", but that is not actively maintained at the moment, and relied on some components that are not open-source.)
Q7: Does Tahoe-LAFS work on Mac OS X?
A: Yes. Follow the standard quickstart instructions on Mac OS X and it will result in a working command-line tool on Mac OS X just as it does on other Unixes.
Q8: Can there be more than one storage directory on a storage node? So if a storage server contains 3 drives without RAID, can it use all 3 for storage?
A: Not directly. Each storage server has a single "base directory" which we term $BASEDIR. The server keeps all of its shares in a subdirectory named $BASEDIR/storage/shares/. (Note that you can symlink this to whatever you want: you can keep the rest of the node's files in one place, and store all the shares somewhere else). Since there's only one such subdirectory, you can only use one filesystem per node. On the other hand, shares are stored in a set of 1024 subdirectories of that one, named $BASEDIR/storage/shares/aa/, $BASEDIR/storage/shares/ab/, etc. If you were to symlink the first third of these to one filesystem, the next third to a second filesystem, etc, (hopefully with a script!), then you'd get about 1/3rd of the shares stored on each disk. The "how much space is available" and space-reservation tools would be confused (including making the reserved_space parameter unusable), but basically everything else should work normally.
A cleaner solution would be to use LVM instead, which can combine several physical disks (or loop devices consisting of common files) to a single logical volume. This logical volume can then be mounted or symlinked to $BASEDIR/storage. This also is a more flexible solution because new disks can then be added seamlessly to the volume with LVM.
Q9: Would it make sense to not use any RAID and let Tahoe-LAFS deal with the redundancy?
A: The Allmydata grid didn't bother with RAID at all: each Tahoe-LAFS storage server node used a single spindle.
The optimal layout depends on how expensive the different forms of repair would be. Tahoe-LAFS can correctly be thought of as a form of "application-level RAID", with more flexibility than the usual RAID-1/4/5 styles (RAID-1 is equivalent to 1-of-2 encoding, and RAID-5 is like 3-of-4).
Using RAID for your redundancy gets you fairly fast repair, because it's all being handled by a controller that sits right on top of the raw drive. Tahoe-LAFS's repair is a lot slower, because it is driven by a client that's examining one file at a time, and since there are a lot of network roundtrips for each file. Doing a repair of a 1TB RAID-5 drive can easily be finished in a day. If that 1TB drive is filled with a million Tahoe-LAFS files that are being repaired over a Wide Area Network, the repair could take a month. On the other hand, many RAID configurations degrade significantly when a drive is lost, and Tahoe-LAFS's read performance is nearly unaffected. So repair events may be infrequent enough to just let them happen quietly in the background and not care much about how long they take.
Q10: Suppose I have a file of 100GB and 2 storage nodes each with 75GB available, will I be able to store the file or does it have to fit within the realms of a single node?
A: The ability to store the file will depend upon how you set the encoding parameters: you get to choose the tradeoff between expansion (how much space gets used) and reliability. The default settings are 3-of-10, which means the file is encoded into 10 shares, and any 3 will be sufficient to reconstruct it. That means each share will be 1/3rd the size of the original file (plus a small overhead, less than 0.5% for large files). For your 100GB file, that means 10 shares, each of which is 33GB in size, which would not fit (it could get two shares on each server, but it couldn't place all ten, so it would return an error).
But you could set the encoding to 2-of-2, which would give you two 50GB shares, and it would happily put one share on each server. That would store the file, but it wouldn't give you any redundancy: a failure of either server would prevent you from recovering the file.
You could also set the encoding to 4-of-6, which would generate six 25GB shares, and put three on each server. This would still be vulnerable to either server being down (since neither server has enough shares to give you the whole file by itself), but would become tolerant to errors in an individual share (if only one share file were damaged, there are still five other shares, and we only need four). A lot of disk errors affect only a single file, so there's some benefit to this even if you're still vulnerable to a full disk/server failure.
Q11: Do I need to shutdown all clients/servers to add a storage node?
A: No, You can add or remove clients or servers anytime you like. The central "Introducer" is responsible for telling clients and servers about each other, and it acts as a simple publish-subscribe hub, so everything is very dynamic. Clients re-evaluate the list of available servers each time they do an upload.
This is great for long-term servers, but can cause a problem right then the node starts up. if you've just started your client and upload a file before it has a chance to connect to all of the servers, your upload may fail due to insufficient servers. Usually you can just try again (your client will usually have finished connecting to all the servers in the time it takes you to see the error message and click retry).
Q12: If I had 3 locations each with 5 storage nodes, could I configure the grid to ensure a file is written to each location so that I could handle all servers at a particular location going down?
A: Not directly. We have a ticket about that one (#467, #302), but it's deeper than it looks and we haven't come to a conclusion on how to build it.
The current system will try to distribute the shares as widely as possible, using a different pseudo-random permutation for each file, but it is completely unaware of server properties like "location". If you have more free servers than shares, it will only put one share on any given server, but you might wind up with more shares in one location than the others.
For example, if you have 15 servers in three locations A:1/2/3/4/5, B:6/7/8/9/10, C:11/12/13/14/15, and use the default 3-of-10 encoding, your worst case is winding up with shares on 1/2/3/4/5/6/7/8/9/10, and not use location C at all. The most *likely* case is that you'll wind up with 3 or 4 shares in each location, but there's nothing in the system to enforce that: it's just shuffling all the servers into a ring, starting at 0, and assigning shares to servers around and around the ring until all the shares have a home.
The possible distributions of shares into locations (A, B, C) are:
(3, 3, 4) 1500
(2, 4, 4) 750
(2, 3, 5) 600
(1, 4, 5) 150
(0, 5, 5) 3
sum = 3003
So you've got a 50% chance of the ideal distribution, and a 1/1000 chance of the worst-case distribution.
Q13: Is it possible to modify a mutable file by "patching" it? Also... if I have a file stored and I want to update a section of the file in the middle, is that possible or would be file need to be downloaded, patched and re-uploaded?
A: Not at present. We've implemented only "Small Distributed Mutable Files" (SDMF) so far, which have the property that the whole file must be downloaded or uploaded at once. We have plans for "medium" MDMF files, which will fix this. MDMF files are broken into segments (default size is 128KiB), and you only have to replace the segments that are dirtied by the write, so changing a single byte would only require the upload of N/k*128KiB or about 440KiB for the default 3-of-10 encoding.
Kevan Carstensen has implemented MDMF, thanks in part to the sponsorship of Google Summer Of Code. Ticket #393 is tracking this work.
Q14: How can Tahoe-LAFS ensure that every node ID is unique?
A: The node ID is the secure hash of the SSL public key certificate of the node. As long the node's public key is unique and the secure hash function doesn't allow collisions, then the node ID will be unique.
Q15: If upload the same file again and again, Tahoe-LAFS will return the same capability. How does Tahoe-LAFS identify that the client is same, when I upload files mutiple times, is it based on node ID?
A: For immutable files this is true—the resulting capability will be the same each time you upload the same file contents. The capability is derived from two pieces of information: The content of the file and the "convergence secret". By default, the convergence secret is randomly generated by the node when it first starts up, then stored and re-used after that. So the same file content uploaded from the same node will always have the same cap string. Uploading the file from a different node with a different convergence secret would result in a different cap string—and in a second copy of the file's contents stored on the grid. If you files you upload to converge (also known as "deduplicate") with files uploaded by someone else, just make sure you're using the same convergence secret as they are.
Q16: If I move the client node base directory to different machine and start the client there, will the node have the same node ID as on the previous machine?
A: Yes, the node ID is stored in the my_nodeid file in each node's base directory, and it is derived from the SSL public/private keypair which is stored in private/node.pem relative to the base directory. As long as you move both of those then the node on the new machine will have the same node ID.
If you are moving these files into an existing base directory of a node that has already been run, then you will also need to delete or move aside private/*.furl under that directory, otherwise the node won't start.
Q17: Is it possible to run multiple introducers on the same grid?
A: Faruque Sarker has been working on this as a Google Summer of Code project. His changes are blocked due to needing more people to test them, review their code, and write more unit tests. For more information please take a look at ticket #68
Q18: Will this thing run only when I tell it to? Will it use up a lot of my network bandwidth, CPU, or RAM?
A: Tahoe-LAFS is designed to be unobtrusive. First of all, it doesn't start at all except when you tell it to—you start it with tahoe start and stop it with tahoe stop. Secondly, the software doesn't act as a server unless you configure it to do so—it isn't like peer-to-peer software which automatically acts as a server as well as a client. Thirdly, the client doesn't do anything except in response to the user starting an upload or a download—it doesn't do anything automatically or in the background (this might change in future, to support background repair for example, but probably only if you explicitly enable it). Fourthly, with two minor exceptions described below, the server doesn't do anything either, except in response to clients doing uploads or downloads. Finally, even when the server is actively serving clients it isn't too intensive of a process. It uses between 40 and 56 MB of RAM on a 64-bit Linux server. We used to run eight of them on a single-core 2 GHz Opteron and had plenty of CPU to spare, so it isn't too CPU intensive.
The two minor exceptions are that the server periodically inspects all of the ciphertext that it is storing on behalf of clients. It is configured to do this "in the background", by doing it only for a second at a time and waiting for a few seconds in between each step. The intent is that this will not noticably impact other users of the same server. For all the details about when these background processes run and what they do, read the documentation in storage/crawler.py and storage/expirer.py.
Q19: If a storage server dies and new one is installed, will Tahoe-LAFS automatically generate a new share of each file to store on the new one?
A: Not automatically (see also Q18). There is a repair operation, but it starts only when the use triggers it, by clicking on the "repair" button on the web user interface or running the "tahoe check" command. You can, of course, execute the "tahoe check" command from a script. Kevin Reid posted his cron script with which he has configured his node to repair all files every night.
Q20: What about revoking access to a file or directory?
Please see these mailing list threads:
- Tahoe Access Control
- question about sharing... (especially this message by Brian Warner)
- revocation of read-access to an immutable file
Q21: How come sometimes my client is connected to my server even though the server is behind NAT?
A: Ideally, all clients attempt to open connections to all servers, and all servers attempt to open connections to all clients. So, if the client is not behind NAT, then even if the server is behind NAT. However, this is not currently the case. Currently what it does is that all clients attempt to open connections to all servers, but if there is a connection between two Tahoe-LAFS processes (== Tahoe-LAFS nodes) it can re-use that connection for any client or server in either node. So, when you enable a storage server on the public facing server, that causes the node behind NAT to initiate a TCP connection to the node on the public facing server. Once that connection is established, that enables the node there to *use* the server behind NAT. Related issue: comment:7:ticket:1086
Q22: What are literal caps?
A: Literal caps (or LIT caps) are simply the base32 encoding of the file data, and are used for very small files. The threshold is 55 bytes (source: immutable/upload.py), which is the break-even point at which the LIT filecap is the same length as a typical CHK filecap. They are sufficient (you don't even need network access to turn the LIT filecap into the data), and necessary (if you don't know the filecap for my data, you can't figure out the data). See this mailing list thread:
Literal caps are supported for immutable files and immutable directories (see the Capabilities wiki page). Whenever the contents of the file or directory are small enough that it would be more efficient to fit the contents into the cap itself than the store the contents remotely and use the cap to fetch it, then it becomes a literal cap.
Q23: Can I access files stored in Tahoe-LAFS via FUSE?
A: Yes. Tahoe-LAFS comes with an SFTP server. If you point sshfs at the SFTP server then you have access to Tahoe-LAFS through FUSE. Alternately, pyfilesystem interfaces directly with Tahoe-LAFS through the latter's WAPI and provides both FUSE and Microsoft Windows filesystem access. See #1353 for discussion of possible improvements to FUSE integration. See Zooko's post to freedombox-discuss and Zooko's post to Google+ for Zooko's ramblings about the advisability of using FUSE for distributed filesystems in general and Tahoe-LAFS in particular.
Q24: How I should setup k,h,N on my small private grid?
A: So you decided pubgrid is not for you and volgrid2 is not for you for one reason or another(note that they don't filled yet as of 6 Feb 2012) and want to knew which settings are for k,h,N. Also,assuming at least some of your nodes are on same LAN and others are widely distributed(VPSeses over internet,'recycled servers' from for example atlas networks, etc). And total number of nodes is not too big. And all your nodes under your control(at least - they are VPSes/Servers which are in your/friends you trust to pay in time name).And - if some storage nodes are permanently down - you can reconfigure gateways you use manually and run repair.
Please note, it's much better to have nodes with roughly same space(500Gb and 5Gb in same grid is not a good idea,if you for some odd reason must have them both - read http://bigpig.org/twiki/bin/view/Main/VolunteerGrid2Philosophies and think again,if you still think you must - don't count them in S in below calculations) Let's say S=number_of_nodes), setup k=3,N=S,h=N-2, use N=number_of_nodes_total, h=N-2
following links can also be helpful: https://tahoe-lafs.org/pipermail/tahoe-dev/2011-October/006754.html https://tahoe-lafs.org/pipermail/tahoe-dev/2011-October/006757.html
Is there a process or command to make shares spread to new storage servers?
A: This is called "rebalancing". It isn't currently implemented, but the repair function can accomplish a similar result sometimes. Repair of immutables will upload shares to servers if necessary to reach "servers-of-happiness", which sometimes has the desired effect of uploading shares to newly added servers. Repair of mutables never uploads new shares. Here are tickets about improving rebalancing behavior: #232, #1657, #699, #661, #543.