Add questions and answers from the mailing list

[Imported from Trac: page FAQ, version 19]
freestorm 2010-05-14 21:09:21 +00:00
parent 6cccef9311
commit 2f74c79bc2

105
FAQ.md

@ -29,6 +29,109 @@ either
or else or else
2. Follow [the standard quickstart instructions](http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/quickstart.html) to get Tahoe-LAFS running on Windows. 2. Follow [the standard quickstart instructions](http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/quickstart.html) to get Tahoe-LAFS running on Windows.
Q. Does Tahoe-LAFS work on Mac OS X? Q: Does Tahoe-LAFS work on Mac OS X?
A: Yes. Follow [the standard quickstart instructions](http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/quickstart.html) on Mac OS X and it will result in a working command-line tool on Mac OS X just as it does on other Unixes. (This includes the Web User Interface, or "WUI". See the instructions for details.) In addition there is code to generate executables and .dmg packages, but this code is not currently working (as of Tahoe-LAFS v1.6.1). See the "mac" targets [in the Makefile]source:Makefile@#L440. A: Yes. Follow [the standard quickstart instructions](http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/quickstart.html) on Mac OS X and it will result in a working command-line tool on Mac OS X just as it does on other Unixes. (This includes the Web User Interface, or "WUI". See the instructions for details.) In addition there is code to generate executables and .dmg packages, but this code is not currently working (as of Tahoe-LAFS v1.6.1). See the "mac" targets [in the Makefile]source:Makefile@#L440.
Q: Can there be more than one storage folder on a storage node? So if a storage server contains 3 drives without RAID, can it use all 3 for storage?
A: Not directly. Each storage server has a single "base directory" which we abbreviate as $BASEDIR. The server keeps all of its shares in a subdirectory named $BASEDIR/storage/shares/ . (Note that you can symlink this to whatever you want: you can run most of the node from one place, and store all the shares somewhere else). Since there's only one such subdirectory, you can only use one filesystem per node.On the other hand, shares are stored in a set of 1024 subdirectories of that one, named $BASEDIR/storage/shares/aa/, $BASEDIR/storage/shares/ab/, etc. If you were to symlink the first third of these to one filesystem, the next third to a second filesystem, etc, (hopefully with a script!), then you'd get about 1/3rd of the shares stored on each disk. The "how much space is available" and space-reservation tools would be confused, but basically everything else should work normally.
Q: Would it make sense to just use RAID-0 and let Tahoe-LAFS deal with the redundancy?
A: The Allmydata grid didn't bother with RAID at all: each Tahoe storage server node used a single spindle.
The "RAID and/or Tahoe" question depends upon how much you trust RAID vs how much you trust Tahoe, and how expensive the different forms of
repair would be. Tahoe can correctly be thought of as a form of "application-level RAID", with more flexibility than the usual RAID0/4/5
styles (I think RAID-0 is equivalent to 1-of-2 encoding, and RAID-5 is like 2-of-3).
Using RAID to achieve your redundancy gets you fairly fast repair, because it's all being handled by a controller that sits right on top of
the raw drive. Tahoe's repair is a lot slower, because it is driven by a client that's examining one file at a time, and since there are a lot of
network roundtrips for each file. Doing a repair of a 1TB RAID-5 drive can easily be finished in a day. If that 1TB drive is filled with a
million Tahoe files, the repair could take a month. On the other hand, many RAID configurations degrade significantly when a drive is lost, and
Tahoe's read performance is nearly unaffected. So repair events may be infrequent enough to just let them happen quietly in the background and
not care much about how long they take.
The optimal choice is a complicated one. Given inputs of:
* how much data will be stored, how it changes over time (inlet rate,churn)<br>
* expected drive failure rate (both single sector errors and complete fail)<br>
* server/datacenter layout, inter/intra-colo bandwidth, costs<br>
* drive/hardware costs<br>
it becomes a tradeoff between money (number of tahoe storage nodes, what sort of RAID [any]if you use for them, how many disks that means, how
much those disks cost, how many computers you need to host them, how much bandwidth you spend doing upload/download/repair), bandwidth costs,
read/write performance, and probability of file loss due to failures happening faster than repair.
In addition, Tahoe's current repair code is not particularly clever: it doesn't put the new shares in exactly the right places, so you can
easily get shares doubled up and not distributed as evenly as if you'd done a single upload. This is being tracked in ticket #610.
Q: Suppose I have a file of 100GB and 2 storage nodes each with 75GB available, will I be able to store the file or does it have to fit
within the realms of a single node?
A: The ability to store the file will depend upon how you set the encoding parameters: you get to choose the tradeoff between expansion (how much
space gets used) and reliability. The default settings are "3-of-10" (very conservative), which means the file is encoded into 10 shares, and
any 3 will be sufficient to reconstruct it. That means each share will be 1/3rd the size of the original file (plus a small overhead, less than
0.5% for large files). For your 100GB file, that means 10 shares, each of which is 33GB in size, which would not fit (it could get two shares
on each server, but it couldn't place all ten, so it would return an error).
But you could set the encoding to 2-of-2, which would give you two 50GB shares, and it would happily put one share on each server. That would
store the file, but it wouldn't give you any redundancy: a failure of either server would prevent you from recovering the file.
You could also set the encoding to 4-of-6, which would generate six 25GB shares, and put three on each server. This would still be vulnerable to
either server being down (since neither server has enough shares to give you the whole file by itself), but would become tolerant to errors in an
individual share (if only one share file were damaged, there are still five other shares, and we only need four). A lot of disk errors affect
only a single file, so there's some benefit to this even if you're still vulnerable to a full disk/server failure.
Q: Do I need to shutdown all clients/servers to add a storage node?
A: No, You can add or remove clients or servers anytime you like. The central "Introducer" is responsible for telling clients and servers
about each other, and it acts as a simple publish-subscribe hub, so everything is very dynamic. Clients re-evaluate the list of available
servers each time they do an upload.
This is great for long-term servers, but can be a bit surprising in the short-term: if you've just started your client and upload a file before
it has a chance to connect to all of the servers, your file may be stored on a small subset of the servers, with less reliability than you
wanted. We're still working on a good way to prevent this while still retaining the dynamic server discovery properties (probably in the form
of a client-side configuration statement that lists all the servers that you expect to connect to, so it can refuse to do an upload until it's
connected to at least those). A list like that might require a client restart when you wanted to add to this "required" list, but we could
implement such a feature without a restart requirement too.
Q: If I had 3 locations each with 5 storage nodes, could I configure the grid to ensure a file is written to each location so that I could handle all
servers at a particular location going down?
A: Not directly. We have a ticket about that one (#467, #302), but it's deeper than it looks and we haven't come to a conclusion on how to
build it.
The current system will try to distribute the shares as widely as possible, using a different pseudo-random permutation for each file, but
it is completely unaware of server properties like "location". If you have more free servers than shares, it will only put one share on any
given server, but you might wind up with more shares in one location
than the others.
For example, if you have 15 servers in three locations A:1/2/3/4/5, B:6/7/8/9/10, C:11/12/13/14/15, and use the default 3-of-10 encoding,
your worst case is winding up with shares on 1/2/3/4/5/6/7/8/9/10, and not use location C at all. The most *likely* case is that you'll wind up
with 3 or 4 shares in each location, but there's nothing in the system to enforce that: it's just shuffling all the servers into a ring,
starting at 0, and assigning shares to servers around and around the ring until all the shares have a home.
There's some math we could do to estimate the probability of things like this, but I'd have to dust off a stats textbook to remember what it is.
(actually, since 15-choose-10 is only 3003).
Ok, so the possibilities are:
(3, 3, 4) 1500<br>
(2, 4, 4) 750<br>
(2, 3, 5) 600<br>
(1, 4, 5) 150<br>
(0, 5, 5) 3<br>
sum = 3003<br>
So you've got a 50% chance of the ideal distribution, and a 1/1000 chance of the worst-case distribution.
Q: Is it possible to modify a mutable file by "patching" it?
So, if I have a file stored and I want to update a section of the file in the middle, is that possible or would be file need to be downloaded,
patched and re-uploaded?
A: Not at present. We've only implemented "Small Distributed Mutable Files" (SDMF) so far, which have the property that the whole file must be
downloaded or uploaded at once. We have plans for "medium" MDMF files, which will fix this. MDMF files are broken into segments (default size
is 128KiB), and you only have to replace the segments that are dirtied by the write, so changing a single byte would only require the upload of
N/k*128KiB or about 440KiB for the default 3-of-10 encoding.
Kevan Carstensen is spending his summer implementing MDMF, thanks to the sponsorship of Google Summer Of Code. Ticket #393 is tracking this work.