[Imported from Trac: page FAQ, version 112]

AzureCerulean 2015-08-24 17:54:15 +00:00
parent 2ff8107653
commit ce0fb2dd67

10
FAQ.md

@ -14,17 +14,17 @@ A: Zooko wrote [//pipermail/tahoe-dev/2011-July/006560.html a long post about th
**<a name="Q2_what_is_erasure_coding">Q2:</a> "Erasure-coding"? What's that?** **<a name="Q2_what_is_erasure_coding">Q2:</a> "Erasure-coding"? What's that?**
A: You know how with RAID-5 you can lose any one drive and still recover? And there is also something called RAID-6 where you can lose any two drives and still recover. Erasure coding is the generalization of this pattern: you get to configure how many drives you could lose and still recover. You can choose how many drives (actually storage servers) will be used in total, from 1 to 256, and how many storage servers are required to recover all the data, from 1 to however many storage servers there are. We call the number of total servers `N` and the number required `K`, and we write the parameters as "`K-of-N`". A: RAID-5 can lose one drive and RAID-6 can lose two drives and recover. Using a method of data protection in which data is broken into fragments, expanded and codified with redundancies, stored across a selected set of various places or storage servers, Erasure coding (CE). The number of records (storage / servers / nodes) used in total can be chosen from 1 to 256, and the number of storage servers that are required to recover all the data, from 1 to the total number of available storage servers. The number of overall storage servers, we call `N` and the number needed `K` and write the parameters such that it is "`K-of-N`".
This uses an amount of space on each server equal to the total size of your data divided by `K`. This uses an amount of space on each storage server equal to the total size of your data is shared over all `K`.
The default Tahoe-LAFS parameters are `3-of-10`, so the data is spread over 10 different drives, and you can lose any 7 of them and still recover the entire data. This gives much better reliability than comparable RAID setups, at a cost of only 3.3 times the storage space that a single copy takes. It takes about 3.3 times the storage space, because it uses space on each server equal to 1/3 of the size of the data, and there are 10 servers. Tahoe-LAFS having default parameters `3-of-10`, the data is spread over 10 different disks and losing any 7, continue to recover all the data. This is more reliable than comparable RAID arrangements, with a cost of only 3.3 times the storage space that a single copy carries. It takes about 3.3 times the storage space, because it uses space on each server, equal to 1/3 of the size of the data, and there are 10 servers.
"Forward error correction" is another term for erasure coding. "Forward error correction" (FEC) is another term for erasure coding.
Erasure coding should not be confused with "secret sharing", which has the additional security property that fewer than `K` servers cannot recover any information about the data. Tahoe-LAFS' erasure coding does not have this property, and does not need to have it because we rely on secret-key encryption (using a key in the read cap) for confidentiality. Erasure coding should not be confused with "secret sharing", which has the additional security property that fewer than `K` servers cannot recover any information about the data. Tahoe-LAFS' erasure coding does not have this property, and does not need to have it because we rely on secret-key encryption (using a key in the read cap) for confidentiality.
"Information Dispersal Algorithm" (IDA) can refer either to an erasure code or a secret sharing algorithm depending on context, so we prefer not to use that term. "Information Dispersal Algorithm" (IDA) can refer either to erasure code or secret sharing algorithm according to context, so we prefer not to use that term.
**<a name="Q3_disable_encryption">Q3:</a> Is there a way to disable the encryption for content which isn't secret? Won't that save a lot of CPU cycles?** **<a name="Q3_disable_encryption">Q3:</a> Is there a way to disable the encryption for content which isn't secret? Won't that save a lot of CPU cycles?**