[Imported from Trac: page StorageIndex, version 1]

warner 2007-06-29 19:17:16 +00:00
parent 425a77f207
commit 8da955937e

13
StorageIndex.md Normal file

@ -0,0 +1,13 @@
The term "storage index" is used in tahoe to refer to a value (generally the output of a SHA-256 hash) which points to a given piece of data. This index is used for two purposes: to select the set of peers that will be queried, and to pass to those peers when retrieving the data.
The data in question may be an erasure-coded share, or the index of a directory node, or something else. When used for [CHK files]CHKFile, each file has a separate [StorageIndex](StorageIndex), which is used to get access to a collection of "share buckets". When used for [DirectoryNodes](DirectoryNode), each dirnode has a separate [StorageIndex](StorageIndex), but the read-only and read-write views of a given dirnode point to the same [StorageIndex](StorageIndex).
For distributed data, the [StorageIndex](StorageIndex) is used in the [ConsistentPermutation](ConsistentPermutation) algorithm to prioritize a list of peers. The intent is that the data referenced by the index is most likely to exist on the top-priority peers in this list. The index is then sent to each peer on that list, to ask them if they do indeed have the corresponding data.
For centralized data, the [StorageIndex](StorageIndex) is simply sent to the server which hosts that data, where it is generally turned into a string and used to locate a file or directory on disk, which contains the data in question.
In capability terms, the [StorageIndex](StorageIndex) represents the authority to see the encrypted form of the corresponding data.
In earlier designs, the [VerifierId](VerifierId) was used for this purpose, but we've since realized that this is not always desireable (in particular it requires that we know the full contents of the file before we can allocate buckets, whereas we might be willing to give up convergence to reduce the memory+storage footprint of a web-based streaming upload). Now we say that in earlier releases, we always set the [StorageIndex](StorageIndex) equal to the [VerifierId](VerifierId), but in newer releases it is free to be whatever value we like.
In practice, to reduce the amount of data we need to keep around in the URI, the [StorageIndex](StorageIndex) is derived by hashing some stronger capability. For example, for [CHKFiles]CHKFile, the [StorageIndex](StorageIndex) is the hash of the readkey, so that anyone who knows the decryption key is also able to retrieve the encrypted data that it operations upon. However, a verifier (who only knows the [StorageIndex](StorageIndex)) is only able to deal with the encrypted data, not the plaintext. Likewise, for [DirectoryNodes](DirectoryNode), the [StorageIndex](StorageIndex) is derived by hashing the readkey, which is itself derived by hashing the writekey. This establishes a chain of successively-weaker capabilities.