tahoe-lafs/trac-2024-07-25

make the storage index be the verifier cap #654

New issue

Open

opened 2009-03-04 23:01:46 +00:00 by zooko · 3 comments

zooko commented

2009-03-04 23:01:46 +00:00

Owner

As I discuss in this mailing list post, we could use the verifycap for the purpose of a storage index.

The big advantage of this to me is reducing the number of concepts by one. This would prevent, for example, misunderstandings such as Shawn Willden's misapprehension about overwriting shares (to which my letter is a response).

Another advantage would be that the storage server (as well as anyone else) could verify that the share is a fitting share for that storage index. This would neatly solve all questions about the correctness of storage indices, such as:

race conditions on upload (such as what if one storage server sees that you are beginning to upload a file with storage index X, and then turns around and starts uploading a different file with storage index X on all the other storage servers before you get to them),
life cycle of the share ; now that we're introducing share death in a future release of tahoe, we have to revisit our earlier assumptions about "once there always there" within a given storage server

This might also allow greater similarity between the immutable and mutable share storage protocols, if both of them used the verify cap as the storage index. The mutable case has much worse issues about security and consistency, of course, and my current assumption is that it, too, could be strengthened and simplified by requiring the storage server to verify the correctness of each share. (Although simpler from some perspectives, this would actually be more complicated for the storage servers because they would have to understand enough about the share layout to verify the correctness. Also it would cost quite a bit of CPU to perform the digital signature checks on the mutable shares.)

I vaguely recall that Brian pointed out some significant added problems or issues with this approach, so hopefully he'll follow up on the list or this ticket and remind me what they were.

As I discuss in [this mailing list post](http://allmydata.org/pipermail/tahoe-dev/2009-March/001388.html), we could use the verifycap for the purpose of a storage index. The big advantage of this to me is *reducing the number of concepts by one*. This would prevent, for example, misunderstandings such as Shawn Willden's misapprehension about overwriting shares (to which my letter is a response). Another advantage would be that the storage server (as well as anyone else) could verify that the share is a fitting share for that storage index. This would neatly solve all questions about the correctness of storage indices, such as: * race conditions on upload (such as what if one storage server sees that you are beginning to upload a file with storage index X, and then turns around and starts uploading a different file with storage index X on all the other storage servers before you get to them), * life cycle of the share ; now that we're introducing share death in a future release of tahoe, we have to revisit our earlier assumptions about "once there always there" within a given storage server This might also allow greater similarity between the immutable and mutable share storage protocols, if both of them used the verify cap as the storage index. The mutable case has much worse issues about security and consistency, of course, and my current assumption is that it, too, could be strengthened and simplified by requiring the storage server to verify the correctness of each share. (Although simpler from some perspectives, this would actually be more complicated for the storage servers because they would have to understand enough about the share layout to verify the correctness. Also it would cost quite a bit of CPU to perform the digital signature checks on the mutable shares.) I vaguely recall that Brian pointed out some significant added problems or issues with this approach, so hopefully he'll follow up on the list or this ticket and remind me what they were.

tahoe-lafs added the

labels 2009-03-04 23:01:46 +00:00

tahoe-lafs added this to the undecided milestone 2009-03-04 23:01:46 +00:00

warner commented

2009-03-09 00:07:41 +00:00

Author

Owner

In general, I like the idea. I'll have to think about it some more, maybe my
notes have some details of the concerns I had.

The general categories of concerns were:

size of the storage index, possibly affecting size of the filecaps, also
affecting the storage-server's overhead/indexing system. SIs are currently
16 bytes / 128 bits (since they're derived from a 128bit AES key).
how many integrity bits you get out of the storage index: is it enough to
replace/equal the UEB?
at what point of the upload/encoding process you get to learn the storage
index
how much work the storage servers are obligated to do for each storage
operation
how much information storage servers have to do local verification of
shares

I'd love a scheme that allowed the servers to validate their own shares but
which didn't obligate them to do so right away.

For our DSA mutable file design, if we had intermediate keys, I think we were
able to make the storage index do exactly what we wanted: every bit of the
storage index can be used to validate the key, so the server can validate the
whole share (and check its signature) all by itself. This enables several
things: write-enabler-less publishing (server accepts the write iff the
signature is good and the seqnum is higher than the old share), local
background Verifier passes (to detect disk errors), buddy-verification
(servers find other servers, check each other's shares). If we can't have
intermediate keys, I think we have a design that will still allow some
portion of the storage index to be used for this purpose, but not the whole
thing.

(I think we could even find a way to switch to pubkey-based-SI for our
existing RSA-based mutable files and still provide backwards compatibility:
basically have the server keep a table which maps from pubkey-hash to SI, and
add an API that looks for shares by pubkey-hash instead of by SI).

For immutable files, I think the prognosis was less cheerful. The storage
index (as it stands today) is used for two purposes: peer selection and share
indexing. The random distribution of the SI (since it is derived by hashing
the writekey) plus the hash-driven permutation of our peer-selection
algorithm gives us load-balancing, or as Zandr likes to put it,
"cryptographically strong load balancing". (note that this is balancing the
inlet rate: the amount of data that is given to each server per unit time..
this may not quite be what you want, since you might want your large servers
to fill at a proportionally-faster rate than your smaller servers). Once you
know which servers to talk to first, the SI is used to reference a specific
share on those servers.

The problem was that any integrity information we get out of an immutable
file won't be known to us until we've finished encoding the file. So we can't
use it for peer-selection, since (to avoid buffering the entire file locally)
we must perform peer-selection before encoding. (we're already considering
switching from CHK to random-keys to avoid the streaming-unfriendly
hash-the-content-to-get-the-key-and-storageindex pass).

Now, there are good arguments to allow alternative peer-selection schemes (as
we've discussed in section 3 of source:docs/specifications/outline.txt), but
there are many desireable properties to the approach we're using now. Most
peer-selection schemes that will work for a large number of servers (where
"work" means the downloader usually doesn't have to ask every single server
whether they have a share or not, and hopefully can find enough shares in a
minimum number of roundtrips) require some sort of peer-selection-index to be
encoded in the filecap.

One approach we discussed was a split index, in which the filecap has two
index fields: one for peer-selection, and a second for share-on-peer. The
peer-selection part could be randomly generated: it doesn't need to be
cryptographically secure, merely long enough to give us good load-balancing
properties.. 10 or 20 bits would probably be enough. (and note that it
wouldn't necessarily have to involve a permuted list: we could pick a random
10-bit starting point on the ring, then select servers in strict clockwise
nodeid order, or something). The share-on-peer part (which is what the server
thinks of as a storage index) could be determined after the encoding process,
and told to the server in the final close() message that commits the finished
share. This would involve a storage server protocol which has some sort of
temporary upload handle (so subsequent messages could refer to the
previously-uploaded partial share fragments with something other than the
final storage index), but that's not hard to build, and might give us some
useful resume-interrupted-upload properties too.

Hmm, here's a scheme that might work: make the peer-selection-index be a hash
of the readkey, and the share-on-peer index be the UEB hash. This would let
us perform peer selection as quickly as we do now (i.e. one pass for CHK, or
zero passes for random-key). Filecaps would remain the same length (although
they'd need a new prefix, of course). Verify caps would be just a prefix plus
the UEB hash (and k+N+size, probably).

The earlier scheme I was thinking of (in which the filecap would need an
extra field for the peer-selection index) had some downsides, but I don't yet
see any in this scheme. The only one I can think of it that it would obligate
us to have some portion of the filecap (the readkey) which can't be used for
integrity checking (since it needs to be generated before we've encoded the
file), but we're already obligated to have that (the readkey is needed to
encode the file in the first place).

Well, and we lose a little bit of convergence: if I upload the same file
(using CHK) as someone in my convergence domain already uploaded before,
they'll wind up with the same filecap (and therefore peer-selection-index and
storage-index) as before, but I won't be able to learn of that fact (and thus
avoid doing the duplicate upload) until the end of encoding, when the
UEBhash/storage-index is generated. I guess this means we should record a
full-cryptographic-length peer-selection-index with each share, maintain a
table that maps from peer-selection-index to UEBhash/storage-index, and have
the servers do a lookup at allocate() time. If allocate() tells us that they
already have a share for that peer-selection-index, we can ask it for the
UEBhash, then download enough data to compare the file we're thinking about
uploading against the shares that are claimed, and see if the upload can be
skipped. Hm, it might also work to use this peer-selection-index as the
"upload-id", for resuming an upload later.

So, a summary of how we could implement this for immutable files:

when encoding/uploading:
construct the readkey (whether by CHK or randomly, doesn't matter)
hash the readkey to get the "peer-selection-index", use it to find
storage servers who will accept a share of the necessary size (but don't
tell the servers the final storage index yet)
- servers might respond to the allocate(peer-selection-index) request
  with news that they already have shares for that index, and will
  return the UEBhash/storage-index for those shares
encrypt/encode the file, writing shares to servers, building hash trees
compute/write the UEB and its hash
tell storage servers to commit the finished share to a storage-index
equal to the UEB hash
record readkey+UEBhash into the filecap as usual
when downloading:
hash the readkey to get the peer-selection index, build the permuted peer
list
ask servers in the permuted list if they have a share with storage-index
== UEBhash
download/decode shares as usual
new interfaces/formats we'd need:
new URI:CHK2: filecap format, to indicate that downloaders should use
hash(readkey) as peer-selection-index and UEB hash as storage-index
(instead of using hash(readkey) for both)
new server upload API:
"allocate" takes an peer-selection-index/upload-id rather than a
storage-index, maintains a table that maps from that to storage-index.
writebucket.close takes a storage-index argument to tell it where to
file the finished share
maybe some sort of resume-upload method which takes a
peer-selection-index/upload-id
new server-side share version number, to tell the server that the
storage index (i.e. bucketdir name) should be the same as the embedded
UEB hash. Older shares cannot be validated this way (the contents can be
validated against the embedded UEB hash, but that hash cannot be checked
against anything).

In general, I like the idea. I'll have to think about it some more, maybe my notes have some details of the concerns I had. The general categories of concerns were: * size of the storage index, possibly affecting size of the filecaps, also affecting the storage-server's overhead/indexing system. SIs are currently 16 bytes / 128 bits (since they're derived from a 128bit AES key). * how many integrity bits you get out of the storage index: is it enough to replace/equal the UEB? * at what point of the upload/encoding process you get to learn the storage index * how much work the storage servers are obligated to do for each storage operation * how much information storage servers have to do local verification of shares I'd love a scheme that allowed the servers to validate their own shares but which didn't obligate them to do so right away. For our DSA mutable file design, if we had intermediate keys, I think we were able to make the storage index do exactly what we wanted: every bit of the storage index can be used to validate the key, so the server can validate the whole share (and check its signature) all by itself. This enables several things: write-enabler-less publishing (server accepts the write iff the signature is good and the seqnum is higher than the old share), local background Verifier passes (to detect disk errors), buddy-verification (servers find other servers, check each other's shares). If we can't have intermediate keys, I think we have a design that will still allow some portion of the storage index to be used for this purpose, but not the whole thing. (I think we could even find a way to switch to pubkey-based-SI for our existing RSA-based mutable files and still provide backwards compatibility: basically have the server keep a table which maps from pubkey-hash to SI, and add an API that looks for shares by pubkey-hash instead of by SI). For immutable files, I think the prognosis was less cheerful. The storage index (as it stands today) is used for two purposes: peer selection and share indexing. The random distribution of the SI (since it is derived by hashing the writekey) plus the hash-driven permutation of our peer-selection algorithm gives us load-balancing, or as Zandr likes to put it, "cryptographically strong load balancing". (note that this is balancing the inlet rate: the amount of data that is given to each server per unit time.. this may not quite be what you want, since you might want your large servers to fill at a proportionally-faster rate than your smaller servers). Once you know which servers to talk to first, the SI is used to reference a specific share on those servers. The problem was that any integrity information we get out of an immutable file won't be known to us until we've finished encoding the file. So we can't use it for peer-selection, since (to avoid buffering the entire file locally) we must perform peer-selection before encoding. (we're already considering switching from CHK to random-keys to avoid the streaming-unfriendly hash-the-content-to-get-the-key-and-storageindex pass). Now, there are good arguments to allow alternative peer-selection schemes (as we've discussed in section 3 of source:docs/specifications/outline.txt), but there are many desireable properties to the approach we're using now. Most peer-selection schemes that will work for a large number of servers (where "work" means the downloader usually doesn't have to ask every single server whether they have a share or not, and hopefully can find enough shares in a minimum number of roundtrips) require some sort of peer-selection-index to be encoded in the filecap. One approach we discussed was a split index, in which the filecap has two index fields: one for peer-selection, and a second for share-on-peer. The peer-selection part could be randomly generated: it doesn't need to be cryptographically secure, merely long enough to give us good load-balancing properties.. 10 or 20 bits would probably be enough. (and note that it wouldn't necessarily have to involve a permuted list: we could pick a random 10-bit starting point on the ring, then select servers in strict clockwise nodeid order, or something). The share-on-peer part (which is what the server thinks of as a storage index) could be determined after the encoding process, and told to the server in the final close() message that commits the finished share. This would involve a storage server protocol which has some sort of temporary upload handle (so subsequent messages could refer to the previously-uploaded partial share fragments with something other than the final storage index), but that's not hard to build, and might give us some useful resume-interrupted-upload properties too. Hmm, here's a scheme that might work: make the peer-selection-index be a hash of the readkey, and the share-on-peer index be the UEB hash. This would let us perform peer selection as quickly as we do now (i.e. one pass for CHK, or zero passes for random-key). Filecaps would remain the same length (although they'd need a new prefix, of course). Verify caps would be just a prefix plus the UEB hash (and k+N+size, probably). The earlier scheme I was thinking of (in which the filecap would need an extra field for the peer-selection index) had some downsides, but I don't yet see any in this scheme. The only one I can think of it that it would obligate us to have some portion of the filecap (the readkey) which can't be used for integrity checking (since it needs to be generated before we've encoded the file), but we're already obligated to have that (the readkey is needed to encode the file in the first place). Well, and we lose a little bit of convergence: if I upload the same file (using CHK) as someone in my convergence domain already uploaded before, they'll wind up with the same filecap (and therefore peer-selection-index and storage-index) as before, but I won't be able to learn of that fact (and thus avoid doing the duplicate upload) until the end of encoding, when the UEBhash/storage-index is generated. I guess this means we should record a full-cryptographic-length peer-selection-index with each share, maintain a table that maps from peer-selection-index to UEBhash/storage-index, and have the servers do a lookup at allocate() time. If allocate() tells us that they already have a share for that peer-selection-index, we can ask it for the UEBhash, then download enough data to compare the file we're thinking about uploading against the shares that are claimed, and see if the upload can be skipped. Hm, it might also work to use this peer-selection-index as the "upload-id", for resuming an upload later. So, a summary of how we could implement this for immutable files: * when encoding/uploading: * construct the readkey (whether by CHK or randomly, doesn't matter) * hash the readkey to get the "peer-selection-index", use it to find storage servers who will accept a share of the necessary size (but don't tell the servers the final storage index yet) * servers might respond to the allocate(peer-selection-index) request with news that they already have shares for that index, and will return the UEBhash/storage-index for those shares * encrypt/encode the file, writing shares to servers, building hash trees * compute/write the UEB and its hash * tell storage servers to commit the finished share to a storage-index equal to the UEB hash * record readkey+UEBhash into the filecap as usual * when downloading: * hash the readkey to get the peer-selection index, build the permuted peer list * ask servers in the permuted list if they have a share with storage-index == UEBhash * download/decode shares as usual * new interfaces/formats we'd need: * new URI:CHK2: filecap format, to indicate that downloaders should use hash(readkey) as peer-selection-index and UEB hash as storage-index (instead of using hash(readkey) for both) * new server upload API: * "allocate" takes an peer-selection-index/upload-id rather than a storage-index, maintains a table that maps from that to storage-index. * writebucket.close takes a storage-index argument to tell it where to file the finished share * maybe some sort of resume-upload method which takes a peer-selection-index/upload-id * new server-side share version number, to tell the server that the storage index (i.e. bucketdir name) should be the same as the embedded UEB hash. Older shares cannot be validated this way (the contents can be validated against the embedded UEB hash, but that hash cannot be checked against anything).

warner commented

2009-03-09 00:12:21 +00:00

Author

Owner

Also, we might want new server-side share file format, to record the peer-selection-index on the bucket label (the same place that holds the leases). This would be used to rebuild the table from the sharefiles, since we consider the sharefiles to be canonical and all other tables to be caches or performance-improving indices. The peer-selection-index would not be verified like the rest of the share (making it even more appropriate to put on the outside of the container rather than the inside).

note that the peer-selection-index table offers some games to an attacker: they could upload a share for file A and pretend that it has the peer-selection-index for file B, with the goal to disrupt someone who is trying to upload file B (and are incorrectly told that the server already has a share for that file, which requires downloading the entire share to verify). I suspect that this is not a very large problem, though. Also, we might want new server-side share file format, to record the peer-selection-index on the bucket label (the same place that holds the leases). This would be used to rebuild the table from the sharefiles, since we consider the sharefiles to be canonical and all other tables to be caches or performance-improving indices. The peer-selection-index would not be verified like the rest of the share (making it even more appropriate to put on the outside of the container rather than the inside).

zooko commented

2009-03-11 22:19:34 +00:00

Author

Owner

Ooh, here's a blast from the past. I just noticed that ticket #5 was "verifierid as storage index: not the whole story". :-) it was closed as fixed on 2007-09-25.