From f7db5622c6240185cd5d34f8dcdc0705add4715f Mon Sep 17 00:00:00 2001 From: warner <> Date: Tue, 25 Aug 2009 09:59:45 +0000 Subject: [PATCH] new page about immutable encoding design [Imported from Trac: page NewImmutableEncodingDesign, version 1] --- NewImmutableEncodingDesign.md | 101 ++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 NewImmutableEncodingDesign.md diff --git a/NewImmutableEncodingDesign.md b/NewImmutableEncodingDesign.md new file mode 100644 index 0000000..f3ef5b0 --- /dev/null +++ b/NewImmutableEncodingDesign.md @@ -0,0 +1,101 @@ +The [NewCapDesign](NewCapDesign) page describes desired features of the next filecap design. +This page is for designing the encoding format for these new immutable files. + +# Features + + * as described on [NewCapDesign#filecaplength,](NewCapDesign#filecaplength,) we probably need 128bit + confidentiality "C" bits, 256bit integrity "I" bits, and 128bit + storage-collision resistance. There are encoding schemes that can combine + the C and I bits (at the expense of convergence, or certain forms of + offline attenutation). + * we may define a "server-selection-index" (which is used to permute or + otherwise narrow the list of servers to be used) to be separate from the + "storage-index" (which is used to identify a specific share on whichever + servers we actually talk to). This may involve a separate field in the + filecap, or it may continue to be derived from the storage index. + * some encoding schemes allow the readcap to be attenuated to a verifycap + offline + * in general, we don't care how long the verifycap is + * the server should be able to validate the entire share by itself, without + the readcap. In general, this means that the storage-index must also be + the verifycap. + * note that this implies that the storage-index cannot be computed until + the end of encoding, when all shares have been generated, the share hash + tree has been built, and its root has been added to the UEB. + * this implies that we can't use the storage-index to detect convergence + with earlier uploads of the same file. To retain convergence may require + a lookup table on the server (mapping hash-of-readkey to storage-index, + or something) + * it also implies that storage-index can't be used as a + server-selection-index, which again points to using hash-of-readkey as + SSI (to retain convergence of server-selection). Setting the + storage-index at the end of upload requires a new uploader protocol, + which uses an "upload handle" for the data transfer, and finishes with a + "now commit this share to storage-index=X" message. + * the original CHK design uses hash-of-readkey as storage-index, which has + all these good properties except server-side full share validation. + (servers can compare share contents against the UEB, and we could put a + copy of the UEB hash into the share, but servers would continue to be + unable to make sure the share was in the right place) + +# Options + +note: all cap-length computations assume the integrity-providing "I" field is +256bits long, and the confidentiality-providing "C" field is 128bits long. If +we decide on different values, the sums below should be updated. + +## One: current CHK design + +Readcaps consist of two main pieces: C bits and I bits, plus: + + * k (which improves the accuracy of the initial number of queries to send + out) + * N (which improves the guessed upper bound on number of queries to send + out, and used to be required by the abandoned [TahoeThree](TahoeThree) algorithm) + * filesize (advisory only, used by deep-size measurements in lieu of + fetching share data to measure filesize) + +SI = H(C), SSI=SI. Verifycap is SI+I. + + * SSI and SI are known ahead of time, uploader protocol starts with SI + * good convergence + * long caps (128+256+len(k+N+filesize)) ~= 400bits + * server cannot verify entire share + +## Two: Zooko's scheme + +Readcaps contain one crypto value that combines C and I fields. (I forget how +this worked.. it was clever, but I think it had some fatal flaw, like not +being able to get a storage-index from the readcap without first retrieving +shares, or something. One of us will dig up the notes on it and describe it +here). + + * short caps + * convergence problems + +## Others? + +## Ideas + +It might be possible to have the uploader give two values to the server, at +different stages of the upload process, which (together) would allow full +validation of the resulting share. Using a single value (the verifycap), as a +storage index, would be cleaner, but might not be strictly necessary. + +The servers could maintain a table, mapping from one sort of index to +another, if that made it easier for the upload process to proceed (or to +achieve convergence). For example, H(readkey) is known at the beginning of +upload, but the I bits aren't known until the end. If the client could use +SSI=H(readkey) and then ask each server to tell them the storage-index of any +shares which used H(readkey), it could achieve convergence and still use the +I bits as the storage-index. The servers would be obligated to maintain a +table with one entry per bucket (so probably ~20M entries), and +errors/malicious behavior in this table would cause convergence failures +(which are hardly fatal). + +The SSI can be much shorter than the SI. It only needs to be long enough to +provide good load-balancing properties. It could be included explicitly in +the filecap. Alternate (non-TahoeTwo) peer-selection strategies could encode +whatever per-file information they needed into the SSI, assuming some sort of +tradeoff between cap length (i.e. SSI length) and work done by the downloader +to find the right servers.