new page about immutable encoding design
[Imported from Trac: page NewImmutableEncodingDesign, version 1]
parent
7fb81cf2ab
commit
f7db5622c6
101
NewImmutableEncodingDesign.md
Normal file
101
NewImmutableEncodingDesign.md
Normal file
|
@ -0,0 +1,101 @@
|
|||
The [NewCapDesign](NewCapDesign) page describes desired features of the next filecap design.
|
||||
This page is for designing the encoding format for these new immutable files.
|
||||
|
||||
# Features
|
||||
|
||||
* as described on [NewCapDesign#filecaplength,](NewCapDesign#filecaplength,) we probably need 128bit
|
||||
confidentiality "C" bits, 256bit integrity "I" bits, and 128bit
|
||||
storage-collision resistance. There are encoding schemes that can combine
|
||||
the C and I bits (at the expense of convergence, or certain forms of
|
||||
offline attenutation).
|
||||
* we may define a "server-selection-index" (which is used to permute or
|
||||
otherwise narrow the list of servers to be used) to be separate from the
|
||||
"storage-index" (which is used to identify a specific share on whichever
|
||||
servers we actually talk to). This may involve a separate field in the
|
||||
filecap, or it may continue to be derived from the storage index.
|
||||
* some encoding schemes allow the readcap to be attenuated to a verifycap
|
||||
offline
|
||||
* in general, we don't care how long the verifycap is
|
||||
* the server should be able to validate the entire share by itself, without
|
||||
the readcap. In general, this means that the storage-index must also be
|
||||
the verifycap.
|
||||
* note that this implies that the storage-index cannot be computed until
|
||||
the end of encoding, when all shares have been generated, the share hash
|
||||
tree has been built, and its root has been added to the UEB.
|
||||
* this implies that we can't use the storage-index to detect convergence
|
||||
with earlier uploads of the same file. To retain convergence may require
|
||||
a lookup table on the server (mapping hash-of-readkey to storage-index,
|
||||
or something)
|
||||
* it also implies that storage-index can't be used as a
|
||||
server-selection-index, which again points to using hash-of-readkey as
|
||||
SSI (to retain convergence of server-selection). Setting the
|
||||
storage-index at the end of upload requires a new uploader protocol,
|
||||
which uses an "upload handle" for the data transfer, and finishes with a
|
||||
"now commit this share to storage-index=X" message.
|
||||
* the original CHK design uses hash-of-readkey as storage-index, which has
|
||||
all these good properties except server-side full share validation.
|
||||
(servers can compare share contents against the UEB, and we could put a
|
||||
copy of the UEB hash into the share, but servers would continue to be
|
||||
unable to make sure the share was in the right place)
|
||||
|
||||
# Options
|
||||
|
||||
note: all cap-length computations assume the integrity-providing "I" field is
|
||||
256bits long, and the confidentiality-providing "C" field is 128bits long. If
|
||||
we decide on different values, the sums below should be updated.
|
||||
|
||||
## One: current CHK design
|
||||
|
||||
Readcaps consist of two main pieces: C bits and I bits, plus:
|
||||
|
||||
* k (which improves the accuracy of the initial number of queries to send
|
||||
out)
|
||||
* N (which improves the guessed upper bound on number of queries to send
|
||||
out, and used to be required by the abandoned [TahoeThree](TahoeThree) algorithm)
|
||||
* filesize (advisory only, used by deep-size measurements in lieu of
|
||||
fetching share data to measure filesize)
|
||||
|
||||
SI = H(C), SSI=SI. Verifycap is SI+I.
|
||||
|
||||
* SSI and SI are known ahead of time, uploader protocol starts with SI
|
||||
* good convergence
|
||||
* long caps (128+256+len(k+N+filesize)) ~= 400bits
|
||||
* server cannot verify entire share
|
||||
|
||||
## Two: Zooko's scheme
|
||||
|
||||
Readcaps contain one crypto value that combines C and I fields. (I forget how
|
||||
this worked.. it was clever, but I think it had some fatal flaw, like not
|
||||
being able to get a storage-index from the readcap without first retrieving
|
||||
shares, or something. One of us will dig up the notes on it and describe it
|
||||
here).
|
||||
|
||||
* short caps
|
||||
* convergence problems
|
||||
|
||||
## Others?
|
||||
|
||||
## Ideas
|
||||
|
||||
It might be possible to have the uploader give two values to the server, at
|
||||
different stages of the upload process, which (together) would allow full
|
||||
validation of the resulting share. Using a single value (the verifycap), as a
|
||||
storage index, would be cleaner, but might not be strictly necessary.
|
||||
|
||||
The servers could maintain a table, mapping from one sort of index to
|
||||
another, if that made it easier for the upload process to proceed (or to
|
||||
achieve convergence). For example, H(readkey) is known at the beginning of
|
||||
upload, but the I bits aren't known until the end. If the client could use
|
||||
SSI=H(readkey) and then ask each server to tell them the storage-index of any
|
||||
shares which used H(readkey), it could achieve convergence and still use the
|
||||
I bits as the storage-index. The servers would be obligated to maintain a
|
||||
table with one entry per bucket (so probably ~20M entries), and
|
||||
errors/malicious behavior in this table would cause convergence failures
|
||||
(which are hardly fatal).
|
||||
|
||||
The SSI can be much shorter than the SI. It only needs to be long enough to
|
||||
provide good load-balancing properties. It could be included explicitly in
|
||||
the filecap. Alternate (non-TahoeTwo) peer-selection strategies could encode
|
||||
whatever per-file information they needed into the SSI, assuming some sort of
|
||||
tradeoff between cap length (i.e. SSI length) and work done by the downloader
|
||||
to find the right servers.
|
Loading…
Reference in a new issue