new page about immutable encoding design
[Imported from Trac: page NewImmutableEncodingDesign, version 1]
parent
7fb81cf2ab
commit
f7db5622c6
101
NewImmutableEncodingDesign.md
Normal file
101
NewImmutableEncodingDesign.md
Normal file
|
@ -0,0 +1,101 @@
|
||||||
|
The [NewCapDesign](NewCapDesign) page describes desired features of the next filecap design.
|
||||||
|
This page is for designing the encoding format for these new immutable files.
|
||||||
|
|
||||||
|
# Features
|
||||||
|
|
||||||
|
* as described on [NewCapDesign#filecaplength,](NewCapDesign#filecaplength,) we probably need 128bit
|
||||||
|
confidentiality "C" bits, 256bit integrity "I" bits, and 128bit
|
||||||
|
storage-collision resistance. There are encoding schemes that can combine
|
||||||
|
the C and I bits (at the expense of convergence, or certain forms of
|
||||||
|
offline attenutation).
|
||||||
|
* we may define a "server-selection-index" (which is used to permute or
|
||||||
|
otherwise narrow the list of servers to be used) to be separate from the
|
||||||
|
"storage-index" (which is used to identify a specific share on whichever
|
||||||
|
servers we actually talk to). This may involve a separate field in the
|
||||||
|
filecap, or it may continue to be derived from the storage index.
|
||||||
|
* some encoding schemes allow the readcap to be attenuated to a verifycap
|
||||||
|
offline
|
||||||
|
* in general, we don't care how long the verifycap is
|
||||||
|
* the server should be able to validate the entire share by itself, without
|
||||||
|
the readcap. In general, this means that the storage-index must also be
|
||||||
|
the verifycap.
|
||||||
|
* note that this implies that the storage-index cannot be computed until
|
||||||
|
the end of encoding, when all shares have been generated, the share hash
|
||||||
|
tree has been built, and its root has been added to the UEB.
|
||||||
|
* this implies that we can't use the storage-index to detect convergence
|
||||||
|
with earlier uploads of the same file. To retain convergence may require
|
||||||
|
a lookup table on the server (mapping hash-of-readkey to storage-index,
|
||||||
|
or something)
|
||||||
|
* it also implies that storage-index can't be used as a
|
||||||
|
server-selection-index, which again points to using hash-of-readkey as
|
||||||
|
SSI (to retain convergence of server-selection). Setting the
|
||||||
|
storage-index at the end of upload requires a new uploader protocol,
|
||||||
|
which uses an "upload handle" for the data transfer, and finishes with a
|
||||||
|
"now commit this share to storage-index=X" message.
|
||||||
|
* the original CHK design uses hash-of-readkey as storage-index, which has
|
||||||
|
all these good properties except server-side full share validation.
|
||||||
|
(servers can compare share contents against the UEB, and we could put a
|
||||||
|
copy of the UEB hash into the share, but servers would continue to be
|
||||||
|
unable to make sure the share was in the right place)
|
||||||
|
|
||||||
|
# Options
|
||||||
|
|
||||||
|
note: all cap-length computations assume the integrity-providing "I" field is
|
||||||
|
256bits long, and the confidentiality-providing "C" field is 128bits long. If
|
||||||
|
we decide on different values, the sums below should be updated.
|
||||||
|
|
||||||
|
## One: current CHK design
|
||||||
|
|
||||||
|
Readcaps consist of two main pieces: C bits and I bits, plus:
|
||||||
|
|
||||||
|
* k (which improves the accuracy of the initial number of queries to send
|
||||||
|
out)
|
||||||
|
* N (which improves the guessed upper bound on number of queries to send
|
||||||
|
out, and used to be required by the abandoned [TahoeThree](TahoeThree) algorithm)
|
||||||
|
* filesize (advisory only, used by deep-size measurements in lieu of
|
||||||
|
fetching share data to measure filesize)
|
||||||
|
|
||||||
|
SI = H(C), SSI=SI. Verifycap is SI+I.
|
||||||
|
|
||||||
|
* SSI and SI are known ahead of time, uploader protocol starts with SI
|
||||||
|
* good convergence
|
||||||
|
* long caps (128+256+len(k+N+filesize)) ~= 400bits
|
||||||
|
* server cannot verify entire share
|
||||||
|
|
||||||
|
## Two: Zooko's scheme
|
||||||
|
|
||||||
|
Readcaps contain one crypto value that combines C and I fields. (I forget how
|
||||||
|
this worked.. it was clever, but I think it had some fatal flaw, like not
|
||||||
|
being able to get a storage-index from the readcap without first retrieving
|
||||||
|
shares, or something. One of us will dig up the notes on it and describe it
|
||||||
|
here).
|
||||||
|
|
||||||
|
* short caps
|
||||||
|
* convergence problems
|
||||||
|
|
||||||
|
## Others?
|
||||||
|
|
||||||
|
## Ideas
|
||||||
|
|
||||||
|
It might be possible to have the uploader give two values to the server, at
|
||||||
|
different stages of the upload process, which (together) would allow full
|
||||||
|
validation of the resulting share. Using a single value (the verifycap), as a
|
||||||
|
storage index, would be cleaner, but might not be strictly necessary.
|
||||||
|
|
||||||
|
The servers could maintain a table, mapping from one sort of index to
|
||||||
|
another, if that made it easier for the upload process to proceed (or to
|
||||||
|
achieve convergence). For example, H(readkey) is known at the beginning of
|
||||||
|
upload, but the I bits aren't known until the end. If the client could use
|
||||||
|
SSI=H(readkey) and then ask each server to tell them the storage-index of any
|
||||||
|
shares which used H(readkey), it could achieve convergence and still use the
|
||||||
|
I bits as the storage-index. The servers would be obligated to maintain a
|
||||||
|
table with one entry per bucket (so probably ~20M entries), and
|
||||||
|
errors/malicious behavior in this table would cause convergence failures
|
||||||
|
(which are hardly fatal).
|
||||||
|
|
||||||
|
The SSI can be much shorter than the SI. It only needs to be long enough to
|
||||||
|
provide good load-balancing properties. It could be included explicitly in
|
||||||
|
the filecap. Alternate (non-TahoeTwo) peer-selection strategies could encode
|
||||||
|
whatever per-file information they needed into the SSI, assuming some sort of
|
||||||
|
tradeoff between cap length (i.e. SSI length) and work done by the downloader
|
||||||
|
to find the right servers.
|
Loading…
Reference in a new issue