reorg, copy relevant notes from referenced tickets (unrolling the loops, so to speak)

[Imported from Trac: page NewCapDesign, version 9]
warner 2009-08-25 10:26:54 +00:00
parent f7db5622c6
commit e5a2340879

@ -6,8 +6,9 @@ across separate tickets: this page is here to consolidate them. We should not
release a new filecap format without checking it against everything on this release a new filecap format without checking it against everything on this
list. list.
There will be a related pair of new encoding designs. The [NewImmutableEncodingDesign](NewImmutableEncodingDesign) There will be a related pair of new encoding designs. The
and [NewMutableEncodingDesign](NewMutableEncodingDesign) pages will hold those design discussions. [NewImmutableEncodingDesign](NewImmutableEncodingDesign) and [NewMutableEncodingDesign](NewMutableEncodingDesign) pages will hold those
design discussions.
Ticket #432 was the starting point: it contained a list of features. Ticket #432 was the starting point: it contained a list of features.
@ -25,8 +26,9 @@ established sense). To make them real, we need to:
necessarily provide enough information to actually access it (i.e. if you necessarily provide enough information to actually access it (i.e. if you
have a URI and somebody pointed you at a file, you could confidently tell have a URI and somebody pointed you at a file, you could confidently tell
them whether or not it was the right file, but if you only have the URI, them whether or not it was the right file, but if you only have the URI,
then you might not be able to find the file without additional information). If then you might not be able to find the file without additional
the cap has both identifying and location information, it's called a URL. information). If the cap has both identifying and location information,
it's called a URL.
* Tahoe filecaps are meant to be URLs (they are intended to provide location * Tahoe filecaps are meant to be URLs (they are intended to provide location
information), but to really make that work, you also need to define which information), but to really make that work, you also need to define which
grid you're talking about. So far this has always been implicit, but that grid you're talking about. So far this has always been implicit, but that
@ -45,8 +47,7 @@ established sense). To make them real, we need to:
them, and that we have a clear procedure for starting with a filecap and them, and that we have a clear procedure for starting with a filecap and
a gateway HTTP URL, and ending with the contents of the file. a gateway HTTP URL, and ending with the contents of the file.
## make them shorter, prettier, and easier to use
## other features
* Short and not so ugly. This is important to enable * Short and not so ugly. This is important to enable
cut-and-paste (see below), but also just because people are cut-and-paste (see below), but also just because people are
@ -54,8 +55,14 @@ established sense). To make them real, we need to:
notes in which dozens of people have spontaneously complained notes in which dozens of people have spontaneously complained
about the current URLs. By contrast, tiny URLs such as about the current URLs. By contrast, tiny URLs such as
tinyurl.com, bit.ly, etc. are ubiquitous nowadays; users have tinyurl.com, bit.ly, etc. are ubiquitous nowadays; users have
no problem with those -- see Twitter. See below for notes on no problem with those -- see Twitter.
cap length. * I (warner) am curious about where the suspicion comes from. Do long URLs
make people think they're being attacked, some sort of browser buffer
overrun thing? Or that they're being phished, with a URL that a human
would evaluate differently than their browser? I agree that people
(including me) don't like long URLs, but I've never pushed anyone to
explain the "suspicion" aspect. One comment in #217 says "smells a bit
spammy", and a later one says "Spooks me every time".
* Enable convenient cut-and-paste. If caps are too long they'll wrap in * Enable convenient cut-and-paste. If caps are too long they'll wrap in
email. If they contain lots of word-breaking characters then you have to email. If they contain lots of word-breaking characters then you have to
drag after you've double clicked (this is probably ok). If the word-broken drag after you've double clicked (this is probably ok). If the word-broken
@ -77,18 +84,55 @@ established sense). To make them real, we need to:
webbrowser (i.e. when you click on `tahoe:foo`, a helper program is webbrowser (i.e. when you click on `tahoe:foo`, a helper program is
launched with `tahoe:foo`, and that in turn launches your web browser launched with `tahoe:foo`, and that in turn launches your web browser
with `<http://localhost:8123/foo>`). (#52) with `<http://localhost:8123/foo>`). (#52)
## make them long enough to be secure
We want filecaps to be as possible, but no shorter. There are
several lower bounds on the length:
* confidentiality: A large computing effort should not be able
to obtain the plaintext of a tahoe file without knowing the
readcap. We require reasonable margin against improvements in
hardware speed and organization efficiency/motivation of
distributed efforts (e.g. could a million PS3 owners break a
filecap?). This currently implies a 128 bit confidentiality
field.
* integrity: a large computing effort should not be able to
produce shares which will be accepted by the readcap holder
but which do not result in the same file as created the
original uploader (and retrieved by other downloaders). We
desire all three of the standard hash properties (collision
resistance, first-pre-image resistance, second-pre-image
resistance) to also apply to tahoe immutable files and their
filecaps. This currently implies a 128bit (or 256bit?)
integrity field.
* variable-length integrity field (#102, comment 16+17),
allowing users to decide between short caps and strong
integrity guarantees
* storage collision resistance (#753): a Tahoe grid should be
able to store trillions of files and still have a vanishingly
small chance of two files using the same storage-index (and
thus confusing each other's shares). The storage-index is
generally compressed out of the filecap, by deriving it with
various hashing stages on the other filecap parameters. The
shortest value in this derivation chain must be at least
128bits long, and preferably about 192bits long.
## other features
* Self-identifying. It should be visually clear what sort of filecap the * Self-identifying. It should be visually clear what sort of filecap the
string represents: read-write or read-only, mutable-or-immutable, string represents: read-write or read-only, mutable-or-immutable,
file-or-directory. This is especially important when sharing tahoe objects file-or-directory. This is especially important when sharing tahoe objects
over out-of-band channels like IM and email: it should be easy for the over out-of-band channels like IM and email: it should be easy for the
user to tell whether they're giving away readonly access or read-write user to tell whether they're giving away readonly access or read-write
access. We've considered prefixes like `DWM..` for "Directory access. We've considered prefixes like `DWM..` for "Directory
Writeable Mutable" and `FRI..` for "File Readonly Immutable". If these Writeable Mutable" and `FRI..` for "File Readonly Immutable" (#102
are jammed against the (base62) crypto bits it may be difficult to tell comment 12). If these are jammed against the (base62) crypto bits it may
where the prefix ends and the crypto bits begin, especially because the be difficult to tell where the prefix ends and the crypto bits begin,
crypto bits will be using the same character set (`FRIDWM...`). It especially because the crypto bits will be using the same character set
might be a good idea to separate the type prefix from the cryptobits: (`FRIDWM...`). It might be a good idea to separate the type prefix
`FRI-cryptobits` or `FRI/cryptobits`. from the cryptobits: `FRI-cryptobits` or `FRI/cryptobits`.
* in addition, tahoe URIs should be distinguishable from local filenames by * in addition, tahoe URIs should be distinguishable from local filenames by
a CLI tool, so that `tahoe cp $CAP local/foo.txt` is unambiguous. a CLI tool, so that `tahoe cp $CAP local/foo.txt` is unambiguous.
(unfortunately, the current practice of using "tahoe:" as a default alias (unfortunately, the current practice of using "tahoe:" as a default alias
@ -110,15 +154,16 @@ established sense). To make them real, we need to:
trivial. Another way to think about this is that if our filecaps were trivial. Another way to think about this is that if our filecaps were
verbose s-expressions, these caps could be expressed as "(readonly verbose s-expressions, these caps could be expressed as "(readonly
(mutable cryptobits))" and "(directory (readonly (mutable cryptobits)))". (mutable cryptobits))" and "(directory (readonly (mutable cryptobits)))".
* provide for verifycaps, repaircaps, and traversalcaps. Repaircaps in * provide for verifycaps, repaircaps, and traversalcaps (#308, #217).
particular may require a grant of storage authority, which might entail a Repaircaps in particular may require a grant of storage authority, which
cap format that can accept arbitrary extra non-hierarchical fields. might entail a cap format that can accept arbitrary extra non-hierarchical
Appendcaps or "drop-box" writecaps might fall into this same space. But fields. Appendcaps or "drop-box" writecaps might fall into this same
remember that URIs should identify objects, not the action that you want space. But remember that URIs should identify objects, not the action that
to do on it: a webapi scheme may use a POST/PUT/DELETE method, or append a you want to do on it: a webapi scheme may use a POST/PUT/DELETE method, or
t=json adverb, or alternatively encode the verb/adverb into the HTTP url append a t=json adverb, or alternatively encode the verb/adverb into the
(think `GET .../filecap/json` or `PUT unlinked/ciphertext`), but HTTP url (think `GET .../filecap/json` or ```PUT
these are independent of the underlying filecap. unlinked/ciphertext```), but these are independent of the underlying
filecap.
* provide ciphertext access. Reading from a verifycap should give you * provide ciphertext access. Reading from a verifycap should give you
ciphertext. It should be possible to upload ciphertext directly. ciphertext. It should be possible to upload ciphertext directly.
* provide for a grid-identifier, possibly on the MSB end, e.g. * provide for a grid-identifier, possibly on the MSB end, e.g.
@ -127,36 +172,5 @@ established sense). To make them real, we need to:
mean `tahoe://grid1234/IR/cryptobits`. Something like mean `tahoe://grid1234/IR/cryptobits`. Something like
`tahoe://grid1234/D/MR/cryptobits` should reference `tahoe://grid1234/D/MR/cryptobits` should reference
`tahoe://grid1234/MR/cryptobits`. (#403) `tahoe://grid1234/MR/cryptobits`. (#403)
* #102 and #217 have notes on dircaps * permit multiple encodings of the same file (same k, different N) to use
* #678 (converge same file, same K, different M) each other's shares (#678)
## filecap length
We want filecaps to be as possible, but no shorter. There are
several lower bounds on the length:
* confidentiality: A large computing effort should not be able
to obtain the plaintext of a tahoe file without knowing the
readcap. We require reasonable margin against improvements in
hardware speed and organization efficiency/motivation of
distributed efforts (e.g. could a million PS3 owners break a
filecap?). This currently implies a 128 bit confidentiality
parameter.
* integrity: a large computing effort should not be able to
produce shares which will be accepted by the readcap holder
but which do not result in the same file as created the
original uploader (and retrieved by other downloaders). We
desire all three of the standard hash properties (collision
resistance, first-pre-image resistance, second-pre-image
resistance) to also apply to tahoe immutable files and their
filecaps. This currently implies a 128bit (or 256bit?) integrity
parameter.
* storage collision resistance (#753): a Tahoe grid should be
able to store trillions of files and still have a vanishingly
small chance of two files using the same storage-index (and
thus confusing each other's shares). The storage-index is
generally compressed out of the filecap, by deriving it with
various hashing stages on the other filecap parameters. The
shortest value in this derivation chain must be at least
128bits long, and preferably about 192bits long.