From e5a234087923ce08abb109224dabddb3b795c029 Mon Sep 17 00:00:00 2001 From: warner <> Date: Tue, 25 Aug 2009 10:26:54 +0000 Subject: [PATCH] reorg, copy relevant notes from referenced tickets (unrolling the loops, so to speak) [Imported from Trac: page NewCapDesign, version 9] --- NewCapDesign.md | 126 +++++++++++++++++++++++++++--------------------- 1 file changed, 70 insertions(+), 56 deletions(-) diff --git a/NewCapDesign.md b/NewCapDesign.md index 5f850e0..b8ec4e5 100644 --- a/NewCapDesign.md +++ b/NewCapDesign.md @@ -6,8 +6,9 @@ across separate tickets: this page is here to consolidate them. We should not release a new filecap format without checking it against everything on this list. -There will be a related pair of new encoding designs. The [NewImmutableEncodingDesign](NewImmutableEncodingDesign) -and [NewMutableEncodingDesign](NewMutableEncodingDesign) pages will hold those design discussions. +There will be a related pair of new encoding designs. The +[NewImmutableEncodingDesign](NewImmutableEncodingDesign) and [NewMutableEncodingDesign](NewMutableEncodingDesign) pages will hold those +design discussions. Ticket #432 was the starting point: it contained a list of features. @@ -25,8 +26,9 @@ established sense). To make them real, we need to: necessarily provide enough information to actually access it (i.e. if you have a URI and somebody pointed you at a file, you could confidently tell them whether or not it was the right file, but if you only have the URI, - then you might not be able to find the file without additional information). If - the cap has both identifying and location information, it's called a URL. + then you might not be able to find the file without additional + information). If the cap has both identifying and location information, + it's called a URL. * Tahoe filecaps are meant to be URLs (they are intended to provide location information), but to really make that work, you also need to define which grid you're talking about. So far this has always been implicit, but that @@ -45,8 +47,7 @@ established sense). To make them real, we need to: them, and that we have a clear procedure for starting with a filecap and a gateway HTTP URL, and ending with the contents of the file. - -## other features +## make them shorter, prettier, and easier to use * Short and not so ugly. This is important to enable cut-and-paste (see below), but also just because people are @@ -54,8 +55,14 @@ established sense). To make them real, we need to: notes in which dozens of people have spontaneously complained about the current URLs. By contrast, tiny URLs such as tinyurl.com, bit.ly, etc. are ubiquitous nowadays; users have - no problem with those -- see Twitter. See below for notes on - cap length. + no problem with those -- see Twitter. + * I (warner) am curious about where the suspicion comes from. Do long URLs + make people think they're being attacked, some sort of browser buffer + overrun thing? Or that they're being phished, with a URL that a human + would evaluate differently than their browser? I agree that people + (including me) don't like long URLs, but I've never pushed anyone to + explain the "suspicion" aspect. One comment in #217 says "smells a bit + spammy", and a later one says "Spooks me every time". * Enable convenient cut-and-paste. If caps are too long they'll wrap in email. If they contain lots of word-breaking characters then you have to drag after you've double clicked (this is probably ok). If the word-broken @@ -77,18 +84,55 @@ established sense). To make them real, we need to: webbrowser (i.e. when you click on `tahoe:foo`, a helper program is launched with `tahoe:foo`, and that in turn launches your web browser with ``). (#52) + +## make them long enough to be secure + +We want filecaps to be as possible, but no shorter. There are +several lower bounds on the length: + + * confidentiality: A large computing effort should not be able + to obtain the plaintext of a tahoe file without knowing the + readcap. We require reasonable margin against improvements in + hardware speed and organization efficiency/motivation of + distributed efforts (e.g. could a million PS3 owners break a + filecap?). This currently implies a 128 bit confidentiality + field. + * integrity: a large computing effort should not be able to + produce shares which will be accepted by the readcap holder + but which do not result in the same file as created the + original uploader (and retrieved by other downloaders). We + desire all three of the standard hash properties (collision + resistance, first-pre-image resistance, second-pre-image + resistance) to also apply to tahoe immutable files and their + filecaps. This currently implies a 128bit (or 256bit?) + integrity field. + * variable-length integrity field (#102, comment 16+17), + allowing users to decide between short caps and strong + integrity guarantees + * storage collision resistance (#753): a Tahoe grid should be + able to store trillions of files and still have a vanishingly + small chance of two files using the same storage-index (and + thus confusing each other's shares). The storage-index is + generally compressed out of the filecap, by deriving it with + various hashing stages on the other filecap parameters. The + shortest value in this derivation chain must be at least + 128bits long, and preferably about 192bits long. + + +## other features + * Self-identifying. It should be visually clear what sort of filecap the string represents: read-write or read-only, mutable-or-immutable, file-or-directory. This is especially important when sharing tahoe objects over out-of-band channels like IM and email: it should be easy for the user to tell whether they're giving away readonly access or read-write access. We've considered prefixes like `DWM..` for "Directory - Writeable Mutable" and `FRI..` for "File Readonly Immutable". If these - are jammed against the (base62) crypto bits it may be difficult to tell - where the prefix ends and the crypto bits begin, especially because the - crypto bits will be using the same character set (`FRIDWM...`). It - might be a good idea to separate the type prefix from the cryptobits: - `FRI-cryptobits` or `FRI/cryptobits`. + Writeable Mutable" and `FRI..` for "File Readonly Immutable" (#102 + comment 12). If these are jammed against the (base62) crypto bits it may + be difficult to tell where the prefix ends and the crypto bits begin, + especially because the crypto bits will be using the same character set + (`FRIDWM...`). It might be a good idea to separate the type prefix + from the cryptobits: `FRI-cryptobits` or `FRI/cryptobits`. * in addition, tahoe URIs should be distinguishable from local filenames by a CLI tool, so that `tahoe cp $CAP local/foo.txt` is unambiguous. (unfortunately, the current practice of using "tahoe:" as a default alias @@ -110,15 +154,16 @@ established sense). To make them real, we need to: trivial. Another way to think about this is that if our filecaps were verbose s-expressions, these caps could be expressed as "(readonly (mutable cryptobits))" and "(directory (readonly (mutable cryptobits)))". - * provide for verifycaps, repaircaps, and traversalcaps. Repaircaps in - particular may require a grant of storage authority, which might entail a - cap format that can accept arbitrary extra non-hierarchical fields. - Appendcaps or "drop-box" writecaps might fall into this same space. But - remember that URIs should identify objects, not the action that you want - to do on it: a webapi scheme may use a POST/PUT/DELETE method, or append a - t=json adverb, or alternatively encode the verb/adverb into the HTTP url - (think `GET .../filecap/json` or `PUT unlinked/ciphertext`), but - these are independent of the underlying filecap. + * provide for verifycaps, repaircaps, and traversalcaps (#308, #217). + Repaircaps in particular may require a grant of storage authority, which + might entail a cap format that can accept arbitrary extra non-hierarchical + fields. Appendcaps or "drop-box" writecaps might fall into this same + space. But remember that URIs should identify objects, not the action that + you want to do on it: a webapi scheme may use a POST/PUT/DELETE method, or + append a t=json adverb, or alternatively encode the verb/adverb into the + HTTP url (think `GET .../filecap/json` or ```PUT + unlinked/ciphertext```), but these are independent of the underlying + filecap. * provide ciphertext access. Reading from a verifycap should give you ciphertext. It should be possible to upload ciphertext directly. * provide for a grid-identifier, possibly on the MSB end, e.g. @@ -127,36 +172,5 @@ established sense). To make them real, we need to: mean `tahoe://grid1234/IR/cryptobits`. Something like `tahoe://grid1234/D/MR/cryptobits` should reference `tahoe://grid1234/MR/cryptobits`. (#403) - * #102 and #217 have notes on dircaps - * #678 (converge same file, same K, different M) - -## filecap length - -We want filecaps to be as possible, but no shorter. There are -several lower bounds on the length: - - * confidentiality: A large computing effort should not be able - to obtain the plaintext of a tahoe file without knowing the - readcap. We require reasonable margin against improvements in - hardware speed and organization efficiency/motivation of - distributed efforts (e.g. could a million PS3 owners break a - filecap?). This currently implies a 128 bit confidentiality - parameter. - * integrity: a large computing effort should not be able to - produce shares which will be accepted by the readcap holder - but which do not result in the same file as created the - original uploader (and retrieved by other downloaders). We - desire all three of the standard hash properties (collision - resistance, first-pre-image resistance, second-pre-image - resistance) to also apply to tahoe immutable files and their - filecaps. This currently implies a 128bit (or 256bit?) integrity - parameter. - * storage collision resistance (#753): a Tahoe grid should be - able to store trillions of files and still have a vanishingly - small chance of two files using the same storage-index (and - thus confusing each other's shares). The storage-index is - generally compressed out of the filecap, by deriving it with - various hashing stages on the other filecap parameters. The - shortest value in this derivation chain must be at least - 128bits long, and preferably about 192bits long. - + * permit multiple encodings of the same file (same k, different N) to use + each other's shares (#678)