tahoe-lafs/trac-2024-07-25

build "sharing slots" / use mutable files as primitives for sharing messages #152

New issue

Open

opened 2007-09-27 19:16:28 +00:00 by warner · 5 comments

warner commented

2007-09-27 19:16:28 +00:00

Owner

We can do this. The requirements are that his computer stays online until the
upload finishes, and that his friend might not be able to download the file
right away (i.e. if he uses the IM'ed string too quickly). If the download is
not yet available, the friend should get an ETA or some sort of progress
message to let them know when they should start downloading it, so that they
can plan their time ("do I go get coffee, or go out to lunch, or come back
tomorrow?").

To build this, I'm thinking we start with an SSK-based mutable slot. The
"Share This File" button creates an SSK slot, fills it with some starting
data, and displays the SSK URI to the originating user. The slot is filled
with:

suggested filename
file length
one of:
- "upload in progress"
  - bytes uploaded so far
  - ETA
- "complete"
  - CHK URI

The originating client will modify the SSK slot every once in a while
(perhaps once every 10 to 60 seconds?) to update the ETA, and will eventually
fill in the URI.

The recipient's GUI should accept an SSK URI (with some framing information
to suggest that it is filled with data in this format) and read the slot to
see whether the file is available yet or not. There should be a "Retrieve
Shared File" button to which you can paste or drag the SSK URI, and it either
produces a window with "waiting for upload to complete: NN%, ETA XX", or
"downloading: NN%, ETA XX", or a file icon ready to be dragged somewhere.

These SSK slots should expire after a while, maybe a week or a month (perhaps
the "share this file" button should have an option somewhere to specify how
long the file will be available). The CHK file needs to last at least the
same duration, so perhaps it needs an extra purely-time-based lease (still
accounted to the originator, but not cancelled if they remove the file from
their vdrive (or never added it in the first place)).

We were talking with Peter yesterday about what sort of sharing UI he'd like to use. In exchanging documents with a colleague, he said he'd like to take the spreadsheet that he's editing and push a button that says "Share This File", and immediately get a window with a string that he can IM or email to somebody. He doesn't want to wait for a file to finish uploading or even encoding, because he wants to be able to walk away from the process once he's IM'ed this string to his friend. We can do this. The requirements are that his computer stays online until the upload finishes, and that his friend might not be able to download the file right away (i.e. if he uses the IM'ed string too quickly). If the download is not yet available, the friend should get an ETA or some sort of progress message to let them know when they should start downloading it, so that they can plan *their* time ("do I go get coffee, or go out to lunch, or come back tomorrow?"). To build this, I'm thinking we start with an SSK-based mutable slot. The "Share This File" button creates an SSK slot, fills it with some starting data, and displays the SSK URI to the originating user. The slot is filled with: * suggested filename * file length * one of: * "upload in progress" * bytes uploaded so far * ETA * "complete" * CHK URI The originating client will modify the SSK slot every once in a while (perhaps once every 10 to 60 seconds?) to update the ETA, and will eventually fill in the URI. The recipient's GUI should accept an SSK URI (with some framing information to suggest that it is filled with data in this format) and read the slot to see whether the file is available yet or not. There should be a "Retrieve Shared File" button to which you can paste or drag the SSK URI, and it either produces a window with "waiting for upload to complete: NN%, ETA XX", or "downloading: NN%, ETA XX", or a file icon ready to be dragged somewhere. These SSK slots should expire after a while, maybe a week or a month (perhaps the "share this file" button should have an option somewhere to specify how long the file will be available). The CHK file needs to last at least the same duration, so perhaps it needs an extra purely-time-based lease (still accounted to the originator, but not cancelled if they remove the file from their vdrive (or never added it in the first place)).

tahoe-lafs added the

labels 2007-09-27 19:16:28 +00:00

tahoe-lafs added this to the eventually milestone 2007-09-27 19:16:28 +00:00

zooko commented

2007-09-27 21:11:16 +00:00

Author

Owner

But for sufficiently large files, this feature sounds cool.

Hm, actually, why doesn't his friend start downloading the file before Peter's computer has finished uploading the file? So the progress meter isn't telling you how far to go until you can start downloading, it is telling you how far to go until the file is completely downloaded. Also if the file is useful when incomplete (such as a movie or audio file), then the friend can start using it as soon as Peter's computer starts uploading it.

How big are big spreadsheets? I have some small spreadsheets that are about 20 KB. If a file is less than a couple hundred KB, the upload of the file itself might complete faster than Peter can cut-and-paste the string and IM it to his friend. (Back-of-envelope 1s per file plus 23 KB/s, so maybe 2 seconds for a 40 KB file.) But for sufficiently large files, this feature sounds cool. Hm, actually, why doesn't his friend start downloading the file before Peter's computer has finished uploading the file? So the progress meter isn't telling you how far to go until you can start downloading, it is telling you how far to go until the file is completely downloaded. Also if the file is useful when incomplete (such as a movie or audio file), then the friend can start using it as soon as Peter's computer starts uploading it.

warner commented

2007-09-28 01:56:39 +00:00

Author

Owner

I guess I'm assuming that microsoft produces are incapable of creating any
file smaller than a few megabytes. I'm also assuming slow consumer-grade ADSL
uplinks.

I'd think that the user should be able to wait up to, say, 15 seconds (from
the time they push the button to the time they get an IM-able string). If
it's less than 2 seconds, then it will feel like their file is being
instantly transmitted, at least from the sender's point of view. The burden
of waiting is really being transferred to their friend, but most of that
latency is hidden from both parties by their own natural sloth :-). (the
longer they procrastinate before pushing the "download this file" link, the
better we look).

If the only thing we need to do is to generate a unique string (like a
Storage Index), then we can respond in a few milliseconds. I think we should
evaluate this time in absolute terms rather than how long it takes Peter to
subsequently cut-and-paste the string, since Peter is waiting on us before
that point, and only on himself after that point. I.e., he can't blame us for
how long it takes him to manipulate his IM client.

Starting the download before the upload finishes would be really slick. It
also won't work at all for our current CHK format, unless we allow the
recipient to download unverified data and keep it quarantined somewhere until
the hashes are uploaded and downloaded and checked. The CHK format has only
one place for verification data (the UEB hash inside the URI), and we can't
generate it until the very end.

Doing download-before-upload on SSK would need some clever work too.. like
signing each segment separately. Or, we could make the validation section
contain a hash tree over just the segments that have been encoded thus far,
with a signature on the root. As we encode more segments, we keep replacing
this tree with a larger one that covers more segments. When we finish
uploading, we'll have a bunch of segments, a complete merkle tree of hashes
(covering all segments), and a single signature on the root.

If this is an important use case, we should keep it in mind when we design
the SSK format. We've talked in the past about designing SSKs that can handle
large amounts of data (using FEC instead of simple replication); if we also
design them to handle partial-upload (with the merkle tree and a variable
number of segments), then we can implement this very nifty feature. (and if
we do this, then the "sharing slot" might just be the SSK itself.. this would
require a place to store "expected file size" or "expected number of
segments", and then we'd probably need to put the suggested file name in the
metadata that wraps the SSK URI and gets pasted or IM'ed to the recipient).

I guess I'm assuming that microsoft produces are incapable of creating any file smaller than a few megabytes. I'm also assuming slow consumer-grade ADSL uplinks. I'd think that the user should be able to wait up to, say, 15 seconds (from the time they push the button to the time they get an IM-able string). If it's less than 2 seconds, then it will feel like their file is being instantly transmitted, at least from the sender's point of view. The burden of waiting is really being transferred to their friend, but most of that latency is hidden from both parties by their own natural sloth :-). (the longer they procrastinate before pushing the "download this file" link, the better we look). If the only thing we need to do is to generate a unique string (like a Storage Index), then we can respond in a few milliseconds. I think we should evaluate this time in absolute terms rather than how long it takes Peter to subsequently cut-and-paste the string, since Peter is waiting on *us* before that point, and only on himself after that point. I.e., he can't blame us for how long it takes *him* to manipulate his IM client. Starting the download before the upload finishes would be really slick. It also won't work at all for our current CHK format, unless we allow the recipient to download unverified data and keep it quarantined somewhere until the hashes are uploaded and downloaded and checked. The CHK format has only one place for verification data (the UEB hash inside the URI), and we can't generate it until the very end. Doing download-before-upload on SSK would need some clever work too.. like signing each segment separately. Or, we could make the validation section contain a hash tree over just the segments that have been encoded thus far, with a signature on the root. As we encode more segments, we keep replacing this tree with a larger one that covers more segments. When we finish uploading, we'll have a bunch of segments, a complete merkle tree of hashes (covering all segments), and a single signature on the root. If this is an important use case, we should keep it in mind when we design the SSK format. We've talked in the past about designing SSKs that can handle large amounts of data (using FEC instead of simple replication); if we also design them to handle partial-upload (with the merkle tree and a variable number of segments), then we can implement this very nifty feature. (and if we do this, then the "sharing slot" might just be the SSK itself.. this would require a place to store "expected file size" or "expected number of segments", and then we'd probably need to put the suggested file name in the metadata that wraps the SSK URI and gets pasted or IM'ed to the recipient).

zooko commented

2007-10-21 01:53:46 +00:00

Author

Owner

Now we're designing SSKs, and I still think that this is a valuable use case, so I'm posting this comment to remind us to think about this while designing SSKs.

tahoe-lafs added

0.6.1

and removed

0.5.1

labels 2007-10-21 01:53:46 +00:00

warner commented

2008-03-28 19:44:21 +00:00

Author

Owner

Ping was surprised by the idea that we'd re-use this directory. He suggested
that we treat the directory like a one-time "Purse" (from the Mint example,
either from erights.org or Tyler's IOU protocol). The specific thing that he
thought would be confusing was that Bob might come to assume that the file
would remain forever in that inbox (that he "owns" the inbox), and therefore
he would be upset if Alice removed something from his space. Likewise Bob
might be upset to think that Alice could add things to his vdrive at will.
Using the same directory for multiple files would increase the utility of
this inbox, increasing the chances that Bob would keep using things in-place
rather than copying them elsewhere, increasing the surprise/upset.

The other realization we had was that the #217 elliptic-curve -based
DSA-based mutable files would have smaller write-caps than read-caps: with
some tricks, we could get them down to 96 bits (plus prefix), so about 15
characters of base-62. If we use a separate mutable file per act of sharing,
then we could give the recipient the full write-cap instead of the (longer)
read-cap. Then we wouldn't need to treat the gift as a directory at all, we
could just use it as a "channel" that the two parties can use to communicate
about this gift.

For example, we could define a human-shareable cap format (i.e. printable,
short enough to avoid wrapping, and with an http prefix) specifically for
sharing things, with a prefix character of "S" (as opposed to "D" for
directory and "F" for file). The rest of the cap would be a mutable-file
write-cap, but the "S" would indicate that we want to treat the contents
specially.

The contsnts would contain a message from the giver to the recipient. It
would include a list of file/directory caps (with names), the nickname of the
sender, heck it could include the public key of the sender and the rest of
the body could be signed (allowing the recipient to assign a petname to the
sender). Higher-level code would accept the gift, look up the mutable file,
read and parse the contents, then offer the user the choice of what to do
with the gift. The response channel could just be writing a timestamp and a
short note into the slot, saying "got it.. thanks". The revocation action
would be to have the writer erase the slot, replacing it with a type byte
that says "this gift was revoked" or something.

The key insight is to use mutable files as a primitive, and to use
higher-level protocols to generate and interpret their contents.

We were chatting with Ping at the hackfest last night, explaining how I was guessing that sharing would work, specifically the idea of having a pair-wise directory: when Alice wants to give something to Bob, she creates a new directory, links its write-cap to "outbox/to-Bob" in her vdrive, puts the file/files she wants to share in the dir, then mails him the directory's read-cap. Bob links the read-cap to "inbox/from-Alice". Then Alice can "revoke" the grant by just deleting the file from that directory, and she has a record of what she's shared. Ping was surprised by the idea that we'd re-use this directory. He suggested that we treat the directory like a one-time "Purse" (from the Mint example, either from erights.org or Tyler's IOU protocol). The specific thing that he thought would be confusing was that Bob might come to assume that the file would remain forever in that inbox (that he "owns" the inbox), and therefore he would be upset if Alice removed something from his space. Likewise Bob might be upset to think that Alice could add things to his vdrive at will. Using the same directory for multiple files would increase the utility of this inbox, increasing the chances that Bob would keep using things in-place rather than copying them elsewhere, increasing the surprise/upset. The other realization we had was that the #217 elliptic-curve -based DSA-based mutable files would have smaller write-caps than read-caps: with some tricks, we could get them down to 96 bits (plus prefix), so about 15 characters of base-62. If we use a separate mutable file per act of sharing, then we could give the recipient the full write-cap instead of the (longer) read-cap. Then we wouldn't need to treat the gift as a directory at all, we could just use it as a "channel" that the two parties can use to communicate about this gift. For example, we could define a human-shareable cap format (i.e. printable, short enough to avoid wrapping, and with an http prefix) specifically for sharing things, with a prefix character of "S" (as opposed to "D" for directory and "F" for file). The rest of the cap would be a mutable-file write-cap, but the "S" would indicate that we want to treat the contents specially. The contsnts would contain a message from the giver to the recipient. It would include a list of file/directory caps (with names), the nickname of the sender, heck it could include the public key of the sender and the rest of the body could be signed (allowing the recipient to assign a petname to the sender). Higher-level code would accept the gift, look up the mutable file, read and parse the contents, then offer the user the choice of what to do with the gift. The response channel could just be writing a timestamp and a short note into the slot, saying "got it.. thanks". The revocation action would be to have the writer erase the slot, replacing it with a type byte that says "this gift was revoked" or something. The key insight is to use mutable files as a primitive, and to use higher-level protocols to generate and interpret their contents.

tahoe-lafs changed title from ~~build "sharing slots"~~ to build "sharing slots" / use mutable files as primitives for sharing messages

2008-03-28 19:44:21 +00:00

tahoe-lafs modified the milestone from eventually to undecided

2008-06-01 20:58:26 +00:00

davidsarah commented

2009-12-18 00:09:34 +00:00

Author

Owner

This doesn't have to be restricted to mutable files; the ability to generate a file cap before the file has been fully uploaded has also been discussed for immutable files in the new cap protocol. That is possible if we use public key crypto for immutable files (the integrity and confidentiality of the file would still only depend on symmetric crypto). See http://allmydata.org/pipermail/tahoe-dev/2009-October/002962.html

This doesn't have to be restricted to mutable files; the ability to generate a file cap before the file has been fully uploaded has also been discussed for immutable files in the new cap protocol. That is possible if we use public key crypto for immutable files (the integrity and confidentiality of the file would still only depend on symmetric crypto). See <http://allmydata.org/pipermail/tahoe-dev/2009-October/002962.html>

tahoe-lafs added

major

and removed

minor

labels 2009-12-18 00:09:34 +00:00