CHK-URIs: derive storage index from readkey to make the URI shorter #3

Closed
opened 2007-04-24 02:31:05 +00:00 by warner · 3 comments
warner commented 2007-04-24 02:31:05 +00:00
Owner

The URI currently contains separate readkey and StorageIndex fields. We should redefine the read-cap CHK file URI to include just the readkey and derive the StorageIndex from it by hashing.

The URI currently contains separate readkey and [StorageIndex](wiki/StorageIndex) fields. We should redefine the read-cap CHK file URI to include just the readkey and derive the [StorageIndex](wiki/StorageIndex) from it by hashing.
tahoe-lafs added the
minor
defect
labels 2007-04-24 02:31:05 +00:00
tahoe-lafs added the
code
label 2007-04-28 19:17:51 +00:00
warner commented 2007-06-29 18:49:20 +00:00
Author
Owner

We've addressed the most immediate problem here, by moving many of the pieces off to the URIExtension, and including the hash of that datablock in the URI itself. This makes the URIs smaller, at the expense of increasing the storage overhead slightly (about 200 bytes per share), and increasing the alacrity slightly (you have to pull 200 bytes from one shareholder before you can verify the first segment).

Once we also switch to deriving the StorageIndex from the readkey, this will shrink the URI down to the following pieces:

  • readkey (base32-encoded 16 or 32 byte value)
  • URIExtension hash (base32-encoded 32 byte value)
  • needed_shares/total_shares (two small integers, normally "25" and "100")
  • filesize (3-7 bytes, really just for quicker UI purposes)

(at the moment, we track the readkey and the storage index separately, so our URIs are another 53 characters longer than this)

I'm redefining this ticket to be about reducing the size of the URI, by deriving the StorageIndex from the readkey. The issue of algorithmically generating things like segment size and encoding parameters from the filesize is less important, in my opinion, now that it's been pushed out to the URIExtension.

We've addressed the most immediate problem here, by moving many of the pieces off to the URIExtension, and including the hash of that datablock in the URI itself. This makes the URIs smaller, at the expense of increasing the storage overhead slightly (about 200 bytes per share), and increasing the alacrity slightly (you have to pull 200 bytes from one shareholder before you can verify the first segment). Once we also switch to deriving the [StorageIndex](wiki/StorageIndex) from the readkey, this will shrink the URI down to the following pieces: * readkey (base32-encoded 16 or 32 byte value) * URIExtension hash (base32-encoded 32 byte value) * needed_shares/total_shares (two small integers, normally "25" and "100") * filesize (3-7 bytes, really just for quicker UI purposes) (at the moment, we track the readkey and the storage index separately, so our URIs are another 53 characters longer than this) I'm redefining this ticket to be about reducing the size of the URI, by deriving the [StorageIndex](wiki/StorageIndex) from the readkey. The issue of algorithmically generating things like segment size and encoding parameters from the filesize is less important, in my opinion, now that it's been pushed out to the URIExtension.
tahoe-lafs added the
0.3.0
label 2007-06-29 18:49:20 +00:00
tahoe-lafs changed title from URIs are too big to URIs could be a bit smaller 2007-06-29 18:49:20 +00:00
tahoe-lafs added
0.4.0
and removed
0.3.0
labels 2007-07-02 19:46:31 +00:00
tahoe-lafs changed title from URIs could be a bit smaller to CHK-URIs: derive storage index from readkey to make the URI shorter 2007-07-02 19:46:31 +00:00
tahoe-lafs added
enhancement
and removed
defect
labels 2007-07-02 19:46:38 +00:00
warner commented 2007-07-22 01:24:44 +00:00
Author
Owner

I decided to go ahead and do this now changeset:81a99044554f72ef, since I changed the URI header anyways (from a bare "URI:" to "URI:CHK:") in the process of refactoring URI processing.

This brings the URI for a 28kB file down from 165 characters to 108.

We still need to talk about some crypto stuff: we certainly want the storage index to be unique, and it might be nice to have it be unguessable, and we should think about how the Birthday Attack impacts this. Given that there's half as many bits in the readkey as there was in the storage index, we're working with less entropy than we used to, and it might be sensible to put a 32-byte value into the URI, truncate it for use as the readkey, and hash the whole thing to generate the storage index.

I decided to go ahead and do this now changeset:81a99044554f72ef, since I changed the URI header anyways (from a bare "URI:" to "URI:CHK:") in the process of refactoring URI processing. This brings the URI for a 28kB file down from 165 characters to 108. We still need to talk about some crypto stuff: we certainly want the storage index to be unique, and it might be nice to have it be unguessable, and we should think about how the Birthday Attack impacts this. Given that there's half as many bits in the readkey as there was in the storage index, we're working with less entropy than we used to, and it might be sensible to put a 32-byte value into the URI, truncate it for use as the readkey, and hash the whole thing to generate the storage index.
tahoe-lafs added this to the 0.5.0 milestone 2007-07-22 01:24:44 +00:00
warner commented 2007-07-24 18:20:19 +00:00
Author
Owner

We decided to truncate the storage index to the same 128 bits that are present in the AES key that it's derived from, to make it clear that we understand our basic information theory.

finally closing this one..

We decided to truncate the storage index to the same 128 bits that are present in the AES key that it's derived from, to make it clear that we understand our basic information theory. finally closing this one..
tahoe-lafs added the
fixed
label 2007-07-24 18:20:19 +00:00
warner closed this issue 2007-07-24 18:20:19 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#3
No description provided.