Cater to rsync as a target Tahoe client. #78

Open
opened 2007-07-05 21:46:52 +00:00 by nejucomo · 4 comments
nejucomo commented 2007-07-05 21:46:52 +00:00
Owner

Imagine a scenario where a sysadmin of a large enterprise network needs to perform routine backups, and does so by rsyncing from many clients to one large raid storage device.

What if they could replace the single large raid with a vdrive, and run tahoe storage nodes on each workstation, and have all of the client-side rsync automation work without change?

If this use case is as common and the Tahoe replacement as useful as I believe it to be, it would behoove Tahoe to cater to rsync for both publication and retrieval.

One sufficient support feature would be file-system emulation (fuse, WebDav, ...) which rsync can already use. However, it may also be worthwhile to implement an rsync-specialized interface to Tahoe if the efficiency-gains-to-development-time tradeoff was right.

Imagine a scenario where a sysadmin of a large enterprise network needs to perform routine backups, and does so by rsyncing from many clients to one large raid storage device. What if they could replace the single large raid with a vdrive, and run tahoe storage nodes on each workstation, and have all of the client-side rsync automation work without change? If this use case is as common and the Tahoe replacement as useful as I believe it to be, it would behoove Tahoe to cater to rsync for both publication and retrieval. One sufficient support feature would be file-system emulation (fuse, [WebDav](wiki/WebDav), ...) which rsync can already use. However, it may also be worthwhile to implement an rsync-specialized interface to Tahoe if the efficiency-gains-to-development-time tradeoff was right.
tahoe-lafs added the
code
major
enhancement
0.4.0
labels 2007-07-05 21:46:52 +00:00
tahoe-lafs added this to the eventually milestone 2007-07-05 21:46:52 +00:00
warner commented 2007-07-25 03:06:50 +00:00
Author
Owner

this means being able to efficiently modify files in-place, right? and/or record rsync's coarse hashes in some place so the update code could decide which blocks needed to be modified without actually having to download them all?

To support this, we'd probably need to use something other than CHK.

this means being able to efficiently modify files in-place, right? and/or record rsync's coarse hashes in some place so the update code could decide which blocks needed to be modified without actually having to download them all? To support this, we'd probably need to use something other than CHK.
tahoe-lafs added
minor
and removed
major
labels 2007-07-25 03:06:50 +00:00
zooko commented 2008-03-20 02:48:08 +00:00
Author
Owner

Nowadays, we have two things: Small Decentralized Mutable Files and Immutable Files (which Brian called "CHKs" in the previous message). The former might support rsync okay for sufficiently small files. There is currently a hard limit of 3.5 MB, which ought to be raised, but there will remain a couple of soft limits -- see #359 (eliminate hard limit on size of SDMFs).

Nowadays, we have two things: Small Decentralized Mutable Files and Immutable Files (which Brian called "CHKs" in the previous message). The former might support rsync okay for sufficiently small files. There is currently a hard limit of 3.5 MB, which ought to be raised, but there will remain a couple of soft limits -- see #359 (eliminate hard limit on size of SDMFs).
tahoe-lafs modified the milestone from eventually to undecided 2008-06-01 20:53:01 +00:00
davidsarah commented 2009-11-23 03:34:17 +00:00
Author
Owner

rsync coarse hashes are documented at http://klubkev.org/rsync/ . Note that they're not secure hashes (the algorithm uses MD4 and a variant of adler32), so they must be treated as confidential.

rsync coarse hashes are documented at <http://klubkev.org/rsync/> . Note that they're not secure hashes (the algorithm uses MD4 and a variant of adler32), so they must be treated as confidential.
warner commented 2009-11-24 06:39:38 +00:00
Author
Owner

hm, perhaps a more open-ended file format could have room for features like this in the UEB hash. For example, our current immutable UEB hash is specified to be a dictionary, in which e.g. the ["crypttext_hash"] key contains the flat SHA256d hash of the ciphertext. If the share format (or post-decode pre-decrypt ciphertext format) could also be expanded, we could add a section for ["encrypted_rsync_hashes"], covered by a UEB key named ["encrypted_rsync_hash_root"], ignored by older clients, but available for more advanced clients to use for .. whatever it is an rsync hash would be useful for.

(btw, of course we've discussed elsewhere the security implications of an extensible format and the possible benefits of explicitly disallowing extensions like this)

hm, perhaps a more open-ended file format could have room for features like this in the UEB hash. For example, our current immutable UEB hash is specified to be a dictionary, in which e.g. the `["crypttext_hash"]` key contains the flat SHA256d hash of the ciphertext. If the share format (or post-decode pre-decrypt ciphertext format) could also be expanded, we could add a section for `["encrypted_rsync_hashes"]`, covered by a UEB key named `["encrypted_rsync_hash_root"]`, ignored by older clients, but available for more advanced clients to use for .. whatever it is an rsync hash would be useful for. (btw, of course we've discussed elsewhere the security implications of an extensible format and the possible benefits of explicitly *disallowing* extensions like this)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#78
No description provided.