add an option for "tahoe manifest" to not skip duplicates, or a --recursive option to "tahoe ls" #662

Open
opened 2009-03-13 07:49:25 +00:00 by warner · 2 comments
warner commented 2009-03-13 07:49:25 +00:00
Owner

My current job involves tools which modify a directory tree [...], and I'd like to use "tahoe manifest" to compare the before- and after- trees to make sure they're the same. Unfortunately, "tahoe manifest"'s cycle-avoidance code (which simply ignores files or directories that it's seen before) is causing me trouble, since an object that's referenced by multiple places in the tree will appear in the manifest output at only one of them, and that location will depend upon the traversal order. (I just pushed a patch to make deep_traverse at least sort the child names before walking them, so it should now be consistent).

I'm thinking that it might be nice to have a flag to "tahoe manifest" that tells it to not supress duplicates like this. The cycle-avoidance code would need to change: instead of keeping a set of nodes that have already been visited, it should just keep a list of the ancestors of the current node. A cycle should be declared if the child node we're considering entering appears on its own ancestor list.

It might also be useful to have two sets of stats: one that includes shared objects, and one that does not.

My current job involves tools which modify a directory tree [...], and I'd like to use "tahoe manifest" to compare the before- and after- trees to make sure they're the same. Unfortunately, "tahoe manifest"'s cycle-avoidance code (which simply ignores files or directories that it's seen before) is causing me trouble, since an object that's referenced by multiple places in the tree will appear in the manifest output at only one of them, and that location will depend upon the traversal order. (I just pushed a patch to make deep_traverse at least sort the child names before walking them, so it should now be consistent). I'm thinking that it might be nice to have a flag to "tahoe manifest" that tells it to not supress duplicates like this. The cycle-avoidance code would need to change: instead of keeping a set of nodes that have already been visited, it should just keep a list of the ancestors of the current node. A cycle should be declared if the child node we're considering entering appears on its own ancestor list. It might also be useful to have two sets of stats: one that includes shared objects, and one that does not.
tahoe-lafs added the
code-dirnodes
major
enhancement
1.3.0
labels 2009-03-13 07:49:25 +00:00
tahoe-lafs added this to the undecided milestone 2009-03-13 07:49:25 +00:00
warner commented 2009-03-13 18:35:38 +00:00
Author
Owner

Oh, I should mention that partly this is the result of changing goals/definitions of "tahoe manifest". Originally, it was intended purely as a set of verifycaps: the idea being that you'd compute your manifest and then hand it to a separate Verifier service, which would take responsibility for checking up on all of them. It was also the intention that a verifycap be usable as a repaircap, so the Verifier service could be a Verifier/Repairer service. In these cases, we don't care about duplicates: we just want the minimum-size set of verifycaps, and it doesn't matter what path or paths were used to store each one.

Later, "tahoe manifest" acquired path information, because that made it easier to backtrack and find a parent directory for any object which was later found to have problems. About this same time, the definition of "manifest" started changing, and now we sort of think about is as a list of (path,cap) tuples.

So maybe we need to be more clear about our definitions, and perhaps create a separate API for each one.

Incidentally, the cycle handling code on the "list of (path,cap) tuples" API could respond to cycles by emitting a special marker: (type="cycle", cap), and maybe include otherpath= too. The program which is receiving the manifest could conceivably use this information to stitch together the cycle somehow.

Oh, I should mention that partly this is the result of changing goals/definitions of "tahoe manifest". Originally, it was intended purely as a set of verifycaps: the idea being that you'd compute your manifest and then hand it to a separate Verifier service, which would take responsibility for checking up on all of them. It was also the intention that a verifycap be usable as a repaircap, so the Verifier service could be a Verifier/Repairer service. In these cases, we don't care about duplicates: we just want the minimum-size set of verifycaps, and it doesn't matter what path or paths were used to store each one. Later, "tahoe manifest" acquired path information, because that made it easier to backtrack and find a parent directory for any object which was later found to have problems. About this same time, the definition of "manifest" started changing, and now we sort of think about is as a list of (path,cap) tuples. So maybe we need to be more clear about our definitions, and perhaps create a separate API for each one. Incidentally, the cycle handling code on the "list of (path,cap) tuples" API could respond to cycles by emitting a special marker: (type="cycle", cap), and maybe include otherpath= too. The program which is receiving the manifest could conceivably use this information to stitch together the cycle somehow.
kmarkley86 commented 2013-09-02 17:33:32 +00:00
Author
Owner

I tried using manifest as a sort of recursive ls, and immediately ran into this issue that it wasn't showing duplicates. Unless there's recursive ls behavior available somewhere else, it would be great to fix this.

I tried using manifest as a sort of recursive ls, and immediately ran into this issue that it wasn't showing duplicates. Unless there's recursive ls behavior available somewhere else, it would be great to fix this.
tahoe-lafs changed title from change "tahoe manifest" to not skip duplicates to add an option for "tahoe manifest" to not skip duplicates, or a --recursive option to "tahoe ls" 2013-09-04 20:35:36 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#662
No description provided.