Magic-folder uploads/downloads do not preserve filesystem metadata #2882

Closed
opened 2017-06-22 05:37:54 +00:00 by cypher · 6 comments
cypher commented 2017-06-22 05:37:54 +00:00
Owner

Unlike the tahoe backup command, magic-folder uploads/downloads do not preserve filesystem metadata (such as mtime, file attributes, etc.); when Bob's shared magic-folder downloads a file uploaded by Alice, that file's mtime will be set to the time at which the download completed rather than the (arguably more desirable) times at which that file was originally modified/created on Alice's computer.

Preserving this metadata would be desirable not only for traditional archival purposes (since, e.g., the date that users may have taken certain pictures or written certain documents is often important) but also for the purposes of aiding users to resolve conflicts produced by magic-folder itself in a collaborative scenario (since, e.g., with the current implementation, Bob has no straightforward way of knowing whether the conflicting file produced on his computer by Alice was the one she edited last week or this morning).

(Also, maybe this should be an enhancement instead of defect?)

Unlike the `tahoe backup` command, magic-folder uploads/downloads do not preserve filesystem metadata (such as mtime, file attributes, etc.); when Bob's shared magic-folder downloads a file uploaded by Alice, that file's mtime will be set to the time at which the download completed rather than the (arguably more desirable) times at which that file was originally modified/created on Alice's computer. Preserving this metadata would be desirable not only for traditional archival purposes (since, e.g., the date that users may have taken certain pictures or written certain documents is often important) but also for the purposes of aiding users to resolve conflicts produced by magic-folder itself in a collaborative scenario (since, e.g., with the current implementation, Bob has no straightforward way of knowing whether the conflicting file produced on his computer by Alice was the one she edited last week or this morning). (Also, maybe this should be an enhancement instead of defect?)
tahoe-lafs added the
unknown
normal
defect
1.12.1
labels 2017-06-22 05:37:54 +00:00
tahoe-lafs added this to the undecided milestone 2017-06-22 05:37:54 +00:00
tahoe-lafs added
major
and removed
normal
labels 2017-09-19 19:40:19 +00:00
meejah commented 2018-01-05 00:27:08 +00:00
Author
Owner

I'm not convinced (trying to) preserve UID or GID is a good idea; these IDs will often vary between computers (even if the very same human set them up). Even if the same human did set them up, they'd probably want the symbolic names preserved/set (i.e. "alice" or "a_group" rather than the 1000 or whatever).

I think trying to preserve mtime should work, but we should double-check that magic-folder doesn't depend on these when searching for updates. Can we actually can mess with ctime directly..?

A different reason people might look at the timestamps could be to answer the question "when did magic-folder update this file" rather than "when did Alice update this file"; do we have any real-user feedback indicating which one is least surprising?

Consider an example: Alice adds a new file to her magic-folder that hasn't been modified for a year (but she happens to "cp" it into her magic directory). Thus, from Linux's perspective, the mtime will be "now" (while Alice might except it to be "a year ago" -- because she forgot to do "cp -a"). Which one does Bob want to see as "mtime"? (One of three choices: when his Tahoe client downloaded it; Alice's "now"; or "a year ago").

We can't actually choose "a year ago" in the above example, because Alice accidentally destroyed that metadata. So, presuming she didn't (by using cp -a) then we could chose that as "the" answer.

I'm not convinced (trying to) preserve UID or GID is a good idea; these IDs will often vary between computers (even if the very same human set them up). Even if the same human did set them up, they'd probably want the symbolic names preserved/set (i.e. "alice" or "a_group" rather than the 1000 or whatever). I *think* trying to preserve mtime should work, but we should double-check that magic-folder doesn't depend on these when searching for updates. *Can* we actually can mess with ctime directly..? A different reason people might look at the timestamps could be to answer the question "when did magic-folder update this file" rather than "when did Alice update this file"; do we have any real-user feedback indicating which one is least surprising? Consider an example: Alice adds a new file to her magic-folder that hasn't been modified for a year (but she happens to "cp" it into her magic directory). Thus, from Linux's perspective, the mtime will be "now" (while Alice might *except* it to be "a year ago" -- because she forgot to do "cp -a"). Which one does Bob want to see as "mtime"? (One of three choices: when his Tahoe client downloaded it; Alice's "now"; or "a year ago"). We can't actually choose "a year ago" in the above example, because Alice accidentally destroyed that metadata. So, presuming she didn't (by using `cp -a`) then we could chose that as "the" answer.
cypher commented 2018-01-05 19:08:30 +00:00
Author
Owner

I'm not convinced (trying to) preserve UID or GID is a good idea; these IDs will often vary between computers (even if the very same human set them up). Even if the same human did set them up, they'd probably want the symbolic names preserved/set (i.e. "alice" or "a_group" rather than the 1000 or whatever)

Yes, you're completely right. My bad; I'm not sure why I even included "uid/gid" here originally.. I'll remove that from the ticket..

A different reason people might look at the timestamps could be to answer the question "when did magic-folder update this file" rather than "when did Alice update this file"; do we have any real-user feedback indicating which one is least surprising?

Not from our sessions, no, however this is a great question that certainly warrants further exploration and user-testing..

For what it's worth, other file-sync applications that I've used/tried in the past (Dropbox, Seafile, Syncthing) have indeed preserved/propagated the mtime of the original file (i.e., setting mtime to "when Alice updated the file" instead of "when $APPLICATION updated the file") such that when I originally tried magic-folder I was surprised and disappointed to learn that it did not work the same way (and could not use it because I depended heavily on mtime to provide additional time/date context for academic research notes, in-progress papers, student assignments, etc.). Because of this, for most of my use-cases, I'd rather use rsync -a to synchronize directories across machines than magic-folder. Obviously my own needs don't necessarily reflect those of others but I'd be very curious/interested to hear additional perspectives on what the preferred/least-surprising usage should be..

It is worth probably pointing out, however, that there's a major difference in expectations between the single-user case (i.e., Bob syncing a folder from his desktop to his laptop) vs. the multi-user case (i.e., Bob syncing a folder from his laptop to Alice's). I'd argue that, in the single-user case, knowing when magic-folder updated my files is largely irrelevant: I already know that I am the only person editing the files and so I would much rather have the metadata on my machines reflect the record of my own actions -- not the actions of magic-folder (which I, as a solitary end-user, don't really care about); the "Documents" folder on my laptop should magically look and work just like the "Documents" folder on my desktop (and this includes being able to be sorted in the same way). If, for some reason, I ever really need to know when magic-folder updated it, I can/should be able get that information from magic-folder/tahoe itself.

Admittedly, the multi-user scenario is more complicated, but I'm not still not convinced that setting the mtime for a given file to when magic-folder updated it is really what I (as a non-technical end-user) would want. When collaborating with others, what I really want to know (in a broader knowledge sense) is that somebody who is not me made some change to a shared folder/file (and, ideally, who that person is, if that information can be provided). mtimes alone, obviously, do not provide this sort of information; an ls -l by itself can't tell me whether it was magic-folder that made a change or some other application on my computer, or whether it was me who made the change or some other person (which are all, arguably, important pieces of information in multi-user/shared situations). That said, in the multi-user situation, what I probably really want is an immediate notification telling me that a file was updated (e.g., "Alice updated 'Proposal.doc'") and/or the ability to see a log/listing of such events should I need it later (e.g., via the CLI, WUI, or something like Gridsync). Dropbox and Seafile both work roughly in this way (though they both have stronger, "account"-centric notions of identity in comparison to magic-folder).

All that being said, I guess my overall point here is that if the end-user's goal is to know when $APPLICATION updates a file, they should get that information from $APPLICATION itself; we should not rely on filesystem timestamps to provide a record of this information since doing so is neither reliable nor convenient for this purpose (and, as suggested above, in some cases, altering mtimes in such a way or having them mean something different in comparison to other applications can become confusing or inconvenient).

> I'm not convinced (trying to) preserve UID or GID is a good idea; these IDs will often vary between computers (even if the very same human set them up). Even if the same human did set them up, they'd probably want the symbolic names preserved/set (i.e. "alice" or "a_group" rather than the 1000 or whatever) Yes, you're completely right. My bad; I'm not sure why I even included "uid/gid" here originally.. I'll remove that from the ticket.. > A different reason people might look at the timestamps could be to answer the question "when did magic-folder update this file" rather than "when did Alice update this file"; do we have any real-user feedback indicating which one is least surprising? Not from our sessions, no, however this is a great question that certainly warrants further exploration and user-testing.. For what it's worth, other file-sync applications that I've used/tried in the past (Dropbox, Seafile, Syncthing) have indeed preserved/propagated the mtime of the original file (i.e., setting mtime to "when Alice updated the file" instead of "when $APPLICATION updated the file") such that when I originally tried magic-folder I was surprised and disappointed to learn that it did not work the same way (and could not use it because I depended heavily on mtime to provide additional time/date context for academic research notes, in-progress papers, student assignments, etc.). Because of this, for most of my use-cases, I'd rather use `rsync -a` to synchronize directories across machines than magic-folder. Obviously my own needs don't necessarily reflect those of others but I'd be very curious/interested to hear additional perspectives on what the preferred/least-surprising usage should be.. It is worth probably pointing out, however, that there's a major difference in expectations between the single-user case (i.e., Bob syncing a folder from his desktop to his laptop) vs. the multi-user case (i.e., Bob syncing a folder from his laptop to Alice's). I'd argue that, in the single-user case, knowing when *magic-folder* updated my files is largely irrelevant: I already know that I am the only person editing the files and so I would much rather have the metadata on my machines reflect the record of *my* own actions -- not the actions of magic-folder (which I, as a solitary end-user, don't really care about); the "Documents" folder on my laptop should magically look and work just like the "Documents" folder on my desktop (and this includes being able to be sorted in the same way). If, for some reason, I ever really need to know when magic-folder updated it, I can/should be able get that information from magic-folder/tahoe itself. Admittedly, the multi-user scenario is more complicated, but I'm not still not convinced that setting the mtime for a given file to when *magic-folder* updated it is really what I (as a non-technical end-user) would want. When collaborating with others, what I really want to know (in a broader knowledge sense) is *that* somebody who is not me made some change to a shared folder/file (and, ideally, *who* that person is, if that information can be provided). mtimes alone, obviously, do not provide this sort of information; an `ls -l` by itself can't tell me whether it was *magic-folder* that made a change or some other application on my computer, or whether it was me who made the change or some other person (which are all, arguably, important pieces of information in multi-user/shared situations). That said, in the multi-user situation, what I probably really want is an immediate notification telling me that a file was updated (e.g., "Alice updated 'Proposal.doc'") and/or the ability to see a log/listing of such events should I need it later (e.g., via the CLI, WUI, or something like Gridsync). Dropbox and Seafile both work roughly in this way (though they both have stronger, "account"-centric notions of identity in comparison to magic-folder). All that being said, I guess my overall point here is that if the end-user's goal is to know when $APPLICATION updates a file, they should get that information from $APPLICATION itself; we should not rely on filesystem timestamps to provide a record of this information since doing so is neither reliable nor convenient for this purpose (and, as suggested above, in some cases, altering mtimes in such a way or having them mean something different in comparison to other applications can become confusing or inconvenient).
meejah commented 2018-01-08 21:22:23 +00:00
Author
Owner

From the "earth dragons" part of the magic-folder spec, it seems we should be fine to set a modified-time "at least as old as (now - T) seconds" (but must be careful not to blindly write timestamp, as Alice's computer could be in the future).

However, Tahoe's notion of the "creation time" is actually "when it got uploaded into Tahoe" (which of course makes some sense) so there's not currently any metadata about "what time the client computer thought the modified time was when the file was uploaded"...

From the "earth dragons" part of the magic-folder spec, it *seems* we should be fine to set a modified-time "at least as old as `(now - T)` seconds" (but must be careful not to blindly write timestamp, as Alice's computer could be in the future). However, Tahoe's notion of the "creation time" is actually "when it got uploaded into Tahoe" (which of course makes some sense) so there's not currently any metadata about "what time the client computer thought the modified time was when the file was uploaded"...
meejah commented 2018-01-09 08:57:20 +00:00
Author
Owner
(https://github.com/tahoe-lafs/tahoe-lafs/pull/457)
meejah commented 2018-01-11 23:10:18 +00:00
Author
Owner

Needs a review of the PR.

Needs a review of the PR.
tahoe-lafs added
code-frontend-magic-folder
and removed
unknown
labels 2019-01-29 15:57:52 +00:00
exarkun commented 2020-06-30 13:42:16 +00:00
Author
Owner

magic-folder has been split out into a separate project - https://github.com/leastauthority/magic-folder

magic-folder has been split out into a separate project - <https://github.com/leastauthority/magic-folder>
tahoe-lafs added the
somebody else's problem
label 2020-06-30 13:42:16 +00:00
exarkun closed this issue 2020-06-30 13:42:16 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#2882
No description provided.