"tahoe backup" thinks "ctime" means "creation time" #897
Labels
No labels
c/code
c/code-dirnodes
c/code-encoding
c/code-frontend
c/code-frontend-cli
c/code-frontend-ftp-sftp
c/code-frontend-magic-folder
c/code-frontend-web
c/code-mutable
c/code-network
c/code-nodeadmin
c/code-peerselection
c/code-storage
c/contrib
c/dev-infrastructure
c/docs
c/operational
c/packaging
c/unknown
c/website
kw:2pc
kw:410
kw:9p
kw:ActivePerl
kw:AttributeError
kw:DataUnavailable
kw:DeadReferenceError
kw:DoS
kw:FileZilla
kw:GetLastError
kw:IFinishableConsumer
kw:K
kw:LeastAuthority
kw:Makefile
kw:RIStorageServer
kw:StringIO
kw:UncoordinatedWriteError
kw:about
kw:access
kw:access-control
kw:accessibility
kw:accounting
kw:accounting-crawler
kw:add-only
kw:aes
kw:aesthetics
kw:alias
kw:aliases
kw:aliens
kw:allmydata
kw:amazon
kw:ambient
kw:annotations
kw:anonymity
kw:anonymous
kw:anti-censorship
kw:api_auth_token
kw:appearance
kw:appname
kw:apport
kw:archive
kw:archlinux
kw:argparse
kw:arm
kw:assertion
kw:attachment
kw:auth
kw:authentication
kw:automation
kw:avahi
kw:availability
kw:aws
kw:azure
kw:backend
kw:backoff
kw:backup
kw:backupdb
kw:backward-compatibility
kw:bandwidth
kw:basedir
kw:bayes
kw:bbfreeze
kw:beta
kw:binaries
kw:binutils
kw:bitcoin
kw:bitrot
kw:blacklist
kw:blocker
kw:blocks-cloud-deployment
kw:blocks-cloud-merge
kw:blocks-magic-folder-merge
kw:blocks-merge
kw:blocks-raic
kw:blocks-release
kw:blog
kw:bom
kw:bonjour
kw:branch
kw:branding
kw:breadcrumbs
kw:brians-opinion-needed
kw:browser
kw:bsd
kw:build
kw:build-helpers
kw:buildbot
kw:builders
kw:buildslave
kw:buildslaves
kw:cache
kw:cap
kw:capleak
kw:captcha
kw:cast
kw:centos
kw:cffi
kw:chacha
kw:charset
kw:check
kw:checker
kw:chroot
kw:ci
kw:clean
kw:cleanup
kw:cli
kw:cloud
kw:cloud-backend
kw:cmdline
kw:code
kw:code-checks
kw:coding-standards
kw:coding-tools
kw:coding_tools
kw:collection
kw:compatibility
kw:completion
kw:compression
kw:confidentiality
kw:config
kw:configuration
kw:configuration.txt
kw:conflict
kw:connection
kw:connectivity
kw:consistency
kw:content
kw:control
kw:control.furl
kw:convergence
kw:coordination
kw:copyright
kw:corruption
kw:cors
kw:cost
kw:coverage
kw:coveralls
kw:coveralls.io
kw:cpu-watcher
kw:cpyext
kw:crash
kw:crawler
kw:crawlers
kw:create-container
kw:cruft
kw:crypto
kw:cryptography
kw:cryptography-lib
kw:cryptopp
kw:csp
kw:curl
kw:cutoff-date
kw:cycle
kw:cygwin
kw:d3
kw:daemon
kw:darcs
kw:darcsver
kw:database
kw:dataloss
kw:db
kw:dead-code
kw:deb
kw:debian
kw:debug
kw:deep-check
kw:defaults
kw:deferred
kw:delete
kw:deletion
kw:denial-of-service
kw:dependency
kw:deployment
kw:deprecation
kw:desert-island
kw:desert-island-build
kw:design
kw:design-review-needed
kw:detection
kw:dev-infrastructure
kw:devpay
kw:directory
kw:directory-page
kw:dirnode
kw:dirnodes
kw:disconnect
kw:discovery
kw:disk
kw:disk-backend
kw:distribute
kw:distutils
kw:dns
kw:do_http
kw:doc-needed
kw:docker
kw:docs
kw:docs-needed
kw:dokan
kw:dos
kw:download
kw:downloader
kw:dragonfly
kw:drop-upload
kw:duplicity
kw:dusty
kw:earth-dragon
kw:easy
kw:ec2
kw:ecdsa
kw:ed25519
kw:egg-needed
kw:eggs
kw:eliot
kw:email
kw:empty
kw:encoding
kw:endpoint
kw:enterprise
kw:enum34
kw:environment
kw:erasure
kw:erasure-coding
kw:error
kw:escaping
kw:etag
kw:etch
kw:evangelism
kw:eventual
kw:example
kw:excess-authority
kw:exec
kw:exocet
kw:expiration
kw:extensibility
kw:extension
kw:failure
kw:fedora
kw:ffp
kw:fhs
kw:figleaf
kw:file
kw:file-descriptor
kw:filename
kw:filesystem
kw:fileutil
kw:fips
kw:firewall
kw:first
kw:floatingpoint
kw:flog
kw:foolscap
kw:forward-compatibility
kw:forward-secrecy
kw:forwarding
kw:free
kw:freebsd
kw:frontend
kw:fsevents
kw:ftp
kw:ftpd
kw:full
kw:furl
kw:fuse
kw:garbage
kw:garbage-collection
kw:gateway
kw:gatherer
kw:gc
kw:gcc
kw:gentoo
kw:get
kw:git
kw:git-annex
kw:github
kw:glacier
kw:globalcaps
kw:glossary
kw:google-cloud-storage
kw:google-drive-backend
kw:gossip
kw:governance
kw:grid
kw:grid-manager
kw:gridid
kw:gridsync
kw:grsec
kw:gsoc
kw:gvfs
kw:hackfest
kw:hacktahoe
kw:hang
kw:hardlink
kw:heartbleed
kw:heisenbug
kw:help
kw:helper
kw:hint
kw:hooks
kw:how
kw:how-to
kw:howto
kw:hp
kw:hp-cloud
kw:html
kw:http
kw:https
kw:i18n
kw:i2p
kw:i2p-collab
kw:illustration
kw:image
kw:immutable
kw:impressions
kw:incentives
kw:incident
kw:init
kw:inlineCallbacks
kw:inotify
kw:install
kw:installer
kw:integration
kw:integration-test
kw:integrity
kw:interactive
kw:interface
kw:interfaces
kw:interoperability
kw:interstellar-exploration
kw:introducer
kw:introduction
kw:iphone
kw:ipkg
kw:iputil
kw:ipv6
kw:irc
kw:jail
kw:javascript
kw:joke
kw:jquery
kw:json
kw:jsui
kw:junk
kw:key-value-store
kw:kfreebsd
kw:known-issue
kw:konqueror
kw:kpreid
kw:kvm
kw:l10n
kw:lae
kw:large
kw:latency
kw:leak
kw:leasedb
kw:leases
kw:libgmp
kw:license
kw:licenss
kw:linecount
kw:link
kw:linux
kw:lit
kw:localhost
kw:location
kw:locking
kw:logging
kw:logo
kw:loopback
kw:lucid
kw:mac
kw:macintosh
kw:magic-folder
kw:manhole
kw:manifest
kw:manual-test-needed
kw:map
kw:mapupdate
kw:max_space
kw:mdmf
kw:memcheck
kw:memory
kw:memory-leak
kw:mesh
kw:metadata
kw:meter
kw:migration
kw:mime
kw:mingw
kw:minimal
kw:misc
kw:miscapture
kw:mlp
kw:mock
kw:more-info-needed
kw:mountain-lion
kw:move
kw:multi-users
kw:multiple
kw:multiuser-gateway
kw:munin
kw:music
kw:mutability
kw:mutable
kw:mystery
kw:names
kw:naming
kw:nas
kw:navigation
kw:needs-review
kw:needs-spawn
kw:netbsd
kw:network
kw:nevow
kw:new-user
kw:newcaps
kw:news
kw:news-done
kw:news-needed
kw:newsletter
kw:newurls
kw:nfc
kw:nginx
kw:nixos
kw:no-clobber
kw:node
kw:node-url
kw:notification
kw:notifyOnDisconnect
kw:nsa310
kw:nsa320
kw:nsa325
kw:numpy
kw:objects
kw:old
kw:openbsd
kw:openitp-packaging
kw:openssl
kw:openstack
kw:opensuse
kw:operation-helpers
kw:operational
kw:operations
kw:ophandle
kw:ophandles
kw:ops
kw:optimization
kw:optional
kw:options
kw:organization
kw:os
kw:os.abort
kw:ostrom
kw:osx
kw:osxfuse
kw:otf-magic-folder-objective1
kw:otf-magic-folder-objective2
kw:otf-magic-folder-objective3
kw:otf-magic-folder-objective4
kw:otf-magic-folder-objective5
kw:otf-magic-folder-objective6
kw:p2p
kw:packaging
kw:partial
kw:password
kw:path
kw:paths
kw:pause
kw:peer-selection
kw:performance
kw:permalink
kw:permissions
kw:persistence
kw:phone
kw:pickle
kw:pip
kw:pipermail
kw:pkg_resources
kw:placement
kw:planning
kw:policy
kw:port
kw:portability
kw:portal
kw:posthook
kw:pratchett
kw:preformance
kw:preservation
kw:privacy
kw:process
kw:profile
kw:profiling
kw:progress
kw:proxy
kw:publish
kw:pyOpenSSL
kw:pyasn1
kw:pycparser
kw:pycrypto
kw:pycrypto-lib
kw:pycryptopp
kw:pyfilesystem
kw:pyflakes
kw:pylint
kw:pypi
kw:pypy
kw:pysqlite
kw:python
kw:python3
kw:pythonpath
kw:pyutil
kw:pywin32
kw:quickstart
kw:quiet
kw:quotas
kw:quoting
kw:raic
kw:rainhill
kw:random
kw:random-access
kw:range
kw:raspberry-pi
kw:reactor
kw:readonly
kw:rebalancing
kw:recovery
kw:recursive
kw:redhat
kw:redirect
kw:redressing
kw:refactor
kw:referer
kw:referrer
kw:regression
kw:rekey
kw:relay
kw:release
kw:release-blocker
kw:reliability
kw:relnotes
kw:remote
kw:removable
kw:removable-disk
kw:rename
kw:renew
kw:repair
kw:replace
kw:report
kw:repository
kw:research
kw:reserved_space
kw:response-needed
kw:response-time
kw:restore
kw:retrieve
kw:retry
kw:review
kw:review-needed
kw:reviewed
kw:revocation
kw:roadmap
kw:rollback
kw:rpm
kw:rsa
kw:rss
kw:rst
kw:rsync
kw:rusty
kw:s3
kw:s3-backend
kw:s3-frontend
kw:s4
kw:same-origin
kw:sandbox
kw:scalability
kw:scaling
kw:scheduling
kw:schema
kw:scheme
kw:scp
kw:scripts
kw:sdist
kw:sdmf
kw:security
kw:self-contained
kw:server
kw:servermap
kw:servers-of-happiness
kw:service
kw:setup
kw:setup.py
kw:setup_requires
kw:setuptools
kw:setuptools_darcs
kw:sftp
kw:shared
kw:shareset
kw:shell
kw:signals
kw:simultaneous
kw:six
kw:size
kw:slackware
kw:slashes
kw:smb
kw:sneakernet
kw:snowleopard
kw:socket
kw:solaris
kw:space
kw:space-efficiency
kw:spam
kw:spec
kw:speed
kw:sqlite
kw:ssh
kw:ssh-keygen
kw:sshfs
kw:ssl
kw:stability
kw:standards
kw:start
kw:startup
kw:static
kw:static-analysis
kw:statistics
kw:stats
kw:stats_gatherer
kw:status
kw:stdeb
kw:storage
kw:streaming
kw:strports
kw:style
kw:stylesheet
kw:subprocess
kw:sumo
kw:survey
kw:svg
kw:symlink
kw:synchronous
kw:tac
kw:tahoe-*
kw:tahoe-add-alias
kw:tahoe-admin
kw:tahoe-archive
kw:tahoe-backup
kw:tahoe-check
kw:tahoe-cp
kw:tahoe-create-alias
kw:tahoe-create-introducer
kw:tahoe-debug
kw:tahoe-deep-check
kw:tahoe-deepcheck
kw:tahoe-lafs-trac-stream
kw:tahoe-list-aliases
kw:tahoe-ls
kw:tahoe-magic-folder
kw:tahoe-manifest
kw:tahoe-mkdir
kw:tahoe-mount
kw:tahoe-mv
kw:tahoe-put
kw:tahoe-restart
kw:tahoe-rm
kw:tahoe-run
kw:tahoe-start
kw:tahoe-stats
kw:tahoe-unlink
kw:tahoe-webopen
kw:tahoe.css
kw:tahoe_files
kw:tahoewapi
kw:tarball
kw:tarballs
kw:tempfile
kw:templates
kw:terminology
kw:test
kw:test-and-set
kw:test-from-egg
kw:test-needed
kw:testgrid
kw:testing
kw:tests
kw:throttling
kw:ticket999-s3-backend
kw:tiddly
kw:time
kw:timeout
kw:timing
kw:to
kw:to-be-closed-on-2011-08-01
kw:tor
kw:tor-protocol
kw:torsocks
kw:tox
kw:trac
kw:transparency
kw:travis
kw:travis-ci
kw:trial
kw:trickle
kw:trivial
kw:truckee
kw:tub
kw:tub.location
kw:twine
kw:twistd
kw:twistd.log
kw:twisted
kw:twisted-14
kw:twisted-trial
kw:twitter
kw:twn
kw:txaws
kw:type
kw:typeerror
kw:ubuntu
kw:ucwe
kw:ueb
kw:ui
kw:unclean
kw:uncoordinated-writes
kw:undeletable
kw:unfinished-business
kw:unhandled-error
kw:unhappy
kw:unicode
kw:unit
kw:unix
kw:unlink
kw:update
kw:upgrade
kw:upload
kw:upload-helper
kw:uri
kw:url
kw:usability
kw:use-case
kw:utf-8
kw:util
kw:uwsgi
kw:ux
kw:validation
kw:variables
kw:vdrive
kw:verify
kw:verlib
kw:version
kw:versioning
kw:versions
kw:video
kw:virtualbox
kw:virtualenv
kw:vista
kw:visualization
kw:visualizer
kw:vm
kw:volunteergrid2
kw:volunteers
kw:vpn
kw:wapi
kw:warners-opinion-needed
kw:warning
kw:weapi
kw:web
kw:web.port
kw:webapi
kw:webdav
kw:webdrive
kw:webport
kw:websec
kw:website
kw:websocket
kw:welcome
kw:welcome-page
kw:welcomepage
kw:wiki
kw:win32
kw:win64
kw:windows
kw:windows-related
kw:winscp
kw:workaround
kw:world-domination
kw:wrapper
kw:write-enabler
kw:wui
kw:x86
kw:x86-64
kw:xhtml
kw:xml
kw:xss
kw:zbase32
kw:zetuptoolz
kw:zfec
kw:zookos-opinion-needed
kw:zope
kw:zope.interface
p/blocker
p/critical
p/major
p/minor
p/normal
p/supercritical
p/trivial
r/cannot reproduce
r/duplicate
r/fixed
r/invalid
r/somebody else's problem
r/was already fixed
r/wontfix
r/worksforme
t/defect
t/enhancement
t/task
v/0.2.0
v/0.3.0
v/0.4.0
v/0.5.0
v/0.5.1
v/0.6.0
v/0.6.1
v/0.7.0
v/0.8.0
v/0.9.0
v/1.0.0
v/1.1.0
v/1.10.0
v/1.10.1
v/1.10.2
v/1.10a2
v/1.11.0
v/1.12.0
v/1.12.1
v/1.13.0
v/1.14.0
v/1.15.0
v/1.15.1
v/1.2.0
v/1.3.0
v/1.4.1
v/1.5.0
v/1.6.0
v/1.6.1
v/1.7.0
v/1.7.1
v/1.7β
v/1.8.0
v/1.8.1
v/1.8.2
v/1.8.3
v/1.8β
v/1.9.0
v/1.9.0-s3branch
v/1.9.0a1
v/1.9.0a2
v/1.9.0b1
v/1.9.1
v/1.9.2
v/1.9.2a1
v/cloud-branch
v/unknown
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: tahoe-lafs/trac#897
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
backupdb seems to think "ctime" means "creation time", which it does, but only on Windows.
This means
there is an incorrect statement in the documentation,that "tahoe backup" is unnecessarily re-uploading files in the case that the ownership or permission bits have changed but the file contents haven't, and that "tahoe backup" is incorrectly mapping between "unix change time" and "file creation time" when used on Windows. So this ticket is forthreetwo bugs, but they are all closely related and should probably be fixed at once.I noticed in source:docs/backupdb.txt@4111#L84 that the backupdb docs mention "creation time". POSIX doesn't provide a "creation time" but it does provide a "change time", abbreviated "ctime", which most people mistakenly think is a "creation time". Windows does provide a "creation time", and unfortunately Python provides unix "change time" and Windows "creation time" in the same slot -- the
st_ctime
slot of thestat
module. Here is my bug report saying that the Python stdlib is wrong to do this, and that any Python code which uses the Python stdlib is wrong unless it immediately disambiguates.In particular, it is a bug for any Tahoe-LAFS code to read the
st_ctime
member without immediately switching on whether the current platform is Windows or not. If you read thest_ctime
member and do not use the current platform to disambiguate, then you have a value whose semantics are uninterpretable without guessing what platform that value was generated on.In particular, for "tahoe backup" purposes, it is probably a mistake to say that a new
ctime
means that the file needs to be uploaded again. Unix and Windows both guarantee that themtime
will be changed if the file contents have changed, and therefore ifmtime
is unchanged then the file contents are unchanged, even if thectime
has changed. On the other hand thectime
changes on Unix even when the file contents have not changed, such as if ownership or permission bits have changed. So if only thectime
has changed then "tahoe backup" might want to set the newctime
value on the link leading to that file, but it should not reupload the file contents.In addition, I think "tahoe backup" should disambiguate between "unix change time" and "creation time" in the metadata that it stores. Why not change the name of the metadata stored in the tahoe-lafs filesystem edge from the ambiguous and widely misunderstood "ctime" to something like "unix change time", and then if you are on non-Windows you can set that from the local filesystem's
ctime
on upload and set the local filesystem'sctime
from that on download. On the other hand if you are on Windows then it is a bug to set the "unix change time" from the local filesystem'sctime
, although it would be correct to set a different metadata entry namedfile creation time
from the local filesystem'sctime
.See also #628, which is about the same issue in "tahoe cp", includes a taxonomy of filesystem "ctime" semantics, and includes a satisfactory backward-compatible solution that was shipped in Tahoe-LAFS v1.4.1.
I'm tagging this ticket with "forward-compatibility" because we'll eventually have to clarify these semantics and the longer we ship a tool that uploads ambiguous data the harder it will be to fix.
I suggest naming as few things as possible “ctime”.
:-)
Even though Mac OS X is a Unix, it keeps file creation time metadata, at least on its native HFS+ filesystems. I would guess it does not have the
st_ctime
confusion, but I don't know how the creation time actually is accessed. If Tahoe backups have a field for creation time, it would be good to preserve this information (I often find it useful as a user, and would be irritated if my hypothetical Tahoe-based personal backups failed to preserve it).Hrm.
using ctime/mtime in backupdb
So, first, let's make the docs (source:docs/backupdb.txt#L84) clearer,
by replacing the reference to "creation time, and modification time"
with just "ctime/mtime". The backupdb does not care about the semantics
of these timestamps. All it cares about is having a cheap
sometimes-false-positive proxy for detecting changes to file contents.
In particular, I'm not worried about trying to avoid re-uploading in the
face of user-triggered changes to metadata that doesn't actually change
file contents. If someone does a "chown" or "chmod" or "touch" on a
bunch of files, I think they'll accept the fact that "tahoe backup" will
subsequently do more work on those files than if they had not gone and
run those commands.
So I think that comparing the (size/ctime/mtime) tuple (specifically the
(stat.ST_SIZE, stat.ST_MTIME, stat.ST_CTIME)
tuple) will servethis purpose, regardless of what
os.stat(fn)[stat.ST_CTIME]
actually means. We could change the backupdb to record more
semantically-accurate fields, and fill in some but not others depending
upon which platform we were using, but since we're only comparing this
data against itself, I don't see enough value in adding that complexity.
putting timestamp metadata into backups created by "tahoe backup"
As a separate issue, I guess I'm +0 on changing the metadata that "tahoe
backup" creates to have more accurate names. Thanks to the patch from
#628, "tahoe backup" is actually the only place that even reads local
filesystem metadata (i.e.
find src -name '*.py' |xargs grep os.stat
is almost all tahoe internal files). "tahoe backup" currently
does the simplistic thing of copying
stat.st_ctime
intometadata["ctime"]
, etc.I'm not sure how to value timestamps (or other metadata) in backups.
When you restore from a backup, do you expect all of the files to have
the same creation/modification timestamps as they did on the original
disk? The same permission bits? The same owner? The same inode numbers?
The same
atime
? (I'd guess a survey would show users expectingthese properties in descending order, from like 70% or users for
timestamps to 1% of users for atime).
But I think most users of a "tahoe cp" tool would expect the
newly-generated local files to have all timestamps set to the present
moment (as /bin/cp does), and for permission bits/owner to be set by the
current umask setting/login.
Other tools that I use for backup purposes (like version-control
systems) don't record this metadata, because it doesn't generally make
sense to restore it (when I do an 'svn update', I really don't want the
timestamps of the newly-modified files to wind up in the past, because
then my builds will get messed up. Likewise, changing the mode bits,
other than sometimes the execute bit, is probably a bad idea).
So this suggests that we'd need a special "tahoe restore" (or maybe an
option on "tahoe cp", like /bin/cp's --preserve) to use this extended
metadata. And then, if we had that, it would make sense for "tahoe
backup" to record more accurate information about platform-specific
timestamps, such that "tahoe cp --preserve tahoe:backups/Latest
./local-restore" could take your Unix-generated backup and copy it onto
your windows box and reset as much metadata as made sense.
Eh, I dunno.
Incidentally, part of the "timestamps are unimportant" philosophy
described above is embedded in "tahoe backup"'s design: if the local
timestamps have changed but file contents have not, we won't upload
anything new, so the backup snapshot will continue to have the same
timestamps from the original upload. This may mean that you shouldn't
put too much trust in the tahoe-side timestamp metadata anyways. We
could change this to upload more frequently, but personally I prefer the
performance wins of sharing directories between snapshots.
Ok, Zooko and I had a long discussion about this in IRC. There's a bit of
tension between three goals:
future developers can figure out where the timestamps came from
"ctime"
Goal 1 is about not trying to be too clever. The original problem here is
that Python tries to be too clever and reports a windows os.stat field (named
ftCreationTime
in the underlying API) asst_ctime
, the same waythat POSIX's st_ctime is reported. This decision was probably based on
mistakenly believing that they have the same semantics, and a desire to hide
irrelevant platform details from developers who shouldn't have to care.
However, if they hadn't done that (i.e. report
st_creationtime
onwindows and
st_ctime
on unix), then we'd have less-convenient butless-ambiguous os.stat results.
Systems which try to hide details from developers can cause frustration,
especially if the developers understand the quirks and foibles of the
underlying system, because then the "helpful" intermediate layers are really
just getting in the way.
To implement goal 1, we would copy all of the
os.stat()
fields into themetadata as-is, and probably include an extra field (perhaps labeled
st_platform
) as a hint to cyber-historians who know better than we dowhat os.stat returns on various platforms, and how to interpret it.
Goal 2 would be accomplished by never using the word "ctime" in our metadata,
even though it's used in two other places (
os.stat
return value, andPOSIX's stat(2) call). Evidence suggests that the majority of developers
believe the wrong thing about what POSIX's ctime means (and I've certainly
been in this camp). So giving them a word other than "ctime" will either be
more meaningful (e.g. if we called it posix-metadata-change-time) or will
force them to look up our actual definition (e.g. if we called it
tahoe-bagel-kumquat and dared them to search webapi.txt for details).
Goal 3 would be accomplished by using a common, easy-to-understand word like
"changetime" or "creationtime" for all platforms, despite whatever name is
used by the underlying system call. POSIX and windows return "mtime" values
with (as far as I've been told) the same semantics. So it's probably fair to
say that the fact that (A: POSIX stat() returns it in st_mtime, while B:
windows returns it in ftModificationTime or something) is an "irrelevant
platform detail", and that developers lives are easier if this distinction is
hidden from them.
So, as a compromise between these goals, we settled on the following keys:
posix-change-time)
windows-creation-time)
The synthetic "st_platform" key will contain
sys.platform
, so somethinglike "linux2" or "darwin" or "windows". The hope is that this is a cheap way
to provide some useful information to future developers and cyber-historians
to interpret the rest of the st_* fields in some meaningful way.
st_dev, st_mode, etc, will be copied directly from the os.stat call. Other
attributes (perhaps platform-specific fields like OS-X's st_creator and
st_type) will be copied here too.
modification-time
will be copied from st_mtime on all platforms, basedon the conclusion that it represents the same concept on all platforms: the
most recent time that the file's contents have been modified.
posix-change-time
will be present for files that came from a POSIXfilesystem, and will be copied from st_ctime.
windows-creation-time
will be present for files that came from awindows filesystem, and will be copied from st_ctime.
Having longer and more-detailed names for the ctime values will help with
goal 2 (help developers correctly interpret this field). Not calling them
"ctime" will help developers who would otherwise misinterpret
posix-change-time
as if it were the mythical "posix-creation-time" thateveryone really wants. We cannot provide goal 3 here, because there is no
common semantic between POSIX and windows.
(note for future discussion: some POSIX-ish filesystems do provide
creation-time, in the form of OS-X's st_birthtime, and supposedly something
that ZFS offers. If we can determine that the semantics of these are the
same, it could be argued that windows-creation-time should be renamed
creation-time
, and only populated on platforms that offer it, whichwould be st_birthtime from HFS+/OS-X, st_ctime on windows, and something else
on ZFS)
(and note that, if we cannot determine that the semantics are the same,
then we should probably refrain from trying to coerce them into the same
field, lest we make the same mistake that Python's os.stat did, making life
more difficult for somebody in the future who is trying to figure out whether
a given file's so-called "creation-time" was really the ZFS notion, or the
HFS+ notion, or whatever).
Replying to warner:
I know Tahoe doesn't attempt to provide anonymity of file uploaders, but from a privacy point of view, why should the holder of the directory read cap have this information?
I think they are the same. Also I think
posix-change-time
should bemetadata-change-time
, since a system that provides it isn't necessarily a POSIX system.The only argument for
st_platform
is to give readers (including dirreadcap holders) a hedge against failures in our current understanding of what these fields mean (well, our+python's understanding). I'm +0 on it now.. I could be talked out of including it.I suspect that our inclusion of st_ino and st_dev and the other fields will reveal much information about the value of st_platform anyways, since I imagine that windows filesystems use these fields in very different ways. An enthusiastic reader who is trying to un-"fix" our+python's attempt at being helpful might be able to recognize these characteristic st_ino/st_dev values and improve the accuracy of their unwinding without being told explicitly what st_platform is. On the other hand, st_platform may not be enough information to do this hypothetical job well, and that future reader may complain to our ghosts that we should have included even more information.
So I'm ok with not including st_platform.
I suspect they are the same too, but I'm afraid of committing the same mistake that Python did, especially when OS-X is the only documentation that I've seen personally, and nobody that I know has even seen the hypothetical ZFS docs (and ZFS is losing ground now that Apple abandoned it). Again, I'm willing to go this way, but I'm going to be awfully embarrassed if our attempt to fix python's semantics proves (years from now) to be adding yet another layer of brokenness.
I'm also willing to go along with this one, but I'm even more hesitant. First, I think the casual reader who sees "metadata-change-time" will incorrectly assume that it is referring to the Tahoe metadata in which this key is embedded, rather than the original disk filesystem from which the file was copied. Second, do we have examples of non-POSIX systems which provide this "ctime" that behave exactly like POSIX ones? We'd be making a bolder claim by going with "metadata-change-time".. there are surely some pieces of metadata that might be changed without updating this timestamp.. if some other system uses a different set of metadata than our POSIX systems, would we ignore the differences and represent both at "metadata-change-time"? Or add a "non-POSIX-metadata-change-time" ?
Oh, incidentally, the wikipedia page on ext3 uses the term "attribute modification" to describe ctime. Maybe "attribute-change-time" would be a suitable replacement for "metadata-change-time"?
It isn't "change time from a POSIX system" it is "change time matching the POSIX semantics". Does that answer your objection David-Sarah?
You have a point about
posix-change-time
-- it's a POSIXism that is fairly unlikely to be duplicated precisely. For creation time, I'll try and find the relevant ZFS and OS-X docs.I think we are out of time to do this for 1.7.0. Brian is busy with new-downloader (unless he decides to save new-downloader for after the v1.7.0 release and work on something else for v1.7.0). David-Sarah and I are also busy with other 1.7.0 work.
I'm putting this in 1.8.0 instead of in "eventually" because it is a
forward-compatibility
issue and I really like to fix those as early as possible...Since this is a forward-compatibility issue I'm still interested in getting it fixed. By the way, Linux is growing a creation-time field:
Hm. What's the status of this ticket? It is assigned to Brian. Is Brian intending to do anything with it? Did we achieve consensus on what should be done? Do we all agree with what Brian wrote in comment:375459 plus the various follow-ups? I no longer remember. Maybe someone should write up a new summary of what we intend to do. Brian: if you don't intend to write up a new summary (or otherwise move this ticket forward) then please assign it to me.
Replying to warner:
But why should we re-upload those files? The unix operating system is asserting, by giving us a changed
change time
and an unchangedmodify time
, that the file contents have not changed. If we are relying on the operating system about this sort of thing, for improved efficiency, then why not believe it about this and skip the re-upload?Obviously if the operating system asserts that the
creation time
has changed, then we should re-upload.Zooko reminded me of this ticket in IRC today, so I re-read
everything. I think we have the following tasks to finish for this
ticket:
st_platform
in thetahoe-backup
metadataposix-change-time
intahoe-backup
to record the keys described incomment:375459 (modification-time, windows-creation-time or
and then a separate ticket can be created to build some sort of
restore command (maybe an option for
tahoe cp
, maybe aseparate
tahoe restore
that reads this metadata and appliesit to the resulting files.
See also #2250 (don't re-use metadata from earlier snapshots, in a "tahoe backup").
Replying to [zooko]comment:18:
It is possible for
ctime
to change butmtime
not, when the file contents change. In particular, suppose there are two filesfoo
andbar
that have different contents but the same size andmtime
(because they were last modified at the same time to within themtime
resolution). Then,mv bar foo
will not changefoo
'smtime
, but will set itsctime
to the current time. (Verified on Linux.)