make sure the new MDMF extension field is forward-compatible and safe #1526
Labels
No labels
c/code
c/code-dirnodes
c/code-encoding
c/code-frontend
c/code-frontend-cli
c/code-frontend-ftp-sftp
c/code-frontend-magic-folder
c/code-frontend-web
c/code-mutable
c/code-network
c/code-nodeadmin
c/code-peerselection
c/code-storage
c/contrib
c/dev-infrastructure
c/docs
c/operational
c/packaging
c/unknown
c/website
kw:2pc
kw:410
kw:9p
kw:ActivePerl
kw:AttributeError
kw:DataUnavailable
kw:DeadReferenceError
kw:DoS
kw:FileZilla
kw:GetLastError
kw:IFinishableConsumer
kw:K
kw:LeastAuthority
kw:Makefile
kw:RIStorageServer
kw:StringIO
kw:UncoordinatedWriteError
kw:about
kw:access
kw:access-control
kw:accessibility
kw:accounting
kw:accounting-crawler
kw:add-only
kw:aes
kw:aesthetics
kw:alias
kw:aliases
kw:aliens
kw:allmydata
kw:amazon
kw:ambient
kw:annotations
kw:anonymity
kw:anonymous
kw:anti-censorship
kw:api_auth_token
kw:appearance
kw:appname
kw:apport
kw:archive
kw:archlinux
kw:argparse
kw:arm
kw:assertion
kw:attachment
kw:auth
kw:authentication
kw:automation
kw:avahi
kw:availability
kw:aws
kw:azure
kw:backend
kw:backoff
kw:backup
kw:backupdb
kw:backward-compatibility
kw:bandwidth
kw:basedir
kw:bayes
kw:bbfreeze
kw:beta
kw:binaries
kw:binutils
kw:bitcoin
kw:bitrot
kw:blacklist
kw:blocker
kw:blocks-cloud-deployment
kw:blocks-cloud-merge
kw:blocks-magic-folder-merge
kw:blocks-merge
kw:blocks-raic
kw:blocks-release
kw:blog
kw:bom
kw:bonjour
kw:branch
kw:branding
kw:breadcrumbs
kw:brians-opinion-needed
kw:browser
kw:bsd
kw:build
kw:build-helpers
kw:buildbot
kw:builders
kw:buildslave
kw:buildslaves
kw:cache
kw:cap
kw:capleak
kw:captcha
kw:cast
kw:centos
kw:cffi
kw:chacha
kw:charset
kw:check
kw:checker
kw:chroot
kw:ci
kw:clean
kw:cleanup
kw:cli
kw:cloud
kw:cloud-backend
kw:cmdline
kw:code
kw:code-checks
kw:coding-standards
kw:coding-tools
kw:coding_tools
kw:collection
kw:compatibility
kw:completion
kw:compression
kw:confidentiality
kw:config
kw:configuration
kw:configuration.txt
kw:conflict
kw:connection
kw:connectivity
kw:consistency
kw:content
kw:control
kw:control.furl
kw:convergence
kw:coordination
kw:copyright
kw:corruption
kw:cors
kw:cost
kw:coverage
kw:coveralls
kw:coveralls.io
kw:cpu-watcher
kw:cpyext
kw:crash
kw:crawler
kw:crawlers
kw:create-container
kw:cruft
kw:crypto
kw:cryptography
kw:cryptography-lib
kw:cryptopp
kw:csp
kw:curl
kw:cutoff-date
kw:cycle
kw:cygwin
kw:d3
kw:daemon
kw:darcs
kw:darcsver
kw:database
kw:dataloss
kw:db
kw:dead-code
kw:deb
kw:debian
kw:debug
kw:deep-check
kw:defaults
kw:deferred
kw:delete
kw:deletion
kw:denial-of-service
kw:dependency
kw:deployment
kw:deprecation
kw:desert-island
kw:desert-island-build
kw:design
kw:design-review-needed
kw:detection
kw:dev-infrastructure
kw:devpay
kw:directory
kw:directory-page
kw:dirnode
kw:dirnodes
kw:disconnect
kw:discovery
kw:disk
kw:disk-backend
kw:distribute
kw:distutils
kw:dns
kw:do_http
kw:doc-needed
kw:docker
kw:docs
kw:docs-needed
kw:dokan
kw:dos
kw:download
kw:downloader
kw:dragonfly
kw:drop-upload
kw:duplicity
kw:dusty
kw:earth-dragon
kw:easy
kw:ec2
kw:ecdsa
kw:ed25519
kw:egg-needed
kw:eggs
kw:eliot
kw:email
kw:empty
kw:encoding
kw:endpoint
kw:enterprise
kw:enum34
kw:environment
kw:erasure
kw:erasure-coding
kw:error
kw:escaping
kw:etag
kw:etch
kw:evangelism
kw:eventual
kw:example
kw:excess-authority
kw:exec
kw:exocet
kw:expiration
kw:extensibility
kw:extension
kw:failure
kw:fedora
kw:ffp
kw:fhs
kw:figleaf
kw:file
kw:file-descriptor
kw:filename
kw:filesystem
kw:fileutil
kw:fips
kw:firewall
kw:first
kw:floatingpoint
kw:flog
kw:foolscap
kw:forward-compatibility
kw:forward-secrecy
kw:forwarding
kw:free
kw:freebsd
kw:frontend
kw:fsevents
kw:ftp
kw:ftpd
kw:full
kw:furl
kw:fuse
kw:garbage
kw:garbage-collection
kw:gateway
kw:gatherer
kw:gc
kw:gcc
kw:gentoo
kw:get
kw:git
kw:git-annex
kw:github
kw:glacier
kw:globalcaps
kw:glossary
kw:google-cloud-storage
kw:google-drive-backend
kw:gossip
kw:governance
kw:grid
kw:grid-manager
kw:gridid
kw:gridsync
kw:grsec
kw:gsoc
kw:gvfs
kw:hackfest
kw:hacktahoe
kw:hang
kw:hardlink
kw:heartbleed
kw:heisenbug
kw:help
kw:helper
kw:hint
kw:hooks
kw:how
kw:how-to
kw:howto
kw:hp
kw:hp-cloud
kw:html
kw:http
kw:https
kw:i18n
kw:i2p
kw:i2p-collab
kw:illustration
kw:image
kw:immutable
kw:impressions
kw:incentives
kw:incident
kw:init
kw:inlineCallbacks
kw:inotify
kw:install
kw:installer
kw:integration
kw:integration-test
kw:integrity
kw:interactive
kw:interface
kw:interfaces
kw:interoperability
kw:interstellar-exploration
kw:introducer
kw:introduction
kw:iphone
kw:ipkg
kw:iputil
kw:ipv6
kw:irc
kw:jail
kw:javascript
kw:joke
kw:jquery
kw:json
kw:jsui
kw:junk
kw:key-value-store
kw:kfreebsd
kw:known-issue
kw:konqueror
kw:kpreid
kw:kvm
kw:l10n
kw:lae
kw:large
kw:latency
kw:leak
kw:leasedb
kw:leases
kw:libgmp
kw:license
kw:licenss
kw:linecount
kw:link
kw:linux
kw:lit
kw:localhost
kw:location
kw:locking
kw:logging
kw:logo
kw:loopback
kw:lucid
kw:mac
kw:macintosh
kw:magic-folder
kw:manhole
kw:manifest
kw:manual-test-needed
kw:map
kw:mapupdate
kw:max_space
kw:mdmf
kw:memcheck
kw:memory
kw:memory-leak
kw:mesh
kw:metadata
kw:meter
kw:migration
kw:mime
kw:mingw
kw:minimal
kw:misc
kw:miscapture
kw:mlp
kw:mock
kw:more-info-needed
kw:mountain-lion
kw:move
kw:multi-users
kw:multiple
kw:multiuser-gateway
kw:munin
kw:music
kw:mutability
kw:mutable
kw:mystery
kw:names
kw:naming
kw:nas
kw:navigation
kw:needs-review
kw:needs-spawn
kw:netbsd
kw:network
kw:nevow
kw:new-user
kw:newcaps
kw:news
kw:news-done
kw:news-needed
kw:newsletter
kw:newurls
kw:nfc
kw:nginx
kw:nixos
kw:no-clobber
kw:node
kw:node-url
kw:notification
kw:notifyOnDisconnect
kw:nsa310
kw:nsa320
kw:nsa325
kw:numpy
kw:objects
kw:old
kw:openbsd
kw:openitp-packaging
kw:openssl
kw:openstack
kw:opensuse
kw:operation-helpers
kw:operational
kw:operations
kw:ophandle
kw:ophandles
kw:ops
kw:optimization
kw:optional
kw:options
kw:organization
kw:os
kw:os.abort
kw:ostrom
kw:osx
kw:osxfuse
kw:otf-magic-folder-objective1
kw:otf-magic-folder-objective2
kw:otf-magic-folder-objective3
kw:otf-magic-folder-objective4
kw:otf-magic-folder-objective5
kw:otf-magic-folder-objective6
kw:p2p
kw:packaging
kw:partial
kw:password
kw:path
kw:paths
kw:pause
kw:peer-selection
kw:performance
kw:permalink
kw:permissions
kw:persistence
kw:phone
kw:pickle
kw:pip
kw:pipermail
kw:pkg_resources
kw:placement
kw:planning
kw:policy
kw:port
kw:portability
kw:portal
kw:posthook
kw:pratchett
kw:preformance
kw:preservation
kw:privacy
kw:process
kw:profile
kw:profiling
kw:progress
kw:proxy
kw:publish
kw:pyOpenSSL
kw:pyasn1
kw:pycparser
kw:pycrypto
kw:pycrypto-lib
kw:pycryptopp
kw:pyfilesystem
kw:pyflakes
kw:pylint
kw:pypi
kw:pypy
kw:pysqlite
kw:python
kw:python3
kw:pythonpath
kw:pyutil
kw:pywin32
kw:quickstart
kw:quiet
kw:quotas
kw:quoting
kw:raic
kw:rainhill
kw:random
kw:random-access
kw:range
kw:raspberry-pi
kw:reactor
kw:readonly
kw:rebalancing
kw:recovery
kw:recursive
kw:redhat
kw:redirect
kw:redressing
kw:refactor
kw:referer
kw:referrer
kw:regression
kw:rekey
kw:relay
kw:release
kw:release-blocker
kw:reliability
kw:relnotes
kw:remote
kw:removable
kw:removable-disk
kw:rename
kw:renew
kw:repair
kw:replace
kw:report
kw:repository
kw:research
kw:reserved_space
kw:response-needed
kw:response-time
kw:restore
kw:retrieve
kw:retry
kw:review
kw:review-needed
kw:reviewed
kw:revocation
kw:roadmap
kw:rollback
kw:rpm
kw:rsa
kw:rss
kw:rst
kw:rsync
kw:rusty
kw:s3
kw:s3-backend
kw:s3-frontend
kw:s4
kw:same-origin
kw:sandbox
kw:scalability
kw:scaling
kw:scheduling
kw:schema
kw:scheme
kw:scp
kw:scripts
kw:sdist
kw:sdmf
kw:security
kw:self-contained
kw:server
kw:servermap
kw:servers-of-happiness
kw:service
kw:setup
kw:setup.py
kw:setup_requires
kw:setuptools
kw:setuptools_darcs
kw:sftp
kw:shared
kw:shareset
kw:shell
kw:signals
kw:simultaneous
kw:six
kw:size
kw:slackware
kw:slashes
kw:smb
kw:sneakernet
kw:snowleopard
kw:socket
kw:solaris
kw:space
kw:space-efficiency
kw:spam
kw:spec
kw:speed
kw:sqlite
kw:ssh
kw:ssh-keygen
kw:sshfs
kw:ssl
kw:stability
kw:standards
kw:start
kw:startup
kw:static
kw:static-analysis
kw:statistics
kw:stats
kw:stats_gatherer
kw:status
kw:stdeb
kw:storage
kw:streaming
kw:strports
kw:style
kw:stylesheet
kw:subprocess
kw:sumo
kw:survey
kw:svg
kw:symlink
kw:synchronous
kw:tac
kw:tahoe-*
kw:tahoe-add-alias
kw:tahoe-admin
kw:tahoe-archive
kw:tahoe-backup
kw:tahoe-check
kw:tahoe-cp
kw:tahoe-create-alias
kw:tahoe-create-introducer
kw:tahoe-debug
kw:tahoe-deep-check
kw:tahoe-deepcheck
kw:tahoe-lafs-trac-stream
kw:tahoe-list-aliases
kw:tahoe-ls
kw:tahoe-magic-folder
kw:tahoe-manifest
kw:tahoe-mkdir
kw:tahoe-mount
kw:tahoe-mv
kw:tahoe-put
kw:tahoe-restart
kw:tahoe-rm
kw:tahoe-run
kw:tahoe-start
kw:tahoe-stats
kw:tahoe-unlink
kw:tahoe-webopen
kw:tahoe.css
kw:tahoe_files
kw:tahoewapi
kw:tarball
kw:tarballs
kw:tempfile
kw:templates
kw:terminology
kw:test
kw:test-and-set
kw:test-from-egg
kw:test-needed
kw:testgrid
kw:testing
kw:tests
kw:throttling
kw:ticket999-s3-backend
kw:tiddly
kw:time
kw:timeout
kw:timing
kw:to
kw:to-be-closed-on-2011-08-01
kw:tor
kw:tor-protocol
kw:torsocks
kw:tox
kw:trac
kw:transparency
kw:travis
kw:travis-ci
kw:trial
kw:trickle
kw:trivial
kw:truckee
kw:tub
kw:tub.location
kw:twine
kw:twistd
kw:twistd.log
kw:twisted
kw:twisted-14
kw:twisted-trial
kw:twitter
kw:twn
kw:txaws
kw:type
kw:typeerror
kw:ubuntu
kw:ucwe
kw:ueb
kw:ui
kw:unclean
kw:uncoordinated-writes
kw:undeletable
kw:unfinished-business
kw:unhandled-error
kw:unhappy
kw:unicode
kw:unit
kw:unix
kw:unlink
kw:update
kw:upgrade
kw:upload
kw:upload-helper
kw:uri
kw:url
kw:usability
kw:use-case
kw:utf-8
kw:util
kw:uwsgi
kw:ux
kw:validation
kw:variables
kw:vdrive
kw:verify
kw:verlib
kw:version
kw:versioning
kw:versions
kw:video
kw:virtualbox
kw:virtualenv
kw:vista
kw:visualization
kw:visualizer
kw:vm
kw:volunteergrid2
kw:volunteers
kw:vpn
kw:wapi
kw:warners-opinion-needed
kw:warning
kw:weapi
kw:web
kw:web.port
kw:webapi
kw:webdav
kw:webdrive
kw:webport
kw:websec
kw:website
kw:websocket
kw:welcome
kw:welcome-page
kw:welcomepage
kw:wiki
kw:win32
kw:win64
kw:windows
kw:windows-related
kw:winscp
kw:workaround
kw:world-domination
kw:wrapper
kw:write-enabler
kw:wui
kw:x86
kw:x86-64
kw:xhtml
kw:xml
kw:xss
kw:zbase32
kw:zetuptoolz
kw:zfec
kw:zookos-opinion-needed
kw:zope
kw:zope.interface
p/blocker
p/critical
p/major
p/minor
p/normal
p/supercritical
p/trivial
r/cannot reproduce
r/duplicate
r/fixed
r/invalid
r/somebody else's problem
r/was already fixed
r/wontfix
r/worksforme
t/defect
t/enhancement
t/task
v/0.2.0
v/0.3.0
v/0.4.0
v/0.5.0
v/0.5.1
v/0.6.0
v/0.6.1
v/0.7.0
v/0.8.0
v/0.9.0
v/1.0.0
v/1.1.0
v/1.10.0
v/1.10.1
v/1.10.2
v/1.10a2
v/1.11.0
v/1.12.0
v/1.12.1
v/1.13.0
v/1.14.0
v/1.15.0
v/1.15.1
v/1.2.0
v/1.3.0
v/1.4.1
v/1.5.0
v/1.6.0
v/1.6.1
v/1.7.0
v/1.7.1
v/1.7β
v/1.8.0
v/1.8.1
v/1.8.2
v/1.8.3
v/1.8β
v/1.9.0
v/1.9.0-s3branch
v/1.9.0a1
v/1.9.0a2
v/1.9.0b1
v/1.9.1
v/1.9.2
v/1.9.2a1
v/cloud-branch
v/unknown
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: tahoe-lafs/trac#1526
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In #393 we've added "extension fields" to the MDMF caps. If I recall correctly, the original motivation was that future writers of MDMFs might want to include the
K
andsegsize
parameters in the cap itself (in addition to inside the share data) and future readers might want to take advantage of that information so that they don't need to do an extra round trip to learn that information by fetching some of the share data.It is important that such future writers of MDMFs don't exclude older (v1.9) readers of MDMFs from being able to read those caps or even (gah!) cause them to get the wrong data or to incur an error when they try to read those caps. Therefore we need a way to include data in the caps which older (v1.9) readers will reliably and safely ignore (and still be able to read the file data correctly) but future readers can use if they want.
I thought we had decided to make a generic field for "extensions" in the MDMF caps, and not to make the current (1.9) reader or writer actually use this extension field yet. But the current code in trunk constrains that field instead of allowing it to be generically extensible, and it seems to try to use the numbers contained therein for its
K
andsegsize
values in some (?) cases.The MDMF filecap format is currently defined as:
and all existing caps are created with two extension fields: the first is K (as an integer), the second is segsize (also as an integer). The intention is to allow additional extension fields to be added in the future, perhaps hints that could speed up a download.
But the current code first constrains each extension field to [contain only the characters 0-9 and :]source:trunk/src/allmydata/uri.py?annotate=blame&rev=5220#L31, and then it requires there to be [exactly two such fields]source:trunk/src/allmydata/mutable/filenode.py?annotate=blame&rev=5227#L120.
Future versions of MDMF writers can't use the extensions then, to communicate anything other than the currently defined K and segsize fields. They might allow some future use if each field can have a larger set of characters, and if there is space for messages that Tahoe-LAFS v1.9 readers will parse and then ignore.
I'd like to have this in 1.9 because then 1.9 will be tolerant of caps generated by future versions that have a different number of extension fields.
As for the use of that field to initialize the
K
andsegsize
values, I haven't read through the code carefully enough to see if it does that correctly and if it has good tests. This is potentially complicated.What, for example, happens if the
segsize
indicated in the cap and thesegsize
indicated [in the version info]source:trunk/src/allmydata/mutable/filenode.py?annotate=blame&rev=5227#L1107 differ? Can thesegsize
or theK
change in different versions of the same MDMF? (I'm pretty sure it can't, but if it can't then maybe the value in the cap should be the only place thatK
orsegsize
exist.) Does the current trunk MDMF reader actually really use this value? Scanning through the code, I don't think so but I'm not 100% sure yet.There are three possibilities that I think we should consider for v1.9.0:
Proposal 1 (extension field for future use—currently unused)
K:segsize
when generating a URL. (The wayK
andsegsize
are encoded into the extension field has, of course, to fit into the constraints of the extension field. In addition to that, it should not consume the entire extension field, but should allow a safe and easy way for other fields to be added into the extension field such that they can be unambiguously parsed apart from theK
andsegsize
fields.)Proposal 2 (
K
andsegsize
in cap):K
andsegsize
. This is in fact exactly the same as the "extension field" in the current trunk, but we stop calling it the "extension field" and start calling itK
andsegsize
.K
andsegsize
values from anywhere other than the cap when reading.Proposal 3 (
K
andsegsize
in cap, plus an extension field):K
andsegsize
into the cap, and then also do proposal 1.a. to provide an extension field for future use.[Zooko was talking below about slightly different versions of the proposals -- he wasn't sure whether the existing code included K and segsize, which in fact it does.]Note:
I think that Proposal 1 is the least likely to delay or destabilize Tahoe-LAFS v1.9
especially if we leave out the optional 1.c. step. If v1.9 does not attempt to use the extension field in any way other than telling where it begins and ends, then future MDMF users will not have to worry that what they put in there will cause problems for old v1.9 users. By removing all the code that does anything with the extension field (aside from the regex which allows the extension field to be present in an MDMF cap), we can simplify the current 1.9 alpha code for easier review.I'd like to hear your opinion about this! (Especially if you are Kevan, David-Sarah, or Brian.)
Current trunk does put K and segsize into the extension field when generating caps (source:src/allmydata/mutable/filenode.py@5227#L702).
also see #1509 ("MDMF filecaps should tolerate extra parameters"). This ticket contains more information than #1509, though, so if we close one as a dup, I'd close #1509.
I like the conservatism of precisely specified and restrictive extension fields; it makes me more comfortable with the idea of extension fields, since it's easier for me to reason about the form of a valid cap and feel confident that we haven't inadvertently allowed caps that we don't want to allow. It is kind of silly to add a restriction like that before we even start writing the code that might one day use the extensions, though, and my objection isn't anything more than a vague unease with the idea.
I think that the code at the moment simply populates the k and segsize values -- it doesn't try to use them. Actually, aside from the restrictive nature of the extension fields at the moment, proposal 1 is what the current code does.
As I understand it, the extension fields are meant to advisory. That is, any part of Tahoe-LAFS that uses them to work with a file should tolerate cases in which they're wrong. So we'd treat the remote share's representation of the segsize, k, or whatever else we choose to stick in the cap as canonical, and defer to it in the event of an inconsistency.
I'm a bit confused as to what optimizations are available from including K and segsize in the cap. Doesn't the downloader get K and segsize on the first round-trip anyway? Why would it be helpful to know them before that?
For any future extensions, I would prefer each field to be labelled. A one-letter label as the first character of the field would probably be sufficient. The advantage of this is that if we decide that a particular field is no longer necessary, we can just omit it without making the parse ambiguous.
[edited to merge in comments from the duplicate #1509, and take into account comment:386739.]Description
on IRC just now, I concluded that a "k" hint is useful to the mapupdate step (so it can make a reasonable number of parallel DYHB/checkstring requests), but that the "segsize" hint isn't so useful.
Ideally, the hints would let us build a one-round-trip downloader. The share contains a bunch of static data (whose size depends upon "N" and the keysize, but not the current filesize), followed by the share data, finally followed by the block hash tree (which depends upon the current filesize). This order lets us grow the file without needing to move all the sharedata. So 1 RTT is really hard: you have to correctly guess where the block hash tree data is, which means knowing the filesize to within a segment, in addition to knowing the segsize.
So 2 RTT is a more reasonable goal: the first request tells you the current filesize and the offset table, which lets you make an accurate second request. For the current Retrieve code, the first request is done during the mapupdate phase, which gets the checkstring and offset table for each known share. Then the retrieve phase can make an accurate initial request, and return the first segment of data in just 2 RTT.
So anyways, I think "k" is a useful hint, but I'm no longer so sure about including "segsize". I think "k" is so useful that I'm willing to make it mandatory, or at least untagged, so the MDMF filecap would be
URI:MDMF:$writekey:$verfkeyfingerprint[:$k][:$EXTNAME=$VALUE]*
.The downside of making "k" so non-optional is that a repairer/reencoder which changes the file's encoding (replacing every single share) would then cause the filecap's "k" value to be stale. The retrieve code only treats it as a hint, of course (includinging the mapupdate step but not e.g. decoding), so it couldn't hurt anything but efficiency (the mapupdate's first batch of requests might not be enough to find enough shares, and we'll have to wait for the first batch to return before we learn of the larger real "k" and send out a second batch). But I think that's not a significant issue.
After talking with zooko on IRC, I'm in favor of making 1.9 tolerate and ignore everything past the fingerprint. That puts no constraints on the extension fields (although 1.10 or some later version might impose some, and including "k" will be the first extension I'd recommend).
If we're really not using those hints at all right now, then the code changes to support this should be pretty simple. I think we may be parsing 'k' and 'segsize' and populating the Filenode with them, in which case it's possible for behavior to be affected, and we should change it to not pre-populate those values.
I'll figure out how to implement this: I think it means a test that the URI parser accepts (and equates) filecaps with extra stuff after the fingerprint (so
URI:MDMF:$writekey:$fingerprint:blahblah
), changes to the URI parser's regexp, and changes to the Filenode constructor (to not extract+populate the extension fields). It might also involve changes to the URI constructor, to not provide those fields either.Kevan: I'd like to talk with you about this. Maybe a phone call would work better than trac comments. Maybe IRC.
Attachment no-extensions.diff (43052 bytes) added
tolerate-but-ignore extensions in MDMF filecaps: remove extension-handling code and tests
In changeset:416701e404c74a3e:
In [5393/ticket999-S3-backend]:
In changeset:de00b277cc9adfb0:
In [5418/ticket999-S3-backend]:
I reviewed changeset:0716c496c8a39758 and it looks good to me (except see below). It has a net reduction of source code, which is always good.
One thing I noticed was that with this patch, if you have a cap with an extension field, and you parse it using [uri.py]source:trunk/src/allmydata/uri.py?annotate=blame&rev=5305 and then produce a cap from the resulting Python object, the cap produced will not have the extension on it. Could this be a problem if, in the future, someone sends a cap with an extension, and a user of Tahoe-LAFS v1.9 inputs that cap into their Tahoe-LAFS implementation? Regardless of whether their Tahoe-LAFS v1.9 will show the extension field in copies of the cap that it produces/exports/displays, it will certainly not use the extension field for anything, so the only question is whether it is okay for the Tahoe-LAFS v1.9 implementation to omit the extension, which it is ignoring, from productions/exports/displays of that cap.
Kevan: I would appreciate it if you would review this issue, and this patch, even though (or especially because) you earlier said you preferred the original behavior of parsing the extension field as a two-tuple of integers.
David-Sarah: I would appreciate it if you would review this because you did such a great job of reasoning about forward-compatibility in previous iterations.
I'm inclined to think that we should only be parsing caps with
uri.py
if we're about to consume them somehow, and never re-stringify them unless we've just created one (i.e. uploaded a new file). Then we won't be obligated to haveuri.py
retain information that it doesn't actually use.Things like directory-listing should just dump the filecap strings.
In fact, I kind of regret having URI objects in the first place.. they're really just a was to encapsulate the parsing/serialization functions. Having a separate class has caused a lot of confusion in things like
filenode.py
'sget_uri()
method (does it return a string? a URI object?). So I'd like to reduce their use, and eventually get rid of them completely, somehow.Re: comment:386755 that sounds good! It means that code which is going to consume caps and later produce them again has to keep a verbatim copy of the (string) cap that they started with. But, this is a question of how storage and reproduction of caps should be implemented -- it is orthogonal to the question of whether that storage and reproduction should include or omit the extension field.