Tue Feb 24 18:30:00 CET 2009 Alberto Berti * Half fix for bug #641. Instead of raising an exception and exit when a non backuppable file is encountered if the '--skip-problematic' or '-s' is specified on the command line the file will be skipped and backup will continue. Test not yet added because i'm unable to think to a platform-agnostic way to test it. As soon as i've understood test_backup, i'll add one for linux platform. Tue Feb 24 18:30:42 CET 2009 Alberto Berti * Added doc for '--skip-problematic' cli backup command. New patches: [Half fix for bug #641. Alberto Berti **20090224173000 Ignore-this: 195d29650b661bca0d68d1daa2da79cb Instead of raising an exception and exit when a non backuppable file is encountered if the '--skip-problematic' or '-s' is specified on the command line the file will be skipped and backup will continue. Test not yet added because i'm unable to think to a platform-agnostic way to test it. As soon as i've understood test_backup, i'll add one for linux platform. ] { hunk ./src/allmydata/scripts/cli.py 210 ("verbose", "v", "Be noisy about what is happening."), ("no-backupdb", None, "Do not use the SQLite-based backup-database (always upload all files)."), ("ignore-timestamps", None, "Do not use backupdb timestamps to decide if a local file is unchanged."), + ("skip-problematic", "s", "Skip non backuppable files like dangling symlinks, devices, etc. instead of stop backup processing"), ] vcs_patterns = ('CVS', 'RCS', 'SCCS', '.git', '.gitignore', '.cvsignore', '.svn', hunk ./src/allmydata/scripts/tahoe_backup.py 1 - import os.path import time import urllib hunk ./src/allmydata/scripts/tahoe_backup.py 132 self.directories_created = 0 self.directories_reused = 0 self.directories_checked = 0 + self.problematic_skipped = 0 def run(self): options = self.options hunk ./src/allmydata/scripts/tahoe_backup.py 206 latest_backup_dircap = str(archives_dir[archive_name][1]["ro_uri"]) # third step: process the tree - new_backup_dircap = self.process(options.from_dir, latest_backup_dircap) + new_backup_dircap = self.process(options.from_dir, latest_backup_dircap, + self.options['skip-problematic']) # fourth: attach the new backup to the list new_readonly_backup_dircap = readonly(new_backup_dircap) hunk ./src/allmydata/scripts/tahoe_backup.py 221 if self.verbosity >= 1: print >>stdout, (" %d files uploaded (%d reused), " - "%d directories created (%d reused)" + "%d directories created (%d reused), " + "%d problematic files skipped" % (self.files_uploaded, self.files_reused, self.directories_created, hunk ./src/allmydata/scripts/tahoe_backup.py 226 - self.directories_reused)) + self.directories_reused, + self.problematic_skipped)) if self.verbosity >= 2: print >>stdout, (" %d files checked, %d directories checked, " "%d directories read" hunk ./src/allmydata/scripts/tahoe_backup.py 242 if self.verbosity >= 2: print >>self.options.stdout, msg - def process(self, localpath, olddircap): + def process(self, localpath, olddircap, skip_problematic=False): # returns newdircap self.verboseprint("processing %s, olddircap %s" % (localpath, olddircap)) hunk ./src/allmydata/scripts/tahoe_backup.py 259 if olddircontents is not None and child in olddircontents: oldchildcap = olddircontents[child][1] # recurse on the child directory - newchilddircap = self.process(childpath, oldchildcap) + newchilddircap = self.process(childpath, oldchildcap, skip_problematic) newdircontents[child] = ("dirnode", newchilddircap, metadata) elif os.path.isfile(childpath): newfilecap, metadata = self.upload(childpath) hunk ./src/allmydata/scripts/tahoe_backup.py 265 newdircontents[child] = ("filenode", newfilecap, metadata) else: - raise BackupProcessingError("Cannot backup this file %r" % childpath) + if skip_problematic: + self.problematic_skipped += 1 + self.verboseprint("skipping problematic file %s.." % childpath) + else: + raise BackupProcessingError("Cannot backup this file %r" % childpath) if (olddircap and olddircontents is not None } [Added doc for '--skip-problematic' cli backup command. Alberto Berti **20090224173042 Ignore-this: 76399da70f07042cffa6d07d77400580 ] hunk ./docs/frontends/CLI.txt 390 * .hgignore * _darcs +tahoe backup --skip-problematic ~ work:backups + + With '--skip-problematic' (or '-s') option added, the backup command + tree walker will skip files or filetypes that it can't backup yet, + instead of surrender and stop the backup process, exiting with an + error. + == Virtual Drive Maintenance == tahoe manifest tahoe: Context: [test_web: add (disabled) test to see what happens when deep-check encounters an unrecoverable directory. We still need code changes to improve this behavior. warner@lothar.com**20090224214017 Ignore-this: e839f1b0ec40f53fedcd809c2a30d5f9 ] [test_repairer: change to use faster no_network.GridTestMixin, split Verifier tests into separate cases, refactor judgement funcs into shared methods warner@lothar.com**20090224041506 Ignore-this: 584ce72d6276da5edc00562793d4ee53 ] [immutable/checker.py: trap ShareVersionIncompatible too. Also, use f.check warner@lothar.com**20090224041405 Ignore-this: b667e8d3192116293babcacdeed42898 instead of examining the value returned by f.trap, because the latter appears to squash exception types down into their base classes (i.e. since ShareVersionIncompatible is a subclass of LayoutInvalid, f.trap(Failure(ShareVersionIncompatible)) == LayoutInvalid). All this resulted in 'incompatible' shares being misclassified as 'corrupt'. ] [immutable/layout.py: wrap to 80 cols, no functional changes warner@lothar.com**20090224005837 Ignore-this: 40019480180ec34141506a28d7711608 ] [test_repairer: change Repairer to use much-faster no_network.GridTestMixin. As a side-effect, fix what I think was a bug: some of the assert-minimal-effort-expended checks were mixing write counts and allocate counts warner@lothar.com**20090223234227 Ignore-this: d58bd0a909f9939775730cda4a858cae ] [test/no_network.py: add a basic stats provider warner@lothar.com**20090223233937 Ignore-this: c9f3cc4eed99cfc36f68938ceff4162c ] [tests: stop using setUpClass/tearDownClass, since they've been deprecated in Twisted-8.2.0 warner@lothar.com**20090223204312 Ignore-this: 24c6592141cf64103530c024f93a5b88 ] [test_checker: improve test coverage for checker results warner@lothar.com**20090223201943 Ignore-this: 83e173602f0f4c811a7a9893d85385df ] [Two small fixes on documentation for cli backup command. Alberto Berti **20090224223634 Ignore-this: 5634a6dadad6e4e43a112de7fe5c74c ] [Add elapsed timestamp to cli backup command final summary. Alberto Berti **20090224171425 Ignore-this: 9a042d11f95ee9f6858a5096d513c0bc ] [Added documentation for '--exclude' and friends cli backup command. Alberto Berti **20090224153049 Ignore-this: bbc791fa56e38535bb82cc3077ffde90 ] [misc/*: remove RuntimeError too warner@lothar.com**20090222233401 Ignore-this: b76f8a184f75bb28eb9d8002f957936a ] [scripts: stop using RuntimeError, for #639 warner@lothar.com**20090222233106 Ignore-this: 686a424442670fffbd4d1816c284a601 ] [mutable/publish: stop using RuntimeError, for #639 warner@lothar.com**20090222233056 Ignore-this: 2a80a661c7850d97357caddad48c6e9d ] [remove more RuntimeError from unit tests, for #639 warner@lothar.com**20090222232855 Ignore-this: 1a1c3e1457f3f29ba7101fe406ee5f43 ] [stop using RuntimeError in unit tests, for #639 warner@lothar.com**20090222232722 Ignore-this: 475ce0c0dcd7a1f5ed83ef460312efea ] [ftpd/sftpd: stop using RuntimeError, for #639 warner@lothar.com**20090222232426 Ignore-this: 97001362c4ba9e94b2e254e229b79987 ] [docs: CREDITS to Alberto Berti zooko@zooko.com**20090222193314 Ignore-this: 74d370ada3234cce9e58aec15d739f71 ] [Fixed tests again so they will pass on windows. Alberto Berti **20090223003502 Ignore-this: 80d5074e7153642a2fa2a77958bfb50d ] [Added tests for the cse when listdir is an iterator Alberto Berti **20090222224356 Ignore-this: 218fb2aba02c28b4b1e5324bdb5adeaa ] [Fixed tests so that they pass also on buildbots. Alberto Berti **20090222224311 Ignore-this: fcb91cd6acf028382411d23d380a4576 ] [Use failUnlessEqual instead of failUnless(a == b) Alberto Berti **20090222224214 Ignore-this: 8f9144632e3ac9acb4726fb48a083bf4 ] [Better implementation of filtering algorithm. Alberto Berti **20090222224049 Ignore-this: 67a8bd2f99bcc87ca2443bef13370a87 ] [Removed '.hgrags' from vcs excludes Alberto Berti **20090222223946 Ignore-this: 3e94c22fc9d85f380ee11fb8bdb4d1e9 ] [docs: move many specification-like documents into specifications/ warner@lothar.com**20090222054054 Ignore-this: a4110cc478198c0611205aba1ccf54f4 ] [test_web.py: increase test coverage of web.status.plural() warner@lothar.com**20090222000116 Ignore-this: 3138c9d5d2410d8e1121e9b2ed694169 ] [crawler: fix performance problems: only save state once per timeslice (not after every bucket), don't start the crawler until 5 minutes after node startup warner@lothar.com**20090221205649 Ignore-this: e6551569982bd31d19779ff15c2d6f58 ] [test_system: oops, don't assume that all files in storage/ are in a deep storage/shares/prefix/si/shnum path, since now the crawler pickle has a short path warner@lothar.com**20090221061710 Ignore-this: fde76d0e5cae853014d1bb18b5f17dae ] [crawler: tolerate low-resolution system clocks (i.e. windows) warner@lothar.com**20090221061533 Ignore-this: 57286a3abcaf44f6d1a78c3c1ad547a5 ] [BucketCountingCrawler: store just the count, not cycle+count, since it's too easy to make usage mistakes otherwise warner@lothar.com**20090221035831 Ignore-this: 573b6f651af74380cdd64059fbbdda4b ] [test_storage: startService the server, as is now the standard practice warner@lothar.com**20090221035755 Ignore-this: 3999889bd628fe4039bbcf1b29160453 ] [crawler: load state from the pickle in init, rather than waiting until startService, so get_state() can be called early warner@lothar.com**20090221035720 Ignore-this: ecd128a5f4364c0daf4b72d791340b66 ] [BucketCountingCrawler: rename status and state keys to use 'bucket' instead of 'share', because the former is more accurate warner@lothar.com**20090221034606 Ignore-this: cf819f63fac9506c878d6c9715ce35b7 ] [storage: also report space-free-for-root and space-free-for-nonroot, since that helps users understand the space-left-for-tahoe number better warner@lothar.com**20090221032856 Ignore-this: 9fdf0475f758acd98b73026677170b45 ] [storage: add bucket-counting share crawler, add its output (number of files+directories maintained by a storage server) and status to the webapi /storage page warner@lothar.com**20090221030408 Ignore-this: 28761c5e076648026bc5f518506db65c ] [storage: move si_b2a/si_a2b/storage_index_to_dir out of server.py and into common.py warner@lothar.com**20090221030309 Ignore-this: 645056428ab797f0b542831c82bf192a ] [crawler: add get_progress, clean up get_state warner@lothar.com**20090221002743 Ignore-this: 9bea69f154c75b31a53425a8ea67789b ] [Added --exclude, --exclude-from and --exclude-vcs options to backup command. Alberto Berti **20090222170829 Ignore-this: 4912890229cd54a2f61f14f06bc4afcc It is still impossible to specify absolute exclusion path, only relative. I must check with tar or rsync how they allow them to be specified. ] [Raise a more explanatory exception for errors encountered during backup processing. Alberto Berti **20090222170252 Ignore-this: f6b8ffe2a903ba07a2c1c59130dac1e4 ] [Added tests for the --exclude* options of backup command. Alberto Berti **20090222165106 Ignore-this: f1b931cf2e7929ce47b737c022bca707 ] [Added tests for the fixed alias related command's synopsis Alberto Berti **20090222163732 Ignore-this: 4432b4e88e990ba53a5b3fe0f12db2ac ] [web/storage: make sure we can handle platforms without os.statvfs too warner@lothar.com**20090220220353 Ignore-this: 79d4cb8482a8543b9759dc949c86c587 ] [crawler: provide for one-shot crawlers, which stop after their first full cycle, for share-upgraders and database-populaters warner@lothar.com**20090220211911 Ignore-this: fcdf72c5ffcafa374d376388be6fa5c5 ] [web: add Storage status page, improve tests warner@lothar.com**20090220202926 Ignore-this: e34d5270dcf0237fe72f573f717c7a4 ] [storage: include reserved_space in stats warner@lothar.com**20090220202920 Ignore-this: b5b480fe0abad0148ecad0c1fb47ecae ] [web/check_results: sort share identifiers in the sharemap display warner@lothar.com**20090220182922 Ignore-this: 5c7bfcee3e15c7082c3653eb8a460960 ] [webapi: pass client through constructor arguments, remove IClient, should make it easier to test web renderers in isolation warner@lothar.com**20090220181554 Ignore-this: e7848cd1bee8faf2ce7aaf040b9bf8e3 ] [test/no_network: do startService on the storage servers, make it easier to customize the storage servers warner@lothar.com**20090220022254 Ignore-this: e62f328721c007e4c5ee023a6efdf66d ] [crawler: modify API to support upcoming bucket-counting crawler warner@lothar.com**20090220013142 Ignore-this: 808f8382837b13082f8b245db2ebee06 ] [test_backupdb: make the not-a-database file larger, since the older sqlite-2.3.2 on OS-X is easily fooled warner@lothar.com**20090220000409 Ignore-this: 694d2ca5053bb96e91670765d0cedf2e ] [web/reliability: add parameter descriptions, adapted from a patch from Terrell Russell. warner@lothar.com**20090219222918 Ignore-this: 835f5ab01e1aff31b2ff9febb9a51f3 ] [test_crawler: hush pyflakes warner@lothar.com**20090219202340 Ignore-this: 765d22c9c9682cc86c5205dc130500af ] [test_crawler: disable the percentage-of-cpu-used test, since it is too unreliable on our slow buildslaves. But leave the code in place for developers to run by hand. warner@lothar.com**20090219201654 Ignore-this: ff7cf5cfa79c6f2ef0cf959495dd989a ] [reliability.py: fix the numpy conversion, it was completely broken. Thanks to Terrell Russell for the help. warner@lothar.com**20090219195515 Ignore-this: f2b1eb65855111b338e1487feee1bbcf ] [reliability: switch to NumPy, since Numeric is deprecated warner@lothar.com**20090219074435 Ignore-this: f588a68e9bcd3b0bc3653570882b6fd5 ] [setup.py: fix pyflakes complaints warner@lothar.com**20090219073643 Ignore-this: a314e5456b0a796bc9f70232a119ec68 ] [move show-tool-versions out of setup.py and into a separate script in misc/ , since setuptools is trying to build and install a bunch of stuff first warner@lothar.com**20090219073558 Ignore-this: 9e56bc43026379212e6b6671ed6a1fd4 ] [test_crawler: don't require >=1 cycle on cygwin warner@lothar.com**20090219065818 Ignore-this: b8d2d40f26aeb30a7622479840a04635 ] [setup.py: add show_tool_versions command, for the benefit of a new buildbot step warner@lothar.com**20090219062436 Ignore-this: 21d761c76a033e481831584bedc60c86 ] [setup.py: wrap to 80 cols, no functional changes warner@lothar.com**20090219055751 Ignore-this: d29e57c6ee555f2ee435667b7e13e60b ] [crawler: use fileutil.move_info_place in preference to our own version warner@lothar.com**20090219051342 Ignore-this: ee4e46f3de965610503ba36b28184db9 ] [fileutil: add move_into_place(), to perform the standard unix trick of atomically replacing a file, with a fallback for windows warner@lothar.com**20090219051310 Ignore-this: c1d35e8ca88fcb223ea194513611c511 ] [crawler: fix problems on windows and our slow cygwin slave warner@lothar.com**20090219042431 Ignore-this: 8019cb0da79ba00c536183a6f57b4cab ] [#633: first version of a rate-limited interruptable share-crawler warner@lothar.com**20090219034633 Ignore-this: 5d2d30c743e3b096a8e775d5a9b33601 ] [change StorageServer to take nodeid in the constructor, instead of assigning it later, since it's cleaner and because the original problem (Tubs not being ready until later) went away warner@lothar.com**20090218222301 Ignore-this: 740d582f20c93bebf60e21d9a446d3d2 ] [test_system: split off checker tests to test_deepcheck.py, this file is too big warner@lothar.com**20090218214234 Ignore-this: 82bf8db81dfbc98224bbf694054a8761 ] [break storage.py into smaller pieces in storage/*.py . No behavioral changes. warner@lothar.com**20090218204655 Ignore-this: 312d408d1cacc5a764d791b53ebf8f91 ] [immutable/layout: minor change to repr name warner@lothar.com**20090218204648 Ignore-this: c8781ef15b7dea63b39236a1899b86ce ] [docs: add lease-tradeoffs diagram warner@lothar.com**20090218204137 Ignore-this: c22a589ad465dac846da834c30dc4083 ] [Add missing synopsis and descriptions for alias commands. Alberto Berti **20090221003106 Ignore-this: 8aedd03d36d92d912102c7f29e4ca697 ] [interfaces.py: allow add/renew/cancel-lease to return Any, so that 1.3.1 clients (the first to use these calls) can tolerate future storage servers which might return something other than None warner@lothar.com**20090218192903 Ignore-this: dcbb704a05416ecc66d90fb486c3d75b ] [docs/debian.txt: minor edit warner@lothar.com**20090218032212 Ignore-this: 64ff1fb163ffca4bcfd920254f1cf866 ] [add --add-lease to 'tahoe check', 'tahoe deep-check', and webapi. warner@lothar.com**20090218013243 Ignore-this: 176b2006cef5041adcb592ee83e084dd ] [change RIStorageServer.remote_add_lease to exit silently in case of no-such-bucket, instead of raising IndexError, because that makes the upcoming --add-lease feature faster and less noisy warner@lothar.com**20090218013053 Ignore-this: 6fdfcea2c832178f1ce72ab0ff510f3a ] [CLI #590: convert 'tahoe deep-check' to streaming form, improve display, add tests warner@lothar.com**20090217231511 Ignore-this: 6d88eb94b1c877eacc8c5ca7d0aac776 ] [interfaces.py: document behavior of add_lease/renew_lease/cancel_lease, before I change it warner@lothar.com**20090217194809 Ignore-this: 703c6712926b8edb19d55d790b65a400 ] [test_backupdb: improve error messages if the test fails warner@lothar.com**20090217170838 Ignore-this: ef657e87c66e4304d3e0aca9831b84c ] [webapi #590: add streaming deep-check. Still need a CLI tool to use it. warner@lothar.com**20090217053553 Ignore-this: a0edd3d2a531c48a64d8397f7e4b208c ] [test_web.Grid: change the CHECK() function to make it easier to test t= values with hyphens in them warner@lothar.com**20090217050034 Ignore-this: 410c08735347c2057df52f6716520228 ] [test_web: improve checker-results coverage with a no-network -based test, enhance no-network harness to assist, fix some bugs in web/check_results.py that were exposed warner@lothar.com**20090217041242 Ignore-this: fe54bb66a9ae073c002a7af51cd1e18 ] [web: fix handling of reliability page when Numeric is not available warner@lothar.com**20090217015658 Ignore-this: 9d329182f1b2e5f812e5e7eb5f4cf2ed ] [test/no_network: update comments with setup timing: no_network takes 50ms, SystemTestMixin takes 2s (on my laptop) warner@lothar.com**20090217000643 Ignore-this: cc778fa3219775b25057bfc9491f8f34 ] [test_upload: rewrite in terms of no-network GridTestMixin, improve no_network.py as necessary warner@lothar.com**20090216234457 Ignore-this: 80a341d5aa3036d24de98e267499d70d ] [test_download: rewrite in terms of no-network GridTestMixin, improve no_network.py as necessary warner@lothar.com**20090216233658 Ignore-this: ec2febafd2403830519120fb3f3ca04e ] [test_dirnode.py: convert Deleter to new no-network gridtest warner@lothar.com**20090216232348 Ignore-this: 8041739442ec4db726675e48f9775ae9 ] [test_cli.py: modify to use the new 'no-network' gridtest instead of SystemTestMixin, which speeds it up from 73s to 43s on my system warner@lothar.com**20090216232005 Ignore-this: ec6d010c9182aa72049d1fb894cf890e ] [tests: fix no_network framework to work with upload/download and checker warner@lothar.com**20090216231947 Ignore-this: 74b4dbd66b8384ae7c7544969fe4f744 ] [client.py: improve docstring warner@lothar.com**20090216231532 Ignore-this: bbaa9e3f63fdb0048e3125c4681b2d1f ] [test_cli: add test coverage for help strings warner@lothar.com**20090216210833 Ignore-this: d2020849107f687448e159a19d0e5dab ] [test/no_network: new test harness, like system-test but doesn't use the network so it's faster warner@lothar.com**20090216205844 Ignore-this: 31678f7bdef30b0216fd657fc6145534 ] [interfaces.py: minor docstring edit warner@lothar.com**20090216205816 Ignore-this: cec3855070197f7920b370f95e8b07bd ] [setup: if you sdist_dsc (to produce the input files for dpkg-buildpackage) then run darcsver first zooko@zooko.com**20090216201558 Ignore-this: b85be51b3d4a9a19a3366e690f1063e2 ] [doc: a few edits to docs made after the 1.3.0 release zooko@zooko.com**20090216201539 Ignore-this: dbff3b929d88134d862f1dffd1ef068a ] [test_cli: improve test coverage slightly warner@lothar.com**20090216030451 Ignore-this: e01ccc6a6fb44aaa4fb14fe8669e2065 ] [test_util: get almost full test coverage of dictutil, starting with the original pyutil tests as a base. The remaining three uncovered lines involve funny cases of ValueOrderedDict that I can't figure out how to get at warner@lothar.com**20090216023210 Ignore-this: dc1f0c6d8c003c0ade38bc8f8516b04d ] [provisioning/reliability: add tests, hush pyflakes, remove dead code, fix web links warner@lothar.com**20090215222451 Ignore-this: 7854df3e0130d9388f06efd4c797262f ] [util/statistics: add tests, fix mean_repair_cost warner@lothar.com**20090215222326 Ignore-this: c576eabc74c23b170702018fc3c122d9 ] [test_repairer: hush pyflakes warner@lothar.com**20090215222310 Ignore-this: 875eb52e86077cda77efd02da77f8cfa ] [lossmodel.lyx: move draft paper into docs/proposed/, since it's unfinished warner@lothar.com**20090215221905 Ignore-this: 7f7ee204e47fd66932759c94deefe68 ] [build a 'reliability' web page, with a simulation of file decay and repair over time warner@lothar.com**20090213234234 Ignore-this: 9e9623eaac7b0637bbd0071f082bd345 ] [More lossmodel work, on repair. Shawn Willden **20090116025648] [Loss model work (temp1) Shawn Willden **20090115030058] [Statistics module Shawn Willden **20090114021235 Added a statistics module for calculating various facets of share survival statistics. ] [docs: relnotes-short.txt zooko@zooko.com**20090215163510 Ignore-this: 683649bb13499bbe0e5cea2e1716ff59 linkedin.com imposed a strict limit on the number of characters I could post. This forced me to prune and prune and edit and edit until relnotes.txt was a quarter of its former size. Here's the short version. ] [TAG allmydata-tahoe-1.3.0 zooko@zooko.com**20090214000556 Ignore-this: aa6c9a31a14a58ad2298cb7b08d3ea70 ] Patch bundle hash: 4a6659f167f9c01bd644641e5785bda6724beddc