"cannot convert float NaN to integer" in next_power_of_k, during upload via helper #1418

Closed
opened 2011-06-14 11:13:29 +00:00 by rycee · 59 comments
rycee commented 2011-06-14 11:13:29 +00:00
Owner

While performing a backup just a few minutes ago, I got the following exception:

Traceback (most recent call last):
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/support/bin/tahoe", line 9, in <module>
    load_entry_point('allmydata-tahoe==1.8.2', 'console_scripts', 'tahoe')()
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/runner.py", line 113, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/runner.py", line 99, in runner
    rc = cli.dispatch[command](so)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/cli.py", line 540, in backup
    rc = tahoe_backup.backup(options)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 325, in backup
    return bu.run()
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 118, in run
    new_backup_dircap = self.process(options.from_dir)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 194, in process
    childcap, metadata = self.upload(childpath)
  File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 305, in upload
    raise HTTPError("Error during file PUT", resp)
allmydata.scripts.common_http.HTTPError: Error during file PUT: 500 Internal Server Error

The remote exception was:

Traceback (most recent call last):
Failure: foolscap.tokens.RemoteException: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/call.py", line 674, in _done
    self.request.complete(res)
  File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/call.py", line 60, in complete
    self.deferred.callback(res)
  File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 361, in callback
    self._startRunCallbacks(result)
  File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 455, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 924, in locate_all_shareholders
    num_segments, n, k, desired)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 225, in get_shareholders
    None)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 88, in make_write_bucket_proxy
    num_share_hashes, uri_extension_size_max, nodeid)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__
    effective_segments = mathutil.next_power_of_k(num_segments,2)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 35, in next_power_of_k
    x = int(math.log(n, k) + 0.5)
exceptions.ValueError: cannot convert float NaN to integer

I have previously performed successful backups with this precise setup, no configurations have been changed or servers restarted recently. The storage node has ample available space.

I tried running the backup several times and the error occur each time. However, when running a second time the file on which the first run failed is reported as uploaded and the second run fails on the next file. Running a check on the reported cap gives:

$ tahoe check --verify --raw URI:CHK:2pqpbe...
{
 "results": {
  "needs-rebalancing": false, 
  "count-shares-expected": 3, 
  "healthy": false, 
  "count-unrecoverable-versions": 1, 
  "count-shares-needed": 1, 
  "sharemap": {}, 
  "count-recoverable-versions": 0, 
  "servers-responding": [
   "5yea4my3w3frgp524lgthrb7rdd6frtr", 
   "44g5kkgwulzrrrntdzci7jtt5rgt6nuo", 
   "bzyf23mghgxycnr34pdkqdmybnevf4ks"
  ], 
  "count-good-share-hosts": 0, 
  "count-wrong-shares": 0, 
  "count-shares-good": 0, 
  "count-corrupt-shares": 3, 
  "list-corrupt-shares": [
   [
    "bzyf23mghgxycnr34pdkqdmybnevf4ks", 
    "barr7kgra3pst6icbtnmxrggca", 
    0
   ], 
   [
    "44g5kkgwulzrrrntdzci7jtt5rgt6nuo", 
    "barr7kgra3pst6icbtnmxrggca", 
    1
   ], 
   [
    "5yea4my3w3frgp524lgthrb7rdd6frtr", 
    "barr7kgra3pst6icbtnmxrggca", 
    2
   ]
  ], 
  "recoverable": false
 }, 
 "storage-index": "barr7kgra3pst6icbtnmxrggca", 
 "summary": "Not Healthy: 0 shares (enc 1-of-3)"
}

Without using --verify

$ tahoe check --raw URI:CHK:2pqp...
{
 "results": {
  "needs-rebalancing": false, 
  "count-shares-expected": 3, 
  "healthy": true, 
  "count-unrecoverable-versions": 0, 
  "count-shares-needed": 1, 
  "sharemap": {
   "0": [
    "bzyf23mghgxycnr34pdkqdmybnevf4ks"
   ], 
   "1": [
    "44g5kkgwulzrrrntdzci7jtt5rgt6nuo"
   ], 
   "2": [
    "5yea4my3w3frgp524lgthrb7rdd6frtr"
   ]
  }, 
  "count-recoverable-versions": 1, 
  "servers-responding": [
   "5yea4my3w3frgp524lgthrb7rdd6frtr", 
   "44g5kkgwulzrrrntdzci7jtt5rgt6nuo", 
   "bzyf23mghgxycnr34pdkqdmybnevf4ks"
  ], 
  "count-good-share-hosts": 3, 
  "count-wrong-shares": 0, 
  "count-shares-good": 3, 
  "count-corrupt-shares": 0, 
  "list-corrupt-shares": [], 
  "recoverable": true
 }, 
 "storage-index": "barr7kgra3pst6icbtnmxrggca", 
 "summary": "Healthy"
}

Both client and server nodes are running 1.8.2 on Debian stable. Sorry for the big pastes, if there is any other information you would like please let me know.

While performing a backup just a few minutes ago, I got the following exception: ``` Traceback (most recent call last): File "/home/rycee/bin/allmydata-tahoe-1.8.2/support/bin/tahoe", line 9, in <module> load_entry_point('allmydata-tahoe==1.8.2', 'console_scripts', 'tahoe')() File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/runner.py", line 113, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/runner.py", line 99, in runner rc = cli.dispatch[command](so) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/cli.py", line 540, in backup rc = tahoe_backup.backup(options) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 325, in backup return bu.run() File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "/home/rycee/bin/allmydata-tahoe-1.8.2/src/allmydata/scripts/tahoe_backup.py", line 305, in upload raise HTTPError("Error during file PUT", resp) allmydata.scripts.common_http.HTTPError: Error during file PUT: 500 Internal Server Error ``` The remote exception was: ``` Traceback (most recent call last): Failure: foolscap.tokens.RemoteException: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/call.py", line 674, in _done self.request.complete(res) File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/call.py", line 60, in complete self.deferred.callback(res) File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 361, in callback self._startRunCallbacks(result) File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/home/rycee/allmydata-tahoe-1.8.2/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 924, in locate_all_shareholders num_segments, n, k, desired) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 225, in get_shareholders None) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 88, in make_write_bucket_proxy num_share_hashes, uri_extension_size_max, nodeid) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__ effective_segments = mathutil.next_power_of_k(num_segments,2) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 35, in next_power_of_k x = int(math.log(n, k) + 0.5) exceptions.ValueError: cannot convert float NaN to integer ``` I have previously performed successful backups with this precise setup, no configurations have been changed or servers restarted recently. The storage node has ample available space. I tried running the backup several times and the error occur each time. However, when running a second time the file on which the first run failed is reported as uploaded and the second run fails on the next file. Running a check on the reported cap gives: ``` $ tahoe check --verify --raw URI:CHK:2pqpbe... { "results": { "needs-rebalancing": false, "count-shares-expected": 3, "healthy": false, "count-unrecoverable-versions": 1, "count-shares-needed": 1, "sharemap": {}, "count-recoverable-versions": 0, "servers-responding": [ "5yea4my3w3frgp524lgthrb7rdd6frtr", "44g5kkgwulzrrrntdzci7jtt5rgt6nuo", "bzyf23mghgxycnr34pdkqdmybnevf4ks" ], "count-good-share-hosts": 0, "count-wrong-shares": 0, "count-shares-good": 0, "count-corrupt-shares": 3, "list-corrupt-shares": [ [ "bzyf23mghgxycnr34pdkqdmybnevf4ks", "barr7kgra3pst6icbtnmxrggca", 0 ], [ "44g5kkgwulzrrrntdzci7jtt5rgt6nuo", "barr7kgra3pst6icbtnmxrggca", 1 ], [ "5yea4my3w3frgp524lgthrb7rdd6frtr", "barr7kgra3pst6icbtnmxrggca", 2 ] ], "recoverable": false }, "storage-index": "barr7kgra3pst6icbtnmxrggca", "summary": "Not Healthy: 0 shares (enc 1-of-3)" } ``` Without using --verify ``` $ tahoe check --raw URI:CHK:2pqp... { "results": { "needs-rebalancing": false, "count-shares-expected": 3, "healthy": true, "count-unrecoverable-versions": 0, "count-shares-needed": 1, "sharemap": { "0": [ "bzyf23mghgxycnr34pdkqdmybnevf4ks" ], "1": [ "44g5kkgwulzrrrntdzci7jtt5rgt6nuo" ], "2": [ "5yea4my3w3frgp524lgthrb7rdd6frtr" ] }, "count-recoverable-versions": 1, "servers-responding": [ "5yea4my3w3frgp524lgthrb7rdd6frtr", "44g5kkgwulzrrrntdzci7jtt5rgt6nuo", "bzyf23mghgxycnr34pdkqdmybnevf4ks" ], "count-good-share-hosts": 3, "count-wrong-shares": 0, "count-shares-good": 3, "count-corrupt-shares": 0, "list-corrupt-shares": [], "recoverable": true }, "storage-index": "barr7kgra3pst6icbtnmxrggca", "summary": "Healthy" } ``` Both client and server nodes are running 1.8.2 on Debian stable. Sorry for the big pastes, if there is any other information you would like please let me know.
tahoe-lafs added the
c/unknown
p/major
t/defect
v/1.8.2
labels 2011-06-14 11:13:29 +00:00
tahoe-lafs added this to the undecided milestone 2011-06-14 11:13:29 +00:00
rycee commented 2011-06-14 11:16:29 +00:00
Author
Owner

I'm currently running a fsck on the storage disks. Maybe a disk corruption would have this effect?

No, the fsck runs came back clean (I run two storage nodes on the computer).

I'm currently running a fsck on the storage disks. Maybe a disk corruption would have this effect? No, the fsck runs came back clean (I run two storage nodes on the computer).

It seems as though mathutil.next_power_of_k(num_segments, 2) was probably called with num_segments = NaN. (The next power of 2 is also undefined for negative numbers and infinity, but those give different exception messages.)

It seems as though `mathutil.next_power_of_k(num_segments, 2)` was probably called with `num_segments = NaN`. (The next power of 2 is also undefined for negative numbers and infinity, but those give different exception messages.)

Attachment preconditions-about-ticket1418.diff (3698 bytes) added

**Attachment** preconditions-about-ticket1418.diff (3698 bytes) added

Dear rycee: thank you for the good bug report! I scanned through the code shown by the stack trace and didn't see the problem. Along the way I added preconditions which should identify the problem more precisely. Could you please apply attachment:preconditions-about-ticket1418.diff to your 1.8.2 source tree and reproduce the problem? Thank you very much!

I wonder if it could stem from a configuration change you've made in tahoe.cfg, perhaps to the "max segment size" setting. If so, this patch will tell us that this is the problem.

Dear rycee: thank you for the good bug report! I scanned through the code shown by the stack trace and didn't see the problem. Along the way I added preconditions which should identify the problem more precisely. Could you please apply [attachment:preconditions-about-ticket1418.diff](/tahoe-lafs/trac/attachments/000078ac-e3ed-3963-c28f-06b4ef65d4cd) to your 1.8.2 source tree and reproduce the problem? Thank you very much! I wonder if it could stem from a configuration change you've made in `tahoe.cfg`, perhaps to the "max segment size" setting. If so, this patch will tell us that this is the problem.
rycee commented 2011-06-22 08:55:30 +00:00
Author
Owner

With zooko's patch I get pretty much the same thing as earlier (do I need to give python some special flags to enable the preconditions?):

[snip]
--- <exception caught here> ---
  File "/home/rycee/allmydata-tahoe-1.8.2.foo/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 926, in locate_all_shareholders
    num_segments, n, k, desired)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 225, in get_shareholders
    None)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 88, in make_write_bucket_proxy
    num_share_hashes, uri_extension_size_max, nodeid)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__
    effective_segments = mathutil.next_power_of_k(num_segments,2)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 35, in next_power_of_k
    x = int(math.log(n, k) + 0.5)
exceptions.ValueError: cannot convert float NaN to integer

Please note that upload.py:926 now appear in the stack compared to upload.py:924 in the initial stack so the patch was really applied! I checked this many times to convince myself that I'm not going insane :-)

After a bit of further investigation it seems like the problem only occurs when I'm using a helper. When uploading directly from my client node it works perfectly fine. When I realized this I also realized that it is possible that since upgrading the helper node to 1.8.2, I have only used the client without helper! I did successfully use the helper when it was running 1.8.1 though. I don't know if this helps you narrow down the problem.

Disabling the use of helper and uploading a random file gives:

$ dd if=/dev/urandom of=rnd0 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.256792 s, 4.1 MB/s
$ tahoe cp rnd0 priv:
Success: files copied
$ tahoe check priv:rnd0
Summary: Healthy
 storage index: fpeoa7vqrcrjny354zmru3gqda
 good-shares: 3 (encoding is 1-of-3)
 wrong-shares: 0
$ tahoe check --verify priv:rnd0
Summary: Healthy
 storage index: fpeoa7vqrcrjny354zmru3gqda
 good-shares: 3 (encoding is 1-of-3)
 wrong-shares: 0

which is good. However, when I enable the use of a helper and restart the client node I get:

$ dd if=/dev/urandom of=rnd1 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.256289 s, 4.1 MB/s
$ tahoe cp rnd1 priv:
Success: files copied
$ tahoe check priv:rnd1
Summary: Healthy
 storage index: rzlouseev5hbrte62facntsj4i
 good-shares: 3 (encoding is 1-of-3)
 wrong-shares: 0
$ tahoe check --verify priv:rnd1
Summary: Not Healthy: 0 shares (enc 1-of-3)
 storage index: rzlouseev5hbrte62facntsj4i
 good-shares: 0 (encoding is 1-of-3)
 wrong-shares: 0
 corrupt shares:
  server bzyf23mghgxycnr34pdkqdmybnevf4ks, SI rzlouseev5hbrte62facntsj4i, shnum 2
  server 44g5kkgwulzrrrntdzci7jtt5rgt6nuo, SI rzlouseev5hbrte62facntsj4i, shnum 0
  server 5yea4my3w3frgp524lgthrb7rdd6frtr, SI rzlouseev5hbrte62facntsj4i, shnum 1

which is bad.

With zooko's patch I get pretty much the same thing as earlier (do I need to give python some special flags to enable the preconditions?): ``` [snip] --- <exception caught here> --- File "/home/rycee/allmydata-tahoe-1.8.2.foo/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 926, in locate_all_shareholders num_segments, n, k, desired) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 225, in get_shareholders None) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 88, in make_write_bucket_proxy num_share_hashes, uri_extension_size_max, nodeid) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__ effective_segments = mathutil.next_power_of_k(num_segments,2) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 35, in next_power_of_k x = int(math.log(n, k) + 0.5) exceptions.ValueError: cannot convert float NaN to integer ``` Please note that upload.py:926 now appear in the stack compared to upload.py:924 in the initial stack so the patch was really applied! I checked this many times to convince myself that I'm not going insane :-) After a bit of further investigation it seems like the problem only occurs when I'm using a helper. When uploading directly from my client node it works perfectly fine. When I realized this I also realized that it is possible that since upgrading the helper node to 1.8.2, I have only used the client without helper! I did successfully use the helper when it was running 1.8.1 though. I don't know if this helps you narrow down the problem. Disabling the use of helper and uploading a random file gives: ``` $ dd if=/dev/urandom of=rnd0 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.256792 s, 4.1 MB/s $ tahoe cp rnd0 priv: Success: files copied $ tahoe check priv:rnd0 Summary: Healthy storage index: fpeoa7vqrcrjny354zmru3gqda good-shares: 3 (encoding is 1-of-3) wrong-shares: 0 $ tahoe check --verify priv:rnd0 Summary: Healthy storage index: fpeoa7vqrcrjny354zmru3gqda good-shares: 3 (encoding is 1-of-3) wrong-shares: 0 ``` which is good. However, when I enable the use of a helper and restart the client node I get: ``` $ dd if=/dev/urandom of=rnd1 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.256289 s, 4.1 MB/s $ tahoe cp rnd1 priv: Success: files copied $ tahoe check priv:rnd1 Summary: Healthy storage index: rzlouseev5hbrte62facntsj4i good-shares: 3 (encoding is 1-of-3) wrong-shares: 0 $ tahoe check --verify priv:rnd1 Summary: Not Healthy: 0 shares (enc 1-of-3) storage index: rzlouseev5hbrte62facntsj4i good-shares: 0 (encoding is 1-of-3) wrong-shares: 0 corrupt shares: server bzyf23mghgxycnr34pdkqdmybnevf4ks, SI rzlouseev5hbrte62facntsj4i, shnum 2 server 44g5kkgwulzrrrntdzci7jtt5rgt6nuo, SI rzlouseev5hbrte62facntsj4i, shnum 0 server 5yea4my3w3frgp524lgthrb7rdd6frtr, SI rzlouseev5hbrte62facntsj4i, shnum 1 ``` which is bad.
rycee commented 2011-06-22 09:01:52 +00:00
Author
Owner

Attachment tahoe.cfg (391 bytes) added

The tahoe.cfg of my storage/helper node (the non-helper nodes just have enabled=false in helper

**Attachment** tahoe.cfg (391 bytes) added The tahoe.cfg of my storage/helper node (the non-helper nodes just have enabled=false in helper
rycee commented 2011-06-22 09:04:06 +00:00
Author
Owner

Attachment tahoe-client.cfg (340 bytes) added

My client config with helper enabled.

**Attachment** tahoe-client.cfg (340 bytes) added My client config with helper enabled.
daira added
c/code
and removed
c/unknown
labels 2011-06-22 09:29:49 +00:00
daira changed title from "cannot convert float NaN to integer" in next_power_of_k to "cannot convert float NaN to integer" in next_power_of_k, during upload via helper 2011-06-22 09:29:49 +00:00

Replying to rycee:

With zooko's patch I get pretty much the same thing as earlier (do I need to give python some special flags to enable the preconditions?)

No, the [precondition()]source:trunk/src/allmydata/util/assertutil.py?annotate=blame&rev=4329 function runs the tests regardless of any flags. (There is a flag in Python to disable assert, but precondition() doesn't use assert or respect that flag.)

Oh, I guess my patch was written under the mistaken assumption that this NaN was arising because one of the inputs to math.log() was not an int or long. I'll need to think harder to provide a more useful diagnostics patch, probably starting by re-reading davidsarah's comment:384951.

However, when I enable the use of a helper and restart the client node I get:

$ dd if=/dev/urandom of=rnd1 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.256289 s, 4.1 MB/s
$ tahoe cp rnd1 priv:
Success: files copied
$ tahoe check priv:rnd1
Summary: Healthy
 storage index: rzlouseev5hbrte62facntsj4i
 good-shares: 3 (encoding is 1-of-3)
 wrong-shares: 0
$ tahoe check --verify priv:rnd1
Summary: Not Healthy: 0 shares (enc 1-of-3)
 storage index: rzlouseev5hbrte62facntsj4i
 good-shares: 0 (encoding is 1-of-3)
 wrong-shares: 0
 corrupt shares:
  server bzyf23mghgxycnr34pdkqdmybnevf4ks, SI rzlouseev5hbrte62facntsj4i, shnum 2
  server 44g5kkgwulzrrrntdzci7jtt5rgt6nuo, SI rzlouseev5hbrte62facntsj4i, shnum 0
  server 5yea4my3w3frgp524lgthrb7rdd6frtr, SI rzlouseev5hbrte62facntsj4i, shnum 1

which is bad.

Wait a minute, are you saying that tahoe cp reports Success: files copied even though the helper encountered this uncaught exception!? That seems unlikely. Which process encounters the uncaught exception and in which step of the transcript above?

Replying to [rycee](/tahoe-lafs/trac/issues/1418#issuecomment-384953): > With zooko's patch I get pretty much the same thing as earlier (do I need to give python some special flags to enable the preconditions?) No, the [precondition()]source:trunk/src/allmydata/util/assertutil.py?annotate=blame&rev=4329 function runs the tests regardless of any flags. (There is a flag in Python to disable `assert`, but `precondition()` doesn't use `assert` or respect that flag.) Oh, I guess my patch was written under the mistaken assumption that this `NaN` was arising because one of the inputs to `math.log()` was not an int or long. I'll need to think harder to provide a more useful diagnostics patch, probably starting by re-reading davidsarah's [comment:384951](/tahoe-lafs/trac/issues/1418#issuecomment-384951). > However, when I enable the use of a helper and restart the client node I get: > > ``` > $ dd if=/dev/urandom of=rnd1 bs=1M count=1 > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB) copied, 0.256289 s, 4.1 MB/s > $ tahoe cp rnd1 priv: > Success: files copied > $ tahoe check priv:rnd1 > Summary: Healthy > storage index: rzlouseev5hbrte62facntsj4i > good-shares: 3 (encoding is 1-of-3) > wrong-shares: 0 > $ tahoe check --verify priv:rnd1 > Summary: Not Healthy: 0 shares (enc 1-of-3) > storage index: rzlouseev5hbrte62facntsj4i > good-shares: 0 (encoding is 1-of-3) > wrong-shares: 0 > corrupt shares: > server bzyf23mghgxycnr34pdkqdmybnevf4ks, SI rzlouseev5hbrte62facntsj4i, shnum 2 > server 44g5kkgwulzrrrntdzci7jtt5rgt6nuo, SI rzlouseev5hbrte62facntsj4i, shnum 0 > server 5yea4my3w3frgp524lgthrb7rdd6frtr, SI rzlouseev5hbrte62facntsj4i, shnum 1 > ``` > > which is bad. Wait a minute, are you saying that `tahoe cp` reports `Success: files copied` even though the helper encountered this uncaught exception!? That seems unlikely. Which process encounters the uncaught exception and in which step of the transcript above?

Attachment diagnostics-about-NaN-from-npok.diff (1105 bytes) added

**Attachment** diagnostics-about-NaN-from-npok.diff (1105 bytes) added

Okay, this patch will definitely give actually useful diagnostics! attachment:diagnostics-about-NaN-from-npok.diff Please try it out.

Okay, this patch will definitely give actually useful diagnostics! [attachment:diagnostics-about-NaN-from-npok.diff](/tahoe-lafs/trac/attachments/000078ac-e3ed-3963-c28f-385c10a71f02) Please try it out.

Oh, by the way, are applying these diagnostics patches to the helper source code, too? And restarting the helper? The code in the stack trace would be running in the helper, not in the client, I believe.

Oh, by the way, are applying these diagnostics patches to the *helper* source code, too? And restarting the helper? The code in the stack trace would be running in the helper, not in the client, I believe.
rycee commented 2011-06-23 08:56:45 +00:00
Author
Owner

Yes, I have applied the patches you've given on the helper node, not the client node. With the new patch I did indeed get some output but being a Python novice I feel more confused, not less. The stacktrace says:

  File "/home/rycee/allmydata-tahoe-1.8.2-bug1418/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 926, in locate_all_shareholders
    num_segments, n, k, desired)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 225, in get_shareholders
    None)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 88, in make_write_bucket_proxy
    num_share_hashes, uri_extension_size_max, nodeid)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__
    effective_segments = mathutil.next_power_of_k(num_segments,2)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 49, in next_power_of_k
    return next_power_of_k_math(n, k)
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 35, in next_power_of_k_math
    x = int(math.log(n, k) + 0.5)
exceptions.ValueError: ('cannot convert float NaN to integer', 30L, 2, 32)
]'>

and in the node's twistd.log I found

2011-06-23 08:19:10+0200 [Negotiation,1,46.10.48.88] XXX n: 30 :: <type 'long'>, k: 2 :: <type 'int'>, next_power_of_k_alt: 32

In the python REPL on the same computer I get

$ python
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> math.log(30L, 2)
4.9068905956085187
>>> int(math.log(30L, 2) + 0.5)
5
>>> 2**5
32

This, together with the Success: files copied, makes me quite confused. Feels like the NaN error is a decoy put under our noses while the real problem slips quietly into the night.

I also tried creating a completely pristine 1.8.2 build on my helper node and now it also fails in the same way as the cp did, i.e., it claims success and the CHK-caps reported by a verbose backup are OK when running check but fails when running check --verify. For example, backup says

/home/rycee/photos.new/2011/05/22/IMGP4679.JPG -> URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153

and checking gives

$ tahoe check URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153
Summary: Healthy
 storage index: 7sxgsu3edgy43at77tuskouuay
 good-shares: 3 (encoding is 1-of-3)
 wrong-shares: 0
$ tahoe check --verify URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153
Summary: Not Healthy: 0 shares (enc 1-of-3)
 storage index: 7sxgsu3edgy43at77tuskouuay
 good-shares: 0 (encoding is 1-of-3)
 wrong-shares: 0
 corrupt shares:
  server bzyf23mghgxycnr34pdkqdmybnevf4ks, SI 7sxgsu3edgy43at77tuskouuay, shnum 2
  server 44g5kkgwulzrrrntdzci7jtt5rgt6nuo, SI 7sxgsu3edgy43at77tuskouuay, shnum 0
  server 5yea4my3w3frgp524lgthrb7rdd6frtr, SI 7sxgsu3edgy43at77tuskouuay, shnum 1

I will attach the result of a diff -ur between the version giving the exception (including zooko's patches from this bug) and the pristine version. Note that the version giving exception contains edits of mine that changes minimum_cycle_time in crawler.py and expirer.py but I have had those changes without trouble since a long time (many versions ago).

Oh, finally. Since next_power_of_k_alt returns a sensible result, I tried making next_power_of_k return that value in my build that produces exceptions. Running a backup then proceeds in the same way as the pristine version, i.e., the client reports success but check --verify fails.

Yes, I have applied the patches you've given on the helper node, not the client node. With the new patch I did indeed get some output but being a Python novice I feel more confused, not less. The stacktrace says: ``` File "/home/rycee/allmydata-tahoe-1.8.2-bug1418/support/lib/python2.6/site-packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 926, in locate_all_shareholders num_segments, n, k, desired) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py", line 225, in get_shareholders None) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 88, in make_write_bucket_proxy num_share_hashes, uri_extension_size_max, nodeid) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__ effective_segments = mathutil.next_power_of_k(num_segments,2) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 49, in next_power_of_k return next_power_of_k_math(n, k) File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 35, in next_power_of_k_math x = int(math.log(n, k) + 0.5) exceptions.ValueError: ('cannot convert float NaN to integer', 30L, 2, 32) ]'> ``` and in the node's twistd.log I found ``` 2011-06-23 08:19:10+0200 [Negotiation,1,46.10.48.88] XXX n: 30 :: <type 'long'>, k: 2 :: <type 'int'>, next_power_of_k_alt: 32 ``` In the python REPL on the **same** computer I get ``` $ python Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import math >>> math.log(30L, 2) 4.9068905956085187 >>> int(math.log(30L, 2) + 0.5) 5 >>> 2**5 32 ``` This, together with the `Success: files copied`, makes me quite confused. Feels like the NaN error is a decoy put under our noses while the real problem slips quietly into the night. I also tried creating a completely pristine 1.8.2 build on my helper node and now it also fails in the same way as the cp did, i.e., it claims success and the CHK-caps reported by a verbose backup are OK when running `check` but fails when running `check --verify`. For example, backup says ``` /home/rycee/photos.new/2011/05/22/IMGP4679.JPG -> URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153 ``` and checking gives ``` $ tahoe check URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153 Summary: Healthy storage index: 7sxgsu3edgy43at77tuskouuay good-shares: 3 (encoding is 1-of-3) wrong-shares: 0 $ tahoe check --verify URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153 Summary: Not Healthy: 0 shares (enc 1-of-3) storage index: 7sxgsu3edgy43at77tuskouuay good-shares: 0 (encoding is 1-of-3) wrong-shares: 0 corrupt shares: server bzyf23mghgxycnr34pdkqdmybnevf4ks, SI 7sxgsu3edgy43at77tuskouuay, shnum 2 server 44g5kkgwulzrrrntdzci7jtt5rgt6nuo, SI 7sxgsu3edgy43at77tuskouuay, shnum 0 server 5yea4my3w3frgp524lgthrb7rdd6frtr, SI 7sxgsu3edgy43at77tuskouuay, shnum 1 ``` I will attach the result of a `diff -ur` between the version giving the exception (including zooko's patches from this bug) and the pristine version. Note that the version giving exception contains edits of mine that changes minimum_cycle_time in `crawler.py` and `expirer.py` but I have had those changes without trouble since a long time (many versions ago). Oh, finally. Since `next_power_of_k_alt` returns a sensible result, I tried making `next_power_of_k` return that value in my build that produces exceptions. Running a backup then proceeds in the same way as the pristine version, i.e., the client reports success but `check --verify` fails.
rycee commented 2011-06-23 08:58:49 +00:00
Author
Owner

Attachment 1.8.2.diff (17244 bytes) added

diff -ur between build throwing NaN exception and pristine build

**Attachment** 1.8.2.diff (17244 bytes) added diff -ur between build throwing NaN exception and pristine build

Yes, this is getting weird. It looks like next_power_of_k_math raised this exception when its inputs were n=30, k=2, but when you tried it yourself in the REPL, the same calls to math.log() on the same computer worked. It could be a failure in your CPU's floating point support, but I would have expected that to be sporadic or permanent, rather than to fail every time under tahoe and work every time under the REPL! I wonder if executing tahoe is somehow changing the floating point mode of your CPU...

Maybe there's a bug in next_power_of_k_math(). Could you please try something like:

HACK zompu:~/playground/tahoe-lafs/trunk$ PYTHONPATH=src python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from allmydata.util import mathutil
>>> mathutil.next_power_of_k(30, 2)
32
>>> mathutil.next_power_of_k_math(30, 2)
32
>>> mathutil.next_power_of_k_alt(30, 2)
32

A next-step on this for me is to try to understand why upload is succeeding but then verify failing. I guess this error is happening during write. If the error is causing it to get an incorrect value for effective_segments:

  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__
    effective_segments = mathutil.next_power_of_k(num_segments,2)

Then it might write the data out incorrectly. If the file you are uploading was previously uploaded then the deduplication will prevent your gateway from uploading a new copy. This would explain why changing it to use next_power_of_k_alt() and then uploading and verifying previously written files got the same failure-to-verify. Oh wait, did you upload a new random file instead of a previously uploaded file, when you did the experiment that showed next_power_of_k_alt had the same problem? Gah! If you already tried that, then perhaps there are two bugs here -- the NaN exception and a different bug that is corrupting files on write.

Anyway it wouldn't make sense for this NaN exception to result in an incorrect value of effective_segments, when we can clearly see that what results is an uncaught exception!

Here are a couple of ideas you could try:

  1. Run next_power_of_k() (the version that uses _math and then uses _alt if _math raised an exception) in a tight loop, possibly with multiple processes doing it, and leave those running and reporting if they got any exceptions.

  2. Edit the code to use next_power_of_k_alt exclusively and mv your entire storage directory aside, or create an entirely separate storage server and introducer for testing, and upload and then verify a random file. (If you haven't already done this.)

Yes, this is getting weird. It *looks* like `next_power_of_k_math` raised this exception when its inputs were `n=30, k=2`, but when you tried it yourself in the REPL, the same calls to `math.log()` on the **same** computer worked. It could be a failure in your CPU's floating point support, but I would have expected that to be sporadic or permanent, rather than to fail every time under `tahoe` and work every time under the REPL! I wonder if executing `tahoe` is somehow changing the floating point mode of your CPU... Maybe there's a bug in `next_power_of_k_math()`. Could you please try something like: ``` HACK zompu:~/playground/tahoe-lafs/trunk$ PYTHONPATH=src python Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) [GCC 4.5.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from allmydata.util import mathutil >>> mathutil.next_power_of_k(30, 2) 32 >>> mathutil.next_power_of_k_math(30, 2) 32 >>> mathutil.next_power_of_k_alt(30, 2) 32 ``` A next-step on this for me is to try to understand why upload is succeeding but then verify failing. I guess this error is happening during *write*. If the error is causing it to get an incorrect value for `effective_segments`: ``` File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__ effective_segments = mathutil.next_power_of_k(num_segments,2) ``` Then it might write the data out incorrectly. If the file you are uploading was previously uploaded then the deduplication will prevent your gateway from uploading a new copy. This would explain why changing it to use `next_power_of_k_alt()` and then uploading and verifying previously written files got the same failure-to-verify. Oh wait, did you upload a new random file instead of a previously uploaded file, when you did the experiment that showed `next_power_of_k_alt` had the same problem? Gah! If you already tried that, then perhaps there are *two* bugs here -- the `NaN` exception and a different bug that is corrupting files on write. Anyway it wouldn't make sense for this `NaN` exception to result in an incorrect value of `effective_segments`, when we can clearly see that what results is an uncaught exception! Here are a couple of ideas you could try: 1. Run `next_power_of_k()` (the version that uses `_math` and then uses `_alt` if `_math` raised an exception) in a tight loop, possibly with multiple processes doing it, and leave those running and reporting if they got any exceptions. 2. Edit the code to use `next_power_of_k_alt` exclusively and `mv` your entire `storage` directory aside, or create an entirely separate storage server and introducer for testing, and upload and then verify a random file. (If you haven't already done this.)

Two more ideas:
3. Run trial --until-failure allmydata.test and just leave it running.
4. memtest86 :-)

Two more ideas: 3. Run `trial --until-failure allmydata.test` and just leave it running. 4. memtest86 :-)
rycee commented 2011-06-23 19:36:08 +00:00
Author
Owner

I don't have physical access to the machine but I ran memtester for quite some time and it didn't show anything. Neither did some stress testing of math.log. But running the trial thingy was good, I noticed that allmydata.test.test_cli.Backup.test_backup failed with an exception on the x = int(math.log(n, k) + 0.5) line. And I found something that seems rather curious to me, putting print math.log(10) first in next_power_of_k and running test_backup gives

...
  File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 32, in next_power_of_k
    print math.log(10)
exceptions.ValueError: math domain error

Yeah, that's right. Running math.log(10) gives an exception and since it's only run once I can't imagine it's being from the CPU being overloaded or anything. Further:

$ python
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> math.log(10)
2.3025850929940459

If I change math.log(10) to math.log10(10) it still gives the error. But – and hang on to your hat – if I use math.sqrt(10) it works, not only does the sqrt work but the math.log(n, k) a few lines down does as well! With next_power_of_k defined

def next_power_of_k(n, k):
    hmmmm = math.sqrt(10)
    if n == 0:
        x = 0
    else:
        x = int(math.log(n, k) + 0.5)
    if k**x < n:
        return k**(x+1)
    else:
        return k**x

the unit test passes!

Now, while this elegant fix removes the exception during backup the uploaded files unfortunately still fail check --verify).

Anyway, obviously something strange happens with some floating point operations and like you said, some floating point mode is probably being altered when running tahoe but not otherwise.

I also tried adding a print of effective_segments right after its assignment. It turns out that this variable is being set to 1 when I try doing a cp of a 1kB file and 128 with a 10MB file. I don't know if these values are OK but regardless, it appears that next_power_of_k is OK to call sometimes but not otherwise.

I don't have physical access to the machine but I ran memtester for quite some time and it didn't show anything. Neither did some stress testing of `math.log`. But running the trial thingy was good, I noticed that `allmydata.test.test_cli.Backup.test_backup` failed with an exception on the `x = int(math.log(n, k) + 0.5)` line. And I found something that seems rather curious to me, putting `print math.log(10)` first in `next_power_of_k` and running `test_backup` gives ``` ... File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py", line 32, in next_power_of_k print math.log(10) exceptions.ValueError: math domain error ``` Yeah, that's right. Running `math.log(10)` gives an exception and since it's only run once I can't imagine it's being from the CPU being overloaded or anything. Further: ``` $ python Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import math >>> math.log(10) 2.3025850929940459 ``` If I change `math.log(10)` to `math.log10(10)` it still gives the error. But – and hang on to your hat – if I use `math.sqrt(10)` it works, not only does the sqrt work but the `math.log(n, k)` a few lines down does as well! With `next_power_of_k` defined ``` def next_power_of_k(n, k): hmmmm = math.sqrt(10) if n == 0: x = 0 else: x = int(math.log(n, k) + 0.5) if k**x < n: return k**(x+1) else: return k**x ``` the unit test passes! Now, while this elegant fix removes the exception during backup the uploaded files unfortunately still fail `check --verify`). Anyway, obviously something strange happens with some floating point operations and like you said, some floating point mode is probably being altered when running tahoe but not otherwise. I also tried adding a print of `effective_segments` right after its assignment. It turns out that this variable is being set to 1 when I try doing a `cp` of a 1kB file and 128 with a 10MB file. I don't know if these values are OK but regardless, it appears that `next_power_of_k` is OK to call sometimes but not otherwise.

Could you please run this single unit test that fails -- allmydata.test.test_cli.Backup.test_backup -- with the math.log(10) in place, under valgrind?

http://valgrind.org

Valgrind is very good. You'll need to use the Python suppressions file with it:

http://svn.python.org/projects/python/trunk/Misc/valgrind-python.supp

Another idea suggested by Riastradh on IRC is to turn on floating point exceptions: http://docs.python.org/library/fpectl.html

Could you please run this single unit test that fails -- `allmydata.test.test_cli.Backup.test_backup` -- with the `math.log(10)` in place, under valgrind? <http://valgrind.org> Valgrind is very good. You'll need to use the Python suppressions file with it: <http://svn.python.org/projects/python/trunk/Misc/valgrind-python.supp> Another idea suggested by Riastradh on IRC is to turn on floating point exceptions: <http://docs.python.org/library/fpectl.html>

Could you please attach the output of:

$ python misc/build_helpers/show-tool-versions.py 

and

$ ./bin/tahoe --version
Could you please attach the output of: ``` $ python misc/build_helpers/show-tool-versions.py ``` and ``` $ ./bin/tahoe --version ```
rycee commented 2011-06-23 19:59:57 +00:00
Author
Owner

I'll attach the show-tool-versions output as a file (it's a bit messy).

$ ./bin/tahoe --version
allmydata-tahoe: 1.8.2,
foolscap: 0.6.1,
pycryptopp: 0.5.17,
zfec: 1.4.10,
Twisted: 10.2.0,
Nevow: 0.10.0,
zope.interface: unknown,
python: 2.6.6,
platform: Linux-debian_6.0.1-i686-32bit_ELF,
pyOpenSSL: 0.10,
simplejson: 2.1.6,
pycrypto: 2.1.0,
pyasn1: unknown,
mock: 0.6.0,
sqlite3: 2.4.1 [sqlite 3.7.3],
setuptools: 0.6c16dev3
I'll attach the show-tool-versions output as a file (it's a bit messy). ``` $ ./bin/tahoe --version allmydata-tahoe: 1.8.2, foolscap: 0.6.1, pycryptopp: 0.5.17, zfec: 1.4.10, Twisted: 10.2.0, Nevow: 0.10.0, zope.interface: unknown, python: 2.6.6, platform: Linux-debian_6.0.1-i686-32bit_ELF, pyOpenSSL: 0.10, simplejson: 2.1.6, pycrypto: 2.1.0, pyasn1: unknown, mock: 0.6.0, sqlite3: 2.4.1 [sqlite 3.7.3], setuptools: 0.6c16dev3 ```
rycee commented 2011-06-23 20:00:36 +00:00
Author
Owner

Attachment show-tool-versions-output.txt (11059 bytes) added

Output from show-tools-versions

**Attachment** show-tool-versions-output.txt (11059 bytes) added Output from show-tools-versions
rycee commented 2011-06-23 20:01:05 +00:00
Author
Owner

Attachment valgrind-output.txt (84317 bytes) added

Output from valgrind.

**Attachment** valgrind-output.txt (84317 bytes) added Output from valgrind.
rycee commented 2011-06-23 20:12:03 +00:00
Author
Owner

About the fpectl library. I tried adding fpectl.turnon_sigfpe() to the beginning of the test_backup method, this gave me

$ bin/tahoe debug trial allmydata.test.test_cli.Backup.test_backup
allmydata.test.test_cli
  Backup
    test_backup ... Fatal Python error: Unprotected floating point exception

which seems … interesting.

I don't know how to associate this to a source location though so I dunno if it corresponds to the math.log(10) or some other piece of code.

About the `fpectl` library. I tried adding `fpectl.turnon_sigfpe()` to the beginning of the `test_backup` method, this gave me ``` $ bin/tahoe debug trial allmydata.test.test_cli.Backup.test_backup allmydata.test.test_cli Backup test_backup ... Fatal Python error: Unprotected floating point exception ``` which seems … interesting. I don't know how to associate this to a source location though so I dunno if it corresponds to the `math.log(10)` or some other piece of code.

Sigh. The Valgrind Python suppressions file that you used is apparently not working. Perhaps it was written for 64-bit architecture and has a bunch of "8" in it which need to be changed to "4" to make it suppress all of those false positives in attachment:valgrind-output.txt ?

All of those "Conditional jump or move depends on uninitialised value(s)" and "Use of uninitialised value of size 4" emitting from PyObject_Free or PyObject_Realloc are false positives. You can generate new suppression entries to put into a suppressions file by executing

valgrind --gen-suppressions=yes bin/tahoe debug trial allmydata.test.test_cli.Backup.test_backup

You might also want to report a bug to the distributor of your valgrind-python.supp file. If you do, please let me know the URL to the ticket.

Sigh. The Valgrind Python suppressions file that you used is apparently not working. Perhaps it was written for 64-bit architecture and has a bunch of "8" in it which need to be changed to "4" to make it suppress all of those false positives in [attachment:valgrind-output.txt](/tahoe-lafs/trac/attachments/000078ac-e3ed-3963-c28f-bae4c5724209) ? All of those "Conditional jump or move depends on uninitialised value(s)" and "Use of uninitialised value of size 4" emitting from `PyObject_Free` or `PyObject_Realloc` are false positives. You can generate new suppression entries to put into a suppressions file by executing ``` valgrind --gen-suppressions=yes bin/tahoe debug trial allmydata.test.test_cli.Backup.test_backup ``` You might also want to report a bug to the distributor of your valgrind-python.supp file. If you do, please let me know the URL to the ticket.

Hm, judging from http://docs.python.org/library/fpectl.html , any Python code (in the CPython interpreter or in an extension module) which does floating point math is supposed to wrap the calls to floating point functions in macros named PyFPE_START_PROTECT and PyFPE_END_PROTECT in order to trap the exception and raise a nice Python exception. Apparently this hasn't happened!

Hopefully you can get a run under valgrind without all the noise from false positives. Maybe whatever problem ends in the floating point exception starts with a memory mis-usage that valgrind can report.

Jason Riedy on identi.ca wrote: "If the 30L somehow is pushed on a stack as an integer, the resulting corruption likely would produce a NaN." -- http://identi.ca/notice/77048991

Hm, judging from <http://docs.python.org/library/fpectl.html> , any Python code (in the CPython interpreter or in an extension module) which does floating point math is supposed to wrap the calls to floating point functions in macros named `PyFPE_START_PROTECT` and `PyFPE_END_PROTECT` in order to trap the exception and raise a nice Python exception. Apparently this hasn't happened! Hopefully you can get a run under valgrind without all the noise from false positives. Maybe whatever problem ends in the floating point exception starts with a memory mis-usage that valgrind can report. Jason Riedy on identi.ca wrote: "If the 30L somehow is pushed on a stack as an integer, the resulting corruption likely would produce a NaN." -- <http://identi.ca/notice/77048991>
rycee commented 2011-06-23 20:55:41 +00:00
Author
Owner

As suggested by Riastradh in #tahoe-lafs, I ran it under strace. Using strace -f python bin/tahoe debug trial allmydata.test.test_cli.Backup.test_backup I got a 3.3MB file which I can attach if requested. The last 20 lines are

[pid  3482] gettimeofday({1308861861, 391151}, NULL) = 0
[pid  3482] gettimeofday({1308861861, 391248}, NULL) = 0
[pid  3482] gettimeofday({1308861861, 391487}, NULL) = 0
[pid  3482] gettimeofday({1308861861, 391770}, NULL) = 0
[pid  3482] gettimeofday({1308861861, 391924}, NULL) = 0
[pid  3482] gettimeofday({1308861861, 392020}, NULL) = 0
[pid  3482] --- SIGFPE (Floating point exception) @ 0 (0) ---
[pid  3482] rt_sigaction(SIGFPE, {0xb77028d0, [], 0}, {0xb77028d0, [], 0}, 8) = 0
[pid  3482] write(2, "Fatal Python error: Unprotected "..., 57) = 57
[pid  3482] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
[pid  3482] tgkill(3482, 3482, SIGABRT) = 0
[pid  3482] --- SIGABRT (Aborted) @ 0 (0) ---
Process 3481 resumed
Process 3482 detached
[pid  3487] +++ killed by SIGABRT +++
<... waitpid resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGABRT}], 0) = 3482
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigaction(SIGINT, {SIG_DFL, [], 0}, {0x8110720, [], 0}, 8) = 0
exit_group(-6)                          = ?
As suggested by Riastradh in #tahoe-lafs, I ran it under strace. Using `strace -f python bin/tahoe debug trial allmydata.test.test_cli.Backup.test_backup` I got a 3.3MB file which I can attach if requested. The last 20 lines are ``` [pid 3482] gettimeofday({1308861861, 391151}, NULL) = 0 [pid 3482] gettimeofday({1308861861, 391248}, NULL) = 0 [pid 3482] gettimeofday({1308861861, 391487}, NULL) = 0 [pid 3482] gettimeofday({1308861861, 391770}, NULL) = 0 [pid 3482] gettimeofday({1308861861, 391924}, NULL) = 0 [pid 3482] gettimeofday({1308861861, 392020}, NULL) = 0 [pid 3482] --- SIGFPE (Floating point exception) @ 0 (0) --- [pid 3482] rt_sigaction(SIGFPE, {0xb77028d0, [], 0}, {0xb77028d0, [], 0}, 8) = 0 [pid 3482] write(2, "Fatal Python error: Unprotected "..., 57) = 57 [pid 3482] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 [pid 3482] tgkill(3482, 3482, SIGABRT) = 0 [pid 3482] --- SIGABRT (Aborted) @ 0 (0) --- Process 3481 resumed Process 3482 detached [pid 3487] +++ killed by SIGABRT +++ <... waitpid resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGABRT}], 0) = 3482 --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGINT, {SIG_DFL, [], 0}, {0x8110720, [], 0}, 8) = 0 exit_group(-6) = ? ```
rycee commented 2011-06-23 21:29:16 +00:00
Author
Owner

After some help I also managed to catch the SIGFPE in gdb. Since I don't have debug symbols in my python (simply the version found in Debian Squeeze), I don't know how useful it is. Anyway, I pasted the stack trace to http://hpaste.org/48228

After some help I also managed to catch the SIGFPE in gdb. Since I don't have debug symbols in my python (simply the version found in Debian Squeeze), I don't know how useful it is. Anyway, I pasted the stack trace to <http://hpaste.org/48228>
rycee commented 2011-06-24 06:48:28 +00:00
Author
Owner

Riastradh suggested that the cause could be a cosmic ray flipping a bit in memory, especially after seeing strange output from info float in gdb. I tried flushing the file cache, rebooting, reinstalling python 2.6, uninstalling python 2.5 (in case it was somehow interfering), rebuilding tahoe and building tahoe 1.8.1. Still get an exception from x = int(math.log(n, k) + 0.5). Note, I also get an exception from my newly built 1.8.1. I have previously used 1.8.1 reliably as a helper so I suppose the problem might be caused due to some newer version of a library pulled in during python setup.py build?

Riastradh suggested that the cause could be a cosmic ray flipping a bit in memory, especially after seeing [strange output](http://hpaste.org/48231) from `info float` in gdb. I tried flushing the file cache, rebooting, reinstalling python 2.6, uninstalling python 2.5 (in case it was somehow interfering), rebuilding tahoe and building tahoe 1.8.1. Still get an exception from `x = int(math.log(n, k) + 0.5)`. Note, I also get an exception from my newly built 1.8.1. I have previously used 1.8.1 reliably as a helper so I suppose the problem might be caused due to some newer version of a library pulled in during `python setup.py build`?

Replying to rycee:

I have previously used 1.8.1 reliably as a helper so I suppose the problem might be caused due to some newer version of a library pulled in during python setup.py build?

Yes, Tahoe-LAFS itself is written in pure Python. I suspect this problem is due to memory corruption or some other incorrect C/C++ code, so it is probably due to one of the dependencies. I really want to track down which one it is, so please try to continue maintaining a "paper trail" of what you changed at each step of this. Thanks!

To see which dependencies have native code in them, look at this table: http://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-lafs-dep-eggs/README.html Every column there is a library that uses native (C/C++) code.

If I were you, my next step would be to clear the false positives out of valgrind (as I described in comment:384968). valgrind will either identify the problem, or if it gives a clean report then we can be confident that there is no branching on uninitialized memory, writing to memory out of bounds, or other of the memory errors that valgrind checks for.

If you're not interested in doing that then say so and lets think of some other approaches to track down this bug.

Replying to [rycee](/tahoe-lafs/trac/issues/1418#issuecomment-384972): > I have previously used 1.8.1 reliably as a helper so I suppose the problem might be caused due to some newer version of a library pulled in during `python setup.py build`? Yes, Tahoe-LAFS itself is written in pure Python. I suspect this problem is due to memory corruption or some other incorrect C/C++ code, so it is probably due to one of the dependencies. I really want to track down which one it is, so please try to continue maintaining a "paper trail" of what you changed at each step of this. Thanks! To see which dependencies have native code in them, look at this table: <http://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-lafs-dep-eggs/README.html> Every column there is a library that uses native (C/C++) code. If I were you, my next step would be to clear the false positives out of valgrind (as I described in [comment:384968](/tahoe-lafs/trac/issues/1418#issuecomment-384968)). valgrind will either identify the problem, or if it gives a clean report then we can be confident that there is *no* branching on uninitialized memory, writing to memory out of bounds, or other of the memory errors that valgrind checks for. If you're not interested in doing that then say so and lets think of some other approaches to track down this bug.

Oh I have an idea. Turn on tracing -- http://www.doughellmann.com/PyMOTW/trace/ -- then you'll know what line of source code triggered the fp exception. (Caveat: the Python process might buffer the output and then die due to the fp exception before it flushes the line telling the actual Python code which came immediately before the exception. So turn on the PYTHONUNBUFFERED=1 env var or pass -u to python.

Oh I have an idea. Turn on tracing -- <http://www.doughellmann.com/PyMOTW/trace/> -- then you'll know what line of source code triggered the fp exception. (Caveat: the Python process might buffer the output and then die due to the fp exception before it flushes the line telling the actual Python code which came immediately before the exception. So turn on the `PYTHONUNBUFFERED=1` env var or pass `-u` to `python`.
rycee commented 2011-06-24 15:49:39 +00:00
Author
Owner

Turns out Debian stable has a python2.6-dbg package. I installed that and caught the SIGFPE when running the allmydata.test.test_upload.EncodingParameters.test_upper_limit_on_readonly_queries test (it also fails and runs quicker than the test_backup test). Anyway, I get

Program received signal SIGFPE, Arithmetic exception.
0x0807de7c in int_float (v=0xa65a168) at ../Objects/intobject.c:945
945	../Objects/intobject.c: No such file or directory.
	in ../Objects/intobject.c
(gdb) info float
  R7: Valid   0x400e9c42000000000000 +40002                     
  R6: Valid   0x400e9c42000000000000 +40002                     
  R5: Zero    0x00000000000000000000 +0                         
  R4: Valid   0x40029000000000000000 +9                         
  R3: Zero    0x00000000000000000000 +0                         
  R2: Zero    0x00000000000000000000 +0                         
  R1: Special 0xffff6edb1408f6ce3bbe Unsupported
=>R0: Special 0xffffe649a0c99e4d8714 QNaN

Status Word:         0xc2e1   IE             PE   ES   SF      C1    C3
                       TOP: 0
Control Word:        0x1372      DM       UM PM
                       PC: Extended Precision (64-bits)
                       RC: Round to nearest
Tag Word:            0x045a
Instruction Pointer: 0x73:0x0807de79
Operand Pointer:     0x7b:0x0a65a170
Opcode:              0xdb40
(gdb) where
#0  0x0807de7c in int_float (v=0xa65a168) at ../Objects/intobject.c:945
#1  0x08062c78 in PyNumber_Float (o=40000) at ../Objects/abstract.c:1749
#2  0x0807d277 in float_new (type=0x822cf20, args=(40000,), kwds=0x0)
    at ../Objects/floatobject.c:1657
#3  0x080accbd in type_call (type=0x822cf20, args=(40000,), kwds=0x0)
    at ../Objects/typeobject.c:726
#4  0x0806232a in PyObject_Call (func=<type at remote 0x822cf20>, arg=(40000,), 
    kw=0x0) at ../Objects/abstract.c:2492
#5  0x080e016b in do_call (f=
    Frame 0xac1249c, for file /home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py, line 1331, in _got (params=(3, 1, 10, 40002), k=3, happy=1, n=10, segsize=40002, f=<cStringIO.StringI at remote 0xa5c83f8>, enckey_hasher=<_SHA256d_Hasher(_digest=None, h=<_sha256.SHA256 at remote 0xb73bd700>, truncate_to=16) at remote 0x9c5f4cc>, BLOCKSIZE=65536, bytes_read=40000, data='datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata...(truncated), throwflag=0)
    at ../Python/ceval.c:3968
Quit
(gdb) print v
$1 = (PyIntObject *) 0xa65a168
(gdb) print *v
$2 = {ob_refcnt = 2, ob_type = 0x822da20, ob_ival = 40000}

The failing line in intobject.c is return [PyFloat](wiki/PyFloat)_FromDouble((double)(v -> ob_ival)); so presumably it is the cast to double that fails. But since ob_ival = 40000 I can't see any reasonable reason why. The failing line in tahoe is self._status.set_progress(0, float(bytes_read)/self._size) in immutable/upload.py.

If I have time, then I'll try the valgrind thingy but at the moment I'm pretty much ready to chalk it all up to hexed hardware.

Turns out Debian stable has a python2.6-dbg package. I installed that and caught the SIGFPE when running the `allmydata.test.test_upload.EncodingParameters.test_upper_limit_on_readonly_queries` test (it also fails and runs quicker than the `test_backup` test). Anyway, I get ``` Program received signal SIGFPE, Arithmetic exception. 0x0807de7c in int_float (v=0xa65a168) at ../Objects/intobject.c:945 945 ../Objects/intobject.c: No such file or directory. in ../Objects/intobject.c (gdb) info float R7: Valid 0x400e9c42000000000000 +40002 R6: Valid 0x400e9c42000000000000 +40002 R5: Zero 0x00000000000000000000 +0 R4: Valid 0x40029000000000000000 +9 R3: Zero 0x00000000000000000000 +0 R2: Zero 0x00000000000000000000 +0 R1: Special 0xffff6edb1408f6ce3bbe Unsupported =>R0: Special 0xffffe649a0c99e4d8714 QNaN Status Word: 0xc2e1 IE PE ES SF C1 C3 TOP: 0 Control Word: 0x1372 DM UM PM PC: Extended Precision (64-bits) RC: Round to nearest Tag Word: 0x045a Instruction Pointer: 0x73:0x0807de79 Operand Pointer: 0x7b:0x0a65a170 Opcode: 0xdb40 (gdb) where #0 0x0807de7c in int_float (v=0xa65a168) at ../Objects/intobject.c:945 #1 0x08062c78 in PyNumber_Float (o=40000) at ../Objects/abstract.c:1749 #2 0x0807d277 in float_new (type=0x822cf20, args=(40000,), kwds=0x0) at ../Objects/floatobject.c:1657 #3 0x080accbd in type_call (type=0x822cf20, args=(40000,), kwds=0x0) at ../Objects/typeobject.c:726 #4 0x0806232a in PyObject_Call (func=<type at remote 0x822cf20>, arg=(40000,), kw=0x0) at ../Objects/abstract.c:2492 #5 0x080e016b in do_call (f= Frame 0xac1249c, for file /home/rycee/allmydata-tahoe-1.8.2/src/allmydata/immutable/upload.py, line 1331, in _got (params=(3, 1, 10, 40002), k=3, happy=1, n=10, segsize=40002, f=<cStringIO.StringI at remote 0xa5c83f8>, enckey_hasher=<_SHA256d_Hasher(_digest=None, h=<_sha256.SHA256 at remote 0xb73bd700>, truncate_to=16) at remote 0x9c5f4cc>, BLOCKSIZE=65536, bytes_read=40000, data='datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata...(truncated), throwflag=0) at ../Python/ceval.c:3968 Quit (gdb) print v $1 = (PyIntObject *) 0xa65a168 (gdb) print *v $2 = {ob_refcnt = 2, ob_type = 0x822da20, ob_ival = 40000} ``` The failing line in `intobject.c` is `return [PyFloat](wiki/PyFloat)_FromDouble((double)(v -> ob_ival));` so presumably it is the cast to double that fails. But since `ob_ival = 40000` I can't see any reasonable reason why. The failing line in tahoe is `self._status.set_progress(0, float(bytes_read)/self._size)` in `immutable/upload.py`. If I have time, then I'll try the valgrind thingy but at the moment I'm pretty much ready to chalk it all up to hexed hardware.

It can't be hexed hardware, because it is reproducible (at least mostly?) and it doesn't happen with any other executables, nor with other Python experiments. Say, here's an idea. My best guess is that one of the native-code dependencies (hopefully not zfec or pycryptopp -- the ones I'm responsible for!) are doing something bad. You could test this by running some complicated floating point math, then importing each native code dependency (see comment:23) and then running the floating point math again.

Also the tracing approach that I suggested in comment:384974 is quite likely to give us useful information. If you're not comfortable with all the Python coding suggested on http://www.doughellmann.com/PyMOTW/trace/ then I can whip up a script that'll do it for you. :-)

It can't be hexed hardware, because it is reproducible (at least mostly?) and it doesn't happen with any other executables, nor with other Python experiments. Say, here's an idea. My best guess is that one of the native-code dependencies (hopefully not zfec or pycryptopp -- the ones I'm responsible for!) are doing something bad. You could test this by running some complicated floating point math, then importing each native code dependency (see comment:23) and then running the floating point math again. Also the tracing approach that I suggested in [comment:384974](/tahoe-lafs/trac/issues/1418#issuecomment-384974) is quite likely to give us useful information. If you're not comfortable with all the Python coding suggested on <http://www.doughellmann.com/PyMOTW/trace/> then I can whip up a script that'll do it for you. :-)

Chad Austin had a suggestion on twitter:

Did you try asserting on the x87 and SSE control words periodically? I put that assertion after every test in our test runner.

https://twitter.com/#!/chadaustin/status/84508454357696514

Chad Austin had a suggestion on twitter: Did you try asserting on the x87 and SSE control words periodically? I put that assertion after every test in our test runner. <https://twitter.com/#!/chadaustin/status/84508454357696514>

I asked Chad Austin on twitter how to detect the x87 and SSE control words. He replied:

On Windows with Python 2.6, this seems to work:

> from ctypes import *
> x87_control_word = c_uint()
> sse_control_word = c_uint()
> print 'success?', cdll.msvcr90.__control87_2(0, 0, pointer(x87_control_word), pointer(sse_control_word))
> print 'x87', hex(x87_control_word.value)
> print 'sse', hex(sse_control_word.value)

The meanings of the control registers: http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz/

I asked Chad Austin on twitter how to detect the x87 and SSE control words. He replied: On Windows with Python 2.6, this seems to work: ``` > from ctypes import * > x87_control_word = c_uint() > sse_control_word = c_uint() > print 'success?', cdll.msvcr90.__control87_2(0, 0, pointer(x87_control_word), pointer(sse_control_word)) > print 'x87', hex(x87_control_word.value) > print 'sse', hex(sse_control_word.value) ``` The meanings of the control registers: <http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz/>

Kludgy but more cross-platform way to get the FP environment (assuming sizeof(fenv_t) <= 100):

from ctypes import cdll, create_string_buffer
from ctypes.util import find_library
libm = cdll.LoadLibrary(find_library('m'))
b = create_string_buffer(100)
assert libm.fegetenv(b) == 0
b.raw

which on my system gives

'\x7f\x03\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xa0\x1f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

To decode the structure, see http://linux.die.net/include/bits/fenv.h. Note that we probably only need to detect when it changes during a unit test. For that, I would try hacking the run method of TestCase in twisted.trial.unittest so that it calls libm.fegetenv(b) before and after running the test code (the result.startTest(self) line) and prints the values of b.raw if they are different.

Kludgy but more cross-platform way to get the FP environment (assuming `sizeof(fenv_t) <= 100`): ``` from ctypes import cdll, create_string_buffer from ctypes.util import find_library libm = cdll.LoadLibrary(find_library('m')) b = create_string_buffer(100) assert libm.fegetenv(b) == 0 b.raw ``` which on my system gives ``` '\x7f\x03\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xa0\x1f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' ``` To decode the structure, see <http://linux.die.net/include/bits/fenv.h>. Note that we probably only need to detect when it changes during a unit test. For that, I would try hacking the `run` method of `TestCase` in `twisted.trial.unittest` so that it calls `libm.fegetenv(b)` before and after running the test code (the `result.startTest(self)` line) and prints the values of `b.raw` if they are different.

There was a bug in the Linux kernel that would cause floating point state to be wrong "on 32-bit linux on modern CPUs":

http://article.gmane.org/gmane.linux.kernel/1258780

comment:384966 says that rycee's machine was in 32-bit mode. rycee's [/attachment/ticket/1418/show-tool-versions-output.txt show-tool-versions output] says that the kernel was 2.6.32-5 (from Debian). I wasn't able to tell from the kernel patch comments when this bug was introduced into the kernel.

Anyway, since rycee appears to have lost interest in debugging this, I suggest we close this ticket as unreproducible until rycee or someone else who can reproduce it reappears.

There was a bug in the Linux kernel that would cause floating point state to be wrong "on 32-bit linux on modern CPUs": <http://article.gmane.org/gmane.linux.kernel/1258780> [comment:384966](/tahoe-lafs/trac/issues/1418#issuecomment-384966) says that rycee's machine was in 32-bit mode. rycee's [/attachment/ticket/1418/show-tool-versions-output.txt show-tool-versions output] says that the kernel was 2.6.32-5 (from Debian). I wasn't able to tell from the kernel patch comments when this bug was introduced into the kernel. Anyway, since rycee appears to have lost interest in debugging this, I suggest we close this ticket as unreproducible until rycee or someone else who can reproduce it reappears.
rycee commented 2012-02-28 17:23:20 +00:00
Author
Owner

Yeah, sorry about not being active on this bug. I've been happy with using my tahoe grid without helper so I haven't felt very motivated to dig around.

zooko: The Linux bug you linked seem unrelated since the CPU on this computer is an old Pentium III-M, 1.2GHz and the bug was for newer, presumably 64-bit, CPUs.

But I have interesting news, just tried downloading and building Tahoe 1.9.1. When I ran allmydata.test.test_cli.Backup.test_backup it was successful! I thought it might be related to some glibc update or something so I also downloaded both 1.8.2 and 1.8.3, built them and ran the same test. They both failed in the same way as described in my previous comments. So, I assume that some change occurred between 1.8.3 and 1.9.0 that must have fixed my problem.

Just to double check that 1.9.1 actually fixes my problem I enabled the helper functionality and tried to upload through it, indeed it was successful.

Yeah, sorry about not being active on this bug. I've been happy with using my tahoe grid without helper so I haven't felt very motivated to dig around. zooko: The Linux bug you linked seem unrelated since the CPU on this computer is an old Pentium III-M, 1.2GHz and the bug was for newer, presumably 64-bit, CPUs. But I have interesting news, just tried downloading and building Tahoe 1.9.1. When I ran `allmydata.test.test_cli.Backup.test_backup` it was successful! I thought it might be related to some glibc update or something so I also downloaded both 1.8.2 and 1.8.3, built them and ran the same test. They both failed in the same way as described in my previous comments. So, I assume that some change occurred between 1.8.3 and 1.9.0 that must have fixed my problem. Just to double check that 1.9.1 actually fixes my problem I enabled the helper functionality and tried to upload through it, indeed it was successful.

rycee: oh boy! You've managed to reproducibly isolate a way to make the bug happen or not happen! Now here's a question: can you run git bisect and get it to automatically run the unit tests in between each version change so that it finds exactly what patch makes the difference? Or do the equivalent manually -- try different versions and narrow down exactly what causes the problem?Image()

Honestly, I'm still skeptical that anything we do in our 100% Python code could cause this bug, but perhaps something about the sequence of instructions that we're sending through the CPU accidentally avoids or accidentally triggers it...

Also I am still willing to help you use some diagnostic tools like valgrind or the fp env from comment:384979...

rycee: oh boy! You've managed to reproducibly isolate a way to make the bug happen or not happen! Now here's a question: can you run `git bisect` and get it to automatically run the unit tests in between each version change so that it finds exactly what patch makes the difference? Or do the equivalent manually -- try different versions and narrow down exactly what causes the problem?Image() Honestly, I'm still skeptical that anything we do in our 100% Python code could cause this bug, but perhaps something about the sequence of instructions that we're sending through the CPU accidentally avoids or accidentally triggers it... Also I am still willing to help you use some diagnostic tools like valgrind or the fp env from [comment:384979](/tahoe-lafs/trac/issues/1418#issuecomment-384979)...

Also you might want to try applying that one patchset from linux to change the way it manages floating point state during context switches and see if that makes the problem go away. I don't know for sure that the linux bug doesn't apply on an old Pentium III -- it was just something from, IIRC, gregkh's announcement that mentioned "newer CPUs".

Also you might want to try applying that one patchset from linux to change the way it manages floating point state during context switches and see if that makes the problem go away. I don't know for sure that the linux bug doesn't apply on an old Pentium III -- it was just something from, IIRC, gregkh's announcement that mentioned "newer CPUs".
rycee commented 2012-02-28 18:49:58 +00:00
Author
Owner

Ok, gave bisect a try and it told me that the issue was fixed in commit 095efbf7a5f8ee375a81136f87ba3943ec995fc6. This change seems to make Tahoe depend on pycryptopp version 0.5.20 or later on my machine, before this change it was satisfied with version 0.5.14 or later.

On my Debian Squeeze installation, the python-pycryptopp package is of version 0.5.17-1+b1 satisfying the dependency before davidsarah's change but not after. Finding this out, I of course tried uninstalling the python-pycryptopp Debian package and rebuild a pristine Tahoe 1.8.2 source tree. The failing test now happily passes!

About the kernel patch. I'm reluctant to try this since I have no physical access to the machine and rely on it being up stably.

Ok, gave bisect a try and it told me that the issue was fixed in commit [095efbf7a5f8ee375a81136f87ba3943ec995fc6](https://github.com/warner/tahoe-lafs/commit/095efbf7a5f8ee375a81136f87ba3943ec995fc6). This change seems to make Tahoe depend on pycryptopp version 0.5.20 or later on my machine, before this change it was satisfied with version 0.5.14 or later. On my Debian Squeeze installation, the python-pycryptopp package is of version 0.5.17-1+b1 satisfying the dependency before davidsarah's change but not after. Finding this out, I of course tried uninstalling the python-pycryptopp Debian package and rebuild a pristine Tahoe 1.8.2 source tree. The failing test now happily passes! About the kernel patch. I'm reluctant to try this since I have no physical access to the machine and rely on it being up stably.

Aha. It seems quite likely that one of the bugs fixed between pycryptopp 0.5.14 0.5.17-1+b1 and 0.5.20 could have been responsible. I doubt it has anything to do with the kernel bug.

Aha. It seems quite likely that one of the bugs fixed between pycryptopp ~~0.5.14~~ 0.5.17-1+b1 and 0.5.20 could have been responsible. I doubt it has anything to do with the kernel bug.
rycee commented 2012-02-28 19:06:56 +00:00
Author
Owner

I guess we can raise the lower bound to pycryptopp version 0.5.17-1+b1 since that's what my Debian box was using when the test was failing. I don't know whether it is worth trying to determine the problem in pycryptopp…

Anyway, thank you David-Sarah for making the commit 095efbf7a5f8ee375a81136f87ba3943ec995fc6 that got us closer to the underlying cause :-)

I guess we can raise the lower bound to pycryptopp version 0.5.17-1+b1 since that's what my Debian box was using when the test was failing. I don't know whether it is worth trying to determine the problem in pycryptopp… Anyway, thank you David-Sarah for making the commit 095efbf7a5f8ee375a81136f87ba3943ec995fc6 that got us closer to the underlying cause :-)

Note that it's possible that the Debian cryptopp was broken or miscompiled or included a bad Debian-specific patch, rather than this being a fix between upstream pycryptopp 0.5.17 and 0.5.20.

(Can you tell I don't really trust the Debian packaging policies or practices? We'd have a lot fewer problems if they would just follow upstream rather than picking out individual patches without really understanding them.)

Note that it's possible that the Debian cryptopp was broken or miscompiled or included a bad Debian-specific patch, rather than this being a fix between upstream pycryptopp 0.5.17 and 0.5.20. (Can you tell I don't really trust the Debian packaging policies or practices? We'd have a lot fewer problems if they would just follow upstream rather than picking out individual patches without really understanding them.)

Oh, also note that the Debian pycryptopp links against a non-embedded Crypto++, so we actually have no idea what version of Crypto++ that is (unless rycee can tell what it was).

Oh, also note that the Debian pycryptopp links against a non-embedded Crypto++, so we actually have no idea what version of Crypto++ that is (unless rycee can tell what it was).

I considered closing this ticket as fixed, since it seems relatively certain from the bisection that this is a fixed pycryptopp or Crypto++ issue. However, since we don't know whether Debian added a bad patch to their pycryptopp or Crypto++, I'll leave it open until we can determine that.

I considered closing this ticket as fixed, since it seems relatively certain from the bisection that this is a fixed pycryptopp or Crypto++ issue. However, since we don't know whether Debian added a bad patch to their pycryptopp or Crypto++, I'll leave it open until we can determine that.

Yes, it is also possible that there was a bug in upstream pycryptopp itself. rycee: can you experiment with swapping in the old Debian version of pycryptopp versus the old upstream version of pycryptopp, maybe?

Yes, it is also possible that there was a bug in upstream pycryptopp itself. rycee: can you experiment with swapping in the old Debian version of pycryptopp versus the old upstream version of pycryptopp, maybe?

Replying to zooko:

Yes, it is also possible that there was a bug in upstream pycryptopp itself.

One reason I thought that less likely is that I reviewed all the patches between 0.5.17 and 0.5.20 and didn't see any fixes that could have caused this bug.

rycee: can you experiment with swapping in the old Debian version of pycryptopp versus the old upstream version of pycryptopp, maybe?

Yes, that's a good test.

Replying to [zooko](/tahoe-lafs/trac/issues/1418#issuecomment-384990): > Yes, it is also possible that there was a bug in upstream pycryptopp itself. One reason I thought that less likely is that I reviewed [all the patches between 0.5.17 and 0.5.20](https://tahoe-lafs.org/trac/pycryptopp/log/trunk/?action=stop_on_copy&mode=stop_on_copy&rev=726&stop_rev=674&limit=100&verbose=on) and didn't see any fixes that could have caused this bug. > rycee: can you experiment with swapping in the old Debian version of pycryptopp versus the old upstream version of pycryptopp, maybe? Yes, that's a good test.
rycee commented 2012-02-29 07:19:39 +00:00
Author
Owner

I added 0.5.17 as an upper bound in _auto_deps.py and running setup.py build now prints

…
Searching for pycryptopp>=0.5.14,<=0.5.17
Reading http://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-dep-sdists/
Reading http://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-lafs-dep-eggs/
Reading http://pypi.python.org/simple/pycryptopp/
Reading http://allmydata.org/source/pycryptopp
Reading http://allmydata.org/trac/pycryptopp
Reading http://tahoe-lafs.org/trac/pycryptopp
Reading http://allmydata.org/source/zfec
Best match: pycryptopp 0.5.17
Downloading http://pypi.python.org/packages/source/p/pycryptopp/pycryptopp-0.5.17.tar.gz#md5=b3d19e7203531f8bd241ae58062f99e4
Processing pycryptopp-0.5.17.tar.gz
Running pycryptopp-0.5.17/setup.py -q bdist_egg --dist-dir /tmp/easy_install-SlyAet/pycryptopp-0.5.17/egg-dist-tmp-2wdZ7B
…

and it goes on to spend quite some time building the crypto++ library. After finishing, this build successfully runs the the previously failing test even though it's Tahoe version 1.8.2. If I then replace the _pycryptopp.so file from inside this Tahoe tree with the corresponding file from the Debian package the error comes back. The Debian version of pycryptopp is dynamically linked against crypto++ version 5.6.0-6.

So, finally, leaving Debian's _pycryptopp.so in place, I tried downloading and building a dynamic library from upstream crypto++ 5.6.0 and 5.6.1 and substituting Debian's copy. Curiously the test failed with 5.6.0 and worked with 5.6.1…

I added 0.5.17 as an upper bound in _auto_deps.py and running setup.py build now prints ``` … Searching for pycryptopp>=0.5.14,<=0.5.17 Reading http://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-dep-sdists/ Reading http://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-lafs-dep-eggs/ Reading http://pypi.python.org/simple/pycryptopp/ Reading http://allmydata.org/source/pycryptopp Reading http://allmydata.org/trac/pycryptopp Reading http://tahoe-lafs.org/trac/pycryptopp Reading http://allmydata.org/source/zfec Best match: pycryptopp 0.5.17 Downloading http://pypi.python.org/packages/source/p/pycryptopp/pycryptopp-0.5.17.tar.gz#md5=b3d19e7203531f8bd241ae58062f99e4 Processing pycryptopp-0.5.17.tar.gz Running pycryptopp-0.5.17/setup.py -q bdist_egg --dist-dir /tmp/easy_install-SlyAet/pycryptopp-0.5.17/egg-dist-tmp-2wdZ7B … ``` and it goes on to spend quite some time building the crypto++ library. After finishing, this build successfully runs the the previously failing test even though it's Tahoe version 1.8.2. If I then replace the `_pycryptopp.so` file from inside this Tahoe tree with the corresponding file from the Debian package the error comes back. The Debian version of pycryptopp is dynamically linked against [crypto++ version 5.6.0-6](http://packages.debian.org/squeeze/libcrypto++8). So, finally, leaving Debian's `_pycryptopp.so` in place, I tried downloading and building a dynamic library from upstream crypto++ 5.6.0 and 5.6.1 and substituting Debian's copy. Curiously the test failed with 5.6.0 and worked with 5.6.1…

Okay, this is very good bug isolation. I didn't quite follow the last bit though -- the bug is in upstream Crypto++ 5.6.0 and not upstream Crypto++ 5.6.1? Or it is in Debian Crypto++ 5.6.0 and not Debian Crypto++ 5.6.1?

By the way, we have a "quick startup self-test" built into pycryptopp, which is an attempt to catch the most egregious of these sorts of failures at run-time. I wonder if it would be possible to make that quick startup self-test more rigorous so that it would catch this one. I guess I'll be better able to tell once we really isolate this bug.

Okay, this is very good bug isolation. I didn't quite follow the last bit though -- the bug is in *upstream* Crypto++ 5.6.0 and not *upstream* Crypto++ 5.6.1? Or it is in Debian Crypto++ 5.6.0 and not Debian Crypto++ 5.6.1? By the way, we have a "quick startup self-test" built into pycryptopp, which is an attempt to catch the most egregious of these sorts of failures at run-time. I wonder if it would be possible to make that quick startup self-test more rigorous so that it would catch this one. I guess I'll be better able to tell once we really isolate this bug.
rycee commented 2012-03-01 06:16:03 +00:00
Author
Owner

Sorry about the vagueness. I used upstream cryptopp560.zip and cryptopp561.zip downloaded from http://www.cryptopp.com/. The 5.6.0 make file didn't have a target for a shared library so I copied that build target from 5.6.1, this is the only change I made to them. Building the cryptest.exe program and running it with ./cryptest.exe v didn't show any problems. Similarly, there were no errors reported for ./cryptest.exe tv [TestVectors](wiki/TestVectors)/all.txt.

Sorry about the vagueness. I used upstream `cryptopp560.zip` and `cryptopp561.zip` downloaded from <http://www.cryptopp.com/>. The 5.6.0 make file didn't have a target for a shared library so I copied that build target from 5.6.1, this is the only change I made to them. Building the `cryptest.exe` program and running it with `./cryptest.exe v` didn't show any problems. Similarly, there were no errors reported for `./cryptest.exe tv [TestVectors](wiki/TestVectors)/all.txt`.

Hm, well okay so we're pretty sure there is a bug in Crypto++ 5.6.0 that is fixed in Crypto++ 5.6.1, right? Maybe we could let it go at that, since it is apparently fixed. The steps to reproduce are:

  1. Get and build Crypto++ 5.6.0
  2. Build pycryptopp with the --disable-embedded-cryptopp option so that it will link against the .so generated in step 1 instead of using its own bundled copy of Crypto++ source code
  3. Run python setup.py test -s allmydata.test.test_cli.Backup.test_backup

On your machine this 100% reproducible triggers the problem, and swapping in Crypto++ 5.6.1 for 5.6.0 makes it stop happening?

Okay one more thing: how do I know I don't have the same bug in the copy of Crypto++ that is bundled inside pycryptopp? I don't think we've upgraded the embedded copy of Crypto++ a newer version of Crypto++ since this patch [91f454b47f66636fffde68e8a8bf4364bed7516e]:

commit 91f454b47f66636fffde68e8a8bf4364bed7516e
Author: weidai <weidai>
Date:   Sat Feb 2 00:40:27 2008 -0800

Four years ago, before Crypto++ 5.6.0 was released. Since then we've just been cherry-picking patches from upstream Crypto++. So if this bug was present in the version that we copied, and if we didn't realize that we needed to cherry-pick the patch that fixes it, then it could be present in the embedded copy of Crypto++ inside pycryptopp.

Could you please replace step 2 above with building and omitting the --disable-embedded-cryptopp flag, to force it to use its embedded version of Crypto++? I think you've probably done that already in this process, but if you could do it again now just to be sure, that would make me feel better.

Hm, well okay so we're pretty sure there is a bug in Crypto++ 5.6.0 that is fixed in Crypto++ 5.6.1, right? Maybe we could let it go at that, since it is apparently fixed. The steps to reproduce are: 1. Get and build Crypto++ 5.6.0 2. Build pycryptopp with the `--disable-embedded-cryptopp` option so that it will link against the .so generated in step 1 instead of using its own bundled copy of Crypto++ source code 3. Run `python setup.py test -s allmydata.test.test_cli.Backup.test_backup` On your machine this 100% reproducible triggers the problem, and swapping in Crypto++ 5.6.1 for 5.6.0 makes it stop happening? Okay one more thing: how do I know I don't have the same bug in the copy of Crypto++ that is bundled inside pycryptopp? I don't think we've upgraded the embedded copy of Crypto++ a newer version of Crypto++ since this patch [91f454b47f66636fffde68e8a8bf4364bed7516e]: ``` commit 91f454b47f66636fffde68e8a8bf4364bed7516e Author: weidai <weidai> Date: Sat Feb 2 00:40:27 2008 -0800 ``` Four years ago, before Crypto++ 5.6.0 [was released](http://cryptopp.com/#news). Since then we've just been cherry-picking patches from upstream Crypto++. So if this bug was present in the version that we copied, and if we didn't realize that we needed to cherry-pick the patch that fixes it, then it could be present in the embedded copy of Crypto++ inside pycryptopp. Could you please replace step 2 above with building and omitting the `--disable-embedded-cryptopp` flag, to force it to use its embedded version of Crypto++? I think you've probably done that already in this process, but if you could do it again now just to be sure, that would make me feel better.
rycee commented 2012-03-01 20:55:57 +00:00
Author
Owner

Ok, I downloaded pycryptopp version 0.5.29 and did two builds with it. Once with --disable-embedded-cryptopp and once without. I then ran python setup.py test on both. For the build that used the embedded version of Crypto++ this finished without any errors. For the one using the dynamically linked Crypto++ (v 5.6.0) it ended in failure and I'll make an attachment containing the output of the failed run.

Note, when I used --disable-embedded-cryptopp it was built against the Debian version of Crypto++ 5.6.0 since I don't know how to convince setup.py to look for headers and libraries in a non-standard directory. It still fails in the same way, though, if I use LD_LIBRARY_PATH to force it to load the upstream Crypto++ 5.6.0. Using the same technique to force it to use Crypto++ 5.6.1 results in an undefined symbol error.

I'm not entirely sure how to use a custom pycryptopp in a Tahoe build so I didn't try to use my two pycryptopp builds in Tahoe but I suppose the pycryptopp test is telling enough. Also, when I run the Tahoe build now (once I uninstalled the Debian pycryptopp) it downloads and builds pycryptopp using the embedded Crypto++; and as previously seen those builds are doing just fine.

Anyway, I think it's safe to say that the Crypto++ embedded in pycryptopp doesn't exhibit the problem I encountered.

Ok, I downloaded pycryptopp version 0.5.29 and did two builds with it. Once with `--disable-embedded-cryptopp` and once without. I then ran `python setup.py test` on both. For the build that used the embedded version of Crypto++ this finished without any errors. For the one using the dynamically linked Crypto++ (v 5.6.0) it ended in failure and I'll make an attachment containing the output of the failed run. Note, when I used `--disable-embedded-cryptopp` it was built against the Debian version of Crypto++ 5.6.0 since I don't know how to convince `setup.py` to look for headers and libraries in a non-standard directory. It still fails in the same way, though, if I use LD_LIBRARY_PATH to force it to load the upstream Crypto++ 5.6.0. Using the same technique to force it to use Crypto++ 5.6.1 results in an undefined symbol error. I'm not entirely sure how to use a custom pycryptopp in a Tahoe build so I didn't try to use my two pycryptopp builds in Tahoe but I suppose the pycryptopp test is telling enough. Also, when I run the Tahoe build now (once I uninstalled the Debian pycryptopp) it downloads and builds pycryptopp using the embedded Crypto++; and as previously seen those builds are doing just fine. Anyway, I think it's safe to say that the Crypto++ embedded in pycryptopp doesn't exhibit the problem I encountered.
rycee commented 2012-03-01 20:57:17 +00:00
Author
Owner

Attachment pycryptopp-dynlink-test-failure.txt (58050 bytes) added

Output from pycryptopp test when built with --disable-embedded-cryptopp

**Attachment** pycryptopp-dynlink-test-failure.txt (58050 bytes) added Output from pycryptopp test when built with --disable-embedded-cryptopp

Replying to rycee:

Note, when I used --disable-embedded-cryptopp it was built against the Debian version of Crypto++ 5.6.0 since I don't know how to convince setup.py to look for headers and libraries in a non-standard directory. It still fails in the same way, though, if I use LD_LIBRARY_PATH to force it to load the upstream Crypto++ 5.6.0.

This is almost sufficient to convince me that this is a fixed bug in Crypto++ (despite it not being clear which change fixed it). However, we haven't excluded the possibility that the problem interacts with the dynamic linking, i.e. that it fails in non-embedded mode.

Using the same technique to force it to use Crypto++ 5.6.1 results in an undefined symbol error.

It sounds like we should open a new (pycryptopp) ticket for that, since at some point Debian or Ubuntu will upgrade to 5.6.1 and then it may fail with the same error.

Replying to [rycee](/tahoe-lafs/trac/issues/1418#issuecomment-384996): > Note, when I used `--disable-embedded-cryptopp` it was built against the Debian version of Crypto++ 5.6.0 since I don't know how to convince `setup.py` to look for headers and libraries in a non-standard directory. It still fails in the same way, though, if I use LD_LIBRARY_PATH to force it to load the upstream Crypto++ 5.6.0. This is almost sufficient to convince me that this is a fixed bug in Crypto++ (despite it not being clear which change fixed it). However, we haven't excluded the possibility that the problem interacts with the dynamic linking, i.e. that it fails in non-embedded mode. > Using the same technique to force it to use Crypto++ 5.6.1 results in an undefined symbol error. It sounds like we should open a new (pycryptopp) ticket for that, since at some point Debian or Ubuntu will upgrade to 5.6.1 and then it may fail with the same error.

I agree to close this ticket as "invalid".

About the undefined symbol error, it sounds like rycee compiled it against the Crypto++ 5.6.0 headers provided by Debian, and then used LD_LIBRARY_PATH to force it to load Crypto++ 5.6.1 dynamic link library. That's not a safe thing to do and nobody, or at least not us, can do anything about that giving undefined symbol errors. Am I right, or am I confused?

I agree to close this ticket as "invalid". About the undefined symbol error, it sounds like rycee compiled it against the Crypto++ 5.6.0 headers provided by Debian, and then used `LD_LIBRARY_PATH` to force it to load Crypto++ 5.6.1 dynamic link library. That's not a safe thing to do and nobody, or at least not us, can do anything about that giving undefined symbol errors. Am I right, or am I confused?
rycee commented 2012-03-02 10:13:45 +00:00
Author
Owner

Replying to [davidsarah]comment:49:

Replying to rycee:

Note, when I used --disable-embedded-cryptopp it was built against the Debian version of Crypto++ 5.6.0 since I don't know how to convince setup.py to look for headers and libraries in a non-standard directory. It still fails in the same way, though, if I use LD_LIBRARY_PATH to force it to load the upstream Crypto++ 5.6.0.

This is almost sufficient to convince me that this is a fixed bug in Crypto++ (despite it not being clear which change fixed it). However, we haven't excluded the possibility that the problem interacts with the dynamic linking, i.e. that it fails in non-embedded mode.
Ok, so I tried a new way. I unpacked pycryptopp twice. In one of the trees I replaced the embeddedcryptopp directory by a pristine upstream Crypto++ 5.6.0 and in the other I replaced embeddedcryptopp by a pristine upstream Crypto 5.6.1. I then did a python setup.py build followed by python setup.py test in each of the two trees. The pycryptopp statically built with 5.6.0 failed the test (in the same way as the previously attached test log) while the build using Crypto++ 5.6.1 ran all tests successfully.

I'm a bit curious, however, as to why the cryptest.exe program from Crypto++ 5.6.0 didn't fail, maybe the failing test from pycryptopp could be ported to a test in cryptest.exe?

Using the same technique to force it to use Crypto++ 5.6.1 results in an undefined symbol error.

It sounds like we should open a new (pycryptopp) ticket for that, since at some point Debian or Ubuntu will upgrade to 5.6.1 and then it may fail with the same error.
It's like zooko said. Since I don't know how to make setup.py use a custom directly header-directory it built against the standard Debian Cryptop++ 5.6.0 headers.

However, curiously enough, when I used Debian's pycryptopp – which is dynamically linked against Debian's Crypto++ – I was able to change out the .so file from 5.6.0 to 5.6.1 without problem. Doing this is how I found out that 5.6.1 worked. I suppose it has something to do with the patches and how the Debian pycryptopp and Crypto++ packages are built.

Replying to [davidsarah]comment:49: > Replying to [rycee](/tahoe-lafs/trac/issues/1418#issuecomment-384996): > > Note, when I used `--disable-embedded-cryptopp` it was built against the Debian version of Crypto++ 5.6.0 since I don't know how to convince `setup.py` to look for headers and libraries in a non-standard directory. It still fails in the same way, though, if I use LD_LIBRARY_PATH to force it to load the upstream Crypto++ 5.6.0. > > This is almost sufficient to convince me that this is a fixed bug in Crypto++ (despite it not being clear which change fixed it). However, we haven't excluded the possibility that the problem interacts with the dynamic linking, i.e. that it fails in non-embedded mode. Ok, so I tried a new way. I unpacked pycryptopp twice. In one of the trees I replaced the `embeddedcryptopp` directory by a pristine upstream Crypto++ 5.6.0 and in the other I replaced `embeddedcryptopp` by a pristine upstream Crypto 5.6.1. I then did a `python setup.py build` followed by `python setup.py test` in each of the two trees. The pycryptopp statically built with 5.6.0 failed the test (in the same way as the previously attached test log) while the build using Crypto++ 5.6.1 ran all tests successfully. I'm a bit curious, however, as to why the `cryptest.exe` program from Crypto++ 5.6.0 didn't fail, maybe the failing test from pycryptopp could be ported to a test in `cryptest.exe`? > > Using the same technique to force it to use Crypto++ 5.6.1 results in an undefined symbol error. > > It sounds like we should open a new (pycryptopp) ticket for that, since at some point Debian or Ubuntu will upgrade to 5.6.1 and then it may fail with the same error. It's like [zooko said](/tahoe-lafs/trac/issues/1418#issuecomment-384998). Since I don't know how to make `setup.py` use a custom directly header-directory it built against the standard Debian Cryptop++ 5.6.0 headers. However, curiously enough, when I used Debian's pycryptopp – which is dynamically linked against Debian's Crypto++ – I *was* able to change out the `.so` file from 5.6.0 to 5.6.1 without problem. Doing this is how I found out that 5.6.1 worked. I suppose it has something to do with the patches and how the Debian pycryptopp and Crypto++ packages are built.

Replying to [rycee]comment:51:

Replying to [davidsarah]comment:49:

Replying to rycee:
[...]
I'm a bit curious, however, as to why the cryptest.exe program from Crypto++ 5.6.0 didn't fail, maybe the failing test from pycryptopp could be ported to a test in cryptest.exe?

The bug seems to affect the floating point state, and it's quite plausible that cryptest.exe doesn't test for that. A regression test would indeed be nice -- if you're sufficiently motivated then I suggest asking on the Crypto++ mailing list.

Using the same technique to force it to use Crypto++ 5.6.1 results in an undefined symbol error.

It sounds like we should open a new (pycryptopp) ticket for that, since at some point Debian or Ubuntu will upgrade to 5.6.1 and then it may fail with the same error.
It's like zooko said. Since I don't know how to make setup.py use a custom directly header-directory it built against the standard Debian Cryptop++ 5.6.0 headers.

Yes, as Zooko said this is an invalid test.

However, curiously enough, when I used Debian's pycryptopp – which is dynamically linked against Debian's Crypto++ – I was able to change out the .so file from 5.6.0 to 5.6.1 without problem. Doing this is how I found out that 5.6.1 worked. I suppose it has something to do with the patches and how the Debian pycryptopp and Crypto++ packages are built.

Ah, so if that worked then it is not the dynamic linking. Okay, let's stop worrying about this one.

Replying to [rycee]comment:51: > Replying to [davidsarah]comment:49: > > Replying to [rycee](/tahoe-lafs/trac/issues/1418#issuecomment-384996): [...] > I'm a bit curious, however, as to why the `cryptest.exe` program from Crypto++ 5.6.0 didn't fail, maybe the failing test from pycryptopp could be ported to a test in `cryptest.exe`? The bug seems to affect the floating point state, and it's quite plausible that `cryptest.exe` doesn't test for that. A regression test would indeed be nice -- if you're sufficiently motivated then I suggest asking on the Crypto++ mailing list. > > > Using the same technique to force it to use Crypto++ 5.6.1 results in an undefined symbol error. > > > > It sounds like we should open a new (pycryptopp) ticket for that, since at some point Debian or Ubuntu will upgrade to 5.6.1 and then it may fail with the same error. > It's like [zooko said](/tahoe-lafs/trac/issues/1418#issuecomment-384998). Since I don't know how to make `setup.py` use a custom directly header-directory it built against the standard Debian Cryptop++ 5.6.0 headers. Yes, as Zooko said this is an invalid test. > However, curiously enough, when I used Debian's pycryptopp – which is dynamically linked against Debian's Crypto++ – I *was* able to change out the `.so` file from 5.6.0 to 5.6.1 without problem. Doing this is how I found out that 5.6.1 worked. I suppose it has something to do with the patches and how the Debian pycryptopp and Crypto++ packages are built. Ah, so if that worked then it is not the dynamic linking. Okay, let's stop worrying about this one.

Okay, so we think this is a bug that was introduced into Crypto++ after we stopped updating pycryptopp's embedded copy of Crypto++ to track upstream Crypto++, and was then released in Crypto++ v5.6.0, and then was fixed in Crypto++ v5.6.1. It would be nice if we knew more precisely what bug was introduced and what patch fixed it, or if we at least had a reliable regression test that would tell us if we later introduce this bug into pycryptopp, but I'm not sure how to go about that and it might not be worth the effort.

So, shall we close this ticket now?

Okay, so we think this is a bug that was introduced into Crypto++ after we stopped updating pycryptopp's embedded copy of Crypto++ to track upstream Crypto++, and was then released in Crypto++ v5.6.0, and then was fixed in Crypto++ v5.6.1. It would be nice if we knew more precisely what bug was introduced and what patch fixed it, or if we at least had a reliable regression test that would tell us if we later introduce this bug into pycryptopp, but I'm not sure how to go about that and it might not be worth the effort. So, shall we close this ticket now?
rycee commented 2012-03-06 09:03:11 +00:00
Author
Owner

This bug should certainly be closed as invalid or something similar. I may explore Crypto++ a bit further as time and interest permits but will raise such a discussion in a more suitable forum.

Thanks for your help in tracking this down. It's been a tricky issue and I'm happy it resolved itself so pleasantly.

This bug should certainly be closed as invalid or something similar. I may explore Crypto++ a bit further as time and interest permits but will raise such a discussion in a more suitable forum. Thanks for your help in tracking this down. It's been a tricky issue and I'm happy it resolved itself so pleasantly.

Okay, thanks a lot for your help!

Okay, thanks a lot for your help!
zooko added the
r/invalid
label 2012-03-06 11:55:58 +00:00
zooko closed this issue 2012-03-06 11:55:58 +00:00
Sign in to join this conversation.
No labels
c/code
c/code-dirnodes
c/code-encoding
c/code-frontend
c/code-frontend-cli
c/code-frontend-ftp-sftp
c/code-frontend-magic-folder
c/code-frontend-web
c/code-mutable
c/code-network
c/code-nodeadmin
c/code-peerselection
c/code-storage
c/contrib
c/dev-infrastructure
c/docs
c/operational
c/packaging
c/unknown
c/website
kw:2pc
kw:410
kw:9p
kw:ActivePerl
kw:AttributeError
kw:DataUnavailable
kw:DeadReferenceError
kw:DoS
kw:FileZilla
kw:GetLastError
kw:IFinishableConsumer
kw:K
kw:LeastAuthority
kw:Makefile
kw:RIStorageServer
kw:StringIO
kw:UncoordinatedWriteError
kw:about
kw:access
kw:access-control
kw:accessibility
kw:accounting
kw:accounting-crawler
kw:add-only
kw:aes
kw:aesthetics
kw:alias
kw:aliases
kw:aliens
kw:allmydata
kw:amazon
kw:ambient
kw:annotations
kw:anonymity
kw:anonymous
kw:anti-censorship
kw:api_auth_token
kw:appearance
kw:appname
kw:apport
kw:archive
kw:archlinux
kw:argparse
kw:arm
kw:assertion
kw:attachment
kw:auth
kw:authentication
kw:automation
kw:avahi
kw:availability
kw:aws
kw:azure
kw:backend
kw:backoff
kw:backup
kw:backupdb
kw:backward-compatibility
kw:bandwidth
kw:basedir
kw:bayes
kw:bbfreeze
kw:beta
kw:binaries
kw:binutils
kw:bitcoin
kw:bitrot
kw:blacklist
kw:blocker
kw:blocks-cloud-deployment
kw:blocks-cloud-merge
kw:blocks-magic-folder-merge
kw:blocks-merge
kw:blocks-raic
kw:blocks-release
kw:blog
kw:bom
kw:bonjour
kw:branch
kw:branding
kw:breadcrumbs
kw:brians-opinion-needed
kw:browser
kw:bsd
kw:build
kw:build-helpers
kw:buildbot
kw:builders
kw:buildslave
kw:buildslaves
kw:cache
kw:cap
kw:capleak
kw:captcha
kw:cast
kw:centos
kw:cffi
kw:chacha
kw:charset
kw:check
kw:checker
kw:chroot
kw:ci
kw:clean
kw:cleanup
kw:cli
kw:cloud
kw:cloud-backend
kw:cmdline
kw:code
kw:code-checks
kw:coding-standards
kw:coding-tools
kw:coding_tools
kw:collection
kw:compatibility
kw:completion
kw:compression
kw:confidentiality
kw:config
kw:configuration
kw:configuration.txt
kw:conflict
kw:connection
kw:connectivity
kw:consistency
kw:content
kw:control
kw:control.furl
kw:convergence
kw:coordination
kw:copyright
kw:corruption
kw:cors
kw:cost
kw:coverage
kw:coveralls
kw:coveralls.io
kw:cpu-watcher
kw:cpyext
kw:crash
kw:crawler
kw:crawlers
kw:create-container
kw:cruft
kw:crypto
kw:cryptography
kw:cryptography-lib
kw:cryptopp
kw:csp
kw:curl
kw:cutoff-date
kw:cycle
kw:cygwin
kw:d3
kw:daemon
kw:darcs
kw:darcsver
kw:database
kw:dataloss
kw:db
kw:dead-code
kw:deb
kw:debian
kw:debug
kw:deep-check
kw:defaults
kw:deferred
kw:delete
kw:deletion
kw:denial-of-service
kw:dependency
kw:deployment
kw:deprecation
kw:desert-island
kw:desert-island-build
kw:design
kw:design-review-needed
kw:detection
kw:dev-infrastructure
kw:devpay
kw:directory
kw:directory-page
kw:dirnode
kw:dirnodes
kw:disconnect
kw:discovery
kw:disk
kw:disk-backend
kw:distribute
kw:distutils
kw:dns
kw:do_http
kw:doc-needed
kw:docker
kw:docs
kw:docs-needed
kw:dokan
kw:dos
kw:download
kw:downloader
kw:dragonfly
kw:drop-upload
kw:duplicity
kw:dusty
kw:earth-dragon
kw:easy
kw:ec2
kw:ecdsa
kw:ed25519
kw:egg-needed
kw:eggs
kw:eliot
kw:email
kw:empty
kw:encoding
kw:endpoint
kw:enterprise
kw:enum34
kw:environment
kw:erasure
kw:erasure-coding
kw:error
kw:escaping
kw:etag
kw:etch
kw:evangelism
kw:eventual
kw:example
kw:excess-authority
kw:exec
kw:exocet
kw:expiration
kw:extensibility
kw:extension
kw:failure
kw:fedora
kw:ffp
kw:fhs
kw:figleaf
kw:file
kw:file-descriptor
kw:filename
kw:filesystem
kw:fileutil
kw:fips
kw:firewall
kw:first
kw:floatingpoint
kw:flog
kw:foolscap
kw:forward-compatibility
kw:forward-secrecy
kw:forwarding
kw:free
kw:freebsd
kw:frontend
kw:fsevents
kw:ftp
kw:ftpd
kw:full
kw:furl
kw:fuse
kw:garbage
kw:garbage-collection
kw:gateway
kw:gatherer
kw:gc
kw:gcc
kw:gentoo
kw:get
kw:git
kw:git-annex
kw:github
kw:glacier
kw:globalcaps
kw:glossary
kw:google-cloud-storage
kw:google-drive-backend
kw:gossip
kw:governance
kw:grid
kw:grid-manager
kw:gridid
kw:gridsync
kw:grsec
kw:gsoc
kw:gvfs
kw:hackfest
kw:hacktahoe
kw:hang
kw:hardlink
kw:heartbleed
kw:heisenbug
kw:help
kw:helper
kw:hint
kw:hooks
kw:how
kw:how-to
kw:howto
kw:hp
kw:hp-cloud
kw:html
kw:http
kw:https
kw:i18n
kw:i2p
kw:i2p-collab
kw:illustration
kw:image
kw:immutable
kw:impressions
kw:incentives
kw:incident
kw:init
kw:inlineCallbacks
kw:inotify
kw:install
kw:installer
kw:integration
kw:integration-test
kw:integrity
kw:interactive
kw:interface
kw:interfaces
kw:interoperability
kw:interstellar-exploration
kw:introducer
kw:introduction
kw:iphone
kw:ipkg
kw:iputil
kw:ipv6
kw:irc
kw:jail
kw:javascript
kw:joke
kw:jquery
kw:json
kw:jsui
kw:junk
kw:key-value-store
kw:kfreebsd
kw:known-issue
kw:konqueror
kw:kpreid
kw:kvm
kw:l10n
kw:lae
kw:large
kw:latency
kw:leak
kw:leasedb
kw:leases
kw:libgmp
kw:license
kw:licenss
kw:linecount
kw:link
kw:linux
kw:lit
kw:localhost
kw:location
kw:locking
kw:logging
kw:logo
kw:loopback
kw:lucid
kw:mac
kw:macintosh
kw:magic-folder
kw:manhole
kw:manifest
kw:manual-test-needed
kw:map
kw:mapupdate
kw:max_space
kw:mdmf
kw:memcheck
kw:memory
kw:memory-leak
kw:mesh
kw:metadata
kw:meter
kw:migration
kw:mime
kw:mingw
kw:minimal
kw:misc
kw:miscapture
kw:mlp
kw:mock
kw:more-info-needed
kw:mountain-lion
kw:move
kw:multi-users
kw:multiple
kw:multiuser-gateway
kw:munin
kw:music
kw:mutability
kw:mutable
kw:mystery
kw:names
kw:naming
kw:nas
kw:navigation
kw:needs-review
kw:needs-spawn
kw:netbsd
kw:network
kw:nevow
kw:new-user
kw:newcaps
kw:news
kw:news-done
kw:news-needed
kw:newsletter
kw:newurls
kw:nfc
kw:nginx
kw:nixos
kw:no-clobber
kw:node
kw:node-url
kw:notification
kw:notifyOnDisconnect
kw:nsa310
kw:nsa320
kw:nsa325
kw:numpy
kw:objects
kw:old
kw:openbsd
kw:openitp-packaging
kw:openssl
kw:openstack
kw:opensuse
kw:operation-helpers
kw:operational
kw:operations
kw:ophandle
kw:ophandles
kw:ops
kw:optimization
kw:optional
kw:options
kw:organization
kw:os
kw:os.abort
kw:ostrom
kw:osx
kw:osxfuse
kw:otf-magic-folder-objective1
kw:otf-magic-folder-objective2
kw:otf-magic-folder-objective3
kw:otf-magic-folder-objective4
kw:otf-magic-folder-objective5
kw:otf-magic-folder-objective6
kw:p2p
kw:packaging
kw:partial
kw:password
kw:path
kw:paths
kw:pause
kw:peer-selection
kw:performance
kw:permalink
kw:permissions
kw:persistence
kw:phone
kw:pickle
kw:pip
kw:pipermail
kw:pkg_resources
kw:placement
kw:planning
kw:policy
kw:port
kw:portability
kw:portal
kw:posthook
kw:pratchett
kw:preformance
kw:preservation
kw:privacy
kw:process
kw:profile
kw:profiling
kw:progress
kw:proxy
kw:publish
kw:pyOpenSSL
kw:pyasn1
kw:pycparser
kw:pycrypto
kw:pycrypto-lib
kw:pycryptopp
kw:pyfilesystem
kw:pyflakes
kw:pylint
kw:pypi
kw:pypy
kw:pysqlite
kw:python
kw:python3
kw:pythonpath
kw:pyutil
kw:pywin32
kw:quickstart
kw:quiet
kw:quotas
kw:quoting
kw:raic
kw:rainhill
kw:random
kw:random-access
kw:range
kw:raspberry-pi
kw:reactor
kw:readonly
kw:rebalancing
kw:recovery
kw:recursive
kw:redhat
kw:redirect
kw:redressing
kw:refactor
kw:referer
kw:referrer
kw:regression
kw:rekey
kw:relay
kw:release
kw:release-blocker
kw:reliability
kw:relnotes
kw:remote
kw:removable
kw:removable-disk
kw:rename
kw:renew
kw:repair
kw:replace
kw:report
kw:repository
kw:research
kw:reserved_space
kw:response-needed
kw:response-time
kw:restore
kw:retrieve
kw:retry
kw:review
kw:review-needed
kw:reviewed
kw:revocation
kw:roadmap
kw:rollback
kw:rpm
kw:rsa
kw:rss
kw:rst
kw:rsync
kw:rusty
kw:s3
kw:s3-backend
kw:s3-frontend
kw:s4
kw:same-origin
kw:sandbox
kw:scalability
kw:scaling
kw:scheduling
kw:schema
kw:scheme
kw:scp
kw:scripts
kw:sdist
kw:sdmf
kw:security
kw:self-contained
kw:server
kw:servermap
kw:servers-of-happiness
kw:service
kw:setup
kw:setup.py
kw:setup_requires
kw:setuptools
kw:setuptools_darcs
kw:sftp
kw:shared
kw:shareset
kw:shell
kw:signals
kw:simultaneous
kw:six
kw:size
kw:slackware
kw:slashes
kw:smb
kw:sneakernet
kw:snowleopard
kw:socket
kw:solaris
kw:space
kw:space-efficiency
kw:spam
kw:spec
kw:speed
kw:sqlite
kw:ssh
kw:ssh-keygen
kw:sshfs
kw:ssl
kw:stability
kw:standards
kw:start
kw:startup
kw:static
kw:static-analysis
kw:statistics
kw:stats
kw:stats_gatherer
kw:status
kw:stdeb
kw:storage
kw:streaming
kw:strports
kw:style
kw:stylesheet
kw:subprocess
kw:sumo
kw:survey
kw:svg
kw:symlink
kw:synchronous
kw:tac
kw:tahoe-*
kw:tahoe-add-alias
kw:tahoe-admin
kw:tahoe-archive
kw:tahoe-backup
kw:tahoe-check
kw:tahoe-cp
kw:tahoe-create-alias
kw:tahoe-create-introducer
kw:tahoe-debug
kw:tahoe-deep-check
kw:tahoe-deepcheck
kw:tahoe-lafs-trac-stream
kw:tahoe-list-aliases
kw:tahoe-ls
kw:tahoe-magic-folder
kw:tahoe-manifest
kw:tahoe-mkdir
kw:tahoe-mount
kw:tahoe-mv
kw:tahoe-put
kw:tahoe-restart
kw:tahoe-rm
kw:tahoe-run
kw:tahoe-start
kw:tahoe-stats
kw:tahoe-unlink
kw:tahoe-webopen
kw:tahoe.css
kw:tahoe_files
kw:tahoewapi
kw:tarball
kw:tarballs
kw:tempfile
kw:templates
kw:terminology
kw:test
kw:test-and-set
kw:test-from-egg
kw:test-needed
kw:testgrid
kw:testing
kw:tests
kw:throttling
kw:ticket999-s3-backend
kw:tiddly
kw:time
kw:timeout
kw:timing
kw:to
kw:to-be-closed-on-2011-08-01
kw:tor
kw:tor-protocol
kw:torsocks
kw:tox
kw:trac
kw:transparency
kw:travis
kw:travis-ci
kw:trial
kw:trickle
kw:trivial
kw:truckee
kw:tub
kw:tub.location
kw:twine
kw:twistd
kw:twistd.log
kw:twisted
kw:twisted-14
kw:twisted-trial
kw:twitter
kw:twn
kw:txaws
kw:type
kw:typeerror
kw:ubuntu
kw:ucwe
kw:ueb
kw:ui
kw:unclean
kw:uncoordinated-writes
kw:undeletable
kw:unfinished-business
kw:unhandled-error
kw:unhappy
kw:unicode
kw:unit
kw:unix
kw:unlink
kw:update
kw:upgrade
kw:upload
kw:upload-helper
kw:uri
kw:url
kw:usability
kw:use-case
kw:utf-8
kw:util
kw:uwsgi
kw:ux
kw:validation
kw:variables
kw:vdrive
kw:verify
kw:verlib
kw:version
kw:versioning
kw:versions
kw:video
kw:virtualbox
kw:virtualenv
kw:vista
kw:visualization
kw:visualizer
kw:vm
kw:volunteergrid2
kw:volunteers
kw:vpn
kw:wapi
kw:warners-opinion-needed
kw:warning
kw:weapi
kw:web
kw:web.port
kw:webapi
kw:webdav
kw:webdrive
kw:webport
kw:websec
kw:website
kw:websocket
kw:welcome
kw:welcome-page
kw:welcomepage
kw:wiki
kw:win32
kw:win64
kw:windows
kw:windows-related
kw:winscp
kw:workaround
kw:world-domination
kw:wrapper
kw:write-enabler
kw:wui
kw:x86
kw:x86-64
kw:xhtml
kw:xml
kw:xss
kw:zbase32
kw:zetuptoolz
kw:zfec
kw:zookos-opinion-needed
kw:zope
kw:zope.interface
p/blocker
p/critical
p/major
p/minor
p/normal
p/supercritical
p/trivial
r/cannot reproduce
r/duplicate
r/fixed
r/invalid
r/somebody else's problem
r/was already fixed
r/wontfix
r/worksforme
t/defect
t/enhancement
t/task
v/0.2.0
v/0.3.0
v/0.4.0
v/0.5.0
v/0.5.1
v/0.6.0
v/0.6.1
v/0.7.0
v/0.8.0
v/0.9.0
v/1.0.0
v/1.1.0
v/1.10.0
v/1.10.1
v/1.10.2
v/1.10a2
v/1.11.0
v/1.12.0
v/1.12.1
v/1.13.0
v/1.14.0
v/1.15.0
v/1.15.1
v/1.2.0
v/1.3.0
v/1.4.1
v/1.5.0
v/1.6.0
v/1.6.1
v/1.7.0
v/1.7.1
v/1.7β
v/1.8.0
v/1.8.1
v/1.8.2
v/1.8.3
v/1.8β
v/1.9.0
v/1.9.0-s3branch
v/1.9.0a1
v/1.9.0a2
v/1.9.0b1
v/1.9.1
v/1.9.2
v/1.9.2a1
v/cloud-branch
v/unknown
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac#1418
No description provided.