Tahoe reports catch-up incidents to a log gatherer with a Unicode filename, which results in them being dropped #1725

Closed
opened 2012-04-29 01:54:29 +00:00 by davidsarah · 6 comments
davidsarah commented 2012-04-29 01:54:29 +00:00
Owner

At source:src/allmydata/node.py@5469#L349, we have:

        incident_dir = os.path.join(self.basedir, "logs", "incidents")
        # this doesn't quite work yet: unit tests fail
        foolscap.logging.log.setLogDir(incident_dir)

(ignore the comment; it's not relevant to this ticket).

Since self.basedir is Unicode, so is incident_dir. foolscap mostly tolerates this, but sometimes ends up sending a Unicode filename to the log gatherer, which causes a type Violation, e.g.:

q2z53drs#188 17:29:18.675: Unhandled Error
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/eventual.py", line 26, in _turn
    cb(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/logging/publish.py", line 106, in subscribe
    self.catch_up(since)
  File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/logging/publish.py", line 114, in catch_up
    self.observer.callRemoteOnly("new_incident", name, trigger)
  File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/referenceable.py", line 422, in callRemoteOnly
    *args, **kwargs)
--- <exception caught here> ---
  File "/usr/local/lib/python2.6/dist-packages/Twisted-11.1.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 134, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/referenceable.py", line 482, in _callRemote
    methodSchema.checkAllArgs(args, kwargs, False)
  File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/remoteinterface.py", line 284, in checkAllArgs
    constraint.checkObject(argvalue, inbound)
  File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/constraint.py", line 220, in checkObject
    raise Violation("'%r' is not a bytestring" % (obj,))
foolscap.tokens.Violation: Violation (RILogObserver.foolscap.lothar.com.new_incident(name=)): ("'u'incident-2012-04-28--21-28-05Z-q3rwjdq'' is not a bytestring",)

The code in foolscap that creates the Unicode filenames is LogPublisher.list_incident_names in foolscap/logging/publish.py. Due to Python 2.x's implicit unicode<->str conversions (booo!) and "do what I thought you wanted" behaviour of the filesystem APIs, there is no Python type error.

The effect is that if a log-gatherer was down when incidents occurred and subsequently tries to catch up, those incidents will be dropped.

This is a regression that was introduced with the Unicode basedir changes released in 1.8 (specifically changeset:618db4867c68a6f9).

At source:src/allmydata/node.py@5469#L349, we have: ``` incident_dir = os.path.join(self.basedir, "logs", "incidents") # this doesn't quite work yet: unit tests fail foolscap.logging.log.setLogDir(incident_dir) ``` (ignore the comment; it's not relevant to this ticket). Since `self.basedir` is Unicode, so is `incident_dir`. foolscap mostly tolerates this, but sometimes ends up sending a Unicode filename to the log gatherer, which causes a type Violation, e.g.: ``` q2z53drs#188 17:29:18.675: Unhandled Error Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/eventual.py", line 26, in _turn cb(*args, **kwargs) File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/logging/publish.py", line 106, in subscribe self.catch_up(since) File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/logging/publish.py", line 114, in catch_up self.observer.callRemoteOnly("new_incident", name, trigger) File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/referenceable.py", line 422, in callRemoteOnly *args, **kwargs) --- <exception caught here> --- File "/usr/local/lib/python2.6/dist-packages/Twisted-11.1.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 134, in maybeDeferred result = f(*args, **kw) File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/referenceable.py", line 482, in _callRemote methodSchema.checkAllArgs(args, kwargs, False) File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/remoteinterface.py", line 284, in checkAllArgs constraint.checkObject(argvalue, inbound) File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.3-py2.6.egg/foolscap/constraint.py", line 220, in checkObject raise Violation("'%r' is not a bytestring" % (obj,)) foolscap.tokens.Violation: Violation (RILogObserver.foolscap.lothar.com.new_incident(name=)): ("'u'incident-2012-04-28--21-28-05Z-q3rwjdq'' is not a bytestring",) ``` The code in foolscap that creates the Unicode filenames is LogPublisher.list_incident_names in foolscap/logging/publish.py. Due to Python 2.x's implicit unicode<->str conversions (booo!) and "do what I thought you wanted" behaviour of the filesystem APIs, there is no Python type error. The effect is that if a log-gatherer was down when incidents occurred and subsequently tries to catch up, those incidents will be dropped. This is a regression that was introduced with the Unicode basedir changes released in 1.8 (specifically changeset:618db4867c68a6f9).
tahoe-lafs added the
code-nodeadmin
major
defect
1.9.1
labels 2012-04-29 01:54:29 +00:00
tahoe-lafs added this to the 1.9.2 milestone 2012-04-29 01:54:29 +00:00
davidsarah commented 2012-04-29 02:25:05 +00:00
Author
Owner

Attachment fix-and-test-1725.darcs.patch (96657 bytes) added

Make sure that foolscap.logging.log.setLogDir is called with a str (not unicode) path. Includes test. fixes #1725

**Attachment** fix-and-test-1725.darcs.patch (96657 bytes) added Make sure that foolscap.logging.log.setLogDir is called with a str (not unicode) path. Includes test. fixes #1725
zooko commented 2012-04-29 02:39:10 +00:00
Author
Owner

One possible change would be to extend RILogObserver.new_incident's type-checking to allow unicode in addition to str. The old way of thinking is that things which are only ever going to be ASCII should be str, and things which might have non-ASCII chars should be unicode. The new way of thinking (exemplified by Python 3) is that things which contain non-human-meaningful binary data should be str (soon to be known as bytestring) and things which contain human-meaningful characters should be unicode. (Even if those human-meaningful characters will never be any but the characters found in ASCII.)

So, if you feel like playing along with the Python way of doing things it makes sense to define the name variable (which looks like 'incident-TIMESTAMP-UNIQUE') as unicode.

One possible change would be to extend [RILogObserver.new_incident](https://github.com/warner/foolscap/blob/4a1be0f81c8014c5f5936ea41d0b364bcefd0164/foolscap/logging/interfaces.py#L19)'s type-checking to allow unicode in addition to str. The old way of thinking is that things which are only ever going to be ASCII should be str, and things which might have non-ASCII chars should be unicode. The new way of thinking (exemplified by Python 3) is that things which contain non-human-meaningful binary data should be str (soon to be known as `bytestring`) and things which contain human-meaningful characters should be unicode. (Even if those human-meaningful characters will never be any but the characters found in ASCII.) So, if you feel like playing along with the Python way of doing things it makes sense to define the `name` variable (which looks like 'incident-TIMESTAMP-UNIQUE') as unicode.
zooko commented 2012-04-29 02:44:36 +00:00
Author
Owner

Well, I reviewed the patch -- fix-and-test-1725.darcs.patch -- and I agree that it will cause setLogDir to be called with a str argument. I don't know what all the effects are of making that argument be str on all platforms. Presumably it works fine, because that's the old way of doing things and foolscap and Twisted know how to handle it. So, +0. I see no bug.

Well, I reviewed the patch -- [fix-and-test-1725.darcs.patch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-45e7-cd73-0bc7-fb65b79b6488) -- and I agree that it will cause `setLogDir` to be called with a `str` argument. I don't know what all the effects are of making that argument be `str` on all platforms. Presumably it works fine, because that's the old way of doing things and foolscap and Twisted know how to handle it. So, +0. I see no bug.
davidsarah commented 2012-04-29 02:52:18 +00:00
Author
Owner

Replying to zooko:

One possible change would be to extend RILogObserver.new_incident's type-checking to allow unicode in addition to str. The old way of thinking is that things which are only ever going to be ASCII should be str, and things which might have non-ASCII chars should be unicode.

Well, maybe, but it's a Tahoe bug that it failed to adhere to the implicit contract of setLogDir as taking a str. If we wanted to pass a Unicode path, we'd need to update foolscap to accept that (also for the "logport-furlfile" tub option), then change Tahoe to depend on that version of foolscap. And then foolscap would probably still end up converting it to a str to preserve wire protocol compatibility with log gatherers running an earlier version. Too much hassle IMHO.

Thanks for the review.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1725#issuecomment-129835): > One possible change would be to extend [RILogObserver.new_incident](https://github.com/warner/foolscap/blob/4a1be0f81c8014c5f5936ea41d0b364bcefd0164/foolscap/logging/interfaces.py#L19)'s type-checking to allow unicode in addition to str. The old way of thinking is that things which are only ever going to be ASCII should be str, and things which might have non-ASCII chars should be unicode. Well, maybe, but it's a Tahoe bug that it failed to adhere to the implicit contract of `setLogDir` as taking a `str`. If we wanted to pass a Unicode path, we'd need to update foolscap to accept that (also for the "logport-furlfile" tub option), then change Tahoe to depend on that version of foolscap. And then foolscap would probably still end up converting it to a `str` to preserve wire protocol compatibility with log gatherers running an earlier version. Too much hassle IMHO. Thanks for the review.
david-sarah@jacaranda.org commented 2012-04-29 02:53:26 +00:00
Author
Owner

In changeset:a5553369105d6c9f:

Make sure that foolscap.logging.log.setLogDir is called with a str (not unicode) path, v2. Includes test. fixes #1725
In changeset:a5553369105d6c9f: ``` Make sure that foolscap.logging.log.setLogDir is called with a str (not unicode) path, v2. Includes test. fixes #1725 ```
tahoe-lafs added the
fixed
label 2012-04-29 02:53:26 +00:00
david-sarah@jacaranda.org closed this issue 2012-04-29 02:53:26 +00:00
david-sarah@jacaranda.org commented 2012-04-29 02:56:28 +00:00
Author
Owner

In changeset:5646/ticket999-S3-backend:

Make sure that foolscap.logging.log.setLogDir is called with a str (not unicode) path, v2. Includes test. fixes #1725
In changeset:5646/ticket999-S3-backend: ``` Make sure that foolscap.logging.log.setLogDir is called with a str (not unicode) path, v2. Includes test. fixes #1725 ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1725
No description provided.