errors during add-lease cause checker false-negatives #875

Closed
opened 2009-12-29 21:26:43 +00:00 by warner · 1 comment
warner commented 2009-12-29 21:26:43 +00:00
Owner

Francois, on the 25c3 grid, observed failures (looking very much like the exceptions in #786) that occurred when "check --repair --add-lease" was done on a directory for which all of the shares were stored on an old (tahoe-1.2.0) server.

The implementation of the mutable checker's add-lease code was such that any unexpected errors in the add-lease call would cause the checker to ignore any shares reported by the simultaneous readv call. tahoe-1.2.0 had a bug in the latency-measure code such that add-lease always threw a KeyError (I'd mis-analyzed the relative levels of support in my notes in the NEWS file).

So when he did "check --add-lease --repair", the add-lease bug caused the checker to think there were no shares available, so it attempted repair, and the lack of any shares caused repair to fail with the weird TypeError that is the focus of #786.

The fix is to separate out the add-lease response/errback path from the readv path. I want to record some errors but ignore the ones that I think are harmless and noisy, like the known limitations of older tahoe versions.

The immutable checker code uses the same pipelined add-lease/DYHB queries, and needs to be fixed also.

Francois, on the 25c3 grid, observed failures (looking very much like the exceptions in #786) that occurred when "check --repair --add-lease" was done on a directory for which all of the shares were stored on an old (tahoe-1.2.0) server. The implementation of the mutable checker's add-lease code was such that any unexpected errors in the add-lease call would cause the checker to ignore any shares reported by the simultaneous readv call. tahoe-1.2.0 had a bug in the latency-measure code such that add-lease always threw a `KeyError` (I'd mis-analyzed the relative levels of support in my notes in the NEWS file). So when he did "check --add-lease --repair", the add-lease bug caused the checker to think there were no shares available, so it attempted repair, and the lack of any shares caused repair to fail with the weird `TypeError` that is the focus of #786. The fix is to separate out the add-lease response/errback path from the readv path. I want to record some errors but ignore the ones that I think are harmless and noisy, like the known limitations of older tahoe versions. The immutable checker code uses the same pipelined add-lease/DYHB queries, and needs to be fixed also.
tahoe-lafs added the
code-encoding
major
defect
1.5.0
labels 2009-12-29 21:26:43 +00:00
tahoe-lafs added this to the undecided milestone 2009-12-29 21:26:43 +00:00
warner commented 2009-12-29 23:11:58 +00:00
Author
Owner

Ok, changeset:794e32738fc654ae should fix this one (both mutable and immutable).

Ok, changeset:794e32738fc654ae should fix this one (both mutable and immutable).
tahoe-lafs added the
fixed
label 2009-12-29 23:11:58 +00:00
tahoe-lafs modified the milestone from undecided to 1.6.0 2009-12-29 23:11:58 +00:00
warner closed this issue 2009-12-29 23:11:58 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#875
No description provided.