mutable file: survive encoding variations #312

Closed
opened 2008-02-13 19:25:55 +00:00 by warner · 7 comments
warner commented 2008-02-13 19:25:55 +00:00
Owner

The current mutable.py has a nasty bug lurking: since the encoding parameters
(k and N) are not included in the URI, a copy is put in each share. The
Retrieve code latches on to the first version it sees, and ignores the values
from all subsequently-fetched shares. If (for whatever reason) some clients
have uploaded the file with different parameters (specifically different
values of k, say 3-of-10 vs 2-of-6), then we could wind up feeding 3-of-10
shares into a zfec decoder configured for 2-of-6, which would cause silent
data corruption.

The first fix for this is to reject shares that have encoding parameters that
differ from the values that we pulled from the first share, rejecting them
with a CorruptShareError. That will at least prevent the possible data
corruption.

The longer-term fix is to refactor Retrieve to treat k and N as part of the
'verinfo' index, along with seqnum and roothash and the salt. This
refactoring also calls for building up a table of available versions, and
then deciding which one (or ones) to decode on the basis of available shares
and highest seqnum. The new Retrieve class should be able to return multiple
versions, or indicate the presence of newer versions (that might not be
recoverable).

The current mutable.py has a nasty bug lurking: since the encoding parameters (k and N) are not included in the URI, a copy is put in each share. The Retrieve code latches on to the first version it sees, and ignores the values from all subsequently-fetched shares. If (for whatever reason) some clients have uploaded the file with different parameters (specifically different values of k, say 3-of-10 vs 2-of-6), then we could wind up feeding 3-of-10 shares into a zfec decoder configured for 2-of-6, which would cause silent data corruption. The first fix for this is to reject shares that have encoding parameters that differ from the values that we pulled from the first share, rejecting them with a `CorruptShareError`. That will at least prevent the possible data corruption. The longer-term fix is to refactor Retrieve to treat k and N as part of the 'verinfo' index, along with seqnum and roothash and the salt. This refactoring also calls for building up a table of available versions, and then deciding which one (or ones) to decode on the basis of available shares and highest seqnum. The new Retrieve class should be able to return multiple versions, or indicate the presence of newer versions (that might not be recoverable).
tahoe-lafs added the
code-encoding
critical
defect
0.7.0
labels 2008-02-13 19:25:55 +00:00
tahoe-lafs added this to the 0.8.0 (Allmydata 3.0 Beta) milestone 2008-02-13 19:25:55 +00:00
warner commented 2008-02-13 20:11:29 +00:00
Author
Owner

I've pushed the first fix for this. We still need to come up with a unit testing scheme for this stuff, addressed in #207.

I've pushed the first fix for this. We still need to come up with a unit testing scheme for this stuff, addressed in #207.
warner commented 2008-02-13 20:12:24 +00:00
Author
Owner

Having that first fix in place addresses the immediate problem, so I'm lowering the severity and pushing the rest of this ticket out a release

Having that first fix in place addresses the immediate problem, so I'm lowering the severity and pushing the rest of this ticket out a release
tahoe-lafs added
major
and removed
critical
labels 2008-02-13 20:12:24 +00:00
tahoe-lafs modified the milestone from 0.8.0 (Allmydata 3.0 Beta) to 0.9.0 (Allmydata 3.0 final) 2008-02-13 20:12:24 +00:00
tahoe-lafs modified the milestone from 0.9.0 (Allmydata 3.0 final) to undecided 2008-03-08 04:13:31 +00:00
warner commented 2008-03-08 07:11:12 +00:00
Author
Owner

If we want #332 to go into the 0.9.0 release, then we also need to fix #312. Do you agree? My concern is that existing dirnodes will wind up with multiple encodings, but maybe I'm wrong.

If we want #332 to go into the 0.9.0 release, then we also need to fix #312. Do you agree? My concern is that existing dirnodes will wind up with multiple encodings, but maybe I'm wrong.
zooko commented 2008-03-08 14:31:17 +00:00
Author
Owner

Hm... yes it would be good to fix this, so that dirnodes produced by v0.8.0 can survive into v0.9.0 and get converted into K=1 dirnodes.

This is our first backwards compatibility decision. :-)

Hm... yes it *would* be good to fix this, so that dirnodes produced by v0.8.0 can survive into v0.9.0 and get converted into K=1 dirnodes. This is our first backwards compatibility decision. :-)
tahoe-lafs modified the milestone from undecided to 0.9.0 (Allmydata 3.0 final) 2008-03-08 14:31:17 +00:00
warner commented 2008-03-11 08:48:56 +00:00
Author
Owner

Fixed, in changeset:10d3ea504540ae2f. This retains the property that Retrieve will return with whatever version was recoverable first: it classifies all shares that it sees into buckets indexed by their full "verinfo" tuple: seqnum, roothash, encoding parameters. Whichever bucket gets enough valid shares to decode first will win.

The rest of the refactoring (to actually fetch and return multiple versions, and handle the "epsilon" anti-rollback parameter, etc) is left for ticket #205.

Fixed, in changeset:10d3ea504540ae2f. This retains the property that Retrieve will return with whatever version was recoverable first: it classifies all shares that it sees into buckets indexed by their full "verinfo" tuple: seqnum, roothash, encoding parameters. Whichever bucket gets enough valid shares to decode first will win. The rest of the refactoring (to actually fetch and return multiple versions, and handle the "epsilon" anti-rollback parameter, etc) is left for ticket #205.
tahoe-lafs added the
fixed
label 2008-03-11 08:48:56 +00:00
warner closed this issue 2008-03-11 08:48:56 +00:00
warner commented 2008-03-11 08:51:50 +00:00
Author
Owner

Oh, also note that this change does nothing whatsoever about "rebalancing" mutable files to use more shares upon each successive update. In fact the code retain the behavior that shares are always updated in place rather than moving them, so if you upload 10 shares when there are only three peers on the network, then those shares will remain bunched up on those three peers even after more peers have been added.
I don't know if we have an enhancement ticket to rebalance bunched-up shares when we find enough peers to do so.

Oh, also note that this change does nothing whatsoever about "rebalancing" mutable files to use more shares upon each successive update. In fact the code retain the behavior that shares are always updated in place rather than moving them, so if you upload 10 shares when there are only three peers on the network, then those shares will remain bunched up on those three peers even after more peers have been added. I don't know if we have an enhancement ticket to rebalance bunched-up shares when we find enough peers to do so.
zooko commented 2008-04-14 16:32:51 +00:00
Author
Owner

This was fixed in changeset:791482cf8de84a91 (the trac changeset now known as changeset:791482cf8de84a91 was formerly known as changeset:10d3ea504540ae2f -- there were two patches listed in the Trac timeline until now that have been obliterated from our trunk).

This was fixed in changeset:791482cf8de84a91 (the trac changeset now known as changeset:791482cf8de84a91 was formerly known as changeset:10d3ea504540ae2f -- there were two patches listed in the Trac timeline until now that have been obliterated from our trunk).
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#312
No description provided.