mutable file: survive encoding variations

warner commented

2008-02-13 19:25:55 +00:00

Owner

The current mutable.py has a nasty bug lurking: since the encoding parameters
(k and N) are not included in the URI, a copy is put in each share. The
Retrieve code latches on to the first version it sees, and ignores the values
from all subsequently-fetched shares. If (for whatever reason) some clients
have uploaded the file with different parameters (specifically different
values of k, say 3-of-10 vs 2-of-6), then we could wind up feeding 3-of-10
shares into a zfec decoder configured for 2-of-6, which would cause silent
data corruption.

The first fix for this is to reject shares that have encoding parameters that
differ from the values that we pulled from the first share, rejecting them
with a CorruptShareError. That will at least prevent the possible data
corruption.

The longer-term fix is to refactor Retrieve to treat k and N as part of the
'verinfo' index, along with seqnum and roothash and the salt. This
refactoring also calls for building up a table of available versions, and
then deciding which one (or ones) to decode on the basis of available shares
and highest seqnum. The new Retrieve class should be able to return multiple
versions, or indicate the presence of newer versions (that might not be
recoverable).

The current mutable.py has a nasty bug lurking: since the encoding parameters (k and N) are not included in the URI, a copy is put in each share. The Retrieve code latches on to the first version it sees, and ignores the values from all subsequently-fetched shares. If (for whatever reason) some clients have uploaded the file with different parameters (specifically different values of k, say 3-of-10 vs 2-of-6), then we could wind up feeding 3-of-10 shares into a zfec decoder configured for 2-of-6, which would cause silent data corruption. The first fix for this is to reject shares that have encoding parameters that differ from the values that we pulled from the first share, rejecting them with a `CorruptShareError`. That will at least prevent the possible data corruption. The longer-term fix is to refactor Retrieve to treat k and N as part of the 'verinfo' index, along with seqnum and roothash and the salt. This refactoring also calls for building up a table of available versions, and then deciding which one (or ones) to decode on the basis of available shares and highest seqnum. The new Retrieve class should be able to return multiple versions, or indicate the presence of newer versions (that might not be recoverable).

tahoe-lafs added the

labels 2008-02-13 19:25:55 +00:00

tahoe-lafs added this to the 0.8.0 (Allmydata 3.0 Beta) milestone 2008-02-13 19:25:55 +00:00

warner commented

2008-02-13 20:11:29 +00:00

Author

Owner

I've pushed the first fix for this. We still need to come up with a unit testing scheme for this stuff, addressed in #207.

warner commented

2008-02-13 20:12:24 +00:00

Author

Owner

Having that first fix in place addresses the immediate problem, so I'm lowering the severity and pushing the rest of this ticket out a release

tahoe-lafs added

major

and removed

critical

labels 2008-02-13 20:12:24 +00:00

tahoe-lafs modified the milestone from 0.8.0 (Allmydata 3.0 Beta) to 0.9.0 (Allmydata 3.0 final)

2008-02-13 20:12:24 +00:00

tahoe-lafs modified the milestone from 0.9.0 (Allmydata 3.0 final) to undecided

2008-03-08 04:13:31 +00:00

warner commented

2008-03-08 07:11:12 +00:00

Author

Owner

If we want #332 to go into the 0.9.0 release, then we also need to fix #312. Do you agree? My concern is that existing dirnodes will wind up with multiple encodings, but maybe I'm wrong.

zooko commented

2008-03-08 14:31:17 +00:00

Author

Owner

Hm... yes it would be good to fix this, so that dirnodes produced by v0.8.0 can survive into v0.9.0 and get converted into K=1 dirnodes.

This is our first backwards compatibility decision. :-)

Hm... yes it *would* be good to fix this, so that dirnodes produced by v0.8.0 can survive into v0.9.0 and get converted into K=1 dirnodes. This is our first backwards compatibility decision. :-)

tahoe-lafs modified the milestone from undecided to 0.9.0 (Allmydata 3.0 final)

2008-03-08 14:31:17 +00:00

warner commented

2008-03-11 08:48:56 +00:00

Author

Owner

Fixed, in changeset:10d3ea504540ae2f. This retains the property that Retrieve will return with whatever version was recoverable first: it classifies all shares that it sees into buckets indexed by their full "verinfo" tuple: seqnum, roothash, encoding parameters. Whichever bucket gets enough valid shares to decode first will win.

The rest of the refactoring (to actually fetch and return multiple versions, and handle the "epsilon" anti-rollback parameter, etc) is left for ticket #205.

Fixed, in changeset:10d3ea504540ae2f. This retains the property that Retrieve will return with whatever version was recoverable first: it classifies all shares that it sees into buckets indexed by their full "verinfo" tuple: seqnum, roothash, encoding parameters. Whichever bucket gets enough valid shares to decode first will win. The rest of the refactoring (to actually fetch and return multiple versions, and handle the "epsilon" anti-rollback parameter, etc) is left for ticket #205.

tahoe-lafs added the

fixed

label 2008-03-11 08:48:56 +00:00

warner closed this issue

2008-03-11 08:48:56 +00:00

warner commented

2008-03-11 08:51:50 +00:00

Author

Owner

Oh, also note that this change does nothing whatsoever about "rebalancing" mutable files to use more shares upon each successive update. In fact the code retain the behavior that shares are always updated in place rather than moving them, so if you upload 10 shares when there are only three peers on the network, then those shares will remain bunched up on those three peers even after more peers have been added.
I don't know if we have an enhancement ticket to rebalance bunched-up shares when we find enough peers to do so.

Oh, also note that this change does nothing whatsoever about "rebalancing" mutable files to use more shares upon each successive update. In fact the code retain the behavior that shares are always updated in place rather than moving them, so if you upload 10 shares when there are only three peers on the network, then those shares will remain bunched up on those three peers even after more peers have been added. I don't know if we have an enhancement ticket to rebalance bunched-up shares when we find enough peers to do so.

zooko commented

2008-04-14 16:32:51 +00:00

Author

Owner

This was fixed in changeset:791482cf8de84a91 (the trac changeset now known as changeset:791482cf8de84a91 was formerly known as changeset:10d3ea504540ae2f -- there were two patches listed in the Trac timeline until now that have been obliterated from our trunk).

mutable file: survive encoding variations #312