move to Tahoe2 peer selection algorithm

zooko commented

2007-04-28 19:06:21 +00:00

Owner

Implement the TahoeTwo peer-selection algorithm for share upload. This is the
first-N-peers-in-the-permuted-list approach, as opposed to the TahoeThree
"shares in baskets" scheme that's present in 0.5.0. TahoeThree can result
severe non-uniform distribution of shares (in some cases, all shares being
given to the same node), which hurts reliability, and is worse in small
networks.

The share download is still ask-everybody, which will work up to maybe 100
peers. Eventually we'll move that to TahoeTwo as well.

In the longer run, we'll need to move to DenverAirport, to get beyond a few
hundred peers, but we're thinking post-0.6.0 for that.

Implement the [TahoeTwo](wiki/TahoeTwo) peer-selection algorithm for share upload. This is the first-N-peers-in-the-permuted-list approach, as opposed to the [TahoeThree](wiki/TahoeThree) "shares in baskets" scheme that's present in 0.5.0. [TahoeThree](wiki/TahoeThree) can result severe non-uniform distribution of shares (in some cases, all shares being given to the same node), which hurts reliability, and is worse in small networks. The share download is still ask-everybody, which will work up to maybe 100 peers. Eventually we'll move that to [TahoeTwo](wiki/TahoeTwo) as well. In the longer run, we'll need to move to [DenverAirport](wiki/DenverAirport), to get beyond a few hundred peers, but we're thinking post-0.6.0 for that.

tahoe-lafs added the

minor

defect

labels 2007-04-28 19:06:21 +00:00

tahoe-lafs added the

documentation

label 2007-04-28 19:17:26 +00:00

zooko commented

2007-05-16 15:20:58 +00:00

Author

Owner

At this time, I'm not able to remember the reasons either. I distinctly recall them seeming like good reasons at the time. ;-)

I'll try to remember.

At this time, I'm not able to remember the reasons either. I distinctly recall them seeming like good reasons at the time. ;-) I'll try to remember.

warner commented

2007-05-24 01:25:12 +00:00

Author

Owner

it occurred to me, while pondering the consequences of varying our encoding parameters, that the PeerSelection/TahoeTwo first-100-peers approach works very well if you know the verifierid but not the number of shares you need to get (you just ask successive peers until someone has some data and ask them for the encoding parameters, then you know how many more peers you need to talk to).

On the other hand, the PeerSelection/TahoeThree (100-uniformly-distributed-clock-hands) approach fails very badly in this case. (at least once the mesh has significantly more than 100 peers).

it occurred to me, while pondering the consequences of varying our encoding parameters, that the [PeerSelection](wiki/PeerSelection)/TahoeTwo first-100-peers approach works very well if you know the verifierid but not the number of shares you need to get (you just ask successive peers until someone has some data and ask them for the encoding parameters, then you know how many more peers you need to talk to). On the other hand, the [PeerSelection](wiki/PeerSelection)/TahoeThree (100-uniformly-distributed-clock-hands) approach fails very badly in this case. (at least once the mesh has significantly more than 100 peers).

zooko commented

2007-05-24 02:55:35 +00:00

Author

Owner

Oh I see. You mean if you know the verifierid but not the encoding parameters. Interesting! It's not such a bad failure, though, if you think of it as just enlarging the URI a bit (by the addition of encoding parameters).

Which reminds me that in zfec I figured out a way to compress the parameters into a mere 2, 3, or 4 bytes...

Oh I see. You mean if you know the verifierid but not the encoding parameters. Interesting! It's not such a bad failure, though, if you think of it as just enlarging the URI a bit (by the addition of encoding parameters). Which reminds me that in zfec I figured out a way to compress the parameters into a mere 2, 3, or 4 bytes...

warner commented

2007-05-24 18:07:15 +00:00

Author

Owner

well, what I'm thinking is that the encoding parameters might change over time (perhaps because we decide that we don't need so much redundancy, perhaps because we switch to a new encoding mechanism). Having the encoding parameters in the URI means that we retain forwards-compatibility with these changes (depending upon how we compress/derive them, of course), but having them in the "share index" means that future people who encode the file one way won't discover the alternate encoding, either during upload (when they might refrain from making the new encoding because the old one is good enough), or during download (when they might find more shares from an alternate encoding than the one that they remember uploading).

It's kind of in the space of convergent encoding, I suppose: the mapping between crypttext (i.e. verifierid) and sets of shares. A one-to-one mapping is the most efficient (there will only be one encoded version of any given file), but one-to-many is likely to occur if we want to take advantage of our flexibility in choosing the encoding parameters.

well, what I'm thinking is that the encoding parameters might change over time (perhaps because we decide that we don't need so much redundancy, perhaps because we switch to a new encoding mechanism). Having the encoding parameters in the URI means that we retain forwards-compatibility with these changes (depending upon how we compress/derive them, of course), but having them in the "share index" means that future people who encode the file one way won't discover the alternate encoding, either during upload (when they might refrain from making the new encoding because the old one is good enough), or during download (when they might find more shares from an alternate encoding than the one that they remember uploading). It's kind of in the space of convergent encoding, I suppose: the mapping between crypttext (i.e. verifierid) and sets of shares. A one-to-one mapping is the most efficient (there will only be one encoded version of any given file), but one-to-many is likely to occur if we want to take advantage of our flexibility in choosing the encoding parameters.

zooko commented

2007-05-24 19:14:21 +00:00

Author

Owner

Good points.

zooko commented

2007-06-04 22:17:54 +00:00

Author

Owner

I think perhaps Brian has done this task by updating the PeerSelection and PeerSelection/TahoeThree pages?

I think perhaps Brian has done this task by updating the [PeerSelection](wiki/PeerSelection) and [PeerSelection](wiki/PeerSelection)/TahoeThree pages?

warner commented

2007-06-05 02:54:20 +00:00

Author

Owner

Nope.. we still need a description of why we implemented tahoe3 instead of tahoe2. I've described the differences between them on those pages, but I can't remember our motivations that week, and I'm starting to believe that tahoe2 is a better choice than tahoe3. So I'm looking for history and explanation of our decision, as I can't come up with anything.

If zooko can't remember our reasons, then we should sit down and have a new talk about tahoe2 vs tahoe3.

Nope.. we still need a description of **why** we implemented tahoe3 instead of tahoe2. I've described the differences between them on those pages, but I can't remember our motivations that week, and I'm starting to believe that tahoe2 is a better choice than tahoe3. So I'm looking for history and explanation of our decision, as I can't come up with anything. If zooko can't remember our reasons, then we should sit down and have a new talk about tahoe2 vs tahoe3.

warner commented

2007-07-08 08:40:36 +00:00

Author

Owner

I'm finding more reasons to prefer TahoeTwo over TahoeThree: for small meshes, the random distribution of shares to nodes is very non-uniform in TahoeThree, which means that one node winds up with a lot more of the shares than others. In our three-node testnet, I'm seeing some files drop 90 (out of 100) shares on a single host. That blows our failure probabilities completely: some files will permute in such a way that a single lost node will lose the entire file.

I think we might have picked TahoeThree because we were working on implementation and wanted behavior that was easy to understand on both large meshes and small ones. TahoeThree has the advantage that you can usually do your bucket-allocation request in a single message per peer, which improves network efficiency (and latency) considerably for small meshes. In TahoeTwo you basically have to send out a separate message for each share, and you need some extra logic to handle small meshes where you have to swing around to the same peer multiple times.

But I'm starting to think that the uniform-distribution properties of TahoeTwo are very important. Perhaps we can take some shortcuts in TahoeTwo to avoid the network overhead: say, we start by offering one share per node, but then when we realize we've looped around and still have 97 shares to go, we divvy them up evenly among the peers who are still in the running, and the next time we ask one, we try to allocate a whole bunch of shares at once. If they refuse some, we equalize out the requests among all the remaining nodes.

I'm raising this one to 'major', because the potential for severe-non-uniformity means we'll fail to meet our relability goals for one of our significant use cases (small meshes among friends).

I'm finding more reasons to prefer [TahoeTwo](wiki/TahoeTwo) over [TahoeThree](wiki/TahoeThree): for small meshes, the random distribution of shares to nodes is very non-uniform in [TahoeThree](wiki/TahoeThree), which means that one node winds up with a lot more of the shares than others. In our three-node testnet, I'm seeing some files drop 90 (out of 100) shares on a single host. That blows our failure probabilities completely: some files will permute in such a way that a single lost node will lose the entire file. I think we might have picked [TahoeThree](wiki/TahoeThree) because we were working on implementation and wanted behavior that was easy to understand on both large meshes and small ones. [TahoeThree](wiki/TahoeThree) has the advantage that you can usually do your bucket-allocation request in a single message per peer, which improves network efficiency (and latency) considerably for small meshes. In [TahoeTwo](wiki/TahoeTwo) you basically have to send out a separate message for each share, and you need some extra logic to handle small meshes where you have to swing around to the same peer multiple times. But I'm starting to think that the uniform-distribution properties of [TahoeTwo](wiki/TahoeTwo) are very important. Perhaps we can take some shortcuts in [TahoeTwo](wiki/TahoeTwo) to avoid the network overhead: say, we start by offering one share per node, but then when we realize we've looped around and still have 97 shares to go, we divvy them up evenly among the peers who are still in the running, and the next time we ask one, we try to allocate a whole bunch of shares at once. If they refuse some, we equalize out the requests among all the remaining nodes. I'm raising this one to 'major', because the potential for severe-non-uniformity means we'll fail to meet our relability goals for one of our significant use cases (small meshes among friends).

tahoe-lafs added

major

and removed

minor

labels 2007-07-08 08:40:36 +00:00

tahoe-lafs added

code-peerselection

0.5.0

and removed

documentation

labels 2007-08-20 19:18:56 +00:00

tahoe-lafs changed title from ~~explain/justify our current Peer Selection~~ to consider moving to Tahoe2 peer selection algorithm

2007-08-20 19:18:56 +00:00

warner commented

2007-08-20 20:45:10 +00:00

Author

Owner

in chatting with zooko, I've decided to add this to the list of stuff to do for 0.6, because the non-uniformity problem is important in my mind. Beyond 0.6, we probably need to sit down and work on DEN, because both TahoeTwo and TahoeThree require a fully-connected mesh, and I'd be leery of using that beyond a few hundred nodes.

in chatting with zooko, I've decided to add this to the list of stuff to do for 0.6, because the non-uniformity problem is important in my mind. Beyond 0.6, we probably need to sit down and work on DEN, because both [TahoeTwo](wiki/TahoeTwo) and [TahoeThree](wiki/TahoeThree) require a fully-connected mesh, and I'd be leery of using that beyond a few hundred nodes.

tahoe-lafs added this to the 0.6.0 milestone 2007-08-20 20:45:10 +00:00

tahoe-lafs changed title from ~~consider moving to Tahoe2 peer selection algorithm~~ to move to Tahoe2 peer selection algorithm

2007-08-20 20:45:10 +00:00

warner commented

2007-09-16 08:34:35 +00:00

Author

Owner

Done, in changeset:979d12cd42f95e36 and changeset:baa16087cd7616a2.

At some point in the future, we should sit down and figure out what happens when we go beyond a few hundred nodes, and what we think we can do about it.

Done, in changeset:979d12cd42f95e36 and changeset:baa16087cd7616a2. At some point in the future, we should sit down and figure out what happens when we go beyond a few hundred nodes, and what we think we can do about it.

tahoe-lafs added the

fixed

label 2007-09-16 08:34:35 +00:00

warner closed this issue

2007-09-16 08:34:35 +00:00

move to Tahoe2 peer selection algorithm #16