tahoe-lafs/trac-2024-07-25

downloader: coordinate crypttext_hash_tree requests #1544

New issue

Open

opened 2011-09-26 00:16:48 +00:00 by warner · 0 comments

warner commented

2011-09-26 00:16:48 +00:00

Owner

Each share should have identical copies of the crypttext_hash_tree (and, if we ever bring it back, the plaintext_hash_tree too). To produce a validated copy of segment0, we need to fetch the crypttext_hash_tree nodes that form the merkle-tree "uncle chain" for seg0. That means fetching ln2(numsegs) hash nodes.

At present, each Share treats this task as its own personal duty: when calculating the "desire" bitmap, the Share checks the common IncompleteHashTree to see which nodes are still needed, then sends enough requests to fetch all of the missing nodes. Because each Share performs this calculation at about the same time (before any server responses have come back), all Shares will conclude that they need the full uncle chain. So there will be lots of parallel requests that will return the same data. All but the first will be discarded.

The improvement would be to have the Shares coordinate these overlapping reads. The first Share to check the hashtree should somehow "claim" the hash node: it will send a request, and other Shares will refrain from sending that request, instead they'll use a Deferred or Observer or something to find out when the uncle chain is available. If the first Share's request fails, then some other Share should be elected to send their own request. Ideally this would prefer a different server than the first one (if there are two Shares on the same server, and the first one failed to provide the hash node, the second one is not very likely to work either).

Also, there needs to be a timeout/impatience mechanism: if the first Share hasn't yielded a result by the time the other data blocks have arrived, we should consider sending extra requests.

This isn't trivial, because it requires new code that can coordinate between otherwise-independent Shares. The performance improvement is considerable while we don't have readv() support in the downloader. Once that's in place, the marginal improvement provided by coordinated requests may be too small to be worth the effort: less IO, less data transmitted (scales with N but still a small fraction of the total data sent), but no fewer remote_readv() messages.

One performance improvement idea from the #1264 and Performance/Sep2011 analysis work is to reduce the number of read() requests roughly in half by introducing cross-Share coordination of crypttext_hash_tree node fetches. Each share should have identical copies of the crypttext_hash_tree (and, if we ever bring it back, the plaintext_hash_tree too). To produce a validated copy of segment0, we need to fetch the crypttext_hash_tree nodes that form the merkle-tree "uncle chain" for seg0. That means fetching ln2(numsegs) hash nodes. At present, each Share treats this task as its own personal duty: when calculating the "desire" bitmap, the Share checks the common `IncompleteHashTree` to see which nodes are still needed, then sends enough requests to fetch all of the missing nodes. Because each Share performs this calculation at about the same time (before any server responses have come back), all Shares will conclude that they need the full uncle chain. So there will be lots of parallel requests that will return the same data. All but the first will be discarded. The improvement would be to have the Shares coordinate these overlapping reads. The first Share to check the hashtree should somehow "claim" the hash node: it will send a request, and other Shares will refrain from sending that request, instead they'll use a Deferred or Observer or something to find out when the uncle chain is available. If the first Share's request fails, then some other Share should be elected to send their own request. Ideally this would prefer a different server than the first one (if there are two Shares on the same server, and the first one failed to provide the hash node, the second one is not very likely to work either). Also, there needs to be a timeout/impatience mechanism: if the first Share hasn't yielded a result by the time the other data blocks have arrived, we should consider sending extra requests. This isn't trivial, because it requires new code that can coordinate between otherwise-independent Shares. The performance improvement is considerable while we don't have readv() support in the downloader. Once that's in place, the marginal improvement provided by coordinated requests may be too small to be worth the effort: less IO, less data transmitted (scales with N but still a small fraction of the total data sent), but no fewer `remote_readv()` messages.