start downloading as soon as you know where to get K shares #928

Closed
opened 2010-01-27 06:07:03 +00:00 by zooko · 23 comments
zooko commented 2010-01-27 06:07:03 +00:00
Owner

The current code (changeset:2a2cc5c4b8a490ae) performs immutable file download in four phases. In phase 1 it sends a get_buckets() remote invocation message to every storage server that it knows about ([CiphertextDownloader._get_all_shareholders()]source:src/allmydata/immutable/download.py@4164#L871). In phase 2 it requests the CEB (a.k.a. UEB) from each server in turn until it gets a valid CEB ([CiphertextDownloader._obtain_uri_extension()]source:src/allmydata/immutable/download.py@4164#L956). In phase 3 it requests the crypttext hash tree from each server in turn until it gets a valid hash tree ([CiphertextDownloader._get_crypttext_hash_tree()]source:src/allmydata/immutable/download.py@4164#L1002). In phase 4 is actually downloads the blocks of data from servers in parallel. Out of the shares which it learned about in phase 1, which shares does it choose? It chooses "primary shares" first (their sharenum is < K) because those can be erasure-decoded at no computational cost, then it chooses randomly from among secondary shares until it has K.

Now currently phase 1 does not end, and therefore phase 2 does not begin, until all servers have answered the get_buckets()! To close this ticket, make it so phase 1 ends as soon as at least K buckets have been found.

The nice non-invasive way to do this is to replace the DeferredList in [CiphertextDownloader._get_all_shareholders()]source:src/allmydata/immutable/download.py@4164#L871 with an object that acts like a DeferredList but fires once at least K shares (or possibly once at least K servers) have been found.

The current code (changeset:2a2cc5c4b8a490ae) performs immutable file download in four phases. In phase 1 it sends a `get_buckets()` remote invocation message to every storage server that it knows about ([CiphertextDownloader._get_all_shareholders()]source:src/allmydata/immutable/download.py@4164#L871). In phase 2 it requests the CEB (a.k.a. UEB) from each server in turn until it gets a valid CEB ([CiphertextDownloader._obtain_uri_extension()]source:src/allmydata/immutable/download.py@4164#L956). In phase 3 it requests the crypttext hash tree from each server in turn until it gets a valid hash tree ([CiphertextDownloader._get_crypttext_hash_tree()]source:src/allmydata/immutable/download.py@4164#L1002). In phase 4 is actually downloads the blocks of data from servers in parallel. Out of the shares which it learned about in phase 1, which shares does it choose? It chooses "primary shares" first (their sharenum is < K) because those can be erasure-decoded at no computational cost, then it chooses randomly from among secondary shares until it has K. Now currently phase 1 does not end, and therefore phase 2 does not begin, until all servers have answered the `get_buckets()`! To close this ticket, make it so phase 1 ends as soon as at least `K` buckets have been found. The nice non-invasive way to do this is to replace the [DeferredList](wiki/DeferredList) in [CiphertextDownloader._get_all_shareholders()]source:src/allmydata/immutable/download.py@4164#L871 with an object that acts like a [DeferredList](wiki/DeferredList) but fires once at least `K` shares (or possibly once at least `K` servers) have been found.
tahoe-lafs added the
code-peerselection
major
defect
1.5.0
labels 2010-01-27 06:07:03 +00:00
tahoe-lafs added this to the 1.6.0 milestone 2010-01-27 06:07:03 +00:00
zooko commented 2010-01-27 16:18:22 +00:00
Author
Owner

Here's a patch with no tests. David-Sarah offered to write tests, last night. If they don't, I'll write tests this evening.

Here's a patch with no tests. David-Sarah offered to write tests, last night. If they don't, I'll write tests this evening.
zooko commented 2010-01-27 16:18:59 +00:00
Author
Owner

Attachment dontwait.darcspatch.txt (46922 bytes) added

**Attachment** dontwait.darcspatch.txt (46922 bytes) added
zooko commented 2010-01-27 16:20:32 +00:00
Author
Owner

That patch had a typo that was caught by the current unit tests. Here is the patch with the typo fixed.

That patch had a typo that was caught by the current unit tests. Here is the patch with the typo fixed.
zooko commented 2010-01-27 16:20:46 +00:00
Author
Owner

Attachment dontwait2.darcspatch.txt (46923 bytes) added

**Attachment** dontwait2.darcspatch.txt (46923 bytes) added
zooko commented 2010-01-27 17:40:29 +00:00
Author
Owner

Amber, driving me to work and listening to me talk about this patch, pointed out that we need to count number of unique share numbers instead of total number of buckets. For example, if we contact all servers, and they all reply "Oh yes! I do have a bucket -- here it, for share number 0.", then with the current patch -- dontwait2.darcspatch.txt -- it will think that we got enough shares to download the file although we didn't. Here is a new patch named dontwait3.darcspatch.txt which fixes this.

Looking at the code to fix this, it appears to me that the current trunk has a similar same problem. It measures len(self_share_buckets) to determine if it should abort or not in [CiphertextDownloader._got_all_shareholders()]source:src/allmydata/immutable/download.py@4164#L938, and if there are insufficient buckets then it errors out with Failed to get enough shareholders: have %d, need %d. However, if it has enough buckets but then it turns out that it doesn't have enough unique shares then this will be caught later, in,
len(self._shares[source:src/allmydata/immutable/download.py@4164#L1025 CiphertextDownloader._activate_enough_buckets()], and reported with the error message ```Unable to activate enough shares: have %d, need %d.

I think this patch will eliminate the possibility of that second case, so all such failures will be detected and reported in the same way.

We should add a test of this case.

Amber, driving me to work and listening to me talk about this patch, pointed out that we need to count number of unique share numbers instead of total number of buckets. For example, if we contact all servers, and they all reply "Oh yes! I do have a bucket -- here it, for share number 0.", then with the current patch -- dontwait2.darcspatch.txt -- it will think that we got enough shares to download the file although we didn't. Here is a new patch named dontwait3.darcspatch.txt which fixes this. Looking at the code to fix this, it appears to me that the current trunk has a similar same problem. It measures `len(self_share_buckets)` to determine if it should abort or not in [CiphertextDownloader._got_all_shareholders()]source:src/allmydata/immutable/download.py@4164#L938, and if there are insufficient buckets then it errors out with `Failed to get enough shareholders: have %d, need %d`. However, if it has enough buckets but then it turns out that it doesn't have enough unique shares then this will be caught later, in, `len(self._shares[source:src/allmydata/immutable/download.py@4164#L1025 CiphertextDownloader._activate_enough_buckets()], and reported with the error message ```Unable to activate enough shares: have %d, need %d`. I think this patch will eliminate the possibility of that second case, so all such failures will be detected and reported in the same way. We should add a test of this case.
zooko commented 2010-01-27 19:06:40 +00:00
Author
Owner

So here is the latest version. This fails the current tests, not to mention the new tests that we need to write for it, possibly because the deferred returned by CiphertextDownloader._got_all_shareholders() can now errback in addition to callback, where on trunk it can only callback. Looking at the chain in [CiphertextDownloader.start()]source:src/allmydata/immutable/download.py?rev=4164#L833, it seems to me like an errback from _get_all_shareholders() should be nicely handled by _finished() and then by _failed(). Hm.

So here is the latest version. This fails the current tests, not to mention the new tests that we need to write for it, possibly because the deferred returned by `CiphertextDownloader._got_all_shareholders()` can now errback in addition to callback, where on trunk it can only callback. Looking at the chain in [CiphertextDownloader.start()]source:src/allmydata/immutable/download.py?rev=4164#L833, it seems to me like an errback from `_get_all_shareholders()` should be nicely handled by `_finished()` and then by `_failed()`. Hm.
zooko commented 2010-01-27 19:07:34 +00:00
Author
Owner

Attachment dontwait3.darcspatch.txt (49287 bytes) added

**Attachment** dontwait3.darcspatch.txt (49287 bytes) added
warner commented 2010-01-27 19:53:10 +00:00
Author
Owner

FWIW, my Downloader rewrite will include this feature.

Also, for the currently-proposed patch, it would be a good idea to make sure that we grab the UEB from each share (and validate it) before starting the download. Specifically I want to make sure that if the storage server can open the file (and therefore returns YES to the get_buckets query), but then experiences errors or corruption when actually reading the share file, that the downloader will correctly switch over to a different share. I don't know if the current downloader can switch to a new share after it starts the download process for real.

FWIW, my Downloader rewrite will include this feature. Also, for the currently-proposed patch, it would be a good idea to make sure that we grab the UEB from each share (and validate it) before starting the download. Specifically I want to make sure that if the storage server can open the file (and therefore returns YES to the get_buckets query), but then experiences errors or corruption when actually reading the share file, that the downloader will correctly switch over to a different share. I don't know if the current downloader can switch to a new share after it starts the download process for real.
zooko commented 2010-01-27 20:53:55 +00:00
Author
Owner

I figured your Downloader rewrite would. Also, probably lots of other improvements such as more pipelining. Is there a list of your intended improvements to Downloader? I have vague recollections of lots of good ideas you had for Downloader improvements.

I still think it is a good idea to apply this patch now, however, (after thorough tests and code review) because:

  1. Now I understand that the bad behavior I've been seeing (especially on the allmydata.com prod grid) in which downloads hang is not caused solely by a server failing during a download, as I formerly thought (#287), but is caused by there being any server connected to the network which is in the hung state (such that it maintains its TCP connections but refuses to answer get_buckets()). With current trunk, as long as there is any such server connected to a grid then all downloads from that grid will hang.
  2. Likewise, with current trunk, the slowest server (even if it isn't completely hung) determines the alacrity of beginning an immutable file download. This explains the behavior that I've observed in which all downloads take a few seconds to start (because there is one server on that grid which is slow or overloaded).
  3. With this patch, you'll download from the K servers that answered the get_buckets first (assuming only one share per server) instead of the K servers that have primary shares (or, in the case that you don't get K servers with primary shares, random servers with secondary shares). This sounds potentially a nice performance improvement, especially for heterogeneous and geographically spread-out grids.
  4. This patch is nicely self-contained, as I hope you (Brian) will take the time to verify by reviewing it. It could be made more self-contained by changing it to callback instead of errback when K buckets couldn't be located (as described in comment:116725), and I should probably do so out of an abundance of caution, but I intend to first examine why the errback doesn't do what I expect. I guess it could also be made smaller by taking out the part that changes reporting of status from "responses received/queries sent" to "responses received+queries failed/queries sent". I changed that only because it seemed slightly inaccurate to omit the queries failed in the reporting, but it isn't really necessary for this patch.
I figured your Downloader rewrite would. Also, probably lots of other improvements such as more pipelining. Is there a list of your intended improvements to Downloader? I have vague recollections of lots of good ideas you had for Downloader improvements. I still think it is a good idea to apply this patch now, however, (after thorough tests and code review) because: 1. Now I understand that the bad behavior I've been seeing (especially on the allmydata.com prod grid) in which downloads hang is *not* caused solely by a server failing *during* a download, as I formerly thought (#287), but is caused by there being any server connected to the network which is in the hung state (such that it maintains its TCP connections but refuses to answer `get_buckets()`). With current trunk, as long as there is any such server connected to a grid then all downloads from that grid will hang. 2. Likewise, with current trunk, the slowest server (even if it isn't completely hung) determines the alacrity of beginning an immutable file download. This explains the behavior that I've observed in which all downloads take a few seconds to start (because there is one server on that grid which is slow or overloaded). 3. With this patch, you'll download from the K servers that answered the `get_buckets` first (assuming only one share per server) instead of the K servers that have primary shares (or, in the case that you don't get K servers with primary shares, random servers with secondary shares). This sounds potentially a nice performance improvement, especially for heterogeneous and geographically spread-out grids. 4. This patch is nicely self-contained, as I hope you (Brian) will take the time to verify by reviewing it. It could be made *more* self-contained by changing it to callback instead of errback when K buckets couldn't be located (as described in [comment:116725](/tahoe-lafs/trac-2024-07-25/issues/928#issuecomment-116725)), and I should probably do so out of an abundance of caution, but I intend to first examine why the errback doesn't do what I expect. I guess it could also be made smaller by taking out the part that changes reporting of status from "responses received/queries sent" to "responses received+queries failed/queries sent". I changed that only because it seemed slightly inaccurate to omit the queries failed in the reporting, but it isn't really necessary for this patch.
zooko commented 2010-01-27 21:06:53 +00:00
Author
Owner

Here is a version of the patch which omits the status reporting change and makes the deferred returned by _get_all_shareholders() callback instead of errback when it runs out of outstanding queries.

Here is a version of the patch which omits the status reporting change and makes the deferred returned by `_get_all_shareholders()` callback instead of errback when it runs out of outstanding queries.
zooko commented 2010-01-27 21:07:30 +00:00
Author
Owner

Attachment dontwait4.darcspatch.txt (49992 bytes) added

**Attachment** dontwait4.darcspatch.txt (49992 bytes) added
zooko commented 2010-01-27 21:12:43 +00:00
Author
Owner

Replying to warner:

Also, for the currently-proposed patch, it would be a good idea to make sure that we grab the UEB from each share (and validate it) before starting the download. Specifically I want to make sure that if the storage server can open the file (and therefore returns YES to the get_buckets query), but then experiences errors or corruption when actually reading the share file, that the downloader will correctly switch over to a different share. I don't know if the current downloader can switch to a new share after it starts the download process for real.

The current downloader cannot switch to a new share after it starts the download process. What you suggest here would be a good idea, but it would require a bigger, more disruptive patch than the current patch. The current patch basically only changes one thing: when does the deferred returned from _get_all_shareholders() fire. It changes it from firing once all queries have resolved, to firing as soon as K shares are found (or all queries are resolved). To make the improvement you suggest here we would have to chain the verification of UEB (and presumably also of ciphertext hash tree) to the remote get_buckets query which would reorder the control flow.

Now I don't think it is a big loss if we get only K chances to find an uncorrupted UEB (and ciphertext hash tree) versus getting ~N chances. In practice I guess we've never seen a corrupted UEB, so we always succeed on the first UEB we try. But the issue that this ticket is intended to address is a big issue in practice, so I would be happy with the trade-off.

Better handling of corrupted UEBs, ciphertext hash trees, and blocks would then be an improvement for your Downloader rewrite.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/928#issuecomment-116726): > Also, for the currently-proposed patch, it would be a good idea to make sure that we grab the UEB from each share (and validate it) before starting the download. Specifically I want to make sure that if the storage server can open the file (and therefore returns YES to the get_buckets query), but then experiences errors or corruption when actually reading the share file, that the downloader will correctly switch over to a different share. I don't know if the current downloader can switch to a new share after it starts the download process for real. The current downloader cannot switch to a new share after it starts the download process. What you suggest here would be a good idea, but it would require a bigger, more disruptive patch than the current patch. The current patch basically only changes one thing: when does the deferred returned from `_get_all_shareholders()` fire. It changes it from firing once all queries have resolved, to firing as soon as K shares are found (or all queries are resolved). To make the improvement you suggest here we would have to chain the verification of UEB (and presumably also of ciphertext hash tree) to the remote `get_buckets` query which would reorder the control flow. Now I don't think it is a big loss if we get only K chances to find an uncorrupted UEB (and ciphertext hash tree) versus getting ~N chances. In practice I guess we've never seen a corrupted UEB, so we always succeed on the first UEB we try. But the issue that this ticket is intended to address *is* a big issue in practice, so I would be happy with the trade-off. Better handling of corrupted UEBs, ciphertext hash trees, and blocks would then be an improvement for your Downloader rewrite.
zooko commented 2010-01-27 21:14:19 +00:00
Author
Owner

There was an edit-o mistake in a comment in the previous version of the patch. Here it is with that edit-o fixed.

There was an edit-o mistake in a comment in the previous version of the patch. Here it is with that edit-o fixed.
zooko commented 2010-01-27 21:14:38 +00:00
Author
Owner

Attachment dontwait5.darcspatch.txt (49989 bytes) added

**Attachment** dontwait5.darcspatch.txt (49989 bytes) added
zooko commented 2010-01-27 23:47:24 +00:00
Author
Owner

Brian reviewed this and made a couple of comments on IRC which I took into account in this next version (6) of the patch.

Brian reviewed this and made a couple of comments on IRC which I took into account in this next version (6) of the patch.
zooko commented 2010-01-27 23:48:43 +00:00
Author
Owner

Attachment dontwait6.darcspatch.txt (51432 bytes) added

**Attachment** dontwait6.darcspatch.txt (51432 bytes) added
davidsarah commented 2010-01-28 06:54:04 +00:00
Author
Owner

Attachment no-network-hang-diff.txt (2259 bytes) added

Support for hanging servers until a Deferred fires in a NoNetworkGrid

**Attachment** no-network-hang-diff.txt (2259 bytes) added Support for hanging servers until a Deferred fires in a [NoNetworkGrid](wiki/NoNetworkGrid)
davidsarah commented 2010-01-28 06:56:16 +00:00
Author
Owner

Attachment test_hung_server.py (8458 bytes) added

Incomplete tests for download when some servers are hung

**Attachment** test_hung_server.py (8458 bytes) added Incomplete tests for download when some servers are hung
davidsarah commented 2010-01-29 12:47:54 +00:00
Author
Owner

Attachment new-tests-for-928-darcspatch.txt (67419 bytes) added

New tests for #928 (patch bundle includes dontwait6). Problem with errback instead of callback fixed; all existing tests run green. Known bug in _copy_all_shares_from.

**Attachment** new-tests-for-928-darcspatch.txt (67419 bytes) added New tests for #928 (patch bundle includes dontwait6). Problem with errback instead of callback fixed; all existing tests run green. Known bug in _copy_all_shares_from.
zooko commented 2010-01-29 18:54:37 +00:00
Author
Owner

Attachment dontwait7.darcspatch.txt (75983 bytes) added

**Attachment** dontwait7.darcspatch.txt (75983 bytes) added
zooko commented 2010-01-29 18:55:03 +00:00
Author
Owner

Here is a fix (dontwait7.darcspatch.txt) for the problem in _copy_share(), rewrite line-endings to by '\n', and add a comment. (Perhaps you should re-record these patches into a single patch that we can then apply to close this ticket.)

Brian: David-Sarah's patch, included in "new-tests-for-928-darcspatch.txt" fixes two bugs in my patch and is cleaner without being more invasive. Please re-review! (I've already reviewed it and will commit it, but it wouldn't hurt for you to review it as well.)

The two bugs were:

  1. In my patch I errback()'ed in one place when I should have callback()'ed in all three places. (David-Sarah's patch improves this by coalescing two of those paths so they result in a single function.)
  2. In my patch I checked for out-of-servers condition inside the for sharenum, bucket in buckets.iteritems():, so if the final server answered the query but said it had zero buckets then that check would be wrongly omitted.
Here is a fix (dontwait7.darcspatch.txt) for the problem in _copy_share(), rewrite line-endings to by '\n', and add a comment. (Perhaps you should re-record these patches into a single patch that we can then apply to close this ticket.) Brian: David-Sarah's patch, included in "new-tests-for-928-darcspatch.txt" fixes two bugs in my patch and is cleaner without being more invasive. Please re-review! (I've already reviewed it and will commit it, but it wouldn't hurt for you to review it as well.) The two bugs were: 1. In my patch I errback()'ed in one place when I should have callback()'ed in all three places. (David-Sarah's patch improves this by coalescing two of those paths so they result in a single function.) 2. In my patch I checked for out-of-servers condition *inside* the `for sharenum, bucket in buckets.iteritems():`, so if the final server answered the query but said it had zero buckets then that check would be wrongly omitted.
davidsarah commented 2010-01-30 06:58:14 +00:00
Author
Owner

Attachment dontwait8.darcspatch.txt (90010 bytes) added

All patches for #928

**Attachment** dontwait8.darcspatch.txt (90010 bytes) added All patches for #928
zooko commented 2010-01-30 07:07:54 +00:00
Author
Owner

fixed by changeset:2bd9dfa5bdacb381, changeset:baa11a0ad4f4bafe, changeset:d62428c1e6e291da, changeset:37a242e01af6cf76. Thank you, David-Sarah!

fixed by changeset:2bd9dfa5bdacb381, changeset:baa11a0ad4f4bafe, changeset:d62428c1e6e291da, changeset:37a242e01af6cf76. Thank you, David-Sarah!
tahoe-lafs added the
fixed
label 2010-01-30 07:07:54 +00:00
zooko closed this issue 2010-01-30 07:07:54 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#928
No description provided.