pipeline download blocks for better performance #1110

Open
opened 2010-07-06 21:28:53 +00:00 by zooko · 5 comments
zooko commented 2010-07-06 21:28:53 +00:00
Owner

As Brian and I have discussed in person, downloads would probably be a bit faster for some users if we pipelined requests for successive blocks. Brian and I casually agreed that a pipeline depth of 2 would probably be pretty good for lots of users.

As Brian and I have discussed in person, downloads would probably be a bit faster for some users if we pipelined requests for successive blocks. Brian and I casually agreed that a pipeline depth of 2 would probably be pretty good for lots of users.
tahoe-lafs added the
code-network
major
enhancement
1.7.0
labels 2010-07-06 21:28:53 +00:00
tahoe-lafs added this to the 1.8.0 milestone 2010-07-06 21:28:53 +00:00
tahoe-lafs modified the milestone from 1.8.0 to eventually 2010-08-05 21:29:25 +00:00
zooko commented 2010-08-05 21:38:07 +00:00
Author
Owner

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004909.html)

says:

This is a good example of how pipelining download of blocks (#1110) could help. Previously I thought of it as a performance improvement when downloading successive blocks of the same file. Therefore I figured that if you were doing streaming processing of the file, such as if it was a movie and you were playing it out at normal speed, then a sufficiently large segment size would make the download faster than your playout speed, so pipelining would not matter for that. But this example shows that for some cases the segment size is irrelevant—in this case (if Brian's guess is correct) a read-block-pipeline depth of >= 2 would take one round-trip off of the startup time.

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004909.html) says: This is a good example of how pipelining download of blocks (#1110) could help. Previously I thought of it as a performance improvement when downloading successive blocks of the same file. Therefore I figured that if you were doing streaming processing of the file, such as if it was a movie and you were playing it out at normal speed, then a sufficiently large segment size would make the download faster than your playout speed, so pipelining would not matter for that. But this example shows that for some cases the segment size is irrelevant—in this case (if Brian's guess is correct) a read-block-pipeline depth of >= 2 would take one round-trip off of the startup time.
zooko commented 2010-09-06 06:44:40 +00:00
Author
Owner

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005151.html)

Kyle's benchmarks and discussion and links to code:

Intriguing! It looks like upload typically took about 150 seconds and download took at least 850! Upload [has pipelining]source:trunk/src/allmydata/immutable/layout.py@4655#L118 and download [doesn't]source:trunk/src/allmydata/immutable/downloader/share.py@4707#L181. I wonder if that could account for all of that large difference!

This is issue #1110. It would probably make an excellent first hack for a new Tahoe-LAFS coder in the v1.9 timeframe. :-)

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005151.html) Kyle's benchmarks and discussion and links to code: Intriguing! It looks like upload typically took about 150 seconds and download took at least 850! Upload [has pipelining]source:trunk/src/allmydata/immutable/layout.py@4655#L118 and download [doesn't]source:trunk/src/allmydata/immutable/downloader/share.py@4707#L181. I wonder if that could account for *all* of that large difference! This is issue #1110. It would probably make an excellent first hack for a new Tahoe-LAFS coder in the v1.9 timeframe. :-)
davidsarah commented 2010-09-07 00:44:37 +00:00
Author
Owner

#1187 is a more ambitious generalization of this ticket. If you pipeline successive shares, but still download a fixed set of shares per segment per server, then the potential gain is limited by the fact that for each segment, you still have to wait for the server that finishes last. What #1187 proposes would tend to keep the pipe from each server as full as possible, by downloading as many shares from each server as bandwidth allows.

It may be worth doing this ticket first, but with an eye to how to extend it.

#1187 is a more ambitious generalization of this ticket. If you pipeline successive shares, but still download a fixed set of shares per segment per server, then the potential gain is limited by the fact that for each segment, you still have to wait for the server that finishes last. What #1187 proposes would tend to keep the pipe from each server as full as possible, by downloading as many shares from each server as bandwidth allows. It may be worth doing this ticket first, but with an eye to how to extend it.
zooko commented 2010-09-07 16:10:09 +00:00
Author
Owner

Marking this as 1.9.0 and "unfinished-business" as mentioned in http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005163.html .

Marking this as 1.9.0 and "unfinished-business" as mentioned in <http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005163.html> .
tahoe-lafs modified the milestone from eventually to 1.9.0 2010-09-07 16:10:09 +00:00
tahoe-lafs modified the milestone from 1.9.0 to soon 2011-07-27 18:22:37 +00:00
daira commented 2014-03-02 13:50:21 +00:00
Author
Owner

Eek, I'm shocked this isn't fixed yet.

Eek, I'm shocked this isn't fixed yet.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1110
No description provided.