tahoe-lafs/trac-2024-07-25

Error in Latency Report on MDMF Retrieve #1505

New issue

Closed

opened 2011-08-26 18:21:55 +00:00 by zancas · 6 comments

zancas commented

2011-08-26 18:21:55 +00:00

Owner

I viewed a write cap to a MDMF, with an associated 8MB file, in the WUI. The retrieve operation reported a 41 year latency in Per-Server Fetch Response Times. All servers were running on the same machine. (My laptop.)

tahoe-lafs added the

labels 2011-08-26 18:21:55 +00:00

tahoe-lafs added this to the undecided milestone 2011-08-26 18:21:55 +00:00

zancas commented

2011-08-26 18:22:07 +00:00

Author

Owner

Attachment response_time_error.html (3197 bytes) added

**Attachment** response_time_error.html (3197 bytes) added

response_time_error.html

3.1 KiB

kevan commented

2011-08-27 16:59:40 +00:00

Author

Owner

This specific issue is due to a bug around line 806 of retrieve.py:

         # validate this block, then generate the block hash root.
         self.log("validating share %d for segment %d" % (reader.shnum,
                                                              segnum))
         self._status.add_fetch_timing(reader.peerid, started)
         self._status.set_status("Valdiating blocks for segment %d" % segnum)

started is the time at which the fetch was started. RetrieveStatus.add_fetch_timing wants the elapsed time, though, so what we really want is time.time() - started. I'll attach a patch with this change.

While investigating this bug, I noticed several other rough spots in how the retrieve process interacts with its status object. In particular,

The progress field is set to 0.0 in Retrieve.__init__, then never touched again, so it says 0.0% even though the retrieve is finished.
The decrypt and decode timing doesn't take into account multiple segments, so later segments clobber the timing data for earlier segments and the report page only shows decrypt and decode timing data for the last segment.
The status field says "decrypting" even though the retrieval process has finished.

It's easy enough to be attentive about updating the status and progress fields. We can measure cumulative decryption and decoding time to address the second point; this requires minor modifications to the retrieve code and status code, and a wording tweak on the status template.

(Do these issues deserve their own ticket? I'll open one if so.)

This specific issue is due to a bug around line 806 of `retrieve.py`: ``` # validate this block, then generate the block hash root. self.log("validating share %d for segment %d" % (reader.shnum, segnum)) self._status.add_fetch_timing(reader.peerid, started) self._status.set_status("Valdiating blocks for segment %d" % segnum) ``` `started` is the time at which the fetch was started. `RetrieveStatus.add_fetch_timing` wants the elapsed time, though, so what we really want is `time.time() - started`. I'll attach a patch with this change. While investigating this bug, I noticed several other rough spots in how the retrieve process interacts with its status object. In particular, * The progress field is set to 0.0 in `Retrieve.__init__`, then never touched again, so it says 0.0% even though the retrieve is finished. * The decrypt and decode timing doesn't take into account multiple segments, so later segments clobber the timing data for earlier segments and the report page only shows decrypt and decode timing data for the last segment. * The status field says "decrypting" even though the retrieval process has finished. It's easy enough to be attentive about updating the status and progress fields. We can measure cumulative decryption and decoding time to address the second point; this requires minor modifications to the retrieve code and status code, and a wording tweak on the status template. (Do these issues deserve their own ticket? I'll open one if so.)

davidsarah commented

2011-08-28 03:39:19 +00:00

Author

Owner

Replying to kevan:

Do these issues deserve their own ticket? I'll open one if so.

I don't think so, this one is short enough.

Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/1505#issuecomment-126825): > Do these issues deserve their own ticket? I'll open one if so. I don't think so, this one is short enough.

warner commented

2011-08-28 23:25:42 +00:00

Author

Owner

I've been in a fix-everything-in-sight mood this weekend, so I took care of this one in changeset:9756146d61673552.

tahoe-lafs modified the milestone from undecided to 1.9.0

2011-08-28 23:27:07 +00:00

warner commented

2011-08-29 06:37:08 +00:00

Author

Owner

hm, not sure why the auto-closer didn't notice the comment in changeset:9756146d61673552. Closing it manually.

tahoe-lafs added the

fixed

label 2011-08-29 06:37:08 +00:00

warner closed this issue

2011-08-29 06:37:08 +00:00

davidsarah commented

2011-08-29 14:38:01 +00:00

Author

Owner

Replying to warner:

hm, not sure why the auto-closer didn't notice the comment in changeset:9756146d61673552.

I think the action name ("fixes") is case-sensitive, so didn't match "Fixes". Code here.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1505#issuecomment-126829): > hm, not sure why the auto-closer didn't notice the comment in changeset:9756146d61673552. I think the action name ("fixes") is case-sensitive, so didn't match "Fixes". Code [here](http://trac.edgewall.org/browser/trunk/tracopt/ticket/commit_updater.py).