Error in Latency Report on MDMF Retrieve #1505

Closed
opened 2011-08-26 18:21:55 +00:00 by zancas · 6 comments
zancas commented 2011-08-26 18:21:55 +00:00
Owner

I viewed a write cap to a MDMF, with an associated 8MB file, in the WUI. The retrieve operation reported a 41 year latency in Per-Server Fetch Response Times. All servers were running on the same machine. (My laptop.)

I viewed a write cap to a MDMF, with an associated 8MB file, in the WUI. The retrieve operation reported a 41 year latency in Per-Server Fetch Response Times. All servers were running on the same machine. (My laptop.)
tahoe-lafs added the
unknown
major
defect
1.9.0a1
labels 2011-08-26 18:21:55 +00:00
tahoe-lafs added this to the undecided milestone 2011-08-26 18:21:55 +00:00
zancas commented 2011-08-26 18:22:07 +00:00
Author
Owner

Attachment response_time_error.html (3197 bytes) added

**Attachment** response_time_error.html (3197 bytes) added
kevan commented 2011-08-27 16:59:40 +00:00
Author
Owner

This specific issue is due to a bug around line 806 of retrieve.py:

         # validate this block, then generate the block hash root.
         self.log("validating share %d for segment %d" % (reader.shnum,
                                                              segnum))
         self._status.add_fetch_timing(reader.peerid, started)
         self._status.set_status("Valdiating blocks for segment %d" % segnum)

started is the time at which the fetch was started. RetrieveStatus.add_fetch_timing wants the elapsed time, though, so what we really want is time.time() - started. I'll attach a patch with this change.

While investigating this bug, I noticed several other rough spots in how the retrieve process interacts with its status object. In particular,

  • The progress field is set to 0.0 in Retrieve.__init__, then never touched again, so it says 0.0% even though the retrieve is finished.
  • The decrypt and decode timing doesn't take into account multiple segments, so later segments clobber the timing data for earlier segments and the report page only shows decrypt and decode timing data for the last segment.
  • The status field says "decrypting" even though the retrieval process has finished.

It's easy enough to be attentive about updating the status and progress fields. We can measure cumulative decryption and decoding time to address the second point; this requires minor modifications to the retrieve code and status code, and a wording tweak on the status template.

(Do these issues deserve their own ticket? I'll open one if so.)

This specific issue is due to a bug around line 806 of `retrieve.py`: ``` # validate this block, then generate the block hash root. self.log("validating share %d for segment %d" % (reader.shnum, segnum)) self._status.add_fetch_timing(reader.peerid, started) self._status.set_status("Valdiating blocks for segment %d" % segnum) ``` `started` is the time at which the fetch was started. `RetrieveStatus.add_fetch_timing` wants the elapsed time, though, so what we really want is `time.time() - started`. I'll attach a patch with this change. While investigating this bug, I noticed several other rough spots in how the retrieve process interacts with its status object. In particular, * The progress field is set to 0.0 in `Retrieve.__init__`, then never touched again, so it says 0.0% even though the retrieve is finished. * The decrypt and decode timing doesn't take into account multiple segments, so later segments clobber the timing data for earlier segments and the report page only shows decrypt and decode timing data for the last segment. * The status field says "decrypting" even though the retrieval process has finished. It's easy enough to be attentive about updating the status and progress fields. We can measure cumulative decryption and decoding time to address the second point; this requires minor modifications to the retrieve code and status code, and a wording tweak on the status template. (Do these issues deserve their own ticket? I'll open one if so.)
davidsarah commented 2011-08-28 03:39:19 +00:00
Author
Owner

Replying to kevan:

Do these issues deserve their own ticket? I'll open one if so.

I don't think so, this one is short enough.

Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/1505#issuecomment-126825): > Do these issues deserve their own ticket? I'll open one if so. I don't think so, this one is short enough.
warner commented 2011-08-28 23:25:42 +00:00
Author
Owner

I've been in a fix-everything-in-sight mood this weekend, so I took care of this one in changeset:9756146d61673552.

I've been in a fix-everything-in-sight mood this weekend, so I took care of this one in changeset:9756146d61673552.
tahoe-lafs modified the milestone from undecided to 1.9.0 2011-08-28 23:27:07 +00:00
warner commented 2011-08-29 06:37:08 +00:00
Author
Owner

hm, not sure why the auto-closer didn't notice the comment in changeset:9756146d61673552. Closing it manually.

hm, not sure why the auto-closer didn't notice the comment in changeset:9756146d61673552. Closing it manually.
tahoe-lafs added the
fixed
label 2011-08-29 06:37:08 +00:00
warner closed this issue 2011-08-29 06:37:08 +00:00
davidsarah commented 2011-08-29 14:38:01 +00:00
Author
Owner

Replying to warner:

hm, not sure why the auto-closer didn't notice the comment in changeset:9756146d61673552.

I think the action name ("fixes") is case-sensitive, so didn't match "Fixes". Code here.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1505#issuecomment-126829): > hm, not sure why the auto-closer didn't notice the comment in changeset:9756146d61673552. I think the action name ("fixes") is case-sensitive, so didn't match "Fixes". Code [here](http://trac.edgewall.org/browser/trunk/tracopt/ticket/commit_updater.py).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1505
No description provided.