tahoe-lafs/trac-2024-07-25

show full, explorable details about check and repair operations #1821

New issue

Open

opened 2012-10-04 15:39:41 +00:00 by zooko · 3 comments

zooko commented

2012-10-04 15:39:41 +00:00

Owner

On Mon, Jul 9, 2012 at 10:39 AM, Brad Rupp <bradrupp@gmail.com> wrote:
>
> The output from repair #1:
>
> repair successful
> done: 11801 objects checked
>  pre-repair: 11725 healthy, 76 unhealthy
>  76 repairs attempted, 76 successful, 0 failed
>  post-repair: 11801 healthy, 0 unhealthy
>
> The output from repair #2:
>
> done: 11801 objects checked
>  pre-repair: 11789 healthy, 12 unhealthy
>  12 repairs attempted, 11 successful, 1 failed
>  post-repair: 11800 healthy, 1 unhealthy
>
> As you can see, the first repair found and fixed 76 unhealthy objects. The
> second repair, approximately 12 hours later, found 12 unhealthy objects and
> fixed 11 of them.
>
> Why would the second repair find 12 unhealthy objects?  I would have
> expected it to find 0 unhealthy objects given that the first repair was
> performed only 12 hours earlier.

Wouldn't it be great if the text that said "12 repairs attempted, 11 successful, 1 failed" had hyperlinks to web pages that listed all of the repair attempts, where you could see which file was not healthy, which servers the repair job attempted to use to repair the file, and what happened with each server that led to success or failure?

Providing such a web page would mostly just be a matter of "web programming" -- generating HTML that shows the contents of the Python objects in memory which contain that data.

See [//pipermail/tahoe-dev/2012-July/007544.html this thread on the tahoe-dev list].

``` On Mon, Jul 9, 2012 at 10:39 AM, Brad Rupp <bradrupp@gmail.com> wrote: > > The output from repair #1: > > repair successful > done: 11801 objects checked > pre-repair: 11725 healthy, 76 unhealthy > 76 repairs attempted, 76 successful, 0 failed > post-repair: 11801 healthy, 0 unhealthy > > The output from repair #2: > > done: 11801 objects checked > pre-repair: 11789 healthy, 12 unhealthy > 12 repairs attempted, 11 successful, 1 failed > post-repair: 11800 healthy, 1 unhealthy > > As you can see, the first repair found and fixed 76 unhealthy objects. The > second repair, approximately 12 hours later, found 12 unhealthy objects and > fixed 11 of them. > > Why would the second repair find 12 unhealthy objects? I would have > expected it to find 0 unhealthy objects given that the first repair was > performed only 12 hours earlier. ``` Wouldn't it be great if the text that said "12 repairs attempted, 11 successful, 1 failed" had hyperlinks to web pages that listed all of the repair attempts, where you could see which file was not healthy, which servers the repair job attempted to use to repair the file, and what happened with each server that led to success or failure? Providing such a web page would mostly just be a matter of "web programming" -- generating HTML that shows the contents of the Python objects in memory which contain that data. See [//pipermail/tahoe-dev/2012-July/007544.html this thread on the tahoe-dev list].

tahoe-lafs added the

labels 2012-10-04 15:39:41 +00:00

tahoe-lafs added this to the undecided milestone 2012-10-04 15:39:41 +00:00

tahoe-lafs added

and removed

labels 2012-10-04 15:40:59 +00:00

davidsarah commented

2012-10-11 04:31:19 +00:00

Author

Owner

I think this is a good idea.

tahoe-lafs modified the milestone from undecided to eventually

2012-10-11 04:31:19 +00:00

zooko commented

2013-12-05 16:56:08 +00:00

Author

Owner

related tickets: #1596, #1116, #2101, #2130

daira commented

2014-12-11 23:26:39 +00:00

Author

Owner

#2130 was a duplicate. Its description was:

In today's Weekly Dev Chat, nejucomo said that in addition to synthetic metrics like "recoverable, healthy, happy, and needs-rebalancing", he wants to see the complete list of which servers are holding which shares. That sounds like a great idea! To close this ticket, make it so that checker results contain that information.

related tickets: #1821, #1596, #1116

especially related ticket: #2101, which is the same as this ticket except #2101 is about presenting this information in an error message and this ticket is about presenting it in a checker-results.

#2130 was a duplicate. Its description was: > In today's Weekly Dev Chat, nejucomo said that in addition to synthetic metrics like "recoverable, healthy, happy, and needs-rebalancing", he wants to see the complete list of which servers are holding which shares. That sounds like a great idea! To close this ticket, make it so that checker results contain that information. > > related tickets: #1821, #1596, #1116 > > especially related ticket: #2101, which is the same as this ticket except #2101 is about presenting this information in an error message and this ticket is about presenting it in a checker-results.