allmydata.org source repository is broken #1017

Closed
opened 2010-04-08 23:28:04 +00:00 by davidsarah · 5 comments
davidsarah commented 2010-04-08 23:28:04 +00:00
Owner

Changes to the main trunk repository, on "hanford" (a.k.a. dev.allmydata.org), are normally mirrored to another repository on allmydata.org that is used by the darcs trac plugin to implement ["Browse Source"]source:. However the script that does this is not working, possibly due to disk problems on allmydata.org.

Apparently for the same or a related reason (?), none of the buildbots are able to check out the source from allmydata.org -- for example see this log and also this one (two different errors).

The script on hanford that is failing when trying to push changes to allmydata.org is /home/source/bin/mirror-to-org.sh, which is invoked by the post-commit hook, /home/darcs/tahoe/trunk-posthook.sh with argument tahoe/trunk. It fails with the message

darcs failed:  Not a repository:
source@allmydata.org:darcs/tahoe/trunk ((scp) failed to fetch:
source@allmydata.org:darcs/tahoe/trunk/_darcs/inventory)

The error on checking out the source from allmydata.org is currently:

darcs: failed to read patch in get_extra:
Sun Feb 21 12:36:26 PST 2010  freestorm77@gmail.com
  * munin-tahoe_storagespace
  Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa
  
  Plugin configuration rename
  
Perhaps this is a 'partial' repository?

Note that in a previous build there was a different error:

Invalid repository:  http://allmydata.org/source/tahoe/distribute

darcs failed:  Failed to download URL http://allmydata.org/source/tahoe/distribute/_darcs/inventory : HTTP error (404?)

The patch mentioned in the first checkout error above, which is also the only current difference in the hanford repository relative to allmydata.org, is the one attached to #968. I think this was pushed at approx. 23:30 UTC on April 3. It is a very minimal patch: it only changes a typo in a comment here. But we should avoid pushing other patches until this issue has been fixed.

Changes to the main trunk repository, on "hanford" (a.k.a. dev.allmydata.org), are normally mirrored to another repository on allmydata.org that is used by the darcs trac plugin to implement ["Browse Source"]source:. However the script that does this is not working, possibly due to disk problems on allmydata.org. Apparently for the same or a related reason (?), none of the [buildbots](http://allmydata.org/buildbot/waterfall?show_events=false&builder=Eugen+lenny-amd64&builder=Soultcer+lenny-amd64&builder=Arthur+lenny+c7+32bit&builder=hardy-amd64&builder=Zooko+karmic+amd64+yukyuk&builder=MM+netbsd4+i386+warp&builder=windows&builder=zooko+ootles+Mac-amd64+10.4&builder=David+A.+OpenSolaris+i386&builder=Kyle+OpenBSD-4.6+amd64&builder=deb-lenny-soultcer&builder=deb-jaunty&builder=deb-lenny-amd64-eugen&builder=tarballs&builder=clean&builder=memcheck-32&builder=memcheck-64) are able to check out the source from allmydata.org -- for example see [this log](http://allmydata.org/buildbot/builders/hardy-amd64/builds/425/steps/darcs/logs/stdio) and also [this one](http://allmydata.org/buildbot/builders/hardy-amd64/builds/426/steps/darcs/logs/stdio) (two different errors). The script on hanford that is failing when trying to push changes to allmydata.org is `/home/source/bin/mirror-to-org.sh`, which is invoked by the post-commit hook, `/home/darcs/tahoe/trunk-posthook.sh` with argument `tahoe/trunk`. It fails with the message ``` darcs failed: Not a repository: source@allmydata.org:darcs/tahoe/trunk ((scp) failed to fetch: source@allmydata.org:darcs/tahoe/trunk/_darcs/inventory) ``` The error on checking out the source from allmydata.org is currently: ``` darcs: failed to read patch in get_extra: Sun Feb 21 12:36:26 PST 2010 freestorm77@gmail.com * munin-tahoe_storagespace Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa Plugin configuration rename Perhaps this is a 'partial' repository? ``` Note that [in a previous build](http://allmydata.org/buildbot/builders/hardy-amd64/builds/425/steps/darcs/logs/stdio) there was a different error: ``` Invalid repository: http://allmydata.org/source/tahoe/distribute darcs failed: Failed to download URL http://allmydata.org/source/tahoe/distribute/_darcs/inventory : HTTP error (404?) ``` The patch mentioned in the first checkout error above, which is also the only current difference in the hanford repository relative to allmydata.org, is [the one](http://allmydata.org/trac/tahoe-lafs/attachment/ticket/968/munin-tahoe_storagespace.darcspatch.txt) attached to #968. I think this was pushed at approx. 23:30 UTC on April 3. It is a very minimal patch: it only changes a typo in a comment [here](http://allmydata.org/trac/tahoe-lafs/browser/misc/munin/tahoe_storagespace#L13). But we should avoid pushing other patches until this issue has been fixed.
tahoe-lafs added the
dev-infrastructure
supercritical
defect
n/a
labels 2010-04-08 23:28:04 +00:00
tahoe-lafs added this to the undecided milestone 2010-04-08 23:28:04 +00:00
davidsarah commented 2010-04-08 23:29:20 +00:00
Author
Owner

Attachment darcspush.txt (1565 bytes) added

Output of darcs push when mirroring script failed.

**Attachment** darcspush.txt (1565 bytes) added Output of `darcs push` when mirroring script failed.
zooko commented 2010-04-09 03:17:43 +00:00
Author
Owner

Hm, I wonder if this was a transient failure of "allmydata.org". It seems to be working okay now:

 Wonwin-McBrootles-Computer:~$ ssh zooko@allmydata.org "ls -lL /home/source/darcs/tahoe/trunk"
total 200
-rw-rw-r--  1 source source 18249 May  1  2008 COPYING.GPL
-rw-rw-r--  1 source source 11258 May  1  2008 COPYING.TGPPL.html
-rw-rw-r--  1 source source  2707 Mar  3 18:05 CREDITS
-rw-rw-r--  1 source source 15070 Feb  3 10:32 Makefile
-rw-rw-r--  1 source source 51865 Feb 26 23:31 NEWS
-rw-rw-r--  1 source source   422 Mar  3 15:29 README
-rw-rw-r--  1 source source    72 May  1  2008 Tahoe.home
-rw-rw-r--  1 source source  5194 Feb 14 21:15 _auto_deps.py
drwxrwsr-x  6 source source  4096 Mar  9 10:52 _darcs
drwxrwsr-x  2 source source  4096 Feb 11  2009 bin
drwxrwsr-x  3 source source  4096 Jun  8  2008 contrib
drwxrwsr-x  7 source source  4096 Mar  3 18:05 docs
-rw-rw-r--  1 source source  7683 Feb  5  2009 ez_setup.py
drwxrwsr-x  4 source source  4096 Sep 24  2009 mac
drwxrwsr-x 10 source source  4096 Mar  3 15:29 misc
-rw-rw-r--  1 source source  1510 Feb 23 23:01 relnotes-short.txt
-rw-rw-r--  1 source source  6166 Feb 23 23:06 relnotes.txt
-rw-rw-r--  1 source source  2949 Jul 16  2009 setup.cfg
-rw-rw-r--  1 source source 15355 Sep 20  2009 setup.py
drwxrwsr-x  3 source source  4096 May  1  2008 src
drwxrwsr-x  3 source source  4096 May  1  2008 twisted
drwxrwsr-x  2 source source  4096 Jan 25 20:34 windows

On the other hand, I can't check on the script on dev.allmydata.com because dev.allmydata.com is currently unreachable:

 Wonwin-McBrootles-Computer:~$ ping -c 3 dev.allmydata.com
PING hanford.allmydata.com (207.7.153.140): 56 data bytes

--- hanford.allmydata.com ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

I suspect that in the near future we'll move to allmydata.org -- I guess the "new" allmydata.org -- being the canonical repository and forget about dev.allmydata.com.

Hm, I wonder if this was a transient failure of "allmydata.org". It seems to be working okay now: ``` Wonwin-McBrootles-Computer:~$ ssh zooko@allmydata.org "ls -lL /home/source/darcs/tahoe/trunk" total 200 -rw-rw-r-- 1 source source 18249 May 1 2008 COPYING.GPL -rw-rw-r-- 1 source source 11258 May 1 2008 COPYING.TGPPL.html -rw-rw-r-- 1 source source 2707 Mar 3 18:05 CREDITS -rw-rw-r-- 1 source source 15070 Feb 3 10:32 Makefile -rw-rw-r-- 1 source source 51865 Feb 26 23:31 NEWS -rw-rw-r-- 1 source source 422 Mar 3 15:29 README -rw-rw-r-- 1 source source 72 May 1 2008 Tahoe.home -rw-rw-r-- 1 source source 5194 Feb 14 21:15 _auto_deps.py drwxrwsr-x 6 source source 4096 Mar 9 10:52 _darcs drwxrwsr-x 2 source source 4096 Feb 11 2009 bin drwxrwsr-x 3 source source 4096 Jun 8 2008 contrib drwxrwsr-x 7 source source 4096 Mar 3 18:05 docs -rw-rw-r-- 1 source source 7683 Feb 5 2009 ez_setup.py drwxrwsr-x 4 source source 4096 Sep 24 2009 mac drwxrwsr-x 10 source source 4096 Mar 3 15:29 misc -rw-rw-r-- 1 source source 1510 Feb 23 23:01 relnotes-short.txt -rw-rw-r-- 1 source source 6166 Feb 23 23:06 relnotes.txt -rw-rw-r-- 1 source source 2949 Jul 16 2009 setup.cfg -rw-rw-r-- 1 source source 15355 Sep 20 2009 setup.py drwxrwsr-x 3 source source 4096 May 1 2008 src drwxrwsr-x 3 source source 4096 May 1 2008 twisted drwxrwsr-x 2 source source 4096 Jan 25 20:34 windows ``` On the other hand, I can't check on the script on dev.allmydata.com because dev.allmydata.com is currently unreachable: ``` Wonwin-McBrootles-Computer:~$ ping -c 3 dev.allmydata.com PING hanford.allmydata.com (207.7.153.140): 56 data bytes --- hanford.allmydata.com ping statistics --- 3 packets transmitted, 0 packets received, 100% packet loss ``` I suspect that in the near future we'll move to allmydata.org -- I guess the "new" allmydata.org -- being the canonical repository and forget about dev.allmydata.com.
davidsarah commented 2010-04-09 03:47:45 +00:00
Author
Owner

hanford is reachable as dev.allmydata.org. I can ssh to it without problems.

You can tell that the source mirror is still not up-to-date by looking at http://allmydata.org/trac/tahoe-lafs/browser/misc/munin/tahoe_storagespace#L13 -- it still shows tahoe-storagespace instead of tahoe_storagespace. The corresponding file on hanford (/home/darcs/tahoe/trunk/misc/munin/tahoe_storagespace) has the patch applied. I don't have an account on allmydata.org, but if you do:

ssh zooko@allmydata.org "cat /home/source/darcs/tahoe/trunk/misc/munin/tahoe_storagespace"

that should confirm the problem.

I just tried running the /home/source/bin/mirror-to-org.sh script manually again on hanford, and it failed in the same way. I don't think it's a permissions problem on hanford, because that's not consistent with the error message, and in any case the script that actually does the mirroring is run via suid_exec.

We could try pushing another trivial patch, but I'm fairly sure that will also fail.

hanford is reachable as dev.allmydata.**org**. I can ssh to it without problems. You can tell that the source mirror is still not up-to-date by looking at <http://allmydata.org/trac/tahoe-lafs/browser/misc/munin/tahoe_storagespace#L13> -- it still shows `tahoe-storagespace` instead of `tahoe_storagespace`. The corresponding file on hanford (`/home/darcs/tahoe/trunk/misc/munin/tahoe_storagespace`) has the patch applied. I don't have an account on allmydata.org, but if you do: ``` ssh zooko@allmydata.org "cat /home/source/darcs/tahoe/trunk/misc/munin/tahoe_storagespace" ``` that should confirm the problem. I just tried running the `/home/source/bin/mirror-to-org.sh` script manually again on hanford, and it failed in the same way. I don't think it's a permissions problem on hanford, because that's not consistent with the error message, and in any case the script that actually does the mirroring is run via suid_exec. We could try pushing another trivial patch, but I'm fairly sure that will also fail.
davidsarah commented 2010-04-12 23:52:33 +00:00
Author
Owner

Checkouts by buildbots are affected as well. I'd bump up the priority of this ticket, but it is already supercritical :-)

If the problem is the disk failure on allmydata.org, then perhaps:

  • mount /home, or at least /home/darcs, from a different disk.
  • move aside any repos that might be corrupted and pull them again from hanford.
Checkouts by buildbots are affected as well. I'd bump up the priority of this ticket, but it is already supercritical :-) If the problem is the disk failure on allmydata.org, then perhaps: * mount /home, or at least /home/darcs, from a different disk. * move aside any repos that might be corrupted and pull them again from hanford.
tahoe-lafs changed title from Mirroring of source to allmydata.org trac is broken to allmydata.org source repository is broken 2010-04-12 23:52:33 +00:00
tahoe-lafs modified the milestone from undecided to soon (release n/a) 2010-04-13 00:20:59 +00:00
secorp commented 2010-04-16 22:15:32 +00:00
Author
Owner

This problem stemmed from the /etc/resolv.conf file on dev.allmydata.com not having the proper dns server for name resolution. This caused allmydata.org not to resolve which caused the darcs push command to time out. After updating the /etc/resolv.conf file (necessary after the machines were moved and re-IPed), david-sarah verified that the pushes were working and it also looks like the buildslaves are working too.

This problem stemmed from the /etc/resolv.conf file on dev.allmydata.com not having the proper dns server for name resolution. This caused allmydata.org not to resolve which caused the darcs push command to time out. After updating the /etc/resolv.conf file (necessary after the machines were moved and re-IPed), david-sarah verified that the pushes were working and it also looks like the buildslaves are working too.
tahoe-lafs added the
fixed
label 2010-04-16 22:15:32 +00:00
secorp closed this issue 2010-04-16 22:15:32 +00:00
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1017
No description provided.