#1017 closed defect (fixed)

allmydata.org source repository is broken

Reported by: davidsarah Owned by: somebody
Priority: supercritical Milestone: soon (release n/a)
Component: dev-infrastructure Version: n/a
Keywords: trac darcs Cc:
Launchpad Bug:

Description (last modified by davidsarah)

Changes to the main trunk repository, on "hanford" (a.k.a. dev.allmydata.org), are normally mirrored to another repository on allmydata.org that is used by the darcs trac plugin to implement Browse Source. However the script that does this is not working, possibly due to disk problems on allmydata.org.

Apparently for the same or a related reason (?), none of the buildbots are able to check out the source from allmydata.org -- for example see this log and also this one (two different errors).

The script on hanford that is failing when trying to push changes to allmydata.org is /home/source/bin/mirror-to-org.sh, which is invoked by the post-commit hook, /home/darcs/tahoe/trunk-posthook.sh with argument tahoe/trunk. It fails with the message

darcs failed:  Not a repository:
source@allmydata.org:darcs/tahoe/trunk ((scp) failed to fetch:
source@allmydata.org:darcs/tahoe/trunk/_darcs/inventory)

The error on checking out the source from allmydata.org is currently:

darcs: failed to read patch in get_extra:
Sun Feb 21 12:36:26 PST 2010  freestorm77@gmail.com
  * munin-tahoe_storagespace
  Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa
  
  Plugin configuration rename
  
Perhaps this is a 'partial' repository?

Note that in a previous build there was a different error:

Invalid repository:  http://allmydata.org/source/tahoe/distribute

darcs failed:  Failed to download URL http://allmydata.org/source/tahoe/distribute/_darcs/inventory : HTTP error (404?)

The patch mentioned in the first checkout error above, which is also the only current difference in the hanford repository relative to allmydata.org, is the one attached to #968. I think this was pushed at approx. 23:30 UTC on April 3. It is a very minimal patch: it only changes a typo in a comment here. But we should avoid pushing other patches until this issue has been fixed.

Attachments (1)

darcspush.txt (1.5 KB) - added by davidsarah at 2010-04-08T23:29:20Z.
Output of darcs push when mirroring script failed.

Download all attachments as: .zip

Change History (9)

Changed at 2010-04-08T23:29:20Z by davidsarah

Output of darcs push when mirroring script failed.

comment:1 Changed at 2010-04-08T23:29:44Z by davidsarah

  • Description modified (diff)

comment:2 Changed at 2010-04-09T03:17:43Z by zooko

Hm, I wonder if this was a transient failure of "allmydata.org". It seems to be working okay now:

 Wonwin-McBrootles-Computer:~$ ssh zooko@allmydata.org "ls -lL /home/source/darcs/tahoe/trunk"
total 200
-rw-rw-r--  1 source source 18249 May  1  2008 COPYING.GPL
-rw-rw-r--  1 source source 11258 May  1  2008 COPYING.TGPPL.html
-rw-rw-r--  1 source source  2707 Mar  3 18:05 CREDITS
-rw-rw-r--  1 source source 15070 Feb  3 10:32 Makefile
-rw-rw-r--  1 source source 51865 Feb 26 23:31 NEWS
-rw-rw-r--  1 source source   422 Mar  3 15:29 README
-rw-rw-r--  1 source source    72 May  1  2008 Tahoe.home
-rw-rw-r--  1 source source  5194 Feb 14 21:15 _auto_deps.py
drwxrwsr-x  6 source source  4096 Mar  9 10:52 _darcs
drwxrwsr-x  2 source source  4096 Feb 11  2009 bin
drwxrwsr-x  3 source source  4096 Jun  8  2008 contrib
drwxrwsr-x  7 source source  4096 Mar  3 18:05 docs
-rw-rw-r--  1 source source  7683 Feb  5  2009 ez_setup.py
drwxrwsr-x  4 source source  4096 Sep 24  2009 mac
drwxrwsr-x 10 source source  4096 Mar  3 15:29 misc
-rw-rw-r--  1 source source  1510 Feb 23 23:01 relnotes-short.txt
-rw-rw-r--  1 source source  6166 Feb 23 23:06 relnotes.txt
-rw-rw-r--  1 source source  2949 Jul 16  2009 setup.cfg
-rw-rw-r--  1 source source 15355 Sep 20  2009 setup.py
drwxrwsr-x  3 source source  4096 May  1  2008 src
drwxrwsr-x  3 source source  4096 May  1  2008 twisted
drwxrwsr-x  2 source source  4096 Jan 25 20:34 windows

On the other hand, I can't check on the script on dev.allmydata.com because dev.allmydata.com is currently unreachable:

 Wonwin-McBrootles-Computer:~$ ping -c 3 dev.allmydata.com
PING hanford.allmydata.com (207.7.153.140): 56 data bytes

--- hanford.allmydata.com ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

I suspect that in the near future we'll move to allmydata.org -- I guess the "new" allmydata.org -- being the canonical repository and forget about dev.allmydata.com.

comment:3 Changed at 2010-04-09T03:47:45Z by davidsarah

hanford is reachable as dev.allmydata.org. I can ssh to it without problems.

You can tell that the source mirror is still not up-to-date by looking at http://allmydata.org/trac/tahoe-lafs/browser/misc/munin/tahoe_storagespace#L13 -- it still shows [tahoe-storagespace] instead of [tahoe_storagespace]. The corresponding file on hanford (/home/darcs/tahoe/trunk/misc/munin/tahoe_storagespace) has the patch applied. I don't have an account on allmydata.org, but if you do:

ssh zooko@allmydata.org "cat /home/source/darcs/tahoe/trunk/misc/munin/tahoe_storagespace"

that should confirm the problem.

I just tried running the /home/source/bin/mirror-to-org.sh script manually again on hanford, and it failed in the same way. I don't think it's a permissions problem on hanford, because that's not consistent with the error message, and in any case the script that actually does the mirroring is run via suid_exec.

We could try pushing another trivial patch, but I'm fairly sure that will also fail.

comment:4 Changed at 2010-04-12T23:05:03Z by davidsarah

  • Description modified (diff)

comment:5 Changed at 2010-04-12T23:52:33Z by davidsarah

  • Description modified (diff)
  • Summary changed from Mirroring of source to allmydata.org trac is broken to allmydata.org source repository is broken

Checkouts by buildbots are affected as well. I'd bump up the priority of this ticket, but it is already supercritical :-)

If the problem is the disk failure on allmydata.org, then perhaps:

  • mount /home, or at least /home/darcs, from a different disk.
  • move aside any repos that might be corrupted and pull them again from hanford.

comment:6 Changed at 2010-04-13T00:20:59Z by davidsarah

  • Milestone changed from undecided to soon (release n/a)

comment:7 Changed at 2010-04-16T22:00:27Z by davidsarah

  • Description modified (diff)

comment:8 Changed at 2010-04-16T22:15:32Z by secorp

  • Resolution set to fixed
  • Status changed from new to closed

This problem stemmed from the /etc/resolv.conf file on dev.allmydata.com not having the proper dns server for name resolution. This caused allmydata.org not to resolve which caused the darcs push command to time out. After updating the /etc/resolv.conf file (necessary after the machines were moved and re-IPed), david-sarah verified that the pushes were working and it also looks like the buildslaves are working too.

Note: See TracTickets for help on using tickets.