the github "attic" repo

Brian Warner warner at lothar.com
Fri Apr 1 08:12:33 UTC 2016


TL;DR:

* https://github.com/tahoe-lafs-attic/tahoe-lafs.git is the "Attic"
* I'll be moving old branches out of the official repo, into the Attic
* please create branches in your personal github repo, not official
* please delete branches when you're done with them, but feel free to
  push them into the Attic first


Now that the 1.11 release is out the door, I'm doing a lot of
housecleaning. One thing I've been meaning to do for a while is clean up
the massive collection of stale branches and tags that are littering our
official Github repo (https://github.com/tahoe-lafs/tahoe-lafs).

The tags were pretty straightforward. I moved 12 tags that were
leftovers from our darcs migration: they had names like "trac-4400", and
pointed at the git revision that matches the darcs revision which Trac
serialized as commit #4400. During the migration I wrote a script to
rewrite all the Trac hyperlinks into git hashes, so there should be
nothing left that references them. Two more tags ("pre-393" and
"post-393") were involved in the intricate merge of MDMF: I've moved
those into the Attic too. That leaves 55 real tags, all of which point
to some alpha/beta/rc/final release.

The bigger issue is that we've got 202 branches. I think our official
repo should have just one ("master"), plus maybe a few oddball ones like
"rel-1.9.1" and "rel-1.9.2" (which deal with the fact that
1.9.2 was released from darcs, after the git migration, and is not
well-represented in our git repo).

The other 199 branches are development work for various tickets,
multiplied by various rebases. Several developers currently use the
following workflow:

* push the proposed changes to a new branch in the official repo with a
  name like "195-windows-packaging-0", then file a pull-request from
  there into "master" (or discuss it on Trac)
* receive feedback, add commits to the branch, sometimes merge trunk in
* when things look more ready, clone the branch to a new name (like
  "195-windows-packaging-1"), rebase to current master, and re-push
* repeat when it doesn't get merged right away
* when/if the PR is merged, leave all the branches around, for posterity

The result is that we've got e.g. 11 branches for ticket #195, at most
one of which (the final merged one) is actually reachable by master. (In
fact none of the #195 branches are reachable by master: the relevant
commits were merged in a different way).

The consequence of this workflow is that everyone who forks the official
repo winds up with a snapshot of all these vestigal works-in-progress
branches. In my mind, this clutters up their forks and makes it
difficult to figure out what branches to pay attention to. I can't tell
the difference between active development and ancient history.

When I tell GitX to show me the entire "railroad" diagram of branches
from the people that I follow, I get something completely
incomprehensible like this:


https://gist.githubusercontent.com/warner/571a2c14a98964d1c6406369ec4f638d/raw/ffc4620b40c6cd7faff97e287ff13f264da423fd/git-chaos.png

To be fair, we've got a lot going on, so it's not reasonable to expect
*too* much simplicity:

* we have some wonderfully prolific developers
* they build on each others work, so the branches are flying back and
  forth furiously
* there are several large projects that have spanned months or years,
  have evolved and been re-designed multiple times, *and* build upon
  other such projects. Like servers-of-happiness, Accounting, the
  improved leasedb, LAE's cloud-backend, and LAE's magic-folders, all of
  which are intimidatingly large review/merge tasks that only get worse
  as time goes on and trunk evolves out from under them.


So I'm trying to figure out how to tame this chaos. We've talked about
this on the weekly devchats a couple of times, and learned that a couple
of us disagree on whether to keep old branches around (both ones that
have been obsoleted due to rebasing/rewritng, and ones that have been
merged but serve as reminders of where the merge parent came from).

The compromise that we reached was that we'd set up an "Attic"
repository, and we could move all old branches into it, and remove them
from the main repo that people clone. I've created a new Github
pseudo-user named "tahoe-lafs-attic", and it has forked the official
tahoe-lafs repository. You can add the "attic" remote to your local git
checkout with:

 git remote add attic https://github.com/tahoe-lafs-attic/tahoe-lafs.git

If you're into nostalgia and dusty old branches, "git fetch attic" will
satisfy your need for history.

I'll be moving branches out of the official repo (tahoe-lafs/tahoe-lafs)
and into the attic (tahoe-lafs-attic/tahoe-lafs) on an ongoing basis. I
won't move something if there's an open pull-request for it or if it's
the current branch for an open ticket.

I have a few requests for my fellow developers to consider:

* For future development, please host the branch in your personal repo,
  rather than in the official one. Make the pull-request be e.g.
  (warner/1234-feature.0 -> tahoe-lafs/master), instead of
  (tahoe-lafs/1234-feature.0 -> tahoe-lafs/master).

* When you're done with a branch, please delete it. If this feels deeply
  wrong to you, please consider pushing it to the Attic and then
  deleting it. (ping me and I'll add you as a collaborator so you can
  push directly). I'm working on some scripts to automate this, but it's
  basically just 'git push attic $NAME; git push $ME :$NAME'. To save
  yourself from the local clutter too, add 'git branch -D $NAME'.

Also, I think maybe we should delete the tags from our personal repos
('git ls-remote $ME |grep tags', 'git push $ME :tag'). Let the official
repo be responsible for advertising what e.g. tahoe-lafs-1.11.0 points
to. Of course, keep copies in your local repo (no need to 'git tag -d
$NAME'). This isn't such a big deal, and Github doesn't help (it copies
all tags and branches when you fork a repo), but I think it'd be tidier.

BTW, one other historical constraint was the buildbot. Unlike Travis-CI,
buildbot does not know how to create confined containers for each build,
so it's not safe to test arbitrary pull-requests on our buildslaves. We
whitelisted the official repo and a handful of core developers, but
before we set up Travis, the only way to get any test coverage *before*
your work landed on trunk was to push the branch to the official repo.

Now that we have Travis set up, I think it provides good-enough coverage
of pull-requests (and the workflow is pretty polished). So I think the
buildbot should only get used for trunk (in fact it's probably
configured this way now), and test coverage should no longer be a
constraint on where in-progress branches must live.


thanks,
 -Brian


More information about the tahoe-dev mailing list