[tahoe-dev] the good thing about "darcs replace"

Mon Aug 1 17:10:53 PDT 2011

Folks:

darcs has its problems (mainly to do with performance, and also the
lack of a short secure efficiently-checkable identifier for a
filesystem state), but one good thing about it is "darcs replace".

If you're going to rename a bunch of instances of one word to another
word, then if you do that with "darcs replace" instead of by editing
the file it can avoid merge conflicts. Here is an example in the form
of the notes I took while merging #1363 and #1382. These two branches
are classic examples of this, because in #1363 Brian renamed a few
variables, and in #1382 Kevan rewrote some of the code that used those
variables. Neither was aware of the other's branch when they were
doing this.

David-Sarah helped me with part of this process and they said that
their opinion of using "darcs replace" was changed by doing this
experiment.

I rerecorded the patches from Brian's #1363 branch to use "darcs
replace" instead of hunk editing for any file in which doing so would
result in fewer total hunks. If the thing being renamed was a
relatively rare token, like "contacted_trackers", then using darcs
replace eliminates the hunks that manually edited that name and does
not require any new manually edited hunks. If the thing being renamed
was a common token, like "name", then darcs replace might also change
some instances of it that shouldn't be changed (such as in comments or
string literals, or unrelated variables that are also named "name").
In that case, I counted, for each file, how many hunks it would take
to put back all the bogus changes after darcs replace versus how many
hunks it would take to make the desired changes without using darcs
replace. Whichever approach yielded the fewest hunks is the one I
took. This is because I regarded every additional hunk as threatening
to cause later merge conflicts.

I pulled the first two patches. The second one—"upload.py: apply
David-Sarah's advice rename (un)contacted(2) trackers to
first_pass/second_pass/next_pass"—caused conflicts. There were 12
conflicts in the no-darcs-replace world, involving 318 lines of code,
and 7 conflicts in the darcs-replace world, involving 109 lines of
code.

pulled-first-and-second-patches/no-darcs-replace$ darcs whatsnew | wc
    318    1922   15227
pulled-first-and-second-patches/no-darcs-replace$ darcs whatsnew |
grep "^\+v v v v " | wc -l
12
pulled-first-and-second-patches/no-darcs-replace$ cd ../yes-darcs-replace/
pulled-first-and-second-patches/yes-darcs-replace$ darcs whatsnew | wc
    109     709    5738
pulled-first-and-second-patches/yes-darcs-replace$ darcs whatsnew |
grep "^\+v v v v " | wc -l
7

(See the exhibits [1] in the directory named
"pulled-first-and-second-patches", which show the results of the
respective darcs pulls.)

Then I manually merged both sides, with David-Sarah looking over my
shoulder and helping for part of it (pair merging).

Then I diffed the two sides. I found a number of places where I had
merged the two differently. In each case I had merged the
no-darcs-replace version incorrectly and the yes-darcs-replace version
corretly. There were two causes of differences. First, I had lost a
rename ("contacted_trackers" => "second_pass_trackers") in the
no-darcs-replace version resulting in five places (five hunks in the
diff) that had the wrong name. Second, I had left in a large function
(_loop), all uses of which had been removed by Kevan's patch. I
remember being a bit confused about the hunks that removed that
function when merging and I speculate that the added cognitive
overhead of sorting through the hunks caused by renaming conflicts
made it easier for me to do the wrong thing with this hunk. (See the
exhibits in "merged-second-patch", which show the results of the
manual pair merging.)

Then I manually fixed all the mistakes that I had made in the world
without darcs replace.

fixed-after-merged-second-patch/no-darcs-replace$ darcs whatsnew | wc
    390    2263   21427
fixed-after-merged-second-patch/no-darcs-replace$ darcs whatsnew |
grep "^\+v v v v " | wc -l
0
fixed-after-merged-second-patch/no-darcs-replace$ cd ../yes-darcs-replace/
fixed-after-merged-second-patch/yes-darcs-replace$ darcs whatsnew | wc
     91     503    5459
fixed-after-merged-second-patch/yes-darcs-replace$ darcs whatsnew |
grep "^\+v v v v " | wc -l
0

(See the exhibits in "fixed-after-merged-second-patch", which show the
results of the manual fixing of the bugs inserted while merging.)

Then pulled then next patch—"replace IServer.name() with get_name(),
and get_longname()". In the no-darcs-replace world, this caused more
merge conflicts (6 more conflicts, touching 398 lines of code). In the
yes-darcs-replace world, it merged cleanly.

pulled-next-patch-after-merging-first-conflicting-patch/no-darcs-replace$
darcs whatsnew | wc
    398    2312   21216
pulled-next-patch-after-merging-first-conflicting-patch/no-darcs-replace$
darcs whatsnew | grep "^\+v v v v " | wc -l
6
pulled-next-patch-after-merging-first-conflicting-patch/no-darcs-replace$
cd ../yes-darcs-replace/
pulled-next-patch-after-merging-first-conflicting-patch/yes-darcs-replace$
darcs whatsnew | wc
     91     503    5467
pulled-next-patch-after-merging-first-conflicting-patch/yes-darcs-replace$
darcs whatsnew | grep "^\+v v v v " | wc -l
0

(See the exhibits in
"pulled-next-patch-after-merging-first-conflicting-patch", which show
the results of the darcs pulls.)

Then I manually merged the conflicts from the no-darcs world.

Then I diffed the two sides. Again I found a number of bugs that had
resulted from the pull followed by my manual merge in the
no-darcs-replace world. I had chosen the #1382 side of each
conflicting hunk, but some non-conflicting hunks from #1382 had added
uses of ".name". In the yes-darcs-replace world those non-conflicting
hunks had been automatically converted to ".get_name", in the
no-darcs-replace world, they still said ".name" and were wrong. When I
was merging conflicts I only looked at the conflicting hunks and
didn't look at the other non-conflicting changes.

(See exhibits in "merged-third-patch".)

After fixing that, the rest of the patches merged cleanly in both
worlds. I had thought that in the no-darcs-replace world that more of
the other patches would also happen to touch places that had
previously conflicted and so they would "re-open old wounds" and
require another round of manual merging, but that didn't happen—the
rest of the patches didn't touch any of the same spots.

The bottom line: if you are doing a renaming, and if neither the old
nor new name are a common token that occurs frequently in contexts
where it shouldn't be renamed (such as in docstrings, comments, or
other things that have the same token but aren't the same variable
that you wish to rename), then you can spare significant time and
bug-creation for people who merge your patch if you use darcs replace
instead of recording a series of hunk changes.

Regards,

Zooko

P.S. The weird hunks that confused David-Sarah and Zancas in a recent
"darcs replace" patch (ticket #1120) do not occur if you don't use
"darcs replace --force" to go ahead with a darcs replace even when the
new token already appears somewhere in the file. Maybe I should use
darcs replace only when the new token doesn't appear in the file in
order to avoid those confusing hunks.

[1] http://zooko.com/pubscratch/exhibits.tar.lz
note: file size = 23 MB

http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1120# simplify Unicode
support by assuming that argv and output encodings are the same
http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1363# refactor
storage_client.py, use IServer objects instead of rrefs
http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1382# immutable peer
selection refactoring and enhancements