badmerge -- concrete version with good semantics

See also the abstract version, in which every line of the file is simply an arbitrary character, to focus attention on the question of "Where does this change go?". This concrete version is to motivate the issue -- this kind of issue can happen in practice, and that if it does, the merge algorithm's behavior can have serious consequences.

    a
   / \
  b1  c1
  |    
  b2

Now the task is to merge b2 and c1. As far as I know all revision control tools such as git, Svn, Mercurial, and Bzr treat this as a 3-merge of (a, b2, c1). This uses less information than the weave-based tools (SCCS, BitKeeper, Vesta, and Codeville), and Darcs use.

The svn 3-merge algorithm and the GNU diff3 algorithm apply the changed line from c1 to the wrong place in b2, resulting in a wrong but apparently clean result.

merged by SVN v1.2.0 or GNU diff3 v3.8.1

Whereas darcs applies the changed from from c1 to the right place in b2, resulting in a clean merge with the correct result.

merged by darcs v1

The difference between what svn does and what darcs does here is not that darcs makes a better or luckier guess as to where the line from c1 should go, but that darcs uses information that svn does not use -- namely the information in b1 -- to learn that the location has moved and precisely to where it has moved. This is contrary to popular belief -- some people assume that the darcs merge algorithm must be applying a more clever heuristic to the same information.

Any algorithm which merges c1 with b2 by examining (a, b2, c1) but without reference to the information in b1 is doomed to do worse than darcs does in this sort of case. (Where a silent but incorrect merge such as SVN currently does is worse than raising it to the user as a conflict, which is worse than merging it correctly.)

It is important to understand that darcs merge performs better in this example not because it is "luckier" or "more complicated" than 3-merge (indeed darcs merge is actually a much simpler algorithm than 3-merge is for this example) but because darcs merge takes advantage of more information -- information which is already present in other systems such as SVN but which is not used in the current SVN merge operation.

code listings

listing a, C code


int square(int x) {
	int y = x;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++) y += x;
	return y;
}

listing b1, C code


int very_slow_square(int x) {
	int y = 0;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++)
		for (int j = 0; j < x; j++)
			y += 1;
	return y;
}

int square(int x) {
	int y = x;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++) y += x;
	return y;
}

listing b2, C code


int square(int x) {
	int y = x;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	return y * x;
}

int very_slow_square(int x) {
	int y = 0;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++)
		for (int j = 0; j < x; j++)
			y += 1;
	return y;
}

int slow_square(int x) {
	int y = x;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++) y += x;
	return y;
}

listing c1, C code


int square(int x) {
	int y = 0;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++) y += x;
	return y;
}

listing three-way merged, C code


int square(int x) {
	int y = 0;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	return y * x;
}

int very_slow_square(int x) {
	int y = 0;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++)
		for (int j = 0; j < x; j++)
			y += 1;
	return y;
}

int slow_square(int x) {
	int y = x;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++) y += x;
	return y;
}

listing darcs merged, C code


int square(int x) {
	int y = x;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	return y * x;
}

int very_slow_square(int x) {
	int y = 0;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++)
		for (int j = 0; j < x; j++)
			y += 1;
	return y;
}

int slow_square(int x) {
	int y = 0;
	/* Update y to equal the result. */
	/* Question: what is the order of magnitude of this algorithm with respect to x? */
	for (int i = 0; i < x; i++) y += x;
	return y;
}

zooko
Last modified: 2008-02-25