[tahoe-dev] What are the common variable names for erasure coding parameters?
zooko
zooko at zooko.com
Fri Dec 21 08:47:19 PST 2007
Folks:
I just noticed, once again, the inconvenience of Brian (and the tahoe
code base) using "k" for the number of primary shares and "n" for the
total number of shares, where I (and the zfec library) use "k" for
the number of primary shares and "m" for the total number of shares.
So I set out to learn what are the most common variable names in use
in the larger world, to decide whether I think zfec (and I) should
change to "k" and "n" or Tahoe (and Brian) should change to "k" and "m".
Mojo Nation and Mnet and zfec use "k" for the number of primary
shares and "m" for the number of total shares, so m-k is the number
of check shares, and k/m is the "rate" and m/k is the "expansion
factor". (We probably inherited this from Doug Barnes who wrote the
first implementation of erasure coding for Mojo Nation, which at that
time was called "Information Dispersal" per Rabin, but I'm not sure
where Doug got his variable names.)
Flud [1] uses "m" for the number of primary shares and "k" for the
number of check shares, so m+k is the number of total shares.
Luigi Rizzo's influential paper [2] uses "k" for the number of
primary shares and "n" for the number of total shares.
James Plank's tutorial [3] uses "n" for the number of primary shares
and "m" for the number of check shares.
Wikipedia [4], uses "n" for the number of required shares (which for
us is always equal to the number of primary shares) and "r" for the
rate, so n/r is the number of total shares.
The top two hits on google for "erasure coding" (excluding the
wikipedia hit) are scientific papers by systems/p2p researchers:
"Erasure Coding vs. Replication: A Quantitative Comparison" -- Hakim
Weatherspoon and John D. Kubiatowicz, and "High Availability in DHTs:
Erasure Coding vs. Replication" -- Rodrigo Rodrigues and Barbara
Liskov. These two both use "m" for the number of primary shares and
"n" for the total number of shares.
Okay, there isn't really a consensus, but "n" is more popular than
"m" for the total number of shares, so I will start using it and I
might someday get around to changing the zfec API and docs and code
from "m" to "n".
Regards,
Zooko
[1] http://www.flud.org/wiki/Story_of_a_File
[2] http://citeseer.ist.psu.edu/rizzo97effective.html
[3] http://citeseer.ist.psu.edu/41070.html
[4] http://en.wikipedia.org/wiki/Erasure_coding
More information about the tahoe-dev
mailing list