http://pypi.python.org/pypi/dupfilefind usage: dupfilefind [-h] [-V] [-v] [-I IGNORE_DIRS] [-H] [-D] [-m m] [-M M] [-p] [--include-names-in-profiles] [-n] [dir [dir ...]] Find files with identical contents. positional arguments: dir directories to recursively examine (default '.') optional arguments: -h, --help show this help message and exit -V, --version Print out the version of dupfilefind. -v, --verbose Emit more information. -I IGNORE_DIRS, --ignore-dirs IGNORE_DIRS comma-separated list of directories to skip (if you need to name a directory which has a comma in its name then escape that name twice) (this does what you would expect with relative vs. absolute paths) (default _dar cs,.svn,.git,.bzr,/proc,/sys,/dev,/tmp,/var/tmp,/lib64 /udev -H, --hard-link-them Whenever a file is found with identical contents to a previously discovered file, replace the new one with a hard link to the old one. This option is very dangerous because hard links are confusing and dangerous things to have around. -D, --delete-them Whenever a file is found with identical contents to a previously discovered file, delete the new one. This option is dangerous. -m m, --min-size m Ignore files smaller than this (default 1024). -M M, --max-size M Hash only the first this many bytes of the file, or -1 to hash all bytes of the file (default -1). -p, --profiles Print out the md5sum and size in bytes of every file. This could be useful for a p2p storage project to measure how valuable convergent encryption is. --include-names-in-profiles Print out the file name in addition to the other information from --profiles, for each file. -n, --no-follow-symlinks Do not follow symlinks. == Starting Points == * TracGuide -- Built-in Documentation * [http://trac.edgewall.org/ The Trac project] -- Trac Open Source Project * [http://trac.edgewall.org/wiki/TracFaq Trac FAQ] -- Frequently Asked Questions * TracSupport -- Trac Support For a complete list of local wiki pages, see TitleIndex.