As a hobby project, I'm working on a binary diff tool[1] that might someday compete with xdelta[2] or even (if I'm lucky) bsdiff[3]. Since binary diffing, like file-compression, ought to work well on a variety of inputs, I'm looking for some "representative" examples of source/target pairs I could add to my test suite and distribute for others to replicate my findings.
File-compression hobbyists have the Calgary Corpus[4] and the Canterbury Corpus[5] among others, but I can't find an equivalent binary-diff or delta-encoding corpus. So far I've mostly been working with fan-translations of video-games, which is good exercise for my code, but makes it legally impossible to share my test data.
Any suggestions?
[1]: https://gitorious.org/python-blip [2]: http://code.google.com/p/xdelta/ [3]: http://www.daemonology.net/bsdiff/ [4]: http://en.wikipedia.org/wiki/Calgary_Corpus [5]: http://corpus.canterbury.ac.nz/purpose.html