summaryrefslogtreecommitdiff
path: root/src/hashsig.h
Commit message (Collapse)AuthorAgeFilesLines
* Refine pluggable similarity APIRussell Belfer2013-02-201-2/+4
| | | | | | | | | | | | This plugs in the three basic similarity strategies for handling whitespace via internal use of the pluggable API. In so doing, I realized that the use of git_buf in the hashsig API was not needed and actually just made it harder to use, so I tweaked that API as well. Note that the similarity metric is still not hooked up in the find_similarity code - this is just setting out the function that will be used.
* Change similarity metric to sampled hashesRussell Belfer2013-02-201-0/+70
This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.