Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Refine pluggable similarity API | Russell Belfer | 2013-02-20 | 1 | -2/+4 |
| | | | | | | | | | | | | This plugs in the three basic similarity strategies for handling whitespace via internal use of the pluggable API. In so doing, I realized that the use of git_buf in the hashsig API was not needed and actually just made it harder to use, so I tweaked that API as well. Note that the similarity metric is still not hooked up in the find_similarity code - this is just setting out the function that will be used. | ||||
* | Change similarity metric to sampled hashes | Russell Belfer | 2013-02-20 | 1 | -0/+70 |
This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is. |