<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/libgit2.git/tests/core/memmem.c, branch ethomson/https_proxy</title>
<subtitle>github.com: libgit2/libgit2.git
</subtitle>
<link rel='alternate' type='text/html' href='http://trove.baserock.org/cgit/delta/libgit2.git/'/>
<entry>
<title>util: provide `git__memmem` function</title>
<updated>2018-10-25T10:52:47+00:00</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2018-10-18T14:08:46+00:00</published>
<link rel='alternate' type='text/html' href='http://trove.baserock.org/cgit/delta/libgit2.git/commit/?id=83e8a6b36acc67f2702cbbc7d4e334c7f7737719'/>
<id>83e8a6b36acc67f2702cbbc7d4e334c7f7737719</id>
<content type='text'>
Unfortunately, neither the `memmem` nor the `strnstr` functions are part
of any C standard but are merely extensions of C that are implemented by
e.g. glibc. Thus, there is no standardized way to search for a string in
a block of memory with a limited size, and using `strstr` is to be
considered unsafe in case where the buffer has not been sanitized. In
fact, there are some uses of `strstr` in exactly that unsafe way in our
codebase.

Provide a new function `git__memmem` that implements the `memmem`
semantics. That is in a given haystack of `n` bytes, search for the
occurrence of a byte sequence of `m` bytes and return a pointer to the
first occurrence. The implementation chosen is the "Not So Naive"
algorithm from [1]. It was chosen as the implementation is comparably
simple while still being reasonably efficient in most cases.
Preprocessing happens in constant time and space, searching has a time
complexity of O(n*m) with a slightly sub-linear average case.

[1]: http://www-igm.univ-mlv.fr/~lecroq/string/
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Unfortunately, neither the `memmem` nor the `strnstr` functions are part
of any C standard but are merely extensions of C that are implemented by
e.g. glibc. Thus, there is no standardized way to search for a string in
a block of memory with a limited size, and using `strstr` is to be
considered unsafe in case where the buffer has not been sanitized. In
fact, there are some uses of `strstr` in exactly that unsafe way in our
codebase.

Provide a new function `git__memmem` that implements the `memmem`
semantics. That is in a given haystack of `n` bytes, search for the
occurrence of a byte sequence of `m` bytes and return a pointer to the
first occurrence. The implementation chosen is the "Not So Naive"
algorithm from [1]. It was chosen as the implementation is comparably
simple while still being reasonably efficient in most cases.
Preprocessing happens in constant time and space, searching has a time
complexity of O(n*m) with a slightly sub-linear average case.

[1]: http://www-igm.univ-mlv.fr/~lecroq/string/
</pre>
</div>
</content>
</entry>
</feed>
