summaryrefslogtreecommitdiff
path: root/grep.h
Commit message (Collapse)AuthorAgeFilesLines
* grep: replace grep_read_mutex by internal obj read lockMatheus Tavares2020-01-171-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git-grep uses 'grep_read_mutex' to protect its calls to object reading operations. But these have their own internal lock now, which ensures a better performance (allowing parallel access to more regions). So, let's remove the former and, instead, activate the latter with enable_obj_read_lock(). Sections that are currently protected by 'grep_read_mutex' but are not internally protected by the object reading lock should be surrounded by obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion with object reading operations, keeping the current behavior and avoiding race conditions. Namely, these places are: In grep.c: - fill_textconv() at fill_textconv_grep(). - userdiff_get_textconv() at grep_source_1(). In builtin/grep.c: - parse_object_or_die() and the submodule functions at grep_submodule(). - deref_tag() and gitmodules_config_oid() at grep_objects(). If these functions become thread-safe, in the future, we might remove the locking and probably get some speedup. Note that some of the submodule functions will already be thread-safe (or close to being thread-safe) with the internal object reading lock. However, as some of them will require additional modifications to be removed from the critical section, this will be done in its own patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'cb/pcre2-chartables-leakfix'Junio C Hamano2019-10-231-0/+2
|\ | | | | | | | | | | | | | | | | Leakfix. * cb/pcre2-chartables-leakfix: grep: avoid leak of chartables in PCRE2 grep: make PCRE2 aware of custom allocator grep: make PCRE1 aware of custom allocator
| * grep: avoid leak of chartables in PCRE2Carlo Marcelo Arenas Belón2019-10-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced a small memory leak visible with valgrind in t7813. Complete the creation of a PCRE2 specific variable that was missing from the original change and free the generated table just like it is done for PCRE1. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * grep: make PCRE2 aware of custom allocatorCarlo Marcelo Arenas Belón2019-10-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include a way to override the system allocator, and so it is incompatible with custom allocators (e.g. nedmalloc). This problem became obvious when we tried to plug a memory leak by `free()`ing a data structure allocated by PCRE2, triggering a segfault in Windows (where we use nedmalloc by default). PCRE2 requires the use of a general context to override the allocator and therefore, there is a lot more code needed than in PCRE1, including a couple of wrapper functions. Extend the grep API with a "destructor" that could be called to cleanup any objects that were created and used globally. Update `builtin/grep.c` to use that new API, but any other future users should make sure to have matching `grep_init()`/`grep_destroy()` calls if they are using the pattern matching functionality. Move some of the logic that was before done per thread (in the workers) into an earlier phase to avoid degrading performance, but as the use of PCRE2 with custom allocators is better understood it is expected more of its functions will be instructed to use the custom allocator as well as was done in the original code[1] this work was based on. [1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'cb/pcre1-cleanup'Junio C Hamano2019-10-111-11/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | PCRE fixes. * cb/pcre1-cleanup: grep: refactor and simplify PCRE1 support grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1
| * | grep: refactor and simplify PCRE1 supportCarlo Marcelo Arenas Belón2019-08-261-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code used both a macro and a variable to keep track if JIT support was desired and relied on the fact that a non JIT enabled library will ignore a request for JIT compilation (as defined by the second parameter of the call to pcre_study) Cleanup the multiple levels of macros used and call pcre_study with the right parameter after JIT support has been confirmed and unless it was requested to be disabled with NO_LIBPCRE1_JIT Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1Carlo Marcelo Arenas Belón2019-08-261-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | e87de7cab4 ("grep: un-break building with PCRE < 8.32", 2017-05-25) added a restriction for JIT support that is no longer needed after pcre_jit_exec() calls were removed. Reorganize the definitions in grep.h so that JIT support could be detected early and NO_LIBPCRE1_JIT could be used reliably to enforce JIT doesn't get used. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'ab/pcre-jit-fixes'Junio C Hamano2019-10-111-11/+2
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A few simplification and bugfixes to PCRE interface. * ab/pcre-jit-fixes: grep: under --debug, show whether PCRE JIT is enabled grep: do not enter PCRE2_UTF mode on fixed matching grep: stess test PCRE v2 on invalid UTF-8 data grep: create a "is_fixed" member in "grep_pat" grep: consistently use "p->fixed" in compile_regexp() grep: stop using a custom JIT stack with PCRE v1 grep: stop "using" a custom JIT stack with PCRE v2 grep: remove overly paranoid BUG(...) code grep: use PCRE v2 for optimized fixed-string search grep: remove the kwset optimization grep: drop support for \0 in --fixed-strings <pattern> grep: make the behavior for NUL-byte in patterns sane grep tests: move binary pattern tests into their own file grep tests: move "grep binary" alongside the rest grep: inline the return value of a function call used only once t4210: skip more command-line encoding tests on MinGW grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>" log tests: test regex backends in "--encode=<enc>" tests
| * | grep: create a "is_fixed" member in "grep_pat"Ævar Arnfjörð Bjarmason2019-07-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This change paves the way for later using this value the regex compile functions themselves. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | grep: stop using a custom JIT stack with PCRE v1Ævar Arnfjörð Bjarmason2019-07-261-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify the PCRE v1 code for the same reasons as for the PCRE v2 code in the last commit. Unlike with v2 we actually used the custom stack in v1, but let's use PCRE's built-in 32 KB one instead, since experience with v2 shows that's enough. Most distros are already using v2 as a default, and the underlying sljit code is the same. Unfortunately we can't just pass a NULL to pcre_jit_exec() as with pcre2_jit_match(). Unlike the v2 function it doesn't support that. Instead we need to use the fatter pcre_exec() if we'd like the same behavior. This will make things slightly slower than on the fast-path function, but it's OK since we care less about v1 performance these days since we have and recommend v2. Running a similar performance test as what I ran in fbaceaac47 ("grep: add support for the PCRE v1 JIT API", 2017-05-25) via: GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE1=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst' ./run HEAD~ HEAD p7820-grep-engines.sh Gives us this, just the /perl/ results: Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.19(0.67+0.52) 0.19(0.65+0.52) +0.0% 7820.7: perl grep '^how to' 0.19(0.78+0.44) 0.19(0.72+0.49) +0.0% 7820.11: perl grep '[how] to' 0.39(2.13+0.43) 0.40(2.10+0.46) +2.6% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.44(2.55+0.37) 0.45(2.47+0.41) +2.3% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.23(1.06+0.42) 0.22(1.03+0.43) -4.3% It will also implicitly re-enable UTF-8 validation for PCRE v1. As noted in [1] we now have cases as a result where PCRE v1 is more eager to error out. Subsequent patches will fix that for v2, and I think it's fair to tell v1 users "just upgrade" and not worry about that edge case for v1. 1. https://public-inbox.org/git/CAPUEsphZJ_Uv9o1-yDpjNLA_q-f7gWXz9g1gCY2pYAYN8ri40g@mail.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | grep: stop "using" a custom JIT stack with PCRE v2Ævar Arnfjörð Bjarmason2019-07-261-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported in [1] the code I added in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) to use a custom JIT stack has never worked. It was incorrectly copy/pasted from code I added in fbaceaac47 ("grep: add support for the PCRE v1 JIT API", 2017-05-25), which did work. Thus our intention of starting with 1 byte of stack at a maximum of 1 MB didn't happen, we'd always use the 32 KB stack provided by PCRE v2's jit_machine_stack_exec()[2]. The reason I allocated a custom stack at all was this advice in pcrejit(3) (same in pcre2jit(3)): "By default, it uses 32KiB on the machine stack. However, some large or complicated patterns need more than this" Since we've haven't had any reports of users running into PCRE2_ERROR_JIT_STACKLIMIT in the wild I think we can safely assume that we can just use the library defaults instead and drop this code. This won't change with the wider use of PCRE v2 in ed0479ce3d ("Merge branch 'ab/no-kwset' into next", 2019-07-15), a fixed string search is not a "large or complicated pattern". For good measure I ran the performance test noted in 94da9193a6, although the command is simpler now due to my 0f50c8e32c ("Makefile: remove the NO_R_TO_GCC_LINKER flag", 2019-05-17): GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE2=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst' ./run HEAD~ HEAD p7820-grep-engines.sh Just the /perl/ results are: Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.17(0.27+0.65) 0.17(0.24+0.68) +0.0% 7820.7: perl grep '^how to' 0.16(0.23+0.66) 0.16(0.23+0.67) +0.0% 7820.11: perl grep '[how] to' 0.18(0.35+0.62) 0.18(0.33+0.65) +0.0% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.17(0.45+0.54) 0.17(0.49+0.50) +0.0% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.16(0.33+0.58) 0.16(0.29+0.62) +0.0% So, as expected there's no change, and running with valgrind reveals that we have fewer allocations now. As noted in [3] there are known regexes that will fail with the lower stack limit, the way GNU grep fixed it is interesting, although I believe the implementation is overly verbose, they could make PCRE v2 handle that gradual re-allocation, that's what min/max memory is for. So we might end up bringing this back, I'm more inclined to just kick such cases upstairs to PCRE maintainers as a bug, perhaps they'll add some overall "just allocate more then" flag to make this easier. In any case there's no functional change here, we didn't have a custom stack, so let's apply this first, we can always revert it later. 1. https://public-inbox.org/git/20190721194052.15440-1-carenas@gmail.com/ 2. I didn't really intend to start with 1 byte, looking at the PCRE v2 code again what happened is that I cargo-culted some of PCRE v2's own test code which was meant to test re-allocations. It's more sane to start with say 32 KB with a max of 1 MB, as pcre2grep.c does. 3. https://public-inbox.org/git/CAPUEspjj+fG8QDmf=bZXktfpLgkgiu34HTjKLhm-cmEE04FE-A@mail.gmail.com/ Reported-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | grep: remove the kwset optimizationÆvar Arnfjörð Bjarmason2019-07-011-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A later change will replace this optimization with optimistic use of PCRE v2. I'm completely removing it as an intermediate step, as opposed to replacing it with PCRE v2, to demonstrate that no grep semantics depend on this (or any other) optimization for the fixed backend anymore. For now this is mostly (but not entirely) a performance regression, as shown by this hacky one-liner: for opt in '' ' -i' do GIT_PERF_7821_GREP_OPTS=$opt GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE=YesPlease' ./run origin/master HEAD -- p7821-grep-engines-fixed.sh done && for opt in '' ' -i' do GIT_PERF_4221_LOG_OPTS=$opt GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE=YesPlease' ./run origin/master HEAD -- p4221-log-grep-engines-fixed.sh done Which produces: plain grep: Test origin/master HEAD ------------------------------------------------------------------------- 7821.1: fixed grep int 0.55(1.60+0.63) 0.82(3.11+0.51) +49.1% 7821.2: basic grep int 0.62(1.68+0.49) 0.85(3.02+0.52) +37.1% 7821.3: extended grep int 0.61(1.63+0.53) 0.91(3.09+0.44) +49.2% 7821.4: perl grep int 0.55(1.60+0.57) 0.41(0.93+0.57) -25.5% 7821.6: fixed grep uncommon 0.20(0.50+0.44) 0.35(1.27+0.42) +75.0% 7821.7: basic grep uncommon 0.20(0.49+0.45) 0.35(1.29+0.41) +75.0% 7821.8: extended grep uncommon 0.20(0.45+0.48) 0.35(1.25+0.44) +75.0% 7821.9: perl grep uncommon 0.20(0.53+0.41) 0.16(0.24+0.49) -20.0% 7821.11: fixed grep æ 0.35(1.27+0.40) 0.25(0.82+0.39) -28.6% 7821.12: basic grep æ 0.35(1.28+0.38) 0.25(0.75+0.44) -28.6% 7821.13: extended grep æ 0.36(1.21+0.46) 0.25(0.86+0.35) -30.6% 7821.14: perl grep æ 0.35(1.33+0.34) 0.16(0.26+0.47) -54.3% grep with -i: Test origin/master HEAD ----------------------------------------------------------------------------- 7821.1: fixed grep -i int 0.61(1.84+0.64) 1.11(4.12+0.64) +82.0% 7821.2: basic grep -i int 0.72(1.86+0.57) 1.15(4.48+0.49) +59.7% 7821.3: extended grep -i int 0.94(1.83+0.60) 1.53(4.12+0.58) +62.8% 7821.4: perl grep -i int 0.66(1.82+0.59) 0.55(1.08+0.58) -16.7% 7821.6: fixed grep -i uncommon 0.21(0.51+0.44) 0.44(1.74+0.34) +109.5% 7821.7: basic grep -i uncommon 0.21(0.55+0.41) 0.44(1.72+0.40) +109.5% 7821.8: extended grep -i uncommon 0.21(0.57+0.39) 0.42(1.64+0.45) +100.0% 7821.9: perl grep -i uncommon 0.21(0.48+0.48) 0.17(0.30+0.45) -19.0% 7821.11: fixed grep -i æ 0.25(0.73+0.45) 0.25(0.75+0.45) +0.0% 7821.12: basic grep -i æ 0.25(0.71+0.49) 0.26(0.77+0.44) +4.0% 7821.13: extended grep -i æ 0.25(0.75+0.44) 0.25(0.74+0.46) +0.0% 7821.14: perl grep -i æ 0.17(0.26+0.48) 0.16(0.20+0.52) -5.9% plain log: Test origin/master HEAD --------------------------------------------------------------------------------- 4221.1: fixed log --grep='int' 7.31(7.06+0.21) 8.11(7.85+0.20) +10.9% 4221.2: basic log --grep='int' 7.30(6.94+0.27) 8.16(7.89+0.19) +11.8% 4221.3: extended log --grep='int' 7.34(7.05+0.21) 8.08(7.76+0.25) +10.1% 4221.4: perl log --grep='int' 7.27(6.94+0.24) 7.05(6.76+0.25) -3.0% 4221.6: fixed log --grep='uncommon' 6.97(6.62+0.32) 7.86(7.51+0.30) +12.8% 4221.7: basic log --grep='uncommon' 7.05(6.69+0.29) 7.89(7.60+0.28) +11.9% 4221.8: extended log --grep='uncommon' 6.89(6.56+0.32) 7.99(7.66+0.24) +16.0% 4221.9: perl log --grep='uncommon' 7.02(6.66+0.33) 6.97(6.54+0.36) -0.7% 4221.11: fixed log --grep='æ' 7.37(7.03+0.33) 7.67(7.30+0.31) +4.1% 4221.12: basic log --grep='æ' 7.41(7.00+0.31) 7.60(7.28+0.26) +2.6% 4221.13: extended log --grep='æ' 7.35(6.96+0.38) 7.73(7.31+0.34) +5.2% 4221.14: perl log --grep='æ' 7.43(7.10+0.32) 6.95(6.61+0.27) -6.5% log with -i: Test origin/master HEAD ------------------------------------------------------------------------------------ 4221.1: fixed log -i --grep='int' 7.40(7.05+0.23) 8.66(8.38+0.20) +17.0% 4221.2: basic log -i --grep='int' 7.39(7.09+0.23) 8.67(8.39+0.20) +17.3% 4221.3: extended log -i --grep='int' 7.29(6.99+0.26) 8.69(8.31+0.26) +19.2% 4221.4: perl log -i --grep='int' 7.42(7.16+0.21) 7.14(6.80+0.24) -3.8% 4221.6: fixed log -i --grep='uncommon' 6.94(6.58+0.35) 8.43(8.04+0.30) +21.5% 4221.7: basic log -i --grep='uncommon' 6.95(6.62+0.31) 8.34(7.93+0.32) +20.0% 4221.8: extended log -i --grep='uncommon' 7.06(6.75+0.25) 8.32(7.98+0.31) +17.8% 4221.9: perl log -i --grep='uncommon' 6.96(6.69+0.26) 7.04(6.64+0.32) +1.1% 4221.11: fixed log -i --grep='æ' 7.92(7.55+0.33) 7.86(7.44+0.34) -0.8% 4221.12: basic log -i --grep='æ' 7.88(7.49+0.32) 7.84(7.46+0.34) -0.5% 4221.13: extended log -i --grep='æ' 7.91(7.51+0.32) 7.87(7.48+0.32) -0.5% 4221.14: perl log -i --grep='æ' 7.01(6.59+0.35) 6.99(6.64+0.28) -0.3% Some of those, as noted in [1] are because PCRE is faster at finding fixed strings. This looks bad for some engines, but in the next change we'll optimistically use PCRE v2 for all of these, so it'll look better. 1. https://public-inbox.org/git/87v9x793qi.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"Ævar Arnfjörð Bjarmason2019-06-281-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bug introduced in 18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was missed due to a blindspot in our tests, as discussed in the previous commit. I then blindly copied the same bug in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) when adding the PCRE v2 code. We should not tell PCRE that we're processing UTF-8 just because we're dealing with non-ASCII. In the case of e.g. "log --encoding=<...>" under is_utf8_locale() the haystack might be in ISO-8859-1, and the needle might be in a non-UTF-8 encoding. Maybe we should be more strict here and die earlier? Should we also be converting the needle to the encoding in question, and failing if it's not a string that's valid in that encoding? Maybe. But for now matching this as non-UTF8 at least has some hope of producing sensible results, since we know that our default heuristic of assuming the text to be matched is in the user locale encoding isn't true when we've explicitly encoded it to be in a different encoding. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: skip UTF8 checks explicitlyCarlo Marcelo Arenas Belón2019-09-091-0/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | 18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was released with git 2.10 added the PCRE_UTF8 flag to PCRE1 matching including a call to has_non_ascii() to try to avoid breakage if there was non-utf8 encoded content in the haystack. Usually PCRE is compiled with JIT support (even if is not the default), and therefore the codepath used includes calling pcre_jit_exec, which skips UTF-8 validation by design (which might result in crashes or hangs) but when JIT support wasn't compiled we use pcre_exec instead with the posibility that grep might be aborted if invalid UTF-8 is found in the haystack. PCRE1 provides a flag since Mar 5, 2007 that could be used to skip the checks explicitly so use that to make both codepaths equivalent (the flag is ignored by pcre1_jit_exec) this fix is only implemented for PCRE1 because PCRE2 is likely to have a better solution (without the risks) instead in the future Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* *.[ch]: remove extern from function declarations using spatchDenton Liu2019-05-051-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There has been a push to remove extern from function declarations. Remove some instances of "extern" for function declarations which are caught by Coccinelle. Note that Coccinelle has some difficulty with processing functions with `__attribute__` or varargs so some `extern` declarations are left behind to be dealt with in a future patch. This was the Coccinelle patch used: @@ type T; identifier f; @@ - extern T f(...); and it was run with: $ git ls-files \*.{c,h} | grep -v ^compat/ | xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place Files under `compat/` are intentionally excluded as some are directly copied from external sources and we should avoid churning them as much as possible. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* grep: remove #ifdef NO_PTHREADSNguyễn Thái Ngọc Duy2018-11-051-6/+0
| | | | | | | | This is a faithful conversion without attempting to improve anything. That comes later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* userdiff.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy2018-09-211-1/+2
| | | | | | | | [jc: squashed in missing forward decl in userdiff.h found by Ramsay] Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* grep.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy2018-09-211-2/+5
| | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'tb/grep-only-matching'Junio C Hamano2018-08-021-0/+1
|\ | | | | | | | | | | | | | | "git grep" learned the "--only-matching" option. * tb/grep-only-matching: grep.c: teach 'git grep --only-matching' grep.c: extract show_line_header()
| * grep.c: teach 'git grep --only-matching'Taylor Blau2018-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach 'git grep --only-matching', a new option to only print the matching part(s) of a line. For instance, a line containing the following (taken from README.md:27): (`man gitcvs-migration` or `git help cvs-migration` if git is Is printed as follows: $ git grep --line-number --column --only-matching -e git -- \ README.md | grep ":27" README.md:27:7:git README.md:27:16:git README.md:27:38:git The patch works mostly as one would expect, with the exception of a few considerations that are worth mentioning here. Like GNU grep, this patch ignores --only-matching when --invert (-v) is given. There is a sensible answer here, but parity with the behavior of other tools is preferred. Because a line might contain more than one match, there are special considerations pertaining to when to print line headers, newlines, and how to increment the match column offset. The line header and newlines are handled as a special case within the main loop to avoid polluting the surrounding code with conditionals that have large blocks. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'tb/grep-column'Junio C Hamano2018-07-181-0/+2
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | "git grep" learned the "--column" option that gives not just the line number but the column number of the hit. * tb/grep-column: contrib/git-jump/git-jump: jump to exact location grep.c: add configuration variables to show matched option builtin/grep.c: add '--column' option to 'git-grep(1)' grep.c: display column number of first match grep.[ch]: extend grep_opt to allow showing matched column grep.c: expose {,inverted} match column in match_line() Documentation/config.txt: camel-case lineNumber for consistency
| * grep.[ch]: extend grep_opt to allow showing matched columnTaylor Blau2018-06-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To support showing the matched column when calling 'git-grep(1)', teach 'grep_opt' the normal set of options to configure the default behavior and colorization of this feature. Now that we have opt->columnnum, use it to disable short-circuiting over ORs and ANDs so that col and icol are always filled with the earliest matches on each line. In addition, don't return the first match from match_line(), for the same reason. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: keep all colors in an arrayNguyễn Thái Ngọc Duy2018-05-291-8/+13
|/ | | | | | | | This is more inline with how we handle color slots in other code. It also allows us to get the list of configurable color slots later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'ab/pcre-v2'Junio C Hamano2017-11-151-2/+3
|\ | | | | | | | | | | | | Building with NO_LIBPCRE1_JIT did not disable it, which has been fixed. * ab/pcre-v2: grep: fix NO_LIBPCRE1_JIT to fully disable JIT
| * grep: fix NO_LIBPCRE1_JIT to fully disable JITCharles Bailey2017-11-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you have a pcre1 library which is compiled with JIT enabled then PCRE_STUDY_JIT_COMPILE will be defined whether or not the NO_LIBPCRE1_JIT configuration is set. This means that we enable JIT functionality when calling pcre_study even if NO_LIBPCRE1_JIT has been explicitly set and we just use plain pcre_exec later. Fix this by using own macro (GIT_PCRE_STUDY_JIT_COMPILE) which we set to PCRE_STUDY_JIT_COMPILE only if NO_LIBPCRE1_JIT is not set and define to 0 otherwise, as before. Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: recurse in-process using 'struct repository'Brandon Williams2017-08-021-1/+0
| | | | | | | | | | | | | | | | Convert grep to use 'struct repository' which enables recursing into submodules to be handled in-process. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: remove regflags from the public grep_opt APIÆvar Arnfjörð Bjarmason2017-06-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor calls to the grep machinery to always pass opt.ignore_case & opt.extended_regexp_option instead of setting the equivalent regflags bits. The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log: make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was really just plastering over the code smell which this change fixes. The reason for adding the extensive commentary here is that I discovered some subtle complexity in implementing this that really should be called out explicitly to future readers. Before this change we'd rely on the difference between `extended_regexp_option` and `regflags` to serve as a membrane between our preliminary parsing of grep.extendedRegexp and grep.patternType, and what we decided to do internally. Now that those two are the same thing, it's necessary to unset `extended_regexp_option` just before we commit in cases where both of those config variables are set. See 84befcd0a4 ("grep: add a grep.patternType configuration setting", 2012-08-03) for the code and documentation related to that. The explanation of why the if/else branches in grep_commit_pattern_type() are ordered the way they are exists in that commit message, but I think it's worth calling this subtlety out explicitly with a comment for future readers. Even though grep_commit_pattern_type() is the only caller of grep_set_pattern_type_option() it's simpler to reset the extended_regexp_option flag in the latter, since 2/3 branches in the former would otherwise need to reset it, this way we can do it in one place. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'bw/object-id'Junio C Hamano2017-06-191-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conversion from uchar[20] to struct object_id continues. * bw/object-id: (33 commits) diff: rename diff_fill_sha1_info to diff_fill_oid_info diffcore-rename: use is_empty_blob_oid tree-diff: convert path_appendnew to object_id tree-diff: convert diff_tree_paths to struct object_id tree-diff: convert try_to_follow_renames to struct object_id builtin/diff-tree: cleanup references to sha1 diff-tree: convert diff_tree_sha1 to struct object_id notes-merge: convert write_note_to_worktree to struct object_id notes-merge: convert verify_notes_filepair to struct object_id notes-merge: convert find_notes_merge_pair_ps to struct object_id notes-merge: convert merge_from_diffs to struct object_id notes-merge: convert notes_merge* to struct object_id tree-diff: convert diff_root_tree_sha1 to struct object_id combine-diff: convert find_paths_* to struct object_id combine-diff: convert diff_tree_combined to struct object_id diff: convert diff_flush_patch_id to struct object_id patch-ids: convert to struct object_id diff: finish conversion for prepare_temp_file to struct object_id diff: convert reuse_worktree_file to struct object_id diff: convert fill_filespec to struct object_id ...
| * grep: convert to struct object_idBrandon Williams2017-06-021-1/+1
| | | | | | | | | | | | | | Convert the remaining parts of grep to use struct object_id. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: add support for PCRE v2Ævar Arnfjörð Bjarmason2017-06-021-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for v2 of the PCRE API. This is a new major version of PCRE that came out in early 2015[1]. The regular expression syntax is the same, but while the API is similar, pretty much every function is either renamed or takes different arguments. Thus using it via entirely new functions makes sense, as opposed to trying to e.g. have one compile_pcre_pattern() that would call either PCRE v1 or v2 functions. Git can now be compiled with either USE_LIBPCRE1=YesPlease or USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a synonym for the former. Providing both is a compile-time error. With earlier patches to enable JIT for PCRE v1 the performance of the release versions of both libraries is almost exactly the same, with PCRE v2 being around 1% slower. However after I reported this to the pcre-dev mailing list[2] I got a lot of help with the API use from Zoltán Herczeg, he subsequently optimized some of the JIT functionality in v2 of the library. Running the p7820-grep-engines.sh performance test against the latest Subversion trunk of both, with both them and git compiled as -O3, and the test run against linux.git, gives the following results. Just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~5 HEAD~ HEAD p7820-grep-engines.sh [...] Test HEAD~5 HEAD~ HEAD ----------------------------------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.31(1.10+0.48) 0.21(0.35+0.56) -32.3% 0.21(0.34+0.55) -32.3% 7820.7: perl grep '^how to' 0.56(2.70+0.40) 0.24(0.64+0.52) -57.1% 0.20(0.28+0.60) -64.3% 7820.11: perl grep '[how] to' 0.56(2.66+0.38) 0.29(0.95+0.45) -48.2% 0.23(0.45+0.54) -58.9% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.02(5.77+0.42) 0.31(1.02+0.54) -69.6% 0.23(0.50+0.54) -77.5% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.38(1.57+0.42) 0.27(0.85+0.46) -28.9% 0.21(0.33+0.57) -44.7% See commit ("perf: add a comparison test of grep regex engines", 2017-04-19) for details on the machine the above test run was executed on. Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine mentioning p7820-grep-engines.sh for more details on the test setup. For ease of readability, a different run just of HEAD~ (PCRE v1 with JIT v.s. PCRE v2), again with just the /perl/ tests shown: [...] Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.21(0.42+0.52) 0.21(0.31+0.58) +0.0% 7820.7: perl grep '^how to' 0.25(0.65+0.50) 0.20(0.31+0.57) -20.0% 7820.11: perl grep '[how] to' 0.30(0.90+0.50) 0.23(0.46+0.53) -23.3% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.30(1.19+0.38) 0.23(0.51+0.51) -23.3% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.27(0.84+0.48) 0.21(0.34+0.57) -22.2% I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead, when it does it's around 20% faster. A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3) the compiled pattern can be shared between threads, but not some of the JIT context, however the grep threading support does all pattern & JIT compilation in separate threads, so this code doesn't need to concern itself with thread safety. See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the initial addition of PCRE v1. This change follows some of the same patterns it did (and which were discussed on list at the time), e.g. mocking up types with typedef instead of ifdef-ing them out when USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the program, but makes the code look nicer. 1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html 2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: un-break building with PCRE >= 8.32 without --enable-jitÆvar Arnfjörð Bjarmason2017-06-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions later than 8.31 compiled without --enable-jit. As explained in that change and a later compatibility change in this series ("grep: un-break building with PCRE < 8.32", 2017-05-10) the pcre_jit_exec() function is a faster path to execute the JIT. Unfortunately there's no compatibility stub for that function compiled into the library if pcre_config(PCRE_CONFIG_JIT, &ret) would return 0, and no macro that can be used to check for it, so the only portable option to support builds without --enable-jit is via a new NO_LIBPCRE1_JIT=UnfortunatelyYes Makefile option[1]. Another option would be to make the JIT opt-in via USE_LIBPCRE1_JIT=YesPlease, after all it's not a default option of PCRE v1. I think it makes more sense to make it opt-out since even though it's not a default option, most packagers of PCRE seem to turn it on by default, with the notable exception of the MinGW package. Make the MinGW platform work by default by changing the build defaults to turn on NO_LIBPCRE1_JIT=UnfortunatelyYes. It is the only platform that turns on USE_LIBPCRE=YesPlease by default, see commit df5218b4c3 ("config.mak.uname: support MSys2", 2016-01-13) for that change. 1. "How do I support pcre1 JIT on all versions?" (https://lists.exim.org/lurker/thread/20170601.103148.10253788.en.html) 2. https://github.com/Alexpux/MINGW-packages/blob/master/mingw-w64-pcre/PKGBUILD (referenced from "Re: PCRE v2 compile error, was Re: What's cooking in git.git (May 2017, #01; Mon, 1)"; <alpine.DEB.2.20.1705021756530.3480@virtualbox>) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: un-break building with PCRE < 8.20Ævar Arnfjörð Bjarmason2017-05-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.20. The 8.20 release was the first release to have JIT & pcre_jit_stack in the headers, so a mock type needs to be provided for it on those releases. Now git should compile with all PCRE versions that it supported before my JIT change. I've tested it as far back as version 7.5 released on 2008-01-10, once I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1, and I couldn't be bothered to anything older than 7.5 as I'm confident that if the build breaks on those older versions it's not because of my JIT change. See the "un-break" change in this series ("grep: un-break building with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the main PCRE JIT commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: un-break building with PCRE < 8.32Ævar Arnfjörð Bjarmason2017-05-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.32. The JIT support was added in version 8.20 released on 2011-10-21, but it wasn't until 8.32 released on 2012-11-30 that the fast code path to use the JIT via pcre_jit_exec() was added[1] (see also [2]). This means that versions 8.20 through 8.31 could still use the JIT, but supporting it on those versions would add to the already verbose macro soup around JIT support it, and I don't expect that the use-case of compiling a brand new git against a 5 year old PCRE is particularly common, and if someone does that they can just get the existing pre-JIT slow codepath. So just take the easy way out and disable the JIT on any version older than 8.32. The reason this change isn't part of the initial change PCRE JIT support is to have a cleaner history showing which parts of the implementation are only used for ancient PCRE versions. This also makes it easier to revert this change if we ever decide to stop supporting those old versions. 1. http://www.pcre.org/original/changelog.txt ("28. Introducing a native interface for JIT. Through this interface, the compiled[...]") 2. https://bugs.exim.org/show_bug.cgi?id=2121 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: add support for the PCRE v1 JIT APIÆvar Arnfjörð Bjarmason2017-05-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the grep PCRE v1 code to use JIT when available. When PCRE support was initially added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into 8.20 released on 2011-10-21. Enabling JIT support usually improves performance by more than 40%. The pattern compilation times are relatively slower, but those relative numbers are tiny, and are easily made back in all but the most trivial cases of grep. Detailed benchmarks & overview of compilation times is at: http://sljit.sourceforge.net/pcre.html With this change the difference in a t/perf/p7820-grep-engines.sh run is, with just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.35(1.11+0.43) 0.23(0.42+0.46) -34.3% 7820.7: perl grep '^how to' 0.64(2.71+0.36) 0.27(0.66+0.44) -57.8% 7820.11: perl grep '[how] to' 0.63(2.51+0.42) 0.33(0.98+0.39) -47.6% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.17(5.61+0.35) 0.34(1.08+0.46) -70.9% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.43(1.52+0.44) 0.30(0.88+0.42) -30.2% The conditional support for JIT is implemented as suggested in the pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's not present. The implementation is relatively verbose because even if PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine if the JIT is available, and if so the faster pcre_jit_exec() function should be called instead of pcre_exec(), and a different (but not complimentary!) function needs to be called to free pcre1_extra_info. There's no graceful fallback if pcre_jit_stack_alloc() fails under PCRE_CONFIG_JIT, instead the program will simply abort. I don't think this is worth handling gracefully, it'll only fail in cases where malloc() doesn't work, in which case we're screwed anyway. That there's no assignment of `p->pcre1_jit_on = 0` when PCRE_CONFIG_JIT isn't defined isn't a bug. The create_grep_pat() function allocates the grep_pat allocates it with calloc(), so it's guaranteed to be 0 when PCRE_CONFIG_JIT isn't defined. I you're bisecting and find this change, check that your PCRE isn't older than 8.32. This change intentionally broke really old versions of PCRE, but that's fixed in follow-up commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: change internal *pcre* variable & function names to be *pcre1*Ævar Arnfjörð Bjarmason2017-05-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Change the internal PCRE variable & function names to have a "1" suffix. This is for preparation for libpcre2 support, where having non-versioned names would be confusing. An earlier change in this series ("grep: change the internal PCRE macro names to be PCRE1", 2017-04-07) elaborates on the motivations behind this change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: change the internal PCRE macro names to be PCRE1Ævar Arnfjörð Bjarmason2017-05-261-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the internal USE_LIBPCRE define, & build options flag to use a naming convention ending in PCRE1, without changing the long-standing USE_LIBPCRE Makefile flag which enables this code. This is for preparation for libpcre2 support where having things like USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely need to for backwards compatibility with old Makefile arguments would be confusing. In some ways it would be better to change everything that now uses USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their build scripts. However I'd like to leave the door open to making USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease, i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it the default. This code and the USE_LIBPCRE Makefile argument was added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was no indication that the PCRE project would release an entirely new & incompatible API around 3 years later. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* grep: add submodules as a grep source typeBrandon Williams2016-12-221-0/+1
| | | | | | | | | | | | | | Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new type in the various switch statements in grep.c. When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the identifier can either be NULL (to indicate that the working tree will be used) or a SHA1 (the REV of the submodule to be grep'd). If the identifier is a SHA1 then we want to fall through to the `GREP_SOURCE_SHA1` case to handle the copying of the SHA1. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'jc/grep-commandline-vs-configuration'Junio C Hamano2016-08-041-1/+0
|\ | | | | | | | | | | | | | | | | "git -c grep.patternType=extended log --basic-regexp" misbehaved because the internal API to access the grep machinery was not designed well. * jc/grep-commandline-vs-configuration: grep: further simplify setting the pattern type
| * grep: further simplify setting the pattern typejc/grep-commandline-vs-configurationJunio C Hamano2016-07-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When c5c31d33 (grep: move pattern-type bits support to top-level grep.[ch], 2012-10-03) introduced grep_commit_pattern_type() helper function, the intention was to allow the users of grep API to having to fiddle only with .pattern_type_option (which can be set to "fixed", "basic", "extended", and "pcre"), and then immediately before compiling the pattern strings for use, call grep_commit_pattern_type() to have it prepare various bits in the grep_opt structure (like .fixed, .regflags, etc.). However, grep_set_pattern_type_option() helper function the grep API internally uses were left as an external function by mistake. This function shouldn't have been made callable by the users of the API. Later when the grep API was used in revision traversal machinery, the caller then mistakenly started calling the function around 34a4ae55 (log --grep: use the same helper to set -E/-F options as "git grep", 2012-10-03), instead of setting the .pattern_type_option field and letting the grep_commit_pattern_type() to take care of the details. This caused an unnecessary bug that made a configured grep.patternType take precedence over the command line options (e.g. --basic-regexp, --fixed-strings) in "git log" family of commands. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep/pcre: prepare locale-dependent tables for icase matchingNguyễn Thái Ngọc Duy2016-07-011-0/+1
|/ | | | | | | | | The default tables are usually built with C locale and only suitable for LANG=C or similar. This should make case insensitive search work correctly for all single-byte charsets. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* grep: add color.grep.matchcontext and color.grep.matchselectedrs/grep-color-wordsRené Scharfe2014-10-281-1/+2
| | | | | | | | | | | | | | The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* grep: allow to use textconv filtersJeff King2013-05-101-0/+1
| | | | | | | | | | | | | | | | | | | | Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* fix clang -Wtautological-compare with unsigned enumAntoine Pelisse2013-02-251-1/+2
| | | | | | | | | | | | | | Create a GREP_HEADER_FIELD_MIN so we can check that the field value is sane and silence the clang warning. Clang warning happens because the enum is unsigned (this is implementation-defined, and there is no negative fields) and the check is then tautological. Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: John Keeping <john@keeping.me.uk> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'nd/grep-true-path'Jeff King2012-10-291-1/+3
|\ | | | | | | | | | | | | | | | | "git grep -e pattern <tree>" asked the attribute system to read "<tree>:.gitattributes" file in the working tree, which was nonsense. * nd/grep-true-path: grep: stop looking at random places for .gitattributes
| * grep: stop looking at random places for .gitattributesNguyễn Thái Ngọc Duy2012-10-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | grep searches for .gitattributes using "name" field in struct grep_source but that field is not real on-disk path name. For example, "grep pattern rev" fills the field with "rev:path", and Git looks for .gitattributes in the (non-existent but exploitable) path "rev:path" instead of "path". This patch passes real paths down to grep_source_load_driver() when: - grep on work tree - grep on the index - grep a commit (or a tag if it points to a commit) so that these cases look up .gitattributes at proper paths. .gitattributes lookup is disabled in all other cases. Initial-work-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: move pattern-type bits support to top-level grep.[ch]Junio C Hamano2012-10-091-0/+2
| | | | | | | | | | | | | | | | | | | | Switching between -E/-G/-P/-F correctly needs a lot more than just flipping opt->regflags bit these days, and we have a nice helper function buried in builtin/grep.c for the sole use of "git grep". Extract it so that "log --grep" family can also use it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: move the configuration parsing logic to grep.[ch]Junio C Hamano2012-10-091-0/+4
| | | | | | | | | | | | | | | | The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | log --grep-reflog: reject the option without -gJunio C Hamano2012-09-291-0/+1
| | | | | | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | revision: add --grep-reflog to filter commits by reflog messagesNguyễn Thái Ngọc Duy2012-09-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | Similar to --author/--committer which filters commits by author and committer header fields. --grep-reflog adds a fake "reflog" header to commit and a grep filter to search on that line. All rules to --author/--committer apply except no timestamp stripping. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | grep: prepare for new header field filterNguyễn Thái Ngọc Duy2012-09-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | grep supports only author and committer headers, which have the same special treatment that later headers may or may not have. Check for field type and only strip_timestamp() when the field is either author or committer. GREP_HEADER_FIELD_MAX is put in the grep_header_field enum to be calculated automatically, correctly, as long as it's at the end of the enum. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>