summaryrefslogtreecommitdiff
path: root/userdiff.c
Commit message (Collapse)AuthorAgeFilesLines
* attr: teach "--attr-source=<tree>" global option to "git"John Cai2023-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish, 2023-01-14) taught "git check-attr" the "--source=<tree>" option to allow it to read attribute files from a tree-ish, but did so only for the command. Just like "check-attr" users wanted a way to use attributes from a tree-ish and not from the working tree files, users of other commands (like "git diff") would benefit from the same. Undo most of the UI change the commit made, while keeping the internal logic to read attributes from a given tree-ish. Expose the internal logic via a new "--attr-source=<tree>" command line option given to "git", so that it can be used with any git command that runs as part of the main git process. Additionally, add an environment variable GIT_ATTR_SOURCE that is set when --attr-source is passed in, so that subprocesses use the same value for the attributes source tree. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'rs/userdiff-multibyte-regex'Junio C Hamano2023-04-201-2/+29
|\ | | | | | | | | | | | | | | | | The userdiff regexp patterns for various filetypes that are built into the system have been updated to avoid triggering regexp errors from UTF-8 aware regex engines. * rs/userdiff-multibyte-regex: userdiff: support regexec(3) with multi-byte support
| * userdiff: support regexec(3) with multi-byte supportRené Scharfe2023-04-071-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 1819ad327b (grep: fix multibyte regex handling under macOS, 2022-08-26) we use the system library for all regular expression matching on macOS, not just for git grep. It supports multi-byte strings and rejects invalid multi-byte characters. This broke all built-in userdiff word regexes in UTF-8 locales because they all include such invalid bytes in expressions that are intended to match multi-byte characters without explicit support for that from the regex engine. "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word regexes to match a single non-space or multi-byte character. The \xNN characters are invalid if interpreted as UTF-8 because they have their high bit set, which indicates they are part of a multi-byte character, but they are surrounded by single-byte characters. Replace that expression with "|[^[:space:]]" if the regex engine supports multi-byte matching, as there is no need to have an explicit range for multi-byte characters then. Check for that capability at runtime, because it depends on the locale and thus on environment variables. Construct the full replacement expression at build time and just switch it in if necessary to avoid string manipulation and allocations at runtime. Additionally the word regex for tex contains the expression "[a-zA-Z0-9\x80-\xff]+" with a similarly invalid range. The best replacement with only valid characters that I can come up with is "([a-zA-Z0-9]|[^\x01-\x7f])+". Unlike the original it matches NUL characters, though. Assuming that tex files usually don't contain NUL this should be acceptable. Reported-by: D. Ben Knoble <ben.knoble@gmail.com> Reported-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'en/header-cleanup'Junio C Hamano2023-03-171-1/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code clean-up to clarify the rule that "git-compat-util.h" must be the first to be included. * en/header-cleanup: diff.h: remove unnecessary include of object.h Remove unnecessary includes of builtin.h treewide: replace cache.h with more direct headers, where possible replace-object.h: move read_replace_refs declaration from cache.h to here object-store.h: move struct object_info from cache.h dir.h: refactor to no longer need to include cache.h object.h: stop depending on cache.h; make cache.h depend on object.h ident.h: move ident-related declarations out of cache.h pretty.h: move has_non_ascii() declaration from commit.h cache.h: remove dependence on hex.h; make other files include it explicitly hex.h: move some hex-related declarations from cache.h hash.h: move some oid-related declarations from cache.h alloc.h: move ALLOC_GROW() functions from cache.h treewide: remove unnecessary cache.h includes in source files treewide: remove unnecessary cache.h includes treewide: remove unnecessary git-compat-util.h includes in headers treewide: ensure one of the appropriate headers is sourced first
| * | alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren2023-02-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'jc/diff-algo-attribute'Junio C Hamano2023-02-271-1/+3
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | The "diff" drivers specified by the "diff" attribute attached to paths can now specify which algorithm (e.g. histogram) to use. * jc/diff-algo-attribute: diff: teach diff to read algorithm from diff driver diff: consolidate diff algorithm option parsing
| * | diff: teach diff to read algorithm from diff driverJohn Cai2023-02-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It can be useful to specify diff algorithms per file type. For example, one may want to use the minimal diff algorithm for .json files, another for .c files, etc. The diff machinery already checks attributes for a diff driver. Teach the diff driver parser a new type "algorithm" to look for in the config, which will be used if a driver has been specified through the attributes. Enforce precedence of the diff algorithm by favoring the command line option, then looking at the driver attributes & config combination, then finally the diff.algorithm config. To enforce precedence order, use a new `ignore_driver_algorithm` member during options parsing to indicate the diff algorithm was set via command line args. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | userdiff: support Java sealed classesAndrei Rybak2023-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new kind of class was added in Java 17 -- sealed classes.[1] This feature includes several new keywords that may appear in a declaration of a class. New modifiers before name of the class: "sealed" and "non-sealed", and a clause after name of the class marked by keyword "permits". The current set of regular expressions in userdiff.c already allows the modifier "sealed" and the "permits" clause, but not the modifier "non-sealed", which is the first hyphenated keyword in Java.[2] Allow hyphen in the words that precede the name of type to match the "non-sealed" modifier. In new input file "java-sealed" for the test t4018-diff-funcname.sh, use a Java code comment for the marker "RIGHT". This workaround is needed, because the name of the sealed class appears on the line of code that has the "ChangeMe" marker. [1] Detailed description in "JEP 409: Sealed Classes" https://openjdk.org/jeps/409 [2] "JEP draft: Keyword Management for the Java Language" https://openjdk.org/jeps/8223002 Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | userdiff: support Java record typesAndrei Rybak2023-02-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new kind of class was added in Java 16 -- records.[1] The syntax of records is similar to regular classes with one important distinction: the name of the record class is followed by a mandatory list of components. The list is enclosed in parentheses, it may be empty, and it may immediately follow the name of the class or type parameters, if any, with or without separating whitespace. For example: public record Example(int i, String s) { } public record WithTypeParameters<A, B>(A a, B b, String s) { } record SpaceBeforeComponents (String comp1, int comp2) { } Support records in the builtin userdiff pattern for Java. Add "record" to the alternatives of keywords for kinds of class. Allowing matching various possibilities for the type parameters and/or list of the components of a record has already been covered by the preceding patch. [1] detailed description is available in "JEP 395: Records" https://openjdk.org/jeps/395 Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | userdiff: support Java type parametersAndrei Rybak2023-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A class or interface in Java can have type parameters following the name in the declared type, surrounded by angle brackets (paired less than and greater than signs).[2] The type parameters -- `A` and `B` in the examples -- may follow the class name immediately: public class ParameterizedClass<A, B> { } or may be separated by whitespace: public class SpaceBeforeTypeParameters <A, B> { } A part of the builtin userdiff pattern for Java matches declarations of classes, enums, and interfaces. The regular expression requires at least one whitespace character after the name of the declared type. This disallows matching for opening angle bracket of type parameters immediately after the name of the type. Mandatory whitespace after the name of the type also disallows using the pattern in repositories with a fairly common code style that puts braces for the body of a class on separate lines: class WithLineBreakBeforeOpeningBrace { } Support matching Java code in more diverse code styles and declarations of classes and interfaces with type parameters immediately following the name of the type in the builtin userdiff pattern for Java. Do so by just matching anything until the end of the line after the keywords for the kind of type being declared. [1] Since Java 5 released in 2004. [2] Detailed description is available in the Java Language Specification, sections "Type Variables" and "Parameterized Types": https://docs.oracle.com/javase/specs/jls/se17/html/jls-4.html#jls-4.4 Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | attr: add flag `--source` to work with tree-ishKarthik Nayak2023-01-141-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The contents of the .gitattributes files may evolve over time, but "git check-attr" always checks attributes against them in the working tree and/or in the index. It may be beneficial to optionally allow the users to check attributes taken from a commit other than HEAD against paths. Add a new flag `--source` which will allow users to check the attributes against a commit (actually any tree-ish would do). When the user uses this flag, we go through the stack of .gitattributes files but instead of checking the current working tree and/or in the index, we check the blobs from the provided tree-ish object. This allows the command to also be used in bare repositories. Since we use a tree-ish object, the user can pass "--source HEAD:subdirectory" and all the attributes will be looked up as if subdirectory was the root directory of the repository. We cannot simply use the `<rev>:<path>` syntax without the `--source` flag, similar to how it is used in `git show` because any non-flag parameter before `--` is treated as an attribute and any parameter after `--` is treated as a pathname. The change involves creating a new function `read_attr_from_blob`, which given the path reads the blob for the path against the provided source and parses the attributes line by line. This function is plugged into `read_attr()` function wherein we go through the stack of attributes files. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Toon Claes <toon@iotcl.com> Co-authored-by: toon@iotcl.com Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | userdiff: mark unused parameter in internal callbackJeff King2022-12-131-1/+2
|/ | | | | | | | | | | | | | Since f12fa9ee6c (userdiff: add and use for_each_userdiff_driver(), 2021-04-08), lookup of userdiffs is done with a generic for_each_userdiff_driver(). But the name lookup doesn't use the "type" field, of course. We can't get rid of that field from the generic interface because it is used by t/helper/test-userdiff.c. So mark it as unused in this instance to silence -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'jd/userdiff-kotlin'Junio C Hamano2022-03-231-0/+12
|\ | | | | | | | | | | | | A new built-in userdiff driver for kotlin. * jd/userdiff-kotlin: userdiff: add builtin diff driver for kotlin language.
| * userdiff: add builtin diff driver for kotlin language.Jaydeep P Das2022-03-121-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The xfuncname pattern finds func/class declarations in diffs to display as a hunk header. The word_regex pattern finds individual tokens in Kotlin code to generate appropriate diffs. This patch adds xfuncname regex and word_regex for Kotlin language. Signed-off-by: Jaydeep P Das <jaydeepjd.8914@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | userdiff.c: use designated initializers for "struct userdiff_driver"Ævar Arnfjörð Bjarmason2022-02-241-14/+22
|/ | | | | | | | | | | | | | | Change the "struct userdiff_driver" assignmentns to use designated initializers, but let's keep the PATTERNS() and IPATTERN() convenience macros to avoid churn, but have them defined in terms of designated initializers. For the "driver_true" and "driver_false" let's have the compiler implicitly initialize most of the fields, but let's leave a redundant ".binary = 0" for "driver_true" to make it obvious that it's the opposite of the the ".binary = 1" for "driver_false". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* userdiff-cpp: back out the digit-separators in numbersJohannes Sixt2021-10-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of digit-separating single-quotes introduced a note-worthy regression: the change of a character literal with a digit would splice the digit and the closing single-quote. For example, the change from 'a' to '2' is now tokenized as '[-a'-]{+2'+} instead of '[-a-]{+2+}'. The options to fix the regression are: - Tighten the regular expression such that the single-quote can only occur between digits (that would match the official syntax). - Remove support for digit separators. I chose to remove support, because - I have not seen a lot of code make use of digit separators. - If code does use digit separators, then the numbers are typically long. If a change in one of the segments occurs, it is actually better visible if only that segment is highlighted as the word that changed instead of the whole long number. This choice does introduce another minor regression, though, which is highlighted in the test case: when a change occurs in the second or later segment of a hexadecimal number where the segment begins with a digit, but also has letters, the segment is mistaken as consisting of a number and an identifier. I can live with that. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* userdiff-cpp: learn the C++ spaceship operatorJohannes Sixt2021-10-101-1/+1
| | | | | | | | Since C++20, the language has a generalized comparison operator <=>. Teach the cpp driver not to separate it into <= and > tokens. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* userdiff-cpp: permit the digit-separating single-quote in numbersJohannes Sixt2021-10-101-3/+3
| | | | | | | | | | | | | | Since C++17, the single-quote can be used as digit separator: 3.141'592'654 1'000'000 0xdead'beaf Make it known to the word regex of the cpp driver, so that numbers are not split into separate tokens at the single-quotes. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* userdiff-cpp: tighten word regexJohannes Sixt2021-10-081-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generally, word regex can be written such that they match tokens liberally and need not model the actual syntax because it can be assumed that the regex will only be applied to syntactically correct text. The regex for cpp (C/C++) is too liberal, though. It regards these sequences as single tokens: 1+2 1.5-e+2+f and the following amalgams as one token: .l as in str.length .f as in str.find .e as in str.erase Tighten the regex in the following way: - Accept + and - only in one position in the exponent. + and - are no longer regarded as the sign of a number and are treated by the catcher-all that is not visible in the driver's regex. - Accept a leading decimal point only when it is followed by a digit. For readability, factor hex- and binary numbers into an own term. As a drive-by, this fixes that floating point numbers such as 12E5 (with upper-case E) were split into two tokens. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'uk/userdiff-php-enum'Junio C Hamano2021-09-101-1/+1
|\ | | | | | | | | | | | | Update the userdiff pattern for PHP. * uk/userdiff-php-enum: userdiff: support enum keyword in PHP hunk header
| * userdiff: support enum keyword in PHP hunk headerUSAMI Kenta2021-08-311-1/+1
| | | | | | | | | | | | | | | | "enum" keyword will be introduced in PHP 8.1. https://wiki.php.net/rfc/enumerations Signed-off-by: USAMI Kenta <tadsan@zonu.me> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'th/userdiff-more-java'Junio C Hamano2021-08-301-1/+5
|\ \ | | | | | | | | | | | | | | | | | | The userdiff pattern for "java" language has been updated. * th/userdiff-more-java: userdiff: improve java hunk header regex
| * | userdiff: improve java hunk header regexTassilo Horn2021-08-111-1/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the git diff hunk headers show the wrong method signature if the method has a qualified return type, an array return type, or a generic return type because the regex doesn't allow dots (.), [], or < and > in the return type. Also, type parameter declarations couldn't be matched. Add several t4018 tests asserting the right hunk headers for different cases: - enum constant change - change in generic method with bounded type parameters - change in generic method with wildcard - field change in a nested class Signed-off-by: Tassilo Horn <tsdh@gnu.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'jc/userdiff-pattern-hint'Junio C Hamano2021-08-301-0/+10
|\ \ | |/ |/| | | | | | | | | | | | | Remind developers that the userdiff patterns should be kept simple and permissive, assuming that the contents they apply are always syntactically correct. * jc/userdiff-pattern-hint: userdiff: comment on the builtin patterns
| * userdiff: comment on the builtin patternsJunio C Hamano2021-08-111-0/+10
| | | | | | | | | | | | | | | | | | | | | | Remind developers that they do not need to go overboard to implement patterns to prepare for invalid constructs. They only have to be sufficiently permissive, assuming that the payload is syntactically correct, and that may allow them to be simpler. Text stolen mostly from, and further improved by, Johannes Sixt. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | userdiff: add support for C# record typesJulian Verdurmen2021-06-161-1/+1
|/ | | | | | | | | | | | | | | Records are added in C# 9 Code example : public record Person(string FirstName, string LastName); For more information, see: * https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-9 Signed-off-by: Julian Verdurmen <julian.verdurmen@outlook.com> Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'ab/userdiff-tests'Junio C Hamano2021-04-201-57/+112
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bit of code clean-up and a lot of test clean-up around userdiff area. * ab/userdiff-tests: blame tests: simplify userdiff driver test blame tests: don't rely on t/t4018/ directory userdiff: remove support for "broken" tests userdiff tests: list builtin drivers via test-tool userdiff tests: explicitly test "default" pattern userdiff: add and use for_each_userdiff_driver() userdiff style: normalize pascal regex declaration userdiff style: declare patterns with consistent style userdiff style: re-order drivers in alphabetical order
| * userdiff: add and use for_each_userdiff_driver()Ævar Arnfjörð Bjarmason2021-04-081-12/+58
| | | | | | | | | | | | | | | | | | | | | | | | Refactor the userdiff_find_by_namelen() function so that a new for_each_userdiff_driver() API function does most of the work. This will be useful for the same reason we've got other for_each_*() API functions as part of various APIs, and will be used in a follow-up commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * userdiff style: normalize pascal regex declarationÆvar Arnfjörð Bjarmason2021-04-081-3/+2
| | | | | | | | | | | | | | | | | | | | Declare the pascal pattern consistently with how we declare the others, not having "\n" on one line by itself, but as part of the pattern, and when there are alterations have the "|" at the start, not end of the line. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * userdiff style: declare patterns with consistent styleÆvar Arnfjörð Bjarmason2021-04-081-5/+15
| | | | | | | | | | | | | | | | | | | | Change those patterns which were declared with a regex on the same line as the "PATTERNS()" line to put that regex on the next line, and add missing "/* -- */" separator comments between the pattern and word_regex. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * userdiff style: re-order drivers in alphabetical orderÆvar Arnfjörð Bjarmason2021-04-081-38/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Address some old code smell and move around the built-in userdiff drivers so they're both in alphabetical order, and now in the same order they appear in the gitattributes(5) documentation. The two started drifting in be58e70dba (diff: unify external diff and funcname parsing code, 2008-10-05), and then even further in 80c49c3de2 (color-words: make regex configurable via attributes, 2009-01-17) when the "cpp" pattern was added. There are no functional changes here, and as --color-moved will show only moved existing lines. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | userdiff: add support for SchemeAtharva Raykar2021-04-081-0/+9
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a diff driver for Scheme-like languages which recognizes top level and local `define` forms, whether it is a function definition, binding, syntax definition or a user-defined `define-xyzzy` form. Also supports R6RS `library` forms, `module` forms along with class and struct declarations used in Racket (PLT Scheme). Alternate "def" syntax such as those in Gerbil Scheme are also supported, like defstruct, defsyntax and so on. The rationale for picking `define` forms for the hunk headers is because it is usually the only significant form for defining the structure of the program, and it is a common pattern for schemers to have local function definitions to hide their visibility, so it is not only the top level `define`'s that are of interest. Schemers also extend the language with macros to provide their own define forms (for example, something like a `define-test-suite`) which is also captured in the hunk header. Since it is common practice to extend syntax with variants of a form like `module+`, `class*` etc, those have been supported as well. The word regex is a best-effort attempt to conform to R7RS[1] valid identifiers, symbols and numbers. [1] https://small.r7rs.org/attachment/r7rs.pdf (section 2.1) Signed-off-by: Atharva Raykar <raykar.ath@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 've/userdiff-bash'Junio C Hamano2020-11-021-0/+21
|\ | | | | | | | | | | | | | | The userdiff pattern learned to identify the function definition in POSIX shells and bash. * ve/userdiff-bash: userdiff: support Bash
| * userdiff: support BashVictor Engmark2020-10-221-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support POSIX, bashism and mixed function declarations, all four compound command types, trailing comments and mixed whitespace. Even though Bash allows locale-dependent characters in function names <https://unix.stackexchange.com/a/245336/3645>, only detect function names with characters allowed by POSIX.1-2017 <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235> for simplicity. This should cover the vast majority of use cases, and produces system-agnostic results. Since a word pattern has to be specified, but there is no easy way to know the default word pattern, use the default `IFS` characters for a starter. A later patch can improve this. Signed-off-by: Victor Engmark <victor@engmark.name> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'sd/userdiff-css-update'Junio C Hamano2020-10-271-1/+1
|\ \ | | | | | | | | | | | | | | | | | | Userdiff for CSS update. * sd/userdiff-css-update: userdiff: expand detected chunk headers for css
| * | userdiff: expand detected chunk headers for cssSohom Datta2020-10-081-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The regex used for the CSS builtin diff driver in git is only able to show chunk headers for lines that start with a number, a letter or an underscore. However, the regex fails to detect classes (starts with a .), ids (starts with a #), :root and attribute-value based selectors (for example [class*="col-"]), as well as @based block-level statements like @page,@keyframes and @media since all of them, start with a special character. Allow the selectors and block level statements to begin with these special characters. Signed-off-by: Sohom Datta <sohom.datta@learner.manipal.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'kb/userdiff-rust-macro-rules'Junio C Hamano2020-10-271-1/+1
|\ \ | | | | | | | | | | | | | | | | | | Userdiff for Rust update. * kb/userdiff-rust-macro-rules: userdiff: recognize 'macro_rules!' as starting a Rust function block
| * | userdiff: recognize 'macro_rules!' as starting a Rust function blockKonrad Borowski2020-10-071-1/+1
| |/ | | | | | | | | Signed-off-by: Konrad Borowski <konrad@borowski.pw> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | userdiff: PHP: catch "abstract" and "final" functionsJavier Spagnoletti2020-10-071-1/+1
|/ | | | | | | | | | | | | | PHP permits functions to be defined like final public function foo() { } abstract protected function bar() { } but our hunk header pattern does not recognize these decorations. Add "final" and "abstract" to the list of function modifiers. Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Javier Spagnoletti <phansys@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* userdiff: improve Fortran xfuncname regexPhilippe Blain2020-08-131-1/+1
| | | | | | | | | | | | | | | | | | The third part of the Fortran xfuncname regex wants to match the beginning of a subroutine or function, so it allows for all characters except `'`, `"` or whitespace before the keyword 'function' or 'subroutine'. This is meant to match the 'recursive', 'elemental' or 'pure' keywords, as well as function return types, and to prevent matches inside strings. However, the negated set does not contain the `!` comment character, so a line with an end-of-line comment containing the keyword 'function' or 'subroutine' followed by another word is mistakenly chosen as a hunk header. Improve the regex by adding `!` to the negated set. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* userdiff: add tests for Fortran xfuncname regexPhilippe Blain2020-08-131-0/+4
| | | | | | | | | | | | | | | | | The Fortran userdiff patterns, introduced in 909a5494f8 (userdiff.c: add builtin fortran regex patterns, 2010-09-10), predate the test infrastructure for xfuncname patterns, introduced in bfa7d01413 (t4018: an infrastructure to test hunk headers, 2014-03-21). Add tests for the Fortran xfuncname patterns. The test 't/t4018/fortran-comment-keyword' documents a shortcoming of the regex that is fixed in a subsequent commit. While at it, add descriptive comments for the different parts of the regex. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'ah/userdiff-markdown'Junio C Hamano2020-05-081-0/+3
|\ | | | | | | | | | | | | The userdiff patterns for Markdown documents have been added. * ah/userdiff-markdown: userdiff: support Markdown
| * userdiff: support MarkdownAsh Holland2020-05-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's typical to find Markdown documentation alongside source code, and having better context for documentation changes is useful; see also commit 69f9c87d4 (userdiff: add support for Fountain documents, 2015-07-21). The pattern is based on the CommonMark specification 0.29, section 4.2 <https://spec.commonmark.org/> but doesn't match empty headings, as seeing them in a hunk header is unlikely to be useful. Only ATX headings are supported, as detecting setext headings would require printing the line before a pattern matches, or matching a multiline pattern. The word-diff pattern is the same as the pattern for HTML, because many Markdown parsers accept inline HTML. Signed-off-by: Ash Holland <ash@sorrel.sh> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | parse_config_key(): return subsection len as size_tJeff King2020-04-101-2/+2
|/ | | | | | | | | | | | | | | | | | | | | | We return the length to a subset of a string using an "int *" out-parameter. This is fine most of the time, as we'd expect config keys to be relatively short, but it could behave oddly if we had a gigantic config key. A more appropriate type is size_t. Let's switch over, which lets our callers use size_t as appropriate (they are bound by our type because they must pass the out-parameter as a pointer). This is mostly just a cleanup to make it clear this code handles long strings correctly. In practice, our config parser already chokes on long key names (because of a similar int/size_t mixup!). When doing an int/size_t conversion, we have to be careful that nobody was trying to assign a negative value to the variable. I manually confirmed that for each case here. They tend to just feed the result to xmemdupz() or similar; in a few cases I adjusted the parameter types for helper functions to make sure the size_t is preserved. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'ln/userdiff-elixir'Junio C Hamano2019-12-251-1/+2
|\ | | | | | | | | | | | | Hotfix. * ln/userdiff-elixir: userdiff: remove empty subexpression from elixir regex
| * userdiff: remove empty subexpression from elixir regexEd Maste2019-12-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The regex failed to compile on FreeBSD. Also add /* -- */ mark to separate the two regex entries given to the PATTERNS() macro, to make it consistent with patterns for other content types. Signed-off-by: Ed Maste <emaste@FreeBSD.org> Reviewed-by: Jeff King <peff@peff.net> Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'jh/userdiff-python-async'Junio C Hamano2019-12-051-1/+1
|\ \ | |/ |/| | | | | | | | | | | The userdiff machinery has been taught that "async def" is another way to begin a "function" in Python. * jh/userdiff-python-async: userdiff: support Python async functions
| * userdiff: support Python async functionsJosh Holland2019-11-201-1/+1
| | | | | | | | | | | | | | | | | | | | Python's async functions (declared with "async def" rather than "def") were not being displayed in hunk headers. This commit teaches git about the async function syntax, and adds tests for the Python userdiff regex. Signed-off-by: Josh Holland <anowlcalledjosh@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | userdiff: add Elixir to supported userdiff languagesŁukasz Niemier2019-11-101-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | Adds support for xfuncref in Elixir[1] language which is Ruby-like language that runs on Erlang[3] Virtual Machine (BEAM). [1]: https://elixir-lang.org [2]: https://www.erlang.org Signed-off-by: Łukasz Niemier <lukasz@niemier.pl> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | userdiff: fix some corner cases in dts regexStephen Boyd2019-10-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While reviewing some dts diffs recently I noticed that the hunk header logic was failing to find the containing node. This is because the regex doesn't consider properties that may span multiple lines, i.e. property = <something>, <something_else>; and it got hung up on comments inside nodes that look like the root node because they start with '/*'. Add tests for these cases and update the regex to find them. Maybe detecting the root node is too complicated but forcing it to be a backslash with any amount of whitespace up to an open bracket seemed OK. I tried to detect that a comment is in-between the two parts but I wasn't happy so I just dropped it. Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>