diff options
author | Luca Di Sera <luca.disera@qt.io> | 2022-03-18 14:49:15 +0100 |
---|---|---|
committer | Luca Di Sera <luca.disera@qt.io> | 2022-03-21 10:27:09 +0100 |
commit | c25ae7e86f20a7bd9a65933bec8bff4f9d9d48cd (patch) | |
tree | ca3018dda1b4ff37dec93fe0ebb4a6e6bc035e75 /src/qdoc/docparser.cpp | |
parent | ea1d75d4daa1268c2887d2f65dfdb4bc45507ed1 (diff) | |
download | qttools-c25ae7e86f20a7bd9a65933bec8bff4f9d9d48cd.tar.gz |
qdoc: Stop considering special characters for auto-linking
While parsing a QDoc block-comment, QDoc evaluates some sequences of
characters as candidates for auto-linking.
A sequence of characters that is considered for auto-linking will be
tried to be correlated with some documented element and presented, in
the final documentation, as a link to that element, if any such element
exists.
Up to now, the rules for auto-linking would qualify as a candidate
sequences of special characters such as "_", "____", "@_" or "()" to be
auto-linking candidate.
The production of those candidates could, in some currently unknown
conditions, be correlated with some documented elements, producing an
incorrect link in the output documentation.
See for example QTBUG-100562.
To avoid this issue, the rules were modified to such that identifier
that are composed of special characters require at least some part
composed of lowercase or uppercase characters to be considered as
candidates for auto-linking.
For example, `_` would not qualify for auto-linking while "QT_DEBUG"
would still qualify.
Code-wise, the method responsible for identifying auto-linking
candidates, `DocParser::isAutoLink`, would keep a count of the amount
of characters of a certain category that was encountered during parsing.
If any character in the category "StrangeSymbol", such as '_', was
encountered, in any position that was parsed, `isAutoLink` would
consider the passed in word as a valid candidate for auto-linking.
The condition was expanded to require at least one character in the
"lowercase" or "uppercase" category to be encountered together with the
"StrangeSymbol" for the passed in word to be considered a valid
candidate.
As a consequence of modifying the validation with regards to
"StrangeSymbol"s, part of the code was able to be simplified.
Before, it was required to check that the sequence "()", which is a
"StrangeSymbol", did not appear as the starting sequence of the word,
such as to avoid the recognition of the word "()".
Now, since a "StrangeSymbol" does not qualify for auto-linking, the
branching for the "()" case could be flattened as a word "()" will not
be considered a candidate and the parsing, as before, will stop after
encountering that sequence.
Some documentation for the current rules was embedded in the codebase.
Fixes: QTBUG-101711
Change-Id: I5c47ece85ac6c419b7fe048d55becb52ba61601b
Reviewed-by: Topi Reiniƶ <topi.reinio@qt.io>
Diffstat (limited to 'src/qdoc/docparser.cpp')
-rw-r--r-- | src/qdoc/docparser.cpp | 156 |
1 files changed, 144 insertions, 12 deletions
diff --git a/src/qdoc/docparser.cpp b/src/qdoc/docparser.cpp index d22a32244..b5bedb0a9 100644 --- a/src/qdoc/docparser.cpp +++ b/src/qdoc/docparser.cpp @@ -1153,6 +1153,8 @@ void DocParser::parse(const QString &source, DocPrivate *docPrivate, } } } else if (isAutoLinkString(cmdStr)) { + // [possibly-incorrect-parsing][auto-linking] + // See [function-dependent-on-context][auto-linking] appendWord(cmdStr); } else { if (!cmdStr.endsWith("propertygroup")) { @@ -1473,8 +1475,126 @@ bool DocParser::openCommand(int cmd) return ok; } +// TODO: Consider modifying this to avoid the possibility of +// misunderstanding the output of the first overload. +// There are different possibilities but some of the obvious one may +// result in a pessimization of `isAutoLinking`, whose impact must be +// considered. + +// REMARK: [function-dependent-on-context][auto-linking] +// The following overload is insidious as it is not in reality +// total on its domain. +// Supposedly, with a good degree of certainty, this overload was +// intended to support the ability to auto-link for those sequences of +// characters that are already parsed. +// Considering how the parser for auto-linking is implemented, this is +// inerehently unsafe unless certain conditions are met. +// At the current time of writing, this is safe as its only use is on +// certain strings that respect the command grammr, which is a safe +// subset for which the function is total. +// See [possibly-incorrect-parsing][auto-linking] for such cases. +// +// Nonetheless, it must be noted that any additional call to this +// overload must always be audited for this issue and any existing +// call must be reaudited every time the format of its input +// parameters changes to assume a new grammar. + /*! Returns \c true if \a word qualifies for auto-linking. + + A word qualifies for auto-linking if either: + + \list + \li It is composed of only upper and lowercase characters + \li AND It contains at least one uppercase character that is not + the first character of word + \li AND it contains at least two lowercase characters + \endlist + + Or + + \list + \li It is composed only of uppercase characters, lowercase + characters, characters in [_@] and the \c {"::"} sequence. + \li It contains at least one uppercase character that is not + the first character of word or it contains at least one + lowercase character + \li AND it contains at least one character in [_@] or it + contains at least one \c {"::"} sequence. + \endlist + + Inserting or suffixing, but not prefixing, any sequence in [0-9]+ + in a word that qualifies for auto-linking by the above rules + preserves the auto-linkability of the word. + + Suffixing the sequence \c {"()"} to a word that qualifies for + auto-linking by the above rules preserves the auto-linkability of + a word. + + FInally, a word qualifies for auto-linking if: + + \list + \li It is composed of only uppercase characters, lowercase + characters and the sequence \c {"()"} + \li AND it contains one lowercase character and a sequence of zero, one + or two upper or lowercase characters + \li AND it contains exactly one sequence \c {"()"} + \li AND it contains one sequence \c {"()"} as the last two + characters of word + \endlist + + For example, \c {"fOo"}, \c {"FooBar"} and \c {"foobaR"} qualify + for auto-linking by the first rule. + + \c {"QT_DEBUG"}, \c {"::Qt"} and \c {"std::move"} qualifies for + auto-linking by the second rule. + + \c {"SIMDVector256"} qualifies by suffixing \c {"SIMDVector"}, + which qualifies by the first rule, with the sequence \c {"256"} + + \c {"FooBar::Bar()"} qualifies by suffixing \c {"FooBar::Bar"}, + which qualifies by the first and second rule, with the sequence \c + {"()"}. + + \c {"Foo()"} and \c {"a()"} qualifies by the last rule. + + Instead, \c {"Q"}, \c {"flower"}, \c {"_"} and \c {"()"} do not + qualify for auto-linking. + + The rules are intended as a heuristic to catch common cases in the + Qt documentation where a word might represent an important + documented element such as a class or a method that could be + linked to while at the same time avoiding catching common words + such as \c {"A"} or \c {"Nonetheless"}. + + The heuristic assumes that Qt's codebase respects a style where + camelCasing is the standard for most of the elements, a function + call is identified by the use of parenthesis and certain elements, + such as macros, might be fully uppercase. + + Furthemore, it assumes that the Qt codebase is written in a + language that has an identifier grammar similar to the one for + C++. + + This overload is intended to be used only on words that are + guaranteed to either entirely qualify for autolinking, such that + they can be produced from the above rules, or to no contain any + prefix that qualifies for auto-linking. For example, words whose + grammar is [a-z][A-Z][a-z]+. + + That is, consider a word W that qualifies for auto-linking and any + word Z. + + The concatenation of W followed by Z will be considred as + qualifying for auto-linking independently of the structure of Z. + + For example, \c {"foo()?ajEjd!!!!&a"} would qualify the whole word + for auto-linking albeit only the prefix \c {"foo()"} actually + respects the auto-linking rules. + + Any call of this overload with a word that does not respect this + constraint will not be considered to guarantee that \a word + qualifies for auto-linking. */ inline bool DocParser::isAutoLinkString(const QString &word) { @@ -1482,6 +1602,22 @@ inline bool DocParser::isAutoLinkString(const QString &word) return isAutoLinkString(word, start); } +/*! + Returns \c true if a prefix of \a word qualifies for auto-linking. + + Respects the same parsing rules as the unary overload. + + \a curPos defines the offset, from the first character of \ word, + at which parsing is initiated. + + When the call completes, \a curPos represents the offset, from the + first character of word, that is the successor of the amount of + characters of word that were parsed. + + If the return value of the call is \c true, it is guaranteed that + the prefix of \word that contains the characters up to and not + including \a curPos qualifies for auto-linking. +*/ bool DocParser::isAutoLinkString(const QString &word, qsizetype &curPos) { qsizetype len = word.size(); @@ -1512,22 +1648,18 @@ bool DocParser::isAutoLinkString(const QString &word, qsizetype &curPos) ++numStrangeSymbols; curPos += 2; } else if (latin1Ch == '(') { - if (curPos > startPos) { - if ((curPos < len - 1) && (word.at(curPos + 1) == QLatin1Char(')'))) { - ++numStrangeSymbols; - m_position += 2; - break; - } else { - break; - } - } else { - break; + if ((curPos < len - 1) && (word.at(curPos + 1) == QLatin1Char(')'))) { + ++numStrangeSymbols; + m_position += 2; } - } else { + + break; + } else { break; } } - return ((numUppercase >= 1 && numLowercase >= 2) || numStrangeSymbols > 0); + + return ((numUppercase >= 1 && numLowercase >= 2) || (numStrangeSymbols > 0 && (numUppercase + numLowercase >= 1))); } bool DocParser::closeCommand(int endCmd) |