QDoc: Add a warning for tokens that are too long to be parsed

`Tokenizer` uses a fixed-size buffer when parsing the sources for tokens. When a token that is too long is encountered, all characters that do not fit into the buffer are discarded and the parsing continues. When this happens, misleading warnings may be issued by QDoc as the sources that it sees are only partially correct. For example, a comment-block that does not fit into the buffer might be seen as invalid even when it is not. To ease the debugging of those problems, a warning is now issued when a character is read when the buffer is already full. To avoid issuing the same warning for each character, `Tokenizer` now employs a boolean flag that is set when such a warning is issued and is reset when a new token is requested. Resetting the flag when a new token is requested ensures that we report all warnings of this type that are encountered during the parsing phase. Some documentation for the new warning was added to the `qodc-warnings` page. Change-Id: I2ca4d86c201a8009d3f1b6760ed8c28f4401e114 Reviewed-by: Paul Wicking <paul.wicking@qt.io> (cherry picked from commit c358b0942f30606472f94249ca2795c721328a06) Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
author: Luca Di Sera <luca.disera@qt.io> 2021-09-30 12:59:54 +0200
committer: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org> 2021-09-30 16:50:35 +0000
commit: 8b06b962ea99755b727ef85c855e86b5e43a01b7 (patch)
tree: fa99f64889539ffac5f4b4c1154c18e1d80e5ab0
parent: 9b2214467138aae827b1ca645467d1767d1afda8 (diff)
download: qttools-8b06b962ea99755b727ef85c855e86b5e43a01b7.tar.gz
3 files changed, 50 insertions, 3 deletions
diff --git a/src/qdoc/doc/qdoc-warnings.qdoc b/src/qdoc/doc/qdoc-warnings.qdoc
index b57c2a5e0..198a9990d 100644
--- a/src/qdoc/doc/qdoc-warnings.qdoc
+++ b/src/qdoc/doc/qdoc-warnings.qdoc
@@ -770,4 +770,32 @@
     \endcode
 
     QDoc will issue this warning if a certain title is used in more than one page.
+
+
+    \section1 The content is too long
+
+    QDoc uses a fixed-size buffer when tokenizing source files. If any single
+    token in the file has more characters than the maximum limit, QDoc will
+    issue this warning.
+
+    While QDoc will continue parsing the file, only the part of the token that
+    fits into the buffer is considered, meaning that the output might be
+    mangled.
+
+    To resolve this warning, the relevant content must be reduced in size,
+    either by splitting it, if possible, or by removing some of its parts.
+
+    The maximum amount of characters for a single token is shown alongside
+    the warning, for example:
+
+    \badcode
+        file.qdoc:71154: (qdoc) warning: The content is too long.
+
+        [The maximum amount of characters for this content is 524288.
+        Consider splitting it or reducing its size.]
+    \endcode
+
+    \note Since content that is too long will not be parsed in full, QDoc may
+    issue warnings that are false positives. Resolve all warnings of this type
+    before fixing other warnings.
 */
diff --git a/src/qdoc/tokenizer.cpp b/src/qdoc/tokenizer.cpp
index c61b18eb8..1f9476d45 100644
--- a/src/qdoc/tokenizer.cpp
+++ b/src/qdoc/tokenizer.cpp
@@ -164,6 +164,8 @@ Tokenizer::~Tokenizer()
 
 int Tokenizer::getToken()
 {
+    token_too_long_warning_was_issued = false;
+
     char *t = m_prevLex;
     m_prevLex = m_lex;
     m_lex = t;
diff --git a/src/qdoc/tokenizer.h b/src/qdoc/tokenizer.h
index 3f2303b8f..77b6bb193 100644
--- a/src/qdoc/tokenizer.h
+++ b/src/qdoc/tokenizer.h
@@ -127,9 +127,11 @@ private:
     void init();
     void start(const Location &loc);
     /*
-      This limit on the length of a lexeme seems fairly high, but a
-      doc comment can be arbitrarily long. The previous 65,536 limit
-      was reached by Mark Summerfield.
+     Represents the maximum amount of characters that can appear in a
+     block-comment.
+
+     When a block-comment with more characters than the maximum amount is
+     encountered, a warning is issued.
     */
     enum { yyLexBufSize = 524288 };
 
@@ -142,6 +144,14 @@ private:
         if (m_lexLen < yyLexBufSize - 1) {
             m_lex[m_lexLen++] = (char)m_ch;
             m_lex[m_lexLen] = '\0';
+        } else if (!token_too_long_warning_was_issued) {
+            location().warning(
+                u"The content is too long.\n"_qs,
+                u"The maximum amount of characters for this content is %1.\n"_qs.arg(yyLexBufSize) +
+                "Consider splitting it or reducing its size."
+            );
+
+            token_too_long_warning_was_issued = true;
         }
         m_curLoc.advance(QChar(m_ch));
         int ch = getch();
@@ -174,6 +184,13 @@ private:
     QString m_version {};
     bool m_parsingMacro {};
 
+    // Used to ensure that the warning that is issued when a token is
+    // too long to fit into our fixed sized buffer is not repeated for each
+    // character of that token after the last saved one.
+    // The flag is reset whenever a new token is requested, so as to allow
+    // reporting all such tokens that are too long during a single execution.
+    bool token_too_long_warning_was_issued{false};
+
 protected:
     QByteArray m_in {};
     int m_pos {};
author	Luca Di Sera <luca.disera@qt.io>	2021-09-30 12:59:54 +0200
committer	Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>	2021-09-30 16:50:35 +0000
commit	8b06b962ea99755b727ef85c855e86b5e43a01b7 (patch)
tree	fa99f64889539ffac5f4b4c1154c18e1d80e5ab0
parent	9b2214467138aae827b1ca645467d1767d1afda8 (diff)
download	qttools-8b06b962ea99755b727ef85c855e86b5e43a01b7.tar.gz