summaryrefslogtreecommitdiff
path: root/src/qdoc/CMakeLists.txt
diff options
context:
space:
mode:
authorPaul Wicking <paul.wicking@qt.io>2023-05-03 11:00:25 +0200
committerPaul Wicking <paul.wicking@qt.io>2023-05-13 22:01:10 +0200
commit7057d01fbb9f8f37c707b33e3b92c10a78919ddc (patch)
treea919ab62d892885bde7d9049011b3b658b920dce /src/qdoc/CMakeLists.txt
parent941a9b5e5963f8c0798415e3cb69f031da1f4109 (diff)
downloadqttools-7057d01fbb9f8f37c707b33e3b92c10a78919ddc.tar.gz
QDoc: Append hash to canonical titles with non-alnum characters
When generating fragment identifiers from a title, QDoc normalizes the string that's used as fragment identifier. This normalization is done by `Doc::canonicalTitle()`. This method returns a string that is stripped from non-alphanumeric characters, has space(s) replaced by one hyphen, and any repeating or trailing hyphens removed. This causes the removal of certain characters, such as 'ß', '大', etc. For documentation written in languages that contain mostly non-latin1 characters, such as Chinese, this means fragment identifiers may be empty, such that links to these anchors (e.g. from a table of contents) lead to nowhere. This patch adds test data to QDoc's generated output test to reproduce the issue. The Chinese test data is courtesy of the bug reporter. The test data also contains other characters from Latin scripts, as during investigation of a solution to the bug, these appeared as separate triggers of the misbehavior. The modified test also serves to catch possible future regressions. The patch modifies `Doc::canonicalTitle` such that it appends a hash to "canonical" titles that contain characters that are not considered legal entities in a canonical title. In this context, legal characters are lowercase a-z, digits 0-9, and the dash (`-`). Other symbols and characters are removed. When encountering any character that is either a non-printable ascii character or ascii character outside a subset (ascii decimal 32-126, inclusive), QDoc will append a hash of the original string to the fragment identifier it generates. This means that the canonical title for a string that contains, for example, a mix of allowed and disallowed characters, will consist of the allowed characters and a hash of the original string appended to the final string. The patch changes the loop in `canonicalTitle` to a ranged for loop over a const-ref, and adds precision to a code comment (precision based on timing the execution of the two implementations of this method one million times). Finally, the patch adds documentation for `Doc::canonicalTitle`, as that didn't exist previously. [ChangeLog][QDoc] QDoc now appends a hash of the original title to the fragment identifier generated for that title if the title contains non-ascii characters. This means QDoc now generates fragment identifiers for titles that are written in non-latin characters. Fixes: QTBUG-64506 Change-Id: Idc62677b9950becea662d8ff5ead1f631ec26bc3 Reviewed-by: Topi Reiniö <topi.reinio@qt.io>
Diffstat (limited to 'src/qdoc/CMakeLists.txt')
0 files changed, 0 insertions, 0 deletions