diff options
author | Paul Wicking <paul.wicking@qt.io> | 2023-05-03 11:00:25 +0200 |
---|---|---|
committer | Paul Wicking <paul.wicking@qt.io> | 2023-05-13 22:01:10 +0200 |
commit | 7057d01fbb9f8f37c707b33e3b92c10a78919ddc (patch) | |
tree | a919ab62d892885bde7d9049011b3b658b920dce /src/qdoc/CMakeLists.txt | |
parent | 941a9b5e5963f8c0798415e3cb69f031da1f4109 (diff) | |
download | qttools-7057d01fbb9f8f37c707b33e3b92c10a78919ddc.tar.gz |
QDoc: Append hash to canonical titles with non-alnum characters
When generating fragment identifiers from a title, QDoc normalizes the
string that's used as fragment identifier. This normalization is done by
`Doc::canonicalTitle()`. This method returns a string that is stripped
from non-alphanumeric characters, has space(s) replaced by one hyphen,
and any repeating or trailing hyphens removed.
This causes the removal of certain characters, such as 'ß', '大', etc.
For documentation written in languages that contain mostly non-latin1
characters, such as Chinese, this means fragment identifiers may be
empty, such that links to these anchors (e.g. from a table of contents)
lead to nowhere.
This patch adds test data to QDoc's generated output test to reproduce
the issue. The Chinese test data is courtesy of the bug reporter. The
test data also contains other characters from Latin scripts, as during
investigation of a solution to the bug, these appeared as separate
triggers of the misbehavior. The modified test also serves to catch
possible future regressions.
The patch modifies `Doc::canonicalTitle` such that it appends a hash to
"canonical" titles that contain characters that are not considered legal
entities in a canonical title. In this context, legal characters are
lowercase a-z, digits 0-9, and the dash (`-`). Other symbols and
characters are removed. When encountering any character that is either a
non-printable ascii character or ascii character outside a subset (ascii
decimal 32-126, inclusive), QDoc will append a hash of the original
string to the fragment identifier it generates. This means that the
canonical title for a string that contains, for example, a mix of
allowed and disallowed characters, will consist of the allowed
characters and a hash of the original string appended to the final
string.
The patch changes the loop in `canonicalTitle` to a ranged for loop over
a const-ref, and adds precision to a code comment (precision based on
timing the execution of the two implementations of this method one
million times).
Finally, the patch adds documentation for `Doc::canonicalTitle`, as that
didn't exist previously.
[ChangeLog][QDoc] QDoc now appends a hash of the original title to the
fragment identifier generated for that title if the title contains
non-ascii characters. This means QDoc now generates fragment identifiers
for titles that are written in non-latin characters.
Fixes: QTBUG-64506
Change-Id: Idc62677b9950becea662d8ff5ead1f631ec26bc3
Reviewed-by: Topi Reiniö <topi.reinio@qt.io>
Diffstat (limited to 'src/qdoc/CMakeLists.txt')
0 files changed, 0 insertions, 0 deletions