diff options
author | Haojian Wu <hokein.wu@gmail.com> | 2022-08-16 21:23:11 +0200 |
---|---|---|
committer | Haojian Wu <hokein.wu@gmail.com> | 2022-08-17 14:30:53 +0200 |
commit | 6a9f79e1020db9f581d00791f1f644b64facfebe (patch) | |
tree | 45cfe7ce4c9905c96fa39cb7c24ad82d991919b7 /clang-tools-extra/pseudo | |
parent | d7e06d5675b62b5d3d89e6d6210c34b74a1a8356 (diff) | |
download | llvm-6a9f79e1020db9f581d00791f1f644b64facfebe.tar.gz |
[pseudo] Eliminate the type-name identifier ambiguities in the grammar.
See https://reviews.llvm.org/D130626 for motivation.
Identifier in the grammar has different categories (type-name, template-name,
namespace-name), they requires semantic information to resolve. This patch is
to eliminate the "local" ambiguities in type-name, and namespace-name, which
gives us a performance boost of the parser:
- eliminate all different type rules (class-name, enum-name, typedef-name), and
fold them into a unified type-name, this removes the #1 type-name ambiguity, and
gives us a big performance boost;
- remove the namespace-alis rules, as they're hard and uninteresting;
Note that we could eliminate more and gain more performance (like fold template-name,
type-name, namespace together), but at current stage, we'd like keep all existing
categories of the identifier (as they might assist in correlated disambiguation &
keep the representation of important concepts uniform).
| file |ambiguous nodes | forest size | glrParse performance |
|SemaCodeComplete.cpp| 11k -> 5.7K | 10.4MB -> 7.9MB | 7.1MB/s -> 9.98MB/s |
| AST.cpp | 1.3k -> 0.73K | 0.99MB -> 0.77MB | 6.7MB/s -> 8.4MB/s |
Differential Revision: https://reviews.llvm.org/D130747
Diffstat (limited to 'clang-tools-extra/pseudo')
-rw-r--r-- | clang-tools-extra/pseudo/lib/cxx/cxx.bnf | 20 | ||||
-rw-r--r-- | clang-tools-extra/pseudo/test/glr.cpp | 10 |
2 files changed, 12 insertions, 18 deletions
diff --git a/clang-tools-extra/pseudo/lib/cxx/cxx.bnf b/clang-tools-extra/pseudo/lib/cxx/cxx.bnf index bc6599c4e3c4..7221a5086acf 100644 --- a/clang-tools-extra/pseudo/lib/cxx/cxx.bnf +++ b/clang-tools-extra/pseudo/lib/cxx/cxx.bnf @@ -34,14 +34,9 @@ _ := statement-seq _ := declaration-seq # gram.key -typedef-name := IDENTIFIER -typedef-name := simple-template-id +#! we don't distinguish between namespaces and namespace aliases, as it's hard +#! and uninteresting. namespace-name := IDENTIFIER -namespace-name := namespace-alias -namespace-alias := IDENTIFIER -class-name := IDENTIFIER -class-name := simple-template-id -enum-name := IDENTIFIER template-name := IDENTIFIER # gram.basic @@ -391,9 +386,12 @@ builtin-type := INT builtin-type := FLOAT builtin-type := DOUBLE builtin-type := VOID -type-name := class-name -type-name := enum-name -type-name := typedef-name +#! Unlike C++ standard grammar, we don't distinguish the underlying type (class, +#! enum, typedef) of the IDENTIFIER, as these ambiguities are "local" and don't +#! affect the final parse tree. Eliminating them gives a significant performance +#! boost to the parser. +type-name := IDENTIFIER +type-name := simple-template-id elaborated-type-specifier := class-key nested-name-specifier_opt IDENTIFIER elaborated-type-specifier := class-key simple-template-id elaborated-type-specifier := class-key nested-name-specifier TEMPLATE_opt simple-template-id @@ -551,7 +549,7 @@ private-module-fragment := module-keyword : PRIVATE ; declaration-seq_opt class-specifier := class-head { member-specification_opt [recover=Brackets] } class-head := class-key class-head-name class-virt-specifier_opt base-clause_opt class-head := class-key base-clause_opt -class-head-name := nested-name-specifier_opt class-name +class-head-name := nested-name-specifier_opt type-name class-virt-specifier := contextual-final class-key := CLASS class-key := STRUCT diff --git a/clang-tools-extra/pseudo/test/glr.cpp b/clang-tools-extra/pseudo/test/glr.cpp index 221725c6f089..f805e42ffa6d 100644 --- a/clang-tools-extra/pseudo/test/glr.cpp +++ b/clang-tools-extra/pseudo/test/glr.cpp @@ -12,10 +12,7 @@ void foo() { // CHECK-NEXT: │ └─; := tok[8] // CHECK-NEXT: └─statement~simple-declaration := decl-specifier-seq init-declarator-list ; // CHECK-NEXT: ├─decl-specifier-seq~simple-type-specifier := <ambiguous> -// CHECK-NEXT: │ ├─simple-type-specifier~type-name := <ambiguous> -// CHECK-NEXT: │ │ ├─type-name~IDENTIFIER := tok[5] -// CHECK-NEXT: │ │ ├─type-name~IDENTIFIER := tok[5] -// CHECK-NEXT: │ │ └─type-name~IDENTIFIER := tok[5] +// CHECK-NEXT: │ ├─simple-type-specifier~IDENTIFIER := tok[5] // CHECK-NEXT: │ └─simple-type-specifier~IDENTIFIER := tok[5] // CHECK-NEXT: ├─init-declarator-list~ptr-declarator := ptr-operator ptr-declarator // CHECK-NEXT: │ ├─ptr-operator~* := tok[6] @@ -23,12 +20,11 @@ void foo() { // CHECK-NEXT: └─; := tok[8] } -// CHECK: 3 Ambiguous nodes: +// CHECK: 2 Ambiguous nodes: // CHECK-NEXT: 1 simple-type-specifier // CHECK-NEXT: 1 statement -// CHECK-NEXT: 1 type-name // CHECK-EMPTY: // CHECK-NEXT: 0 Opaque nodes: // CHECK-EMPTY: -// CHECK-NEXT: Ambiguity: 0.40 misparses/token +// CHECK-NEXT: Ambiguity: 0.20 misparses/token // CHECK-NEXT: Unparsed: 0.00% |