| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
As reported on https://pvs-studio.com/en/blog/posts/cpp/0771/ (Snippet 2) - (and mentioned on rGdc4259d5a38409) we are repeating the T1.isNull() check instead of checking T2.isNull() as well, and at this point neither should be null - so we're better off with an assertion.
Differential Revision: https://reviews.llvm.org/D107347
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes an issue where GEP salvaging did not properly account for GEP
instructions which stepped over array elements of width 0 (effectively a
no-op). This unnecessarily produced long expressions by appending
`... + (x * 0)` and potentially extended the number of SSA values used
in the dbg.value. This also erroneously triggered an assert in the
salvage function that the element width would be strictly positive.
These issues are resolved by simply ignoring these useless operands.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D111809
|
|
|
|
|
|
| |
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D111735
|
|
|
|
|
|
|
| |
When this code was added, an unnecessary assertion slipped in which we
now hit in real code.
Add a test to defend against it firing again.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With D110105, the isDebug flag for register uses is now a proxy for whether
the instruction is a debug instruction; that causes DBG_PHIs to have their
operands updated by calls to updateDbgUsersToReg, which is the correct
behaviour. However: that function only expects to receive DBG_VALUE
instructions and asserts such.
This patch splits the updating-action into a lambda, and applies it to the
appropriate operands for each kind of debug instruction. Tested with an
ARM test that stimulates this function: I've added some DBG_PHI
instructions that should be updated in the same way as DBG_VALUEs.
Differential Revision: https://reviews.llvm.org/D108641
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor ConnectToRemote() to improve readability and make future
changes easier:
1. Replace static buffers with std::string.
2. When handling errors, prefer reporting the actual error over dumb
'connection status is not success'.
3. Move host/port parsing directly into reverse_connection condition
that is its only user, and simplify it to make its purpose (verifying
that a valid port is provided) clear.
4. Use llvm::errs() and llvm::outs() instead of fprintf().
Differential Revision: https://reviews.llvm.org/D11196
|
|
|
|
|
|
|
| |
This reverts commit fa16329ae0721023376f24c7577b9020d438df1a.
See comments in discussion. Merged by mistake, not entirely getting what
the problem was.
|
|
|
|
| |
This file was rewritten in rst format in clang/docs/Block-ABI-Apple.rst
|
|
|
|
|
|
|
|
| |
clang format on some demangling files
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D111934
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using explicit Clang modules, some declarations might unexpectedly become invisible.
This is caused by the mechanism that loads PCM files passed via `-fmodule-file=<path>` and creates an `IdentifierInfo` for the module name. The `IdentifierInfo` creation takes place when the `ASTReader` is in a weird state, with modules that are loaded but not yet set up properly. This patch delays the creation of `IdentifierInfo` until the `ASTReader` is done with reading the PCM.
Note that the `-fmodule-file=<name>=<path>` form of the argument doesn't suffer from this issue, since it doesn't create `IdentifierInfo` for the module name.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D111543
|
| |
|
|
|
|
|
|
| |
gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D111872
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.
This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.
Differential Revision: https://reviews.llvm.org/D111397
|
|
|
|
|
|
|
|
|
|
| |
Add patterns for i8/i16 local atomic load/store.
Added tests for new patterns.
Copied atomic_[store/load]_local.ll to GlobalISel directory.
Differential Revision: https://reviews.llvm.org/D111869
|
|
|
|
|
|
|
|
| |
Set `HAVE_CXX_ATOMICS_WITHOUT_LIB` or `HAVE_LIBATOMIC` when build LLVM with xlclang. With these macros set, libraries like libLLVMSupport are able to know whether it's necessary to add `-latomic` to dependent system libs. If `HAVE_LIBATOMIC` is set, `llvm-config --system-libs` appends `-latomic` to its output.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D111782
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.
However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).
This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.
Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D111885
|
|
|
|
|
|
|
|
| |
This patch is to order the AVX instructions ahead of AVX512 instructions
in the matching table so that the AVX instructions can be matched first.
Thanks Craig and Shengchen for the idea.
Differential Revision: https://reviews.llvm.org/D111538
|
|
|
|
|
|
|
|
|
|
|
| |
Remove Status::WasInterrupted() that checks whether the underlying error
code matches EINTR. ProcessGDBRemote::ConnectToDebugserver() is its
only call site, and it does not seem correct there. After all, EINTR
is precisely when we want to retry, not stop retrying. Furthermore,
it should not really matter since we should be catching EINTR
immediately via llvm::sys::RetryAfterSignal() but that's another story.
Differential Revision: https://reviews.llvm.org/D111908
|
|
|
|
| |
Precommit tests for D111888.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The change adds divergence predicates for fused logical operations.
The problem with selecting a scalar fused op such as S_NOR_B32 is
that it does not have a VALU counterpart and will be split in
moveToVALU. At the same time it prevents selection of a better
opcode on the VALU side (such as V_OR3_B32) which does not have a
counterpart on SALU side.
XNOR opcodes are left as is and selected as scalar to get advantage
of the SIInstrInfo::lowerScalarXnor() code which can commute
operations to keep one of two opcodes on SALU if possible. See
xnor.ll test for this.
Differential Revision: https://reviews.llvm.org/D111907
|
|
|
|
|
|
|
| |
This is a temporary fix, better would be to avoid including
llvm/Option/ArgList.h from a Support source file.
Differential Revision: https://reviews.llvm.org/D111974
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no reason why this function should be returning a ConstString.
While modifying these files, I also fixed several instances where
GetPluginName and GetPluginNameStatic were returning different strings.
I am not changing the return type of GetPluginNameStatic in this patch, as that
would necessitate additional changes, and this patch is big enough as it is.
Differential Revision: https://reviews.llvm.org/D111877
|
|
|
|
| |
This was introduced in D105168 which added RISCVISAInfo.h.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the outline method definition.
The clang behavior was poor before this patch:
```
void B::foo() override {}
// Before: clang emited "expcted function body after function
// declarator", and skiped all contents until it hits a ";", the
// following function f() is discarded.
// VS
// Now "override is not allowed" with a remove fixit, and following f()
// is retained.
void f();
```
Differential Revision: https://reviews.llvm.org/D111883
|
|
|
|
|
|
|
| |
Create new virtual register for the definition of new AND instruction and
replace old register by the new one to keep SSA form.
Differential Revision: https://reviews.llvm.org/D109963
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kill flags
We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05
After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D111688
|
|
|
|
|
|
| |
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D110855
|
|
|
|
|
| |
The "Fixers" name was a hangover from an earlier draft of the patch. "Visitors"
fits the function name(s).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When peeling a loop, we assume that the latch has a `br` terminator and
that all loop exits are either terminated with an `unreachable` or have
a terminating deoptimize call. So when we peel off the 1st iteration, we
change the IDom of all loop exits to the peeled copy of
`NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support
loops with exits that are followed by a block with an `unreachable` or a
terminating deoptimize call, changing the exit's idom wouldn't be enough
and DT would be broken.
For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So
neither of the exits dominates this unreachable block. If we change the
IDoms of the exits to some peeled loop block, we don't update the
dominators of the unreachable block. Currently we just don't get to the
peeling logic, saying that we can't peel such loops.
With this NFC we just insert edges from cloned exiting blocks to their
exits after peeling each iteration (we accumulate the insertion updates
and then after peeling apply the updates to DT).
This patch was a part of D110922.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev
|
| |
|
| |
|
|
|
|
| |
We have backend optimizations for these, but currently the costmodel doesn't match them
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to reduct the size of D111337. The IfBuilder and the two
utility functions genIsNotNull and genIsNull have been extracted in
a separate patch with dedicated unittests.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: Leporacanthicus
Differential Revision: https://reviews.llvm.org/D111796
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
|
|
|
|
| |
We have backend optimizations for these (like we do for power-of-2 divisions), but currently the costmodel doesn't match them
|
|
|
|
| |
Both ports are required for BitTest ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures and what Intel AoM / Agner reports as well.
|
|
|
|
| |
Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
|
|
|
|
| |
Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
|
|
|
|
| |
Extra 1uop for folded pshufb ops, based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The multiply() implementation is very slow -- it performs six
multiplications in double the bitwidth, which means that it will
typically work on allocated APInts and bypass fast-path
implementations. Add an additional implementation that doesn't
try to produce anything better than a full range if overflow is
possible. At least for the BasicAA use-case, we really don't care
about more precise modeling of overflow behavior. The current
use of multiply() is fine while the implementation is limited to
a single index, but extending it to the multiple-index case makes
the compile-time impact untenable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/9bnKrefcG - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So could pick cost of `40`
For store we have:
https://godbolt.org/z/5s3s14dEY - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So we could pick cost of `40`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111945
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/MTaKboejM - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So could pick cost of `32`
For store we have:
https://godbolt.org/z/v7xPj3Wd4 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `32`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111944
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/11rcvdreP - for intels `Block RThroughput: <=68.0`; for ryzens, `Block RThroughput: <=48.0`
So could pick cost of `68`
For store we have:
https://godbolt.org/z/6aM11fWcP - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `64`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111943
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/s5b6E6jsP - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`
For store we have:
https://godbolt.org/z/efh99d93b - for intels `Block RThroughput: <=48.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `48`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111942
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/YTeT9M7fW - for intels `Block RThroughput: <=212.0`; for ryzens, `Block RThroughput: <=64.0`
So could pick cost of `212`
For store we have:
https://godbolt.org/z/vc954KEGP - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=24.0`
So we could pick cost of `90`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111940
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```
[5.1] 2.21.2 THREADPRIVATE Directive
A variable that appears in a threadprivate directive must be declared in
the scope of a module or have the SAVE attribute, either explicitly or
implicitly.
A variable that appears in a threadprivate directive must not be an
element of a common block or appear in an EQUIVALENCE statement.
```
This patch supports the following checks for DECLARE TARGET Directive:
```
[5.1] 2.14.7 Declare Target Directive
A variable that is part of another variable (as an array, structure
element or type parameter inquiry) cannot appear in a declare
target directive.
A variable that appears in a declare target directive must be declared
in the scope of a module or have the SAVE attribute, either explicitly
or implicitly.
A variable that appears in a declare target directive must not be an
element of a common block or appear in an EQUIVALENCE statement.
```
As Fortran 2018 standard [8.5.16] states, a variable, common block, or
procedure pointer declared in the scoping unit of a main program,
module, or submodule implicitly has the SAVE attribute, which may be
confirmed by explicit specification.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D109864
|
|
|
|
|
|
|
| |
Previously, we reported the same value as for C17, now we report 202000L, which
is the same value currently used by GCC.
Once C23 ships, this value will be bumped to the correct date.
|
| |
|