summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* [Sema] haveSameParameterTypes - replace repeated isNull() test with assertions5357a98c823aSimon Pilgrim2021-10-181-1/+2
| | | | | | As reported on https://pvs-studio.com/en/blog/posts/cpp/0771/ (Snippet 2) - (and mentioned on rGdc4259d5a38409) we are repeating the T1.isNull() check instead of checking T2.isNull() as well, and at this point neither should be null - so we're better off with an assertion. Differential Revision: https://reviews.llvm.org/D107347
* [DebugInfo] Correctly handle arrays with 0-width elements in GEP salvagingStephen Tozer2021-10-182-7/+22
| | | | | | | | | | | | | | Fixes an issue where GEP salvaging did not properly account for GEP instructions which stepped over array elements of width 0 (effectively a no-op). This unnecessarily produced long expressions by appending `... + (x * 0)` and potentially extended the number of SSA values used in the dbg.value. This also erroneously triggered an assert in the salvage function that the element width would be strictly positive. These issues are resolved by simply ignoring these useless operands. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D111809
* [AArch64][SVE][CodeGen] Add tests for RSHRN{T,B} instructionsPeter Waller2021-10-181-0/+73
| | | | | | Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D111735
* [InstCombine][DebugInfo] Remove superflous assertion, add testPeter Waller2021-10-182-2/+9
| | | | | | | When this code was added, an unnecessary assertion slipped in which we now hit in real code. Add a test to defend against it firing again.
* [AMDGPU] Remove unused VirtRegMap analysis. NFC.Jay Foad2021-10-181-2/+0
|
* [DebugInfo][InstrRef] Avoid a crash during DBG_PHI maintenenceJeremy Morse2021-10-182-13/+30
| | | | | | | | | | | | | | | With D110105, the isDebug flag for register uses is now a proxy for whether the instruction is a debug instruction; that causes DBG_PHIs to have their operands updated by calls to updateDbgUsersToReg, which is the correct behaviour. However: that function only expects to receive DBG_VALUE instructions and asserts such. This patch splits the updating-action into a lambda, and applies it to the appropriate operands for each kind of debug instruction. Tested with an ARM test that stimulates this function: I've added some DBG_PHI instructions that should be updated in the same way as DBG_VALUEs. Differential Revision: https://reviews.llvm.org/D108641
* [lldb] [lldb-server] Refactor ConnectToRemote()Michał Górny2021-10-181-47/+44
| | | | | | | | | | | | | | | Refactor ConnectToRemote() to improve readability and make future changes easier: 1. Replace static buffers with std::string. 2. When handling errors, prefer reporting the actual error over dumb 'connection status is not success'. 3. Move host/port parsing directly into reverse_connection condition that is its only user, and simplify it to make its purpose (verifying that a valid port is provided) clear. 4. Use llvm::errs() and llvm::outs() instead of fprintf(). Differential Revision: https://reviews.llvm.org/D11196
* Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits"Max Kazantsev2021-10-181-34/+56
| | | | | | | This reverts commit fa16329ae0721023376f24c7577b9020d438df1a. See comments in discussion. Merged by mistake, not entirely getting what the problem was.
* [NFC] Remove Block-ABI-Apple.txtShivam Gupta2021-10-181-1/+0
| | | | This file was rewritten in rst format in clang/docs/Block-ABI-Apple.rst
* [lldb][NFC] clang format changeLasse Folger2021-10-182-7/+7
| | | | | | | | clang format on some demangling files Reviewed By: teemperor Differential Revision: https://reviews.llvm.org/D111934
* [lldb] Fix SymbolFilePDBTests for a3939e1Pavel Labath2021-10-181-1/+2
|
* [clang][modules] Delay creating `IdentifierInfo` for names of explicit modulesJan Svoboda2021-10-185-6/+44
| | | | | | | | | | | | When using explicit Clang modules, some declarations might unexpectedly become invisible. This is caused by the mechanism that loads PCM files passed via `-fmodule-file=<path>` and creates an `IdentifierInfo` for the module name. The `IdentifierInfo` creation takes place when the `ASTReader` is in a weird state, with modules that are loaded but not yet set up properly. This patch delays the creation of `IdentifierInfo` until the `ASTReader` is done with reading the PCM. Note that the `-fmodule-file=<name>=<path>` form of the argument doesn't suffer from this issue, since it doesn't create `IdentifierInfo` for the module name. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D111543
* [AMDGPU] Add link to bugJay Foad2021-10-181-0/+1
|
* Fix signed/unsigned comparison after b5426ced71280Jeremy Morse2021-10-181-1/+1
| | | | | | gcc11 warns that this counter causes a signed/unsigned comaprison when it's later compared with a SmallVector::difference_type. gcc appears to be correct, clang does not warn one way or the other.
* Remove the verifyAfter mechanism that was replaced by D111397Jay Foad2021-10-189-50/+38
| | | | Differential Revision: https://reviews.llvm.org/D111872
* Add new MachineFunction property FailsVerificationJay Foad2021-10-1810-1/+37
| | | | | | | | | | | | | | | | | TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets you skip machine verification after a particular pass. Unfortunately this is used in generic code in TargetPassConfig itself to skip verification after a generic pass, only because some previous target- specific pass damaged the MIR on that specific target. This is bad because problems in one target cause lack of verification for all targets. This patch replaces that mechanism with a new MachineFunction property called "FailsVerification" which can be set by (usually target-specific) passes that are known to introduce problems. Later passes can reset it again if they are known to clean up the previous problems. Differential Revision: https://reviews.llvm.org/D111397
* [AMDGPU] Add patterns for i8/i16 local atomic load/storePiotr Sobczak2021-10-187-0/+407
| | | | | | | | | | Add patterns for i8/i16 local atomic load/store. Added tests for new patterns. Copied atomic_[store/load]_local.ll to GlobalISel directory. Differential Revision: https://reviews.llvm.org/D111869
* [AIX][cmake] Set atomics related macros when build with xlclangKai Luo2021-10-181-2/+2
| | | | | | | | Set `HAVE_CXX_ATOMICS_WITHOUT_LIB` or `HAVE_LIBATOMIC` when build LLVM with xlclang. With these macros set, libraries like libLLVMSupport are able to know whether it's necessary to add `-latomic` to dependent system libs. If `HAVE_LIBATOMIC` is set, `llvm-config --system-libs` appends `-latomic` to its output. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D111782
* [SelectionDAG] Fix illegal widening of scalable-vector loadsFraser Cormack2021-10-183-1/+34
| | | | | | | | | | | | | | | | | | | | | | The process of widening simple vector loads attempts to use a load of a wider vector type if the original load is sufficiently aligned to avoid memory faults. However this optimization is only legal when performed on fixed-length vector types. For scalable vector types this is invalid (unless vscale happens to be 1). This patch does increase the likelihood of compiler crashes (from `FindMemType` failing to find a suitable type) but this now better matches how widening non-simple loads, insufficiently-aligned loads, and scalable-vector stores are handled. Patches will be introduced later by which loads and stores can be widened on targets with support for masked or predicated operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111885
* [X86] Prefer VEX encoding in X86 assembler.Luo, Yuanke2021-10-184-18/+16
| | | | | | | | This patch is to order the AVX instructions ahead of AVX512 instructions in the matching table so that the AVX instructions can be matched first. Thanks Craig and Shengchen for the idea. Differential Revision: https://reviews.llvm.org/D111538
* [lldb] [Utility] Remove Status::WasInterrupted() along with its only useMichał Górny2021-10-183-17/+0
| | | | | | | | | | | Remove Status::WasInterrupted() that checks whether the underlying error code matches EINTR. ProcessGDBRemote::ConnectToDebugserver() is its only call site, and it does not seem correct there. After all, EINTR is precisely when we want to retry, not stop retrying. Furthermore, it should not really matter since we should be catching EINTR immediately via llvm::sys::RetryAfterSignal() but that's another story. Differential Revision: https://reviews.llvm.org/D111908
* [AArch64][GISel] Add 8/16 bit uaddo lowering tests.Florian Hahn2021-10-181-0/+872
| | | | Precommit tests for D111888.
* [AMDGPU] Divergence driven selection for fused bitlogicStanislav Mekhanoshin2021-10-183-11/+16
| | | | | | | | | | | | | | | | The change adds divergence predicates for fused logical operations. The problem with selecting a scalar fused op such as S_NOR_B32 is that it does not have a VALU counterpart and will be split in moveToVALU. At the same time it prevents selection of a better opcode on the VALU side (such as V_OR3_B32) which does not have a counterpart on SALU side. XNOR opcodes are left as is and selected as scalar to get advantage of the SIInstrInfo::lowerScalarXnor() code which can commute operations to keep one of two opcodes on SALU if possible. See xnor.ll test for this. Differential Revision: https://reviews.llvm.org/D111907
* Fix bazel build.Adrian Kuegel2021-10-181-0/+2
| | | | | | | This is a temporary fix, better would be to avoid including llvm/Option/ArgList.h from a Support source file. Differential Revision: https://reviews.llvm.org/D111974
* [lldb] Return StringRef from PluginInterface::GetPluginNamePavel Labath2021-10-18205-687/+468
| | | | | | | | | | | | There is no reason why this function should be returning a ConstString. While modifying these files, I also fixed several instances where GetPluginName and GetPluginNameStatic were returning different strings. I am not changing the return type of GetPluginNameStatic in this patch, as that would necessitate additional changes, and this patch is big enough as it is. Differential Revision: https://reviews.llvm.org/D111877
* Fix cyclic header dependency between Support<->Option due to RISCVISAInfoRaphael Isemann2021-10-182-1/+4
| | | | This was introduced in D105168 which added RISCVISAInfo.h.
* [Parse] Improve diagnostic and recovery when there is an extra override in ↵Haojian Wu2021-10-183-0/+34
| | | | | | | | | | | | | | | | | | | | | the outline method definition. The clang behavior was poor before this patch: ``` void B::foo() override {} // Before: clang emited "expcted function body after function // declarator", and skiped all contents until it hits a ";", the // following function f() is discarded. // VS // Now "override is not allowed" with a remove fixit, and following f() // is retained. void f(); ``` Differential Revision: https://reviews.llvm.org/D111883
* [AArch64] Fixed a bug on AArch64MIPeepholeOptJingu Kang2021-10-181-2/+13
| | | | | | | Create new virtual register for the definition of new AND instruction and replace old register by the new one to keep SSA form. Differential Revision: https://reviews.llvm.org/D109963
* [MachineSink] Compile time improvement for large testcases which has many ↵Bing1 Yu2021-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | | kill flags We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags. Before: Number of BasicBlocks:33357 Number of Instructions:162067 Number of Cleared Kill Flags:32869 Time of handling kill flags(ms):1.607509e+05 After: Number of BasicBlocks:33357 Number of Instructions:162067 Number of Cleared Kill Flags:32869 Time of handling kill flags:3.987371e+03 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D111688
* [PowerPC] Implement scheduling model for Power10Qiu Chaofan2021-10-1834-677/+3450
| | | | | | Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D110855
* [JITLink] Add comments, rename types for visitExistingEdges utility.Lang Hames2021-10-171-15/+21
| | | | | The "Fixers" name was a hangover from an earlier draft of the patch. "Visitors" fits the function name(s).
* [NFC] [LoopPeel] Change the way DT is updated for loop exitsMax Kazantsev2021-10-181-56/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. With this NFC we just insert edges from cloned exiting blocks to their exits after peeling each iteration (we accumulate the insertion updates and then after peeling apply the updates to DT). This patch was a part of D110922. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev
* [lldb] Skip target variable test on ASJonas Devlieghere2021-10-171-0/+2
|
* [clang] Use llvm::erase_if (NFC)Kazu Hirata2021-10-1715-71/+37
|
* [CostModel][X86] Add mul by positive/negative power-of-2 constants testsSimon Pilgrim2021-10-171-0/+716
| | | | We have backend optimizations for these, but currently the costmodel doesn't match them
* [fir] Add IfBuilder and utility functionsValentin Clement2021-10-174-0/+197
| | | | | | | | | | | | | | | In order to reduct the size of D111337. The IfBuilder and the two utility functions genIsNotNull and genIsNull have been extracted in a separate patch with dedicated unittests. This patch is part of the upstreaming effort from fir-dev branch. Reviewed By: Leporacanthicus Differential Revision: https://reviews.llvm.org/D111796 Co-authored-by: Jean Perier <jperier@nvidia.com> Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
* [CostModel][X86] Add div/rem by negative power-of-2 constantsSimon Pilgrim2021-10-172-0/+1237
| | | | We have backend optimizations for these (like we do for power-of-2 divisions), but currently the costmodel doesn't match them
* [X86][SLM] Fix BitTest+Set uops + port usageSimon Pilgrim2021-10-172-118/+118
| | | | Both ports are required for BitTest ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures and what Intel AoM / Agner reports as well.
* [X86][SLM] Fix uops for PCMPISTR/PCMPISTR instructionsSimon Pilgrim2021-10-172-12/+12
| | | | Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
* [X86][SLM] Fix uops for PCLMULQDQSimon Pilgrim2021-10-172-3/+3
| | | | Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
* [X86][SLM] +1uop for PSHUFBrm xmmSimon Pilgrim2021-10-172-3/+3
| | | | Extra 1uop for folded pshufb ops, based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
* [ConstantRange] Add fast signed multiplyNikita Popov2021-10-174-1/+39
| | | | | | | | | | | | | The multiply() implementation is very slow -- it performs six multiplications in double the bitwidth, which means that it will typically work on allocated APInts and bypass fast-path implementations. Add an additional implementation that doesn't try to produce anything better than a full range if overflow is possible. At least for the BasicAA use-case, we really don't care about more precise modeling of overflow behavior. The current use of multiply() is fine while the implementation is limited to a single index, but extending it to the multiple-index case makes the compile-time impact untenable.
* [X86][Costmodel] Load/store i64 Stride=4 VF=16 interleaving costsRoman Lebedev2021-10-175-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/9bnKrefcG - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0` So could pick cost of `40` For store we have: https://godbolt.org/z/5s3s14dEY - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0` So we could pick cost of `40`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111945
* [X86][Costmodel] Load/store i64 Stride=2 VF=32 interleaving costsRoman Lebedev2021-10-175-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/MTaKboejM - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0` So could pick cost of `32` For store we have: https://godbolt.org/z/v7xPj3Wd4 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=32.0` So we could pick cost of `32`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111944
* [X86][Costmodel] Load/store i32 Stride=4 VF=32 interleaving costsRoman Lebedev2021-10-178-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/11rcvdreP - for intels `Block RThroughput: <=68.0`; for ryzens, `Block RThroughput: <=48.0` So could pick cost of `68` For store we have: https://godbolt.org/z/6aM11fWcP - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0` So we could pick cost of `64`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111943
* [X86][Costmodel] Load/store i32 Stride=3 VF=32 interleaving costsRoman Lebedev2021-10-177-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/s5b6E6jsP - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0` So could pick cost of `32` For store we have: https://godbolt.org/z/efh99d93b - for intels `Block RThroughput: <=48.0`; for ryzens, `Block RThroughput: <=32.0` So we could pick cost of `48`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111942
* [X86][Costmodel] Load/store i16 Stride=6 VF=32 interleaving costsRoman Lebedev2021-10-173-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/YTeT9M7fW - for intels `Block RThroughput: <=212.0`; for ryzens, `Block RThroughput: <=64.0` So could pick cost of `212` For store we have: https://godbolt.org/z/vc954KEGP - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=24.0` So we could pick cost of `90`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111940
* This patch supports the following checks for THREADPRIVATE Directive:PeixinQiao2021-10-176-2/+445
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ``` [5.1] 2.21.2 THREADPRIVATE Directive A variable that appears in a threadprivate directive must be declared in the scope of a module or have the SAVE attribute, either explicitly or implicitly. A variable that appears in a threadprivate directive must not be an element of a common block or appear in an EQUIVALENCE statement. ``` This patch supports the following checks for DECLARE TARGET Directive: ``` [5.1] 2.14.7 Declare Target Directive A variable that is part of another variable (as an array, structure element or type parameter inquiry) cannot appear in a declare target directive. A variable that appears in a declare target directive must be declared in the scope of a module or have the SAVE attribute, either explicitly or implicitly. A variable that appears in a declare target directive must not be an element of a common block or appear in an EQUIVALENCE statement. ``` As Fortran 2018 standard [8.5.16] states, a variable, common block, or procedure pointer declared in the scoping unit of a main program, module, or submodule implicitly has the SAVE attribute, which may be confirmed by explicit specification. Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D109864
* Bump the value of __STDC_VERSION__ in -std=c2x modeAaron Ballman2021-10-173-1/+12
| | | | | | | Previously, we reported the same value as for C17, now we report 202000L, which is the same value currently used by GCC. Once C23 ships, this value will be bumped to the correct date.
* [InstCombine] Add some extra tests for truncated saturates. NFCDavid Green2021-10-171-0/+585
|