summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Save VGPR of whole wave when spillingdev-main-updateSebastian Neubauer2021-04-1214-611/+1587
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes of the VGPR. We even need to save the VGPR if it is marked as dead. The generated code depends on two things: - Can we scavenge an SGPR to save EXEC? - And can we scavenge a VGPR? If we can scavenge an SGPR, we - save EXEC into the SGPR - set the needed lane mask - save the temporary VGPR - write the spilled SGPR into VGPR lanes - save the VGPR again to the target stack slot - restore the VGPR - restore EXEC If we were not able to scavenge an SGPR, we do the same operations, but everytime the temporary VGPR is written to memory, we - write VGPR to memory - flip exec (s_not exec, exec) - write VGPR again (previously inactive lanes) Surprisingly often, we are able to scavenge an SGPR, even though we are at the brink of running out of SGPRs. Scavenging a VGPR does not have a great effect (saves three instructions if no SGPR was scavenged), but we need to know if the VGPR we use is live before or not, otherwise the machine verifier complains. Differential Revision: https://reviews.llvm.org/D96336
* [OpenCL] Accept .rgba in OpenCL 3.0Sven van Haastregt2021-04-123-9/+17
| | | | | | | | | | | | | | The .rgba vector component accessors are supported in OpenCL C 3.0. Previously, the diagnostic would check `OpenCLVersion` for version 2.2 (value 220) and report those accessors are an OpenCL 2.2 feature. However, there is no "OpenCL C version 2.2", so change the check and diagnostic text to 3.0 only. A spurious `OpenCLVersion` argument was passed into the diagnostic; remove that. Differential Revision: https://reviews.llvm.org/D99969
* [AArch64] Adds memory operands for indexed loads.Stelios Ioannou2021-04-122-70/+69
| | | | | | | | | This patch adds the memory operands for indexed loads so that certain optimizations can take place. Differential Revision: https://reviews.llvm.org/D100215/ Change-Id: I539fcf046ca4ad1e7df1d893f57d751419d8364d
* [DebugInfo] Fix the mismatching between C++ language tags and Dwarf versions.Esme-Yi2021-04-123-3/+16
| | | | | | | | Summary: The tags DW_LANG_C_plus_plus_14 and DW_LANG_C_plus_plus_11, introduced in Dwarf-5, are unexpected in previous versions. Fixing the mismathing doesn't have any drawbacks for any other debuggers, but helps dbx. Reviewed By: aprantl, shchenz Differential Revision: https://reviews.llvm.org/D99250
* [clang][AST] Handle overload callee type in CallExpr::getCallReturnType.Balázs Kéri2021-04-122-0/+74
| | | | | | | | | | | | | | The function did not handle every case. In some cases this caused assertion failure. After the fix the function returns DependentTy if the exact return type can not be determined. It seems that clang itself does not call the function in the affected cases but some checker or other code may call it. Reviewed By: hokein Differential Revision: https://reviews.llvm.org/D95244
* [NFC][Debug] Fix unnecessary deep-copy for vector to save compiling timeZhang Qing Shan2021-04-121-5/+5
| | | | | | | | | | | | | | We saw some big compiling time impact after enabling the debug entry value feature for X86 platform(D73534). Compiling time goes from 900s->1600s with our testcase. It is caused by allocating/freeing the memory busily. 'using FwdRegWorklist = MapVector<unsigned, SmallVector<FwdRegParamInfo, 2>>;' The value for this map is vector, and we miss the reference when access the element. The same happens for `auto CalleesMap = MF->getCallSitesInfo();` which is a DenseMap. Reviewed by: djtodoro, flychen50 Differential Revision: https://reviews.llvm.org/D100162
* [libtooling][clang-tidy] Fix compiler warnings in testcase [NFC]Mikael Holmen2021-04-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without the fix we get: 06:31:09 In file included from ../../clang-tools-extra/unittests/clang-tidy/ClangTidyDiagnosticConsumerTest.cpp:3: 06:31:09 ../utils/unittest/googletest/include/gtest/gtest.h:1392:11: error: comparison of integers of different signs: 'const int' and 'const unsigned int' [-Werror,-Wsign-compare] 06:31:09 if (lhs == rhs) { 06:31:09 ~~~ ^ ~~~ 06:31:09 ../utils/unittest/googletest/include/gtest/gtest.h:1421:12: note: in instantiation of function template specialization 'testing::internal::CmpHelperEQ<int, unsigned int>' requested here 06:31:09 return CmpHelperEQ(lhs_expression, rhs_expression, lhs, rhs); 06:31:09 ^ 06:31:09 ../../clang-tools-extra/unittests/clang-tidy/ClangTidyDiagnosticConsumerTest.cpp:60:3: note: in instantiation of function template specialization 'testing::internal::EqHelper<false>::Compare<int, unsigned int>' requested here 06:31:09 EXPECT_EQ(4, Errors[0].Message.FileOffset); 06:31:09 ^ 06:31:09 ../utils/unittest/googletest/include/gtest/gtest.h:1924:63: note: expanded from macro 'EXPECT_EQ' 06:31:09 EqHelper<GTEST_IS_NULL_LITERAL_(val1)>::Compare, \ 06:31:09 ^ 06:31:09 ../utils/unittest/googletest/include/gtest/gtest.h:1392:11: error: comparison of integers of different signs: 'const int' and 'const unsigned long' [-Werror,-Wsign-compare] 06:31:09 if (lhs == rhs) { 06:31:09 ~~~ ^ ~~~ 06:31:09 ../utils/unittest/googletest/include/gtest/gtest.h:1421:12: note: in instantiation of function template specialization 'testing::internal::CmpHelperEQ<int, unsigned long>' requested here 06:31:09 return CmpHelperEQ(lhs_expression, rhs_expression, lhs, rhs); 06:31:09 ^ 06:31:09 ../../clang-tools-extra/unittests/clang-tidy/ClangTidyDiagnosticConsumerTest.cpp:64:3: note: in instantiation of function template specialization 'testing::internal::EqHelper<false>::Compare<int, unsigned long>' requested here 06:31:09 EXPECT_EQ(1, Errors[0].Message.Ranges.size()); 06:31:09 ^ 06:31:09 ../utils/unittest/googletest/include/gtest/gtest.h:1924:63: note: expanded from macro 'EXPECT_EQ' 06:31:09 EqHelper<GTEST_IS_NULL_LITERAL_(val1)>::Compare, \ 06:31:09 ^ 06:31:09 2 errors generated.
* [NFC] [Clang]: fix spelling mistake in assert messageJim Lin2021-04-121-1/+1
| | | | | | Reviewed By: Jim Differential Revision: https://reviews.llvm.org/D71541
* fix typo in a CMake SANITIZER_CAN_USE_CXXABI variable initial definitionJim Lin2021-04-121-1/+1
| | | | | | | | | The current variable name isn't used anywhere else, which indicates it's a typo. Let's fix it before someone copy+pastes it somewhere else. Reviewed By: Jim Differential Revision: https://reviews.llvm.org/D39157
* [X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operationBing1 Yu2021-04-122-45/+333
| | | | | | Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244
* [NARY] Don't optimize min/max if there are side usesEvgeniy Brevnov2021-04-122-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Say we have %1=min(%a,%b) %2=min(%b,%c) %3=min(%2,%a) The optimization will try to reassociate the later one so that we can rewrite it to %3=min(%1, %c) and remove %2. But if %2 has another uses outside of %3 then we can't remove %2 and end up with: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1, %c) This doesn't harm by itself except it is not profitable and changes IR for no good reason. What is bad it triggers next iteration which finds out that optimization is applicable to %2 and %3 and generates: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1,%c) %4=min(%2,%a) and so on... The solution is to prevent optimization in the first place if intermediate result (%2) has side uses and known to be not removed. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D100170
* [X86] Remove FeatureCLWB from FeaturesICLClientFreddy Ye2021-04-124-8/+9
| | | | | | Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100279
* [lld-macho][nfc] Convert tabs to spacesJez Ng2021-04-1115-71/+71
|
* [Debug-Info] make fortran CHARACTER(1) type as valid unsigned typeChen Zheng2021-04-112-0/+54
| | | | | | | | This resolves https://bugs.llvm.org/show_bug.cgi?id=49872 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D100015
* [Clang][Coroutine][DebugInfo] In c++ coroutine, clang will emit different ↵yifeng.dongyifeng2021-04-127-13/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | debug info variables for parameters and move-parameters. The first one is the real parameters of the coroutine function, the other one just for copying parameters to the coroutine frame. Considering the following c++ code: ``` struct coro { ... }; coro foo(struct test & t) { ... co_await suspend_always(); ... co_await suspend_always(); ... co_await suspend_always(); } int main(int argc, char *argv[]) { auto c = foo(...); c.handle.resume(); ... } ``` Function foo is the standard coroutine function, and it has only one parameter named t (ignoring this at first), when we use the llvm code to compile this function, we can get the following ir: ``` !2921 = distinct !DISubprogram(name: "foo", linkageName: "_ZN6Object3fooE4test", scope: !2211, file: !45, li\ ne: 48, type: !2329, scopeLine: 48, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefi\ nition | DISPFlagOptimized, unit: !44, declaration: !2328, retainedNodes: !2922) !2924 = !DILocalVariable(name: "t", arg: 2, scope: !2921, file: !45, line: 48, type: !838) ... !2926 = !DILocalVariable(name: "t", scope: !2921, type: !838, flags: DIFlagArtificial) ``` We can find there are two `the same` DIVariable named t in the same dwarf scope for foo.resume. And when we try to use llvm-dwarfdump to dump the dwarf info of this elf, we get the following output: ``` 0x00006684: DW_TAG_subprogram DW_AT_low_pc (0x00000000004013a0) DW_AT_high_pc (0x00000000004013a8) DW_AT_frame_base (DW_OP_reg7 RSP) DW_AT_object_pointer (0x0000669c) DW_AT_GNU_all_call_sites (true) DW_AT_specification (0x00005b5c "_ZN6Object3fooE4test") 0x000066a5: DW_TAG_formal_parameter DW_AT_name ("t") DW_AT_decl_file ("/disk1/yifeng.dongyifeng/my_code/llvm/build/bin/coro-debug-1.cpp") DW_AT_decl_line (48) DW_AT_type (0x00004146 "test") 0x000066ba: DW_TAG_variable DW_AT_name ("t") DW_AT_type (0x00004146 "test") DW_AT_artificial (true) ``` The elf also has two 't' in the same scope. But unluckily, it might let the debugger confused. And failed to print parameters for O0 or above. This patch will make coroutine parameters and move parameters use the same DIVar and try to fix the problems that I mentioned before. Test Plan: check-clang Reviewed By: aprantl, jmorse Differential Revision: https://reviews.llvm.org/D97533
* [PowerPC] Lower f128 SETCC/SELECT_CC as libcall if p9vector disabledQiu Chaofan2021-04-124-318/+313
| | | | | | | | | XSCMPUQP is not available for pre-P9 subtargets. This patch will lower them into libcall for correct behavior on power7/power8. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92083
* [RISCV][Clang] Add some RVV Permutation intrinsic functions.Zakk Chen2021-04-1115-0/+22700
| | | | | | | | | | | | | | Support the following instructions. 1. Vector Slide Instructions 2. Vector Register Gather Instructions 3. Vector Compress Instruction Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100127
* [RISCV][Clang] Add all RVV Mask intrinsic functions.Zakk Chen2021-04-1131-10/+5871
| | | | | | | | | | | | | | | 1. Redefine vpopc and vfirst IR intrinsic so it could adapt on clang tablegen generator which always appends a type for vl in IntrinsicType of clang codegen. 2. Remove `c` type transformer and add `u` and `l` for unsigned long and long type. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100120
* [RISCV][Clang] Add more RVV load/store intrinsic functions.Zakk Chen2021-04-1112-265/+33246
| | | | | | | | | | | | | | Support the following instructions. 1. Mask load and store 2. Vector Strided Instructions 3. Vector Indexed Store Instructions Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99965
* [RISCV][Clang] Add all RVV Reduction intrinsic functions.Zakk Chen2021-04-1124-0/+21760
| | | | | | | | | Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99964
* [RISCV][Clang] Add RVV merge intrinsic functions.Zakk Chen2021-04-113-3/+2943
| | | | | | | | | Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99963
* [RISCV][Clang] Add RVV Type-Convert intrinsic functions.Zakk Chen2021-04-118-4/+8515
| | | | | | | | | | | | | Fix extension macro condition. Support below instructions: 1. Single-Width Floating-Point/Integer Type-Convert Instructions 2. Widening Floating-Point/Integer Type-Convert Instructions 3. Narrowing Floating-Point/Integer Type-Convert Instructions Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99742
* [RISCV][Clang] Add some RVV Floating-Point intrinsic functions.Zakk Chen2021-04-1111-5/+2493
| | | | | | | | | | | Support vfclass, vfmerge, vfrec7, vfrsqrt7, vfsqrt instructions. Reviewed By: craig.topper Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Differential Revision: https://reviews.llvm.org/D99741
* [RISCV][Clang] Add more RVV Floating-Point intrinsic functions.Zakk Chen2021-04-1143-31/+19034
| | | | | | | | | | | | | | | | Support below instructions. 1. Vector Widening Floating-Point Add/Subtract Instructions 2. Vector Widening Floating-Point Multiply 3. Vector Single-Width Floating-Point Fused Multiply-Add Instructions 4. Vector Widening Floating-Point Fused Multiply-Add Instructions 5. Vector Floating-Point Compare Instructions Reviewed By: craig.topper, HsiangKai Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Differential Revision: https://reviews.llvm.org/D99669
* [RISCV][Clang] Add some RVV Floating-Point intrinsic functions.Zakk Chen2021-04-1117-0/+9926
| | | | | | | | | | | | | | | Support the following instructions which have the same class. 1. Vector Single-Width Floating-Point Subtract Instructions 2. Vector Single-Width Floating-Point Multiply/Divide Instructions 3. Vector Floating-Point MIN/MAX Instructions 4. Vector Floating-Point Sign-Injection Instructions Reviewed By: craig.topper Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Differential Revision: https://reviews.llvm.org/D99668
* [RISCV][Clang] Add RVV Widening Integer Add/Subtract intrinsic functions.Zakk Chen2021-04-115-1/+14170
| | | | | | | | | Reviewed By: craig.topper Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Differential Revision: https://reviews.llvm.org/D99526
* [RISCV][NFC] Remove unneeded explict XLenVT type on codegen patternsJim Lin2021-04-121-106/+106
| | | | | | | | Customized SDNode has been specified the explict XLenVT type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100190
* [RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 ↵Craig Topper2021-04-111-3/+3
| | | | | | | byte aligned. According to the 0.10 spec, VLEN is at least 128 bits and is a power of 2.
* [RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) ↵Craig Topper2021-04-116-113/+113
| | | | | | | custom isel on RV64. We don't need the sign extending behavior here and SLLI/SRLI are able to compress to C.SLLI/C.SRLI.
* [NFCI][SimplifyCFG] PerformValueComparisonIntoPredecessorFolding(): improve ↵Roman Lebedev2021-04-111-2/+7
| | | | | | Dominator Tree updating Same as with previous patches.
* [NFCI][SimplifyCFG] mergeEmptyReturnBlocks(): improve Dominator Tree updatingRoman Lebedev2021-04-111-3/+7
| | | | Same as with previous patches.
* [NFCI][Local] MergeBasicBlockIntoOnlyPred(): improve Dominator Tree updatingRoman Lebedev2021-04-111-5/+8
| | | | Same as with TryToSimplifyUncondBranchFromEmptyBlock()/MergeBlockIntoPredecessor() patch.
* [NFCI][BasicBlockUtils] MergeBlockIntoPredecessor(): improve Dominator Tree ↵Roman Lebedev2021-04-111-8/+10
| | | | | | updating Same as with TryToSimplifyUncondBranchFromEmptyBlock() patch.
* [NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): improve Dominator ↵Roman Lebedev2021-04-111-7/+8
| | | | | | | | | | | | | | Tree updating First, we don't need vector-ness for the predecessor lists. Secondly, like elsewhere, do insertions before deletions. Lastly, the check that we actually need to insert an edge, that it doesn't exist already, is backwards. Instead of looking at successors of every single 'PredOfBB', just always look at predecessors of the 'Succ'. The result is always the same, but we avoid *really* inefficient code.
* [NFCI][DomTreeUpdater] applyUpdates(): reserve space for updates firstRoman Lebedev2021-04-111-0/+1
| | | | | | While, indeed, we may end up pushing less updates that we'd reserve space for, self-dominating updates aren't often enough for that to matter. But this should matter for normal updates.
* [LoopUnroll] Add AArch64 test case with large vector ops.Florian Hahn2021-04-111-0/+92
| | | | | Add test case to illustrate over-eager unrolling on AArch64, due to the cost-model not estimating the size of vector loads/stores accurately.
* [VectorCombine] Add tests for load/extract scalarization.Florian Hahn2021-04-111-0/+307
| | | | Add tests where scalarizing a vector load + extract is profitable.
* [X86][AVX512] Fold not(kmov(x)) -> kmov(not(x)) and not(widen_subvector(x)) ↵Simon Pilgrim2021-04-113-50/+48
| | | | | | -> widen_subvector(not(x)) Improve AVX512 mask inversion, rG38c799bce801 exposed some missing opportunities to move scalar not() back onto the boolvector types for folding with setcc etc.
* [WebAssembly] Update v128.any_trueThomas Lively2021-04-115-40/+48
| | | | | | | | In the final SIMD spec, there is only a single v128.any_true instruction, rather than one for each lane interpretation because the semantics do not depend on the lane interpretation. Differential Revision: https://reviews.llvm.org/D100241
* [X86] combineXor - Pull out repeated getOperand() calls. NFCI.Simon Pilgrim2021-04-111-11/+12
|
* [X86] Fold cmpeq/ne(and(X,Y),Y) --> cmpeq/ne(and(~X,Y),0)Simon Pilgrim2021-04-115-65/+84
| | | | | | | | Followup to D100177, handle an similar (demorgan inverse style) case from PR47797 as well The AVX512 test cases could be further improved if we folded not(iX bitcast(vXi1)) -> (iX bitcast(not(vXi1))) Alive2: https://alive2.llvm.org/ce/z/AnA_-W
* [RISCV] Drop earlyclobber constraint from vwadd(u).wx, vwsub(u).wx, ↵Craig Topper2021-04-1113-316/+158
| | | | | | | | | | | | | | | vfwadd.wf and vfwsub.wf. The first source has the same EEW as the destination and the other source is a scalar so the overlap constraints don't apply to the unmasked version. For the masked version we have a constraint that the destination can't be V0 so that covers the only overlap issue there. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D100217
* [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when ↵Craig Topper2021-04-115-13/+23
| | | | | | | | | zext.h is supported. Similar to what we do for zext.w. Disable the (srl (and X, 0xffff), C) custom isel when zext.h is available.
* [RISCV] Add i8 and i16 srli and srai tests to Zbb/Zbp test files. NFCCraig Topper2021-04-112-0/+224
| | | | | | | | | These require the input to be zero or sign extended. If we have sext.b, sext.h or zext.h instructions we can use them. Otherwise we need to use a pair of shifts to accomplish the zero/sign extend and the final shift. We don't currently use zext.h when it is available.
* [InstCombine] Improve "get low bit mask upto and including bit X" patternRoman Lebedev2021-04-112-27/+31
| | | | https://alive2.llvm.org/ce/z/3u-48R
* [NFC][InstCombine] Add tests for "get low bit mask upto and including bit X" ↵Roman Lebedev2021-04-111-0/+284
| | | | pattern
* [InstCombine] (X | Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is ↵Roman Lebedev2021-04-113-5/+10
| | | | | | actually an add https://alive2.llvm.org/ce/z/Coc5yf
* [NFC][InstCombine] Add a few test of adding to add-like orRoman Lebedev2021-04-111-0/+39
|
* [NFC][LoopVectorize] Autogenerate interleaved-accesses.llRoman Lebedev2021-04-111-36/+36
|
* [LoopIdiom] left-shift-until-bittest: set all allowed no-wrap flags on add/subRoman Lebedev2021-04-112-113/+119
| | | | | I've checked each one of these with alive2, and this is both correct and precise.