summaryrefslogtreecommitdiff
path: root/runtimes
diff options
context:
space:
mode:
authorpvanhout <pierre.vanhoutryve@amd.com>2023-02-17 11:18:36 +0100
committerpvanhout <pierre.vanhoutryve@amd.com>2023-03-28 09:33:12 +0200
commit6b971325e9eac315b83aa474c01da85b81062d17 (patch)
tree392a2659d50e785a280596cbbfae4b081ceb6818 /runtimes
parenta7dcf39f096a98cd1694d98f6f6b46eaade34a0b (diff)
downloadllvm-6b971325e9eac315b83aa474c01da85b81062d17.tar.gz
[AMDGPU] Fold more AGPR copies/PHIs in SIFoldOperands
Generalize `tryFoldLCSSAPhi` into `tryFoldPhiAGPR` which works on any kind of PHI node (not just LCSSA ones) and attempts to create AGPR Phis more aggressively. Also adds a GFX908-only "cleanup" function `tryOptimizeAGPRPhis` which tries to minimize AGPR to AGPR copies on GFX908, which doesn't have a ACCVGPR MOV instruction (so AGPR-AGPR copies become 2 or 3 instructions as they need a VGPR temp). The reason why this is needed is because D143731 + the new `tryFoldPhiAGPR` may create a lot more PHIs (one 32xfloat PHI becomes 32 float phis), and if each PHI hits the same AGPR (like in `test_mfma_loop_agpr_init`) they will be lowered to 32 copies from the same AGPR, which will each become 2-3 instructions. Creating a VGPR cache in this case prevents all those copies from being generated (we have AGPR-VGPR copies instead which are trivial). This is a prepation patch intended to prevent regressions in D143731 when AGPRs are involved. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144099
Diffstat (limited to 'runtimes')
0 files changed, 0 insertions, 0 deletions