diff options
author | Eric Anholt <eric@anholt.net> | 2019-10-02 10:59:13 -0700 |
---|---|---|
committer | Eric Anholt <eric@anholt.net> | 2019-11-26 10:13:46 -0800 |
commit | d845dca0f5451331abca250275c3d119f5d98d0b (patch) | |
tree | ddc4f35b15f320f8628d1bc120cd8a6e0b8610eb /src/compiler/nir/nir_search.h | |
parent | 90ad6304bff0e8ba05261c32a5bc964a803868c8 (diff) | |
download | mesa-d845dca0f5451331abca250275c3d119f5d98d0b.tar.gz |
nir: Make algebraic backtrack and reprocess after a replacement.
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:
result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);
->
temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);
nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.
Back in 2016 in 7be8d0773229 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first. This helped his cases, but I
believe introduced this failure mode. Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things). There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.
Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):
Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
-65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
-68.8066% +/- 6.49523% (was 4.8s)
v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
tricky code. Runtime of uniform_api.random.3 down -0.790299% +/-
0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
fixing infinite loops.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Diffstat (limited to 'src/compiler/nir/nir_search.h')
-rw-r--r-- | src/compiler/nir/nir_search.h | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h index 9d567f88165..30e8b6ac7f8 100644 --- a/src/compiler/nir/nir_search.h +++ b/src/compiler/nir/nir_search.h @@ -29,6 +29,7 @@ #define _NIR_SEARCH_ #include "nir.h" +#include "nir_worklist.h" #include "util/u_dynarray.h" #define NIR_SEARCH_MAX_VARIABLES 16 @@ -202,7 +203,8 @@ nir_replace_instr(struct nir_builder *b, nir_alu_instr *instr, struct util_dynarray *states, const struct per_op_table *pass_op_table, const nir_search_expression *search, - const nir_search_value *replace); + const nir_search_value *replace, + nir_instr_worklist *algebraic_worklist); bool nir_algebraic_impl(nir_function_impl *impl, const bool *condition_flags, |