summaryrefslogtreecommitdiff
path: root/src/compiler/nir/nir_search.h
diff options
context:
space:
mode:
authorEric Anholt <eric@anholt.net>2019-10-02 10:59:13 -0700
committerEric Anholt <eric@anholt.net>2019-11-26 10:13:46 -0800
commitd845dca0f5451331abca250275c3d119f5d98d0b (patch)
treeddc4f35b15f320f8628d1bc120cd8a6e0b8610eb /src/compiler/nir/nir_search.h
parent90ad6304bff0e8ba05261c32a5bc964a803868c8 (diff)
downloadmesa-d845dca0f5451331abca250275c3d119f5d98d0b.tar.gz
nir: Make algebraic backtrack and reprocess after a replacement.
The algebraic pass was exhibiting O(n^2) behavior in dEQP-GLES2.functional.uniform_api.random.3 and dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with other code-generated tests, and likely real-world loop-unroll cases). In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to transform: result = b2f(a == b); result *= b2f(c == d); ... result *= b2f(z == w); -> temp = (a == b) temp = temp && (c == d) ... temp = temp && (z == w) result = b2f(temp); nir_opt_algebraic, proceeding bottom-to-top, would match and convert the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to be matched by the next fmul down on the next time algebraic got run by the optimization loop. Back in 2016 in 7be8d0773229 ("nir: Do opt_algebraic in reverse order."), Matt changed algebraic to go bottom-to-top so that we would match the biggest patterns first. This helped his cases, but I believe introduced this failure mode. Instead of reverting that, now that we've got the automaton, we can update the automaton's state recursively and just re-process any instructions whose state has changed (indicating that they might match new things). There's a small chance that the state will hash to the same value and miss out on this round of algebraic, but this seems to be good enough to fix dEQP. Effects with NIR_VALIDATE=0 (improvement is better with validation enabled): Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling outliers removed) dEQP-GLES2.functional.uniform_api.random.3 runtime -65.3512% +/- 4.22369% (n=21, was 1.4s) dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime -68.8066% +/- 6.49523% (was 4.8s) v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of tricky code. Runtime of uniform_api.random.3 down -0.790299% +/- 0.244213% compred to v1. v3: Re-add the nir_instr_remove() that I accidentally dropped in v2, fixing infinite loops. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Diffstat (limited to 'src/compiler/nir/nir_search.h')
-rw-r--r--src/compiler/nir/nir_search.h4
1 files changed, 3 insertions, 1 deletions
diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
index 9d567f88165..30e8b6ac7f8 100644
--- a/src/compiler/nir/nir_search.h
+++ b/src/compiler/nir/nir_search.h
@@ -29,6 +29,7 @@
#define _NIR_SEARCH_
#include "nir.h"
+#include "nir_worklist.h"
#include "util/u_dynarray.h"
#define NIR_SEARCH_MAX_VARIABLES 16
@@ -202,7 +203,8 @@ nir_replace_instr(struct nir_builder *b, nir_alu_instr *instr,
struct util_dynarray *states,
const struct per_op_table *pass_op_table,
const nir_search_expression *search,
- const nir_search_value *replace);
+ const nir_search_value *replace,
+ nir_instr_worklist *algebraic_worklist);
bool
nir_algebraic_impl(nir_function_impl *impl,
const bool *condition_flags,