summaryrefslogtreecommitdiff
path: root/gcc/config
diff options
context:
space:
mode:
authorwschmidt <wschmidt@138bc75d-0d04-0410-961f-82ee72b054a4>2014-10-06 15:27:32 +0000
committerwschmidt <wschmidt@138bc75d-0d04-0410-961f-82ee72b054a4>2014-10-06 15:27:32 +0000
commit671b0e9370ecc06d81d49aa97393aee93d36853e (patch)
treeffd14bfdb460ec5093d708d3a41bc7b2d65053fd /gcc/config
parent1b7bd30b5d339d2c23ffb7975a9cee0be35869fb (diff)
downloadgcc-671b0e9370ecc06d81d49aa97393aee93d36853e.tar.gz
2014-10-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (analyze_swaps commentary): Add discussion of permutes and why we don't handle them. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@215951 138bc75d-0d04-0410-961f-82ee72b054a4
Diffstat (limited to 'gcc/config')
-rw-r--r--gcc/config/rs6000/rs6000.c47
1 files changed, 47 insertions, 0 deletions
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 489c65ecb98..8b35a0416d1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -33431,6 +33431,53 @@ emit_fusion_gpr_load (rtx target, rtx mem)
than deleting a swap, we convert the load/store into a permuting
load/store (which effectively removes the swap). */
+/* Notes on Permutes
+
+ We do not currently handle computations that contain permutes. There
+ is a general transformation that can be performed correctly, but it
+ may introduce more expensive code than it replaces. To handle these
+ would require a cost model to determine when to perform the optimization.
+ This commentary records how this could be done if desired.
+
+ The most general permute is something like this (example for V16QI):
+
+ (vec_select:V16QI (vec_concat:V32QI (op1:V16QI) (op2:V16QI))
+ (parallel [(const_int a0) (const_int a1)
+ ...
+ (const_int a14) (const_int a15)]))
+
+ where a0,...,a15 are in [0,31] and select elements from op1 and op2
+ to produce in the result.
+
+ Regardless of mode, we can convert the PARALLEL to a mask of 16
+ byte-element selectors. Let's call this M, with M[i] representing
+ the ith byte-element selector value. Then if we swap doublewords
+ throughout the computation, we can get correct behavior by replacing
+ M with M' as follows:
+
+ { M[i+8]+8 : i < 8, M[i+8] in [0,7] U [16,23]
+ M'[i] = { M[i+8]-8 : i < 8, M[i+8] in [8,15] U [24,31]
+ { M[i-8]+8 : i >= 8, M[i-8] in [0,7] U [16,23]
+ { M[i-8]-8 : i >= 8, M[i-8] in [8,15] U [24,31]
+
+ This seems promising at first, since we are just replacing one mask
+ with another. But certain masks are preferable to others. If M
+ is a mask that matches a vmrghh pattern, for example, M' certainly
+ will not. Instead of a single vmrghh, we would generate a load of
+ M' and a vperm. So we would need to know how many xxswapd's we can
+ remove as a result of this transformation to determine if it's
+ profitable; and preferably the logic would need to be aware of all
+ the special preferable masks.
+
+ Another form of permute is an UNSPEC_VPERM, in which the mask is
+ already in a register. In some cases, this mask may be a constant
+ that we can discover with ud-chains, in which case the above
+ transformation is ok. However, the common usage here is for the
+ mask to be produced by an UNSPEC_LVSL, in which case the mask
+ cannot be known at compile time. In such a case we would have to
+ generate several instructions to compute M' as above at run time,
+ and a cost model is needed again. */
+
/* This is based on the union-find logic in web.c. web_entry_base is
defined in df.h. */
class swap_web_entry : public web_entry_base