diff options
author | Andy Polyakov <appro@openssl.org> | 2017-07-20 09:48:35 +0200 |
---|---|---|
committer | Andy Polyakov <appro@openssl.org> | 2017-07-21 14:07:32 +0200 |
commit | 64d92d74985ebb3d0be58a9718f9e080a14a8e7f (patch) | |
tree | 036456c8d139587371300824d273d1c500411d1a /crypto/x86_64cpuid.pl | |
parent | bbb4ceb86eb6ea0300f744443c36fb6e980fff9d (diff) | |
download | openssl-new-64d92d74985ebb3d0be58a9718f9e080a14a8e7f.tar.gz |
x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.
AVX-512 results cover even Skylake-X :-)
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Rich Salz <rsalz@openssl.org>
Diffstat (limited to 'crypto/x86_64cpuid.pl')
-rw-r--r-- | crypto/x86_64cpuid.pl | 17 |
1 files changed, 16 insertions, 1 deletions
diff --git a/crypto/x86_64cpuid.pl b/crypto/x86_64cpuid.pl index 2467af7e9e..a9f93bb2cf 100644 --- a/crypto/x86_64cpuid.pl +++ b/crypto/x86_64cpuid.pl @@ -145,8 +145,19 @@ OPENSSL_ia32_cpuid: or \$0x40000000,%edx # set reserved bit#30 on Intel CPUs and \$15,%ah cmp \$15,%ah # examine Family ID - jne .Lnotintel + jne .LnotP4 or \$0x00100000,%edx # set reserved bit#20 to engage RC4_CHAR +.LnotP4: + cmp \$6,%ah + jne .Lnotintel + and \$0x0ffff0f0,%eax + cmp \$0x00050670,%eax # Knights Landing + je .Lknights + cmp \$0x00080650,%eax # Knights Mill (according to sde) + jne .Lnotintel +.Lknights: + and \$0xfbffffff,%ecx # clear XSAVE flag to mimic Silvermont + .Lnotintel: bt \$28,%edx # test hyper-threading bit jnc .Lgeneric @@ -171,6 +182,10 @@ OPENSSL_ia32_cpuid: mov \$7,%eax xor %ecx,%ecx cpuid + bt \$26,%r9d # check XSAVE bit, cleared on Knights + jc .Lnotknights + and \$0xfff7ffff,%ebx # clear ADCX/ADOX flag +.Lnotknights: mov %ebx,8(%rdi) # save extended feature flags .Lno_extended_info: |