summaryrefslogtreecommitdiff
path: root/crypto/x86_64cpuid.pl
Commit message (Collapse)AuthorAgeFilesLines
* Update copyright yearMatt Caswell2021-04-081-1/+1
| | | | | Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/14801)
* Dual 1024-bit exponentiation optimization for Intel IceLake CPUAndrey Matyukov2021-03-221-1/+1
| | | | | | | | | | with AVX512_IFMA + AVX512_VL instructions, primarily for RSA CRT private key operations. It uses 256-bit registers to avoid CPU frequency scaling issues. The performance speedup for RSA2k signature on ICL is ~2x. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/pull/13750)
* Update copyright yearMatt Caswell2020-04-231-1/+1
| | | | | Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/11616)
* Also check for errors in x86_64-xlate.pl.David Benjamin2020-02-171-1/+1
| | | | | | | | | | | | | | In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude the perlasm drivers since they aren't opening pipes and do not particularly need it, but I only noticed x86_64-xlate.pl, so arm-xlate.pl and ppc-xlate.pl got the change. That seems to have been fine, so be consistent and also apply the change to x86_64-xlate.pl. Checking for errors is generally a good idea. Reviewed-by: Richard Levitte <levitte@openssl.org> Reviewed-by: David Benjamin <davidben@google.com> (Merged from https://github.com/openssl/openssl/pull/10930)
* x86_64: Add endbranch at function entries for Intel CETH.J. Lu2020-02-151-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | To support Intel CET, all indirect branch targets must start with endbranch. Here is a patch to add endbranch to function entries in x86_64 assembly codes which are indirect branch targets as discovered by running openssl testsuite on Intel CET machine and visual inspection. Verified with $ CC="gcc -Wl,-z,cet-report=error" ./Configure shared linux-x86_64 -fcf-protection $ make $ make test and $ CC="gcc -mx32 -Wl,-z,cet-report=error" ./Configure shared linux-x32 -fcf-protection $ make $ make test # <<< passed with https://github.com/openssl/openssl/pull/10988 Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/10982)
* Do not silently truncate files on perlasm errorsDavid Benjamin2020-01-221-1/+1
| | | | | | | | | | | | | | | If one of the perlasm xlate drivers crashes, OpenSSL's build will currently swallow the error and silently truncate the output to however far the driver got. This will hopefully fail to build, but better to check such things. Handle this by checking for errors when closing STDOUT (which is a pipe to the xlate driver). Reviewed-by: Richard Levitte <levitte@openssl.org> Reviewed-by: Tim Hudson <tjh@openssl.org> Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> (Merged from https://github.com/openssl/openssl/pull/10883)
* Fix unwind info for some trivial functionsBernd Edlinger2019-12-181-0/+16
| | | | | | | | | | | | While stack unwinding works with gdb here, the function _Unwind_Backtrace gives up when something outside .cfi_startproc/.cfi_endproc is found in the call stack, like OPENSSL_cleanse, OPENSSL_atomic_add, OPENSSL_rdtsc, CRYPTO_memcmp and other trivial functions which don't save anything in the stack. Reviewed-by: Richard Levitte <levitte@openssl.org> Reviewed-by: Kurt Roeckx <kurt@roeckx.be> (Merged from https://github.com/openssl/openssl/pull/10635)
* Unify all assembler file generatorsRichard Levitte2019-09-161-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They now generally conform to the following argument sequence: script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \ $(PROCESSOR) <output file> However, in the spirit of being able to use these scripts manually, they also allow for no argument, or for only the flavour, or for only the output file. This is done by only using the last argument as output file if it's a file (it has an extension), and only using the first argument as flavour if it isn't a file (it doesn't have an extension). While we're at it, we make all $xlate calls the same, i.e. the $output argument is always quoted, and we always die on error when trying to start $xlate. There's a perl lesson in this, regarding operator priority... This will always succeed, even when it fails: open FOO, "something" || die "ERR: $!"; The reason is that '||' has higher priority than list operators (a function is essentially a list operator and gobbles up everything following it that isn't lower priority), and since a non-empty string is always true, so that ends up being exactly the same as: open FOO, "something"; This, however, will fail if "something" can't be opened: open FOO, "something" or die "ERR: $!"; The reason is that 'or' has lower priority that list operators, i.e. it's performed after the 'open' call. Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/pull/9884)
* Following the license change, modify the boilerplates in crypto/Richard Levitte2018-12-061-1/+1
| | | | | | | [skip ci] Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/pull/7827)
* {arm64|x86_64}cpuid.pl: add special 16-byte case to OPENSSL_memcmp.Andy Polyakov2018-06-031-0/+12
| | | | | | | | | | OPENSSL_memcmp is a must in GCM decrypt and general-purpose loop takes quite a portion of execution time for short inputs, more than GHASH for few-byte inputs according to profiler. Special 16-byte case takes it off top five list in profiler output. Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/6312)
* Fix issues in ia32 RDRAND asm leading to reduced entropyBryan Donlan2018-03-081-17/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes two issues in the ia32 RDRAND assembly code that result in a (possibly significant) loss of entropy. The first, less significant, issue is that, by returning success as 0 from OPENSSL_ia32_rdrand() and OPENSSL_ia32_rdseed(), a subtle bias was introduced. Specifically, because the assembly routine copied the remaining number of retries over the result when RDRAND/RDSEED returned 'successful but zero', a bias towards values 1-8 (primarily 8) was introduced. The second, more worrying issue was that, due to a mixup in registers, when a buffer that was not size 0 or 1 mod 8 was passed to OPENSSL_ia32_rdrand_bytes or OPENSSL_ia32_rdseed_bytes, the last (n mod 8) bytes were all the same value. This issue impacts only the 64-bit variant of the assembly. This change fixes both issues by first eliminating the only use of OPENSSL_ia32_rdrand, replacing it with OPENSSL_ia32_rdrand_bytes, and fixes the register mixup in OPENSSL_ia32_rdrand_bytes. It also adds a sanity test for OPENSSL_ia32_rdrand_bytes and OPENSSL_ia32_rdseed_bytes to help catch problems of this nature in the future. Reviewed-by: Andy Polyakov <appro@openssl.org> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/5342)
* crypto/x86_64cpuid.pl: suppress AVX512F flag on Skylake-X.Andy Polyakov2017-12-081-0/+8
| | | | | | | | | | It was observed that AVX512 code paths can negatively affect overall Skylake-X system performance. But we are talking specifically about 512-bit code, while AVX512VL, 256-bit variant of AVX512F instructions, is supposed to fly as smooth as AVX2. Which is why it remains unmasked. Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/4838)
* crypto/x86_64cpuid.pl: fix AVX512 capability masking.Andy Polyakov2017-11-231-4/+5
| | | | | | | | Originally it was thought that it's possible to use AVX512VL+BW instructions with XMM and YMM registers without kernel enabling ZMM support, but it turned to be wrong assumption. Reviewed-by: Rich Salz <rsalz@openssl.org>
* OPENSSL_ia32cap: reserve for new extensions.Andy Polyakov2017-11-081-1/+2
| | | | Reviewed-by: Rich Salz <rsalz@openssl.org>
* Fix comment typo.David Benjamin2017-07-261-1/+1
| | | | | | Reviewed-by: Ben Kaduk <kaduk@mit.edu> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/4023)
* crypto/x86_64cpuid.pl: fix typo in Knights Landing detection.Andy Polyakov2017-07-251-1/+1
| | | | | | | Thanks to David Benjamin for spotting this! Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/4009)
* x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.Andy Polyakov2017-07-211-1/+16
| | | | | | | | | | | | | | | | | | "Optimize" is in quotes because it's rather a "salvage operation" for now. Idea is to identify processor capability flags that drive Knights Landing to suboptimial code paths and mask them. Two flags were identified, XSAVE and ADCX/ADOX. Former affects choice of AES-NI code path specific for Silvermont (Knights Landing is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are effectively mishandled at decode time. In both cases we are looking at ~2x improvement. AVX-512 results cover even Skylake-X :-) Hardware used for benchmarking courtesy of Atos, experiments run by Romain Dolbeau <romain.dolbeau@atos.net>. Kudos! Reviewed-by: Rich Salz <rsalz@openssl.org>
* crypto/x86*cpuid.pl: move extended feature detection.Andy Polyakov2017-03-131-11/+10
| | | | | | | | | | | | Exteneded feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen. Original fix for x86_64cpuid.pl addressed this problem, but messed up processor vendor detection. This fix moves extended feature detection past basic feature detection where it belongs. 32-bit counterpart is harmonized too. Reviewed-by: Rich Salz <rsalz@openssl.org> Reviewed-by: Richard Levitte <levitte@openssl.org>
* crypto/x86_64cpuid.pl: move extended feature detection upwards.Andy Polyakov2017-03-071-8/+10
| | | | | | | | | Exteneded feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen. It should have been reported for Excavator since it implements AVX2 extension, but apparently nobody noticed or cared... Reviewed-by: Rich Salz <rsalz@openssl.org>
* crypto/x86_64cpuid.pl: add CFI annotations.Andy Polyakov2017-02-261-0/+4
| | | | Reviewed-by: Rich Salz <rsalz@openssl.org>
* crypto/x86_64cpuid.pl: detect if kernel preserves %zmm registers.Andy Polyakov2017-02-031-1/+9
| | | | Reviewed-by: Rich Salz <rsalz@openssl.org>
* crypto/x86[_64]cpuid.pl: add OPENSSL_ia32_rd[rand|seed]_bytes.Andy Polyakov2016-07-151-21/+52
| | | | Reviewed-by: Richard Levitte <levitte@openssl.org>
* x86_64 assembly pack: tolerate spaces in source directory name.Andy Polyakov2016-05-291-1/+1
| | | | | | [as it is now quoting $output is not required, but done just in case] Reviewed-by: Richard Levitte <levitte@openssl.org>
* Add assembly CRYPTO_memcmp.Andy Polyakov2016-05-191-0/+22
| | | | | | GH: #102 Reviewed-by: Richard Levitte <levitte@openssl.org>
* Copyright consolidation: perl filesRich Salz2016-04-201-1/+8
| | | | | | | | | Add copyright to most .pl files This does NOT cover any .pl file that has other copyright in it. Most of those are Andy's but some are public domain. Fix typo's in some existing files. Reviewed-by: Richard Levitte <levitte@openssl.org>
* x86[_64]cpuid.pl: add low-level RDSEED.Andy Polyakov2014-02-141-0/+15
|
* x86_64 assembly pack: make Windows build more robust.Andy Polyakov2013-01-221-1/+2
| | | | PR: 2963 and a number of others
* Extend OPENSSL_ia32cap_P with extra word to accomodate AVX2 capability.Andy Polyakov2012-11-171-2/+12
|
* x86_64 assembly pack: make it possible to compile with Perl located onAndy Polyakov2012-06-271-1/+1
| | | | | | path with spaces. PR: 2835
* cryptlib.c, etc.: fix linker warnings in 64-bit Darwin build.Andy Polyakov2011-11-121-1/+1
|
* x86_64cpuid.pl: fix typo.Andy Polyakov2011-06-041-1/+1
|
* x86[_64]cpuid.pl: add function accessing rdrand instruction.Andy Polyakov2011-06-041-1/+22
|
* x86[_64]cpuid.pl: harmonize usage of reserved bits #20 and #30.Andy Polyakov2011-05-271-3/+4
|
* x86_64cpuid.pl: get AVX masking right.Andy Polyakov2011-05-261-8/+7
|
* x86_64cpuid.pl: allow shared build to work without -Bsymbolic.Andy Polyakov2011-05-181-0/+4
| | | | PR: 2466
* x86[_64]cpuid.pl: handle new extensions.Andy Polyakov2011-05-161-16/+41
|
* Multiple assembler packs: add experimental memory bus instrumentation.Andy Polyakov2011-04-171-2/+93
|
* Revert previous Linux-specific/centric commit#19629. If it really has toAndy Polyakov2010-05-051-7/+0
| | | | | be done, it's definitely not the way to do it. So far answer to the question was to ./config -Wa,--noexecstack (adopted by RedHat).
* Non-executable stack in asm.Ben Laurie2010-05-051-0/+7
|
* x86_64cpuid.pl: ml64 is allergic to db on label line.Andy Polyakov2010-04-141-1/+2
|
* OPENSSL_cleanse to accept zero length parameter [matching C implementation].Andy Polyakov2010-01-241-1/+3
|
* x86[_64]cpuid.pl: further refine shared cache detection.Andy Polyakov2009-05-141-3/+31
|
* x86_64cpuid.pl: refine shared cache detection logic.Andy Polyakov2009-05-121-2/+27
|
* x86_64 assembler pack to comply with updated styling x86_64-xlate.pl rules.Andy Polyakov2008-11-121-105/+88
|
* x86_64cpuid.pl cosmetics: harmonize $dir treatment with other modules.Andy Polyakov2008-07-151-2/+1
|
* Use default value for $dir if it is empty.Dr. Stephen Henson2008-02-251-0/+1
|
* Make all x86_64 modules independent on current working directory.Andy Polyakov2008-01-131-1/+3
|
* Make x86_64 modules work under Win64/x64.Andy Polyakov2007-08-231-2/+2
|
* x86*cpuid update.Andy Polyakov2007-07-211-2/+2
|
* Flush output in x86_64cpuid.pl.Andy Polyakov2007-06-211-0/+1
|