summaryrefslogtreecommitdiff
path: root/arm/README
Commit message (Collapse)AuthorAgeFilesLines
* arm: Unify neon asm for big- and little-endian modesMichael Weiser2021-01-131-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Switch arm neon assembler routines to endianness-agnostic loads and stores where possible to avoid modifications to the rest of the code. This involves switching to vld1.32 for loading consecutive 32-bit words in host endianness as well as vst1.8 for storing back to memory in little-endian order as required by the caller. Where necessary, r3 is used to store the precalculated offset into the source vector for the secondary load operations. vstm is kept for little-endian platforms because it is faster than vst1 on most ARM implementations. vst1.x (at least on the Allwinner A20 Cortex-A7 implementation) seems to interfer with itself on subsequent calls, slowing it down further. So we reschedule some instructions to do stores as soon as results become available to have some other calculations or loads before the next vst1.x. This reliably saves two additional cycles per block on salsa20 and chacha which would otherwise be incurred. vld1.x does not seem to suffer from this or at least not to a level where two consecutive vld1.x run slower than an equivalent vldm. Rescheduling them similarly did not improve performance beyond that of vldm. Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
* Document arm endianness considerationsMichael Weiser2018-03-251-1/+68
| | | | | | Extend arm/README to provide some background on considerations to be taken into account when writing assembly routines supposed to work in big and little memory endianness.
* Reorganization of ARM assembly.Niels Möller2013-04-181-0/+47
Renamed directory armv7 to arm. New subdirectory arm/neon, for files using neon instructions. configure.ac hacked to make use of neon configurable.