summaryrefslogtreecommitdiff
path: root/arm
diff options
context:
space:
mode:
authorMichael Weiser <michael.weiser@gmx.de>2018-02-13 22:13:14 +0100
committerNiels Möller <nisse@lysator.liu.se>2018-03-25 11:27:44 +0200
commit70135c70863eedfd9b300614f4a5535b8b93066c (patch)
treee8230747d2d551418141fce35c12d4939b813ad6 /arm
parent2644d1ed132f7dad05e165d6c96a68ee66547d32 (diff)
downloadnettle-70135c70863eedfd9b300614f4a5535b8b93066c.tar.gz
Document arm endianness considerations
Extend arm/README to provide some background on considerations to be taken into account when writing assembly routines supposed to work in big and little memory endianness.
Diffstat (limited to 'arm')
-rw-r--r--arm/README69
1 files changed, 68 insertions, 1 deletions
diff --git a/arm/README b/arm/README
index 9bacd97b..1ba54e0d 100644
--- a/arm/README
+++ b/arm/README
@@ -44,4 +44,71 @@ q12 (d24, d25) Y
q13 (d26, d27) Y
q14 (d28, d29) Y
q15 (d30, d31) Y
-
+
+Endianness
+
+ARM supports big- and little-endian memory access modes. Representation in
+registers stays the same but loads and stores switch bytes. This has to be
+taken into account in various cases.
+
+Two m4 macros are provided to handle these special cases in assembly source:
+IF_LE(<if-true>,<if-false>)
+IF_BE(<if-true>,<if-false>)
+respectively expand to <if-true> if the target system's endianness is
+little-endian or big-endian. Otherwise they expand to <if-false>.
+
+1. ldr/str
+
+Loading and storing 32-bit words will reverse the words' bytes in little-endian
+mode. If the handled data is actually a byte sequence or data in network byte
+order (big-endian), the loaded word needs to be reversed after load to get it
+back into correct sequence. See v6/sha1-compress.asm LOAD macro for example.
+
+2. shifts
+
+If data is to be processed with bit operations only, endianness can be ignored
+because byte-swapping on load and store will cancel each other out. Shifts
+however have to be inverted. See arm/memxor.asm for an example.
+
+3. vld1.8
+
+NEON's vld instruction can be used to produce endianness-neutral code. vld1.8
+will load a byte sequence into a register regardless of memory endianness. This
+can be used to process byte sequences. See arm/neon/umac-nh.asm for example.
+
+4. vldm/vstm
+
+Care has to be taken when using vldm/vstm because they have two non-obvious
+characteristics:
+
+a. vldm/vstm do normal byte-swapping on each value they load. When loading into
+ d (doubleword) registers, this means that bytes, halfwords and words of the
+ doubleword get swapped. When the data loaded actually represents e.g.
+ vectors of 32-bit words this will swap columns.
+a. vldm/vstm on q (quadword) registers get translated into lvdm/vstm on the
+ equivalent number of d (doubleword) registers. Instead of a 128-bit load it
+ does two 64-bit loads. When again handling vectors of 32-bit words this will
+ still swap adjacent columns but will not reverse all four columns.
+
+memory adr0: w0 w1 w2 w3
+register q0: w1 w0 w3 w2
+
+See arm/neon/chacha-core-internal.asm for an example.
+
+5. simple byte store
+
+Sometimes it is necessary to store remaining single bytes to memory. A simple
+logic will store the lowest byte from a register, then do a right shift and
+start over until all bytes are stored. Since this constitutes a
+least-significant-byte-first store, the data to be stored needs to be reversed
+first on a big-endian system. See arm/memxor.asm Lmemxor_leftover for an
+example.
+
+6. Function parameters/return values
+
+AAPCS requires 64-bit parameters to be passed to and returned from functions
+"in two consecutive registers [...] as if the value had been loaded from memory
+representation with a single LDM instruction." Since loading a big-endian
+doubleword using ldm transposes its words, the same has to be done when e.g.
+returning a 64-bit value from an assembler routine. See arm/neon/umac-nh.asm
+for an example.