From 88477764f37a8462c7c01f2b235ef4efd08c765f Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Sun, 30 Dec 2018 07:54:48 -0800
Subject: ELF: add support for the ELF "merge" attribute

Add support for the "merge" attribute in ELF, along with the
associated "strings" and size specifier attributes.

Fix a few places where we used "int", but a larger type really ought
to have been used.

Be a bit more lax about respecifying attributes. For example, align=
can be respecified; the highest resulting value is used.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 doc/changes.src |  3 +++
 doc/nasmdoc.src | 25 ++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

(limited to 'doc')

diff --git a/doc/changes.src b/doc/changes.src
index a4df0473..6fd19943 100644
--- a/doc/changes.src
+++ b/doc/changes.src
@@ -12,6 +12,9 @@ since 2007.
 \b Suppress nuisance "\c{label changed during code generation}" messages
 after a real error.
 
+\b Add support for the \c{merge} and \c{strings} attributes on ELF
+sections. See \k{elfsect}.
+
 \S{cl-2.14.02} Version 2.14.02
 
 \b Fix crash due to multiple errors or warnings during the code
diff --git a/doc/nasmdoc.src b/doc/nasmdoc.src
index ea6f10f2..bcfcad90 100644
--- a/doc/nasmdoc.src
+++ b/doc/nasmdoc.src
@@ -256,9 +256,12 @@ Object File Format
 \IA{sectalign}{sectalign}
 \IR{solaris x86} Solaris x86
 \IA{standard section names}{standardized section names}
+\IR{strings, elf attribute} \c{strings}
 \IR{symbols, exporting from dlls} symbols, exporting from DLLs
 \IR{symbols, importing from dlls} symbols, importing from DLLs
 \IR{test subdirectory} \c{test} subdirectory
+\IR{thread local storage in elf} thread local storage, in \c{elf}
+\IR{thread local storage in mach-o} thread local storage, in \c{macho}
 \IR{tlink} \c{TLINK}
 \IR{underscore, in c symbols} underscore, in C symbols
 \IR{unicode} Unicode
@@ -5951,6 +5954,26 @@ contents given, such as a BSS section.
 \I{section alignment, in elf}\I{alignment, in elf sections}alignment
 requirements of the section.
 
+\b \c{ent=} or \c{entsize=} specifies the fundamental data item size
+for a section which contains either fixed-sized data structures or
+strings; this is generally used with the \c{merge} attribute (see
+below.)
+
+\b \c{byte}, \c{word}, \c{dword}, \c{qword}, \c{tword}, \c{oword},
+\c{yword}, or \c{zword} are both shorthand for \c{entsize=}, but also
+sets the default alignment.
+
+\b \i{strings, ELF attribute}\c{strings} indicate that this section
+contains exclusively null-terminated strings. By default these are
+assumed to be byte strings, but a size specifier can be used to
+override that.
+
+\b \i\c{merge} indicates that duplicate data elements in this section
+should be merged with data elements from other object files. Data
+elements can be either fixed-sized objects or null-terminatedstrings
+(with the \c{strings} attribute.) A size specifier is required unless
+\c{strings} is specified, in which case the size defaults to \c{byte}.
+
 \b \i\c{tls} defines the section to be one which contains
 thread local variables.
 
@@ -8213,7 +8236,7 @@ then the correct first instruction in the code section will not be
 seen because the starting point skipped over it. This isn't really
 ideal.
 
-To avoid this, you can specify a `\i\c{synchronisation}' point, or indeed
+To avoid this, you can specify a `\i{synchronisation}' point, or indeed
 as many synchronisation points as you like (although NDISASM can
 only handle 2147483647 sync points internally). The definition of a sync
 point is this: NDISASM guarantees to hit sync points exactly during
-- 
cgit v1.2.1


From b2004511dddeefd7c0866a33ceaa5fa1a6ee0510 Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Tue, 26 Feb 2019 00:02:35 -0800
Subject: ELF: handle more than 32,633 sections

Dead code elimination in ELF uses separate ELF sections for every
functions or data items that may be garbage collected. This can end up
being more than 32,633 sections which, when the ELF internal and
relocation sections are added in, can exceed the legacy ELF maximum of
65,279 sections.

Newer versions of the ELF specification has added support for much
larger number of sections by putting a place holder value (usually
SHN_XINDEX == 0xffff, but 0 in some cases) into fields where the
section index is a 16-bit value, and storing the full value in a
diffent place: the program header uses entries in section header 0,
the symbol table uses an auxiliary segment with the additional
indicies; the section header did not need it as the sh_link field is
already 32 (or 64) bits long.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 doc/changes.src | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'doc')

diff --git a/doc/changes.src b/doc/changes.src
index 6fd19943..1e67bec5 100644
--- a/doc/changes.src
+++ b/doc/changes.src
@@ -15,6 +15,8 @@ after a real error.
 \b Add support for the \c{merge} and \c{strings} attributes on ELF
 sections. See \k{elfsect}.
 
+\b Handle more than 32,633 sections in ELF.
+
 \S{cl-2.14.02} Version 2.14.02
 
 \b Fix crash due to multiple errors or warnings during the code
-- 
cgit v1.2.1


From dc5939b4960e169e19c536e5503ec4487cff550d Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Tue, 26 Feb 2019 01:44:55 -0800
Subject: Handle more ELF section types

note, preinit_array, init_array, and fini_array are ELF section types
that can matter to the assembly programmer.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 doc/changes.src |   3 ++
 doc/nasmdoc.src | 133 ++++++++++++++++++++++++++++++++------------------------
 2 files changed, 80 insertions(+), 56 deletions(-)

(limited to 'doc')

diff --git a/doc/changes.src b/doc/changes.src
index 1e67bec5..d1181971 100644
--- a/doc/changes.src
+++ b/doc/changes.src
@@ -15,6 +15,9 @@ after a real error.
 \b Add support for the \c{merge} and \c{strings} attributes on ELF
 sections. See \k{elfsect}.
 
+\b Add support for the \c{note}, \c{preinit_array}, \c{init_array},
+and \c{fini_array} sections type in ELF. See \k{elfsect}.
+
 \b Handle more than 32,633 sections in ELF.
 
 \S{cl-2.14.02} Version 2.14.02
diff --git a/doc/nasmdoc.src b/doc/nasmdoc.src
index bcfcad90..cb58045a 100644
--- a/doc/nasmdoc.src
+++ b/doc/nasmdoc.src
@@ -122,15 +122,14 @@
 \IR{- opunary} \c{-} operator, unary
 \IR{! opunary} \c{!} operator, unary
 \IR{alignment, in bin sections} alignment, in \c{bin} sections
-\IR{alignment, in elf sections} alignment, in \c{elf} sections
+\IR{alignment, in elf sections} alignment, in ELF sections
 \IR{alignment, in win32 sections} alignment, in \c{win32} sections
-\IR{alignment, of elf common variables} alignment, of \c{elf} common
+\IR{alignment, of elf common variables} alignment, of ELF common
 variables
 \IR{alignment, in obj sections} alignment, in \c{obj} sections
 \IR{a.out, bsd version} \c{a.out}, BSD version
 \IR{a.out, linux version} \c{a.out}, Linux version
-\IR{autoconf} Autoconf
-\IR{bin} bin
+\IR{bin} \c{bin} output format
 \IR{bitwise and} bitwise AND
 \IR{bitwise or} bitwise OR
 \IR{bitwise xor} bitwise XOR
@@ -150,8 +149,8 @@ variables
 \IR{codeview} CodeView debugging format
 \IR{common object file format} Common Object File Format
 \IR{common variables, alignment in elf} common variables, alignment
-in \c{elf}
-\IR{common, elf extensions to} \c{COMMON}, \c{elf} extensions to
+in ELF
+\IR{common, elf extensions to} \c{COMMON}, ELF extensions to
 \IR{common, obj extensions to} \c{COMMON}, \c{obj} extensions to
 \IR{declaring structure} declaring structures
 \IR{default-wrt mechanism} default-\c{WRT} mechanism
@@ -165,7 +164,8 @@ in \c{elf}
 \IA{effective address}{effective addresses}
 \IA{effective-address}{effective addresses}
 \IR{elf} ELF
-\IR{elf, 16-bit code and} ELF, 16-bit code and
+\IR{elf, 16-bit code} ELF, 16-bit code
+\IR{elf, debug formats} ELF, debug formats
 \IR{elf shared libraries} ELF, shared libraries
 \IR{elf32} \c{elf32}
 \IR{elf64} \c{elf64}
@@ -181,7 +181,7 @@ in \c{elf}
 \IR{functions, pascal calling convention} functions, Pascal calling
 convention
 \IR{global, aoutb extensions to} \c{GLOBAL}, \c{aoutb} extensions to
-\IR{global, elf extensions to} \c{GLOBAL}, \c{elf} extensions to
+\IR{global, elf extensions to} \c{GLOBAL}, ELF extensions to
 \IR{global, rdf extensions to} \c{GLOBAL}, \c{rdf} extensions to
 \IR{got} GOT
 \IR{got relocations} \c{GOT} relocations
@@ -238,16 +238,16 @@ convention
 Object File Format
 \IR{relocations, pic-specific} relocations, PIC-specific
 \IA{repeating}{repeating code}
-\IR{section alignment, in elf} section alignment, in \c{elf}
+\IR{section alignment, in elf} section alignment, in ELF
 \IR{section alignment, in bin} section alignment, in \c{bin}
 \IR{section alignment, in obj} section alignment, in \c{obj}
 \IR{section alignment, in win32} section alignment, in \c{win32}
-\IR{section, elf extensions to} \c{SECTION}, \c{elf} extensions to
+\IR{section, elf extensions to} \c{SECTION}, ELF extensions to
 \IR{section, macho extensions to} \c{SECTION}, \c{macho} extensions to
 \IR{section, win32 extensions to} \c{SECTION}, \c{win32} extensions to
 \IR{segment alignment, in bin} segment alignment, in \c{bin}
 \IR{segment alignment, in obj} segment alignment, in \c{obj}
-\IR{segment, obj extensions to} \c{SEGMENT}, \c{elf} extensions to
+\IR{segment, obj extensions to} \c{SEGMENT}, ELF extensions to
 \IR{segment names, borland pascal} segment names, Borland Pascal
 \IR{shift command} \c{shift} command
 \IA{sib}{sib byte}
@@ -256,11 +256,10 @@ Object File Format
 \IA{sectalign}{sectalign}
 \IR{solaris x86} Solaris x86
 \IA{standard section names}{standardized section names}
-\IR{strings, elf attribute} \c{strings}
 \IR{symbols, exporting from dlls} symbols, exporting from DLLs
 \IR{symbols, importing from dlls} symbols, importing from DLLs
 \IR{test subdirectory} \c{test} subdirectory
-\IR{thread local storage in elf} thread local storage, in \c{elf}
+\IR{thread local storage in elf} thread local storage, in ELF
 \IR{thread local storage in mach-o} thread local storage, in \c{macho}
 \IR{tlink} \c{TLINK}
 \IR{underscore, in c symbols} underscore, in C symbols
@@ -298,16 +297,16 @@ Object File Format
 
 The Netwide Assembler, NASM, is an 80x86 and x86-64 assembler designed
 for portability and modularity. It supports a range of object file
-formats, including Linux and \c{*BSD} \c{a.out}, \c{ELF}, \c{COFF},
-\c{Mach-O}, 16-bit and 32-bit \c{OBJ} (OMF) format, \c{Win32} and
-\c{Win64}. It will also output plain binary files, Intel hex and
+formats, including Linux and *BSD \c{a.out}, ELF, Mach-O, 16-bit and
+32-bit \c{.obj} (OMF) format, COFF (including its Win32 and Win64
+variants.) It can also output plain binary files, Intel hex and
 Motorola S-Record formats. Its syntax is designed to be simple and
 easy to understand, similar to the syntax in the Intel Software
 Developer Manual with minimal complexity. It supports all currently
 known x86 architectural extensions, and has strong support for macros.
 
-NASM also comes with a set of utilities for handling the \c{RDOFF}
-custom object-file format.
+NASM also comes with a set of utilities for handling its own RDOFF2
+object-file format.
 
 \S{legal} \i{License} Conditions
 
@@ -355,7 +354,7 @@ For example,
 
 \c nasm -f elf myfile.asm
 
-will assemble \c{myfile.asm} into an \c{ELF} object file \c{myfile.o}. And
+will assemble \c{myfile.asm} into an ELF object file \c{myfile.o}. And
 
 \c nasm -f bin myfile.asm -o myfile.com
 
@@ -377,7 +376,7 @@ The option \c{-hf} will also list the available output file formats,
 and what they are.
 
 If you use Linux but aren't sure whether your system is \c{a.out}
-or \c{ELF}, type
+or ELF, type
 
 \c file nasm
 
@@ -4376,7 +4375,7 @@ operating in 16-bit mode, 32-bit mode or 64-bit mode. The syntax is
 \c{BITS XX}, where XX is 16, 32 or 64.
 
 In most cases, you should not need to use \c{BITS} explicitly. The
-\c{aout}, \c{coff}, \c{elf}, \c{macho}, \c{win32} and \c{win64}
+\c{aout}, \c{coff}, \c{elf*}, \c{macho}, \c{win32} and \c{win64}
 object formats, which are designed for use in 32-bit or 64-bit
 operating systems, all cause NASM to select 32-bit or 64-bit mode,
 respectively, by default. The \c{obj} object format allows you
@@ -4653,9 +4652,8 @@ refer to symbols which \e{are} defined in the same module as the
 \c         ; some code
 
 \c{GLOBAL}, like \c{EXTERN}, allows object formats to define private
-extensions by means of a colon. The \c{elf} object format, for
-example, lets you specify whether global data items are functions or
-data:
+extensions by means of a colon. The ELF object format, for example,
+lets you specify whether global data items are functions or data:
 
 \c global  hashlookup:function, hashtable:data
 
@@ -4686,8 +4684,8 @@ at the same piece of memory.
 
 Like \c{GLOBAL} and \c{EXTERN}, \c{COMMON} supports object-format
 specific extensions. For example, the \c{obj} format allows common
-variables to be NEAR or FAR, and the \c{elf} format allows you to
-specify the alignment requirements of a common variable:
+variables to be NEAR or FAR, and the ELF format allows you to specify
+the alignment requirements of a common variable:
 
 \c common  commvar  4:near  ; works in OBJ
 \c common  intarray 100:4   ; works in ELF: 4 byte aligned
@@ -4759,7 +4757,7 @@ For example, when mangling local symbols via the generic namespace:
 This is useful when the directive is needed to be output format
 agnostic.
 
-The example is also euquivalent to this, when the output format is \c{elf}:
+The example is also euquivalent to this, when the output format is ELF:
 
 \c      %pragma elf gprefix _
 
@@ -5907,8 +5905,8 @@ Format} Object Files
 The \c{elf32}, \c{elf64} and \c{elfx32} output formats generate
 \c{ELF32 and ELF64} (Executable and Linkable Format) object files, as
 used by Linux as well as \i{Unix System V}, including \i{Solaris x86},
-\i{UnixWare} and \i{SCO Unix}. \c{elf} provides a default output
-file-name extension of \c{.o}.  \c{elf} is a synonym for \c{elf32}.
+\i{UnixWare} and \i{SCO Unix}. ELF provides a default output
+file-name extension of \c{.o}. \c{elf} is a synonym for \c{elf32}.
 
 The \c{elfx32} format is used for the \i{x32} ABI, which is a 32-bit
 ABI with the CPU in 64-bit mode.
@@ -5921,8 +5919,8 @@ target operating system (OSABI).  This field can be set by using the
 system. If this directive is not used, the default value will be "UNIX
 System V ABI" (0) which will work on most systems which support ELF.
 
-\S{elfsect} \c{elf} extensions to the \c{SECTION} Directive
-\I{SECTION, elf extensions to}
+\S{elfsect} ELF extensions to the \c{SECTION} Directive
+\I{SECTION, ELF extensions to}
 
 Like the \c{obj} format, \c{elf} allows you to specify additional
 information on the \c{SECTION} directive line, to control the type
@@ -5947,23 +5945,42 @@ not.
 
 \b \i\c{progbits} defines the section to be one with explicit contents
 stored in the object file: an ordinary code or data section, for
-example, \i\c{nobits} defines the section to be one with no explicit
+example.
+
+\b \i\c{nobits} defines the section to be one with no explicit
 contents given, such as a BSS section.
 
-\b \c{align=}, used with a trailing number as in \c{obj}, gives the
+\b \i\c{note} indicates that this section contains ELF notes. The
+content of ELF notes are specified using normal assembly instructions;
+it is up to the programmer to ensure these are valid ELF notes.
+
+\b \i\c{preinit_array} indicates that this section contains function
+addresses to be called before any other initialization has happened.
+
+\b \i\c{init_array} indicates that this section contains function
+addresses to be called during initialization.
+
+\b \i\c{fini_array} indicates that this section contains function
+pointers to be called during termination.
+
+\b \I{align, ELF attribute}\c{align=}, used with a trailing number as in \c{obj}, gives the
 \I{section alignment, in elf}\I{alignment, in elf sections}alignment
 requirements of the section.
 
-\b \c{ent=} or \c{entsize=} specifies the fundamental data item size
-for a section which contains either fixed-sized data structures or
-strings; this is generally used with the \c{merge} attribute (see
-below.)
-
 \b \c{byte}, \c{word}, \c{dword}, \c{qword}, \c{tword}, \c{oword},
-\c{yword}, or \c{zword} are both shorthand for \c{entsize=}, but also
-sets the default alignment.
-
-\b \i{strings, ELF attribute}\c{strings} indicate that this section
+\c{yword}, or \c{zword} with an optional \c{*}\i{multiplier} specify
+the fundamental data item size for a section which contains either
+fixed-sized data structures or strings; it also sets a default
+alignment. This is generally used with the \c{strings} and \c{merge}
+attributes (see below.) For example \c{byte*4} defines a unit size of
+4 bytes, with a default alignment of 1; \c{dword} also defines a unit
+size of 4 bytes, but with a default alignment of 4. The \c{align=}
+attribute, if specified, overrides this default alignment.
+
+\b \I{pointer, ELF attribute}\c{pointer} is equivalent to \c{dword}
+for \c{elf32} or \c{elfx32}, and \c{qword} for \c{elf64}.
+
+\b \I{strings, ELF attribute}\c{strings} indicate that this section
 contains exclusively null-terminated strings. By default these are
 assumed to be byte strings, but a size specifier can be used to
 override that.
@@ -5983,24 +6000,28 @@ qualifiers are:
 \I\c{.text} \I\c{.rodata} \I\c{.lrodata} \I\c{.data} \I\c{.ldata}
 \I\c{.bss} \I\c{.lbss} \I\c{.tdata} \I\c{.tbss} \I\c\{.comment}
 
-\c section .text    progbits  alloc   exec    nowrite  align=16
-\c section .rodata  progbits  alloc   noexec  nowrite  align=4
-\c section .lrodata progbits  alloc   noexec  nowrite  align=4
-\c section .data    progbits  alloc   noexec  write    align=4
-\c section .ldata   progbits  alloc   noexec  write    align=4
-\c section .bss     nobits    alloc   noexec  write    align=4
-\c section .lbss    nobits    alloc   noexec  write    align=4
-\c section .tdata   progbits  alloc   noexec  write    align=4    tls
-\c section .tbss    nobits    alloc   noexec  write    align=4    tls
-\c section .comment progbits  noalloc noexec  nowrite  align=1
-\c section other    progbits  alloc   noexec  nowrite  align=1
+\c section .text          progbits      alloc   exec    nowrite  align=16
+\c section .rodata        progbits      alloc   noexec  nowrite  align=4
+\c section .lrodata       progbits      alloc   noexec  nowrite  align=4
+\c section .data          progbits      alloc   noexec  write    align=4
+\c section .ldata         progbits      alloc   noexec  write    align=4
+\c section .bss           nobits        alloc   noexec  write    align=4
+\c section .lbss          nobits        alloc   noexec  write    align=4
+\c section .tdata         progbits      alloc   noexec  write    align=4   tls
+\c section .tbss          nobits        alloc   noexec  write    align=4   tls
+\c section .comment       progbits      noalloc noexec  nowrite  align=1
+\c section .preinit_array preinit_array alloc   noexec  nowrite  pointer
+\c section .init_array    init_array    alloc   noexec  nowrite  pointer
+\c section .fini_array    fini_array    alloc   noexec  nowrite  pointer
+\c section .note          note          noalloc noexec  nowrite  align=1
+\c section other          progbits      alloc   noexec  nowrite  align=1
 
 (Any section name other than those in the above table
  is treated by default like \c{other} in the above table.
  Please note that section names are case sensitive.)
 
 
-\S{elfwrt} \i{Position-Independent Code}\I{PIC}: \c{macho} Special
+\S{elfwrt} \i{Position-Independent Code}\I{PIC}: ELF Special
 Symbols and \i\c{WRT}
 
 Since \c{ELF} does not support segment-base references, the \c{WRT}
@@ -6138,7 +6159,7 @@ requires that it be aligned on a 4-byte boundary.
 
 
 \S{elf16} 16-bit code and ELF
-\I{ELF, 16-bit code and}
+\I{ELF, 16-bit code}
 
 The \c{ELF32} specification doesn't provide relocations for 8- and
 16-bit values, but the GNU \c{ld} linker adds these as an extension.
@@ -6148,7 +6169,7 @@ be linked as ELF using GNU \c{ld}. If NASM is used with the
 these relocations is generated.
 
 \S{elfdbg} Debug formats and ELF
-\I{ELF, Debug formats and}
+\I{ELF, debug formats}
 
 ELF provides debug information in \c{STABS} and \c{DWARF} formats.
 Line number information is generated for all executable sections, but please
-- 
cgit v1.2.1


From a8604c83fa8ece9859fb76b328b8753f549b8863 Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Tue, 26 Feb 2019 02:36:15 -0800
Subject: ELF: the .note section should be 4-byte aligned

The ELF .note section contains of 4-byte words and should be aligned
accordingly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 doc/nasmdoc.src | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'doc')

diff --git a/doc/nasmdoc.src b/doc/nasmdoc.src
index cb58045a..8310faac 100644
--- a/doc/nasmdoc.src
+++ b/doc/nasmdoc.src
@@ -6013,7 +6013,7 @@ qualifiers are:
 \c section .preinit_array preinit_array alloc   noexec  nowrite  pointer
 \c section .init_array    init_array    alloc   noexec  nowrite  pointer
 \c section .fini_array    fini_array    alloc   noexec  nowrite  pointer
-\c section .note          note          noalloc noexec  nowrite  align=1
+\c section .note          note          noalloc noexec  nowrite  align=4
 \c section other          progbits      alloc   noexec  nowrite  align=1
 
 (Any section name other than those in the above table
-- 
cgit v1.2.1