debuginfod 2/2: server side

Add the server to the debuginfod/ subdirectory. This is a highly multithreaded c++11 program (still buildable on rhel7's gcc 4.8, which is only partly c++11 compliant). Includes an initial suite of tests, man pages, and a sample systemd service. Signed-off-by: Frank Ch. Eigler <fche@redhat.com> Signed-off-by: Aaron Merey <amerey@redhat.com>
author: Frank Ch. Eigler <fche@elastic.org> 2019-10-28 13:29:26 -0400
committer: Mark Wielaard <mark@klomp.org> 2019-11-22 23:02:29 +0100
commit: e27e30cae0468903473641efe3853c12d9294ac3 (patch)
tree: 1c99041d8d11ee3d620adb08f03616a1fb870a16 /doc
parent: 288f6b199a8b1a60d4fb1f54ca7b883cdd5afca2 (diff)
download: elfutils-e27e30cae0468903473641efe3853c12d9294ac3.tar.gz
2 files changed, 371 insertions, 1 deletions
diff --git a/doc/Makefile.am b/doc/Makefile.am
index 23102604..60e942cb 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -19,10 +19,11 @@
 EXTRA_DIST = COPYING-GFDL README
 dist_man1_MANS=readelf.1 elfclassify.1
 notrans_dist_man3_MANS=elf_update.3 elf_getdata.3 elf_clone.3 elf_begin.3
-
+notrans_dist_man8_MANS=
 notrans_dist_man1_MANS=
 
 if DEBUGINFOD
+notrans_dist_man8_MANS += debuginfod.8
 notrans_dist_man3_MANS += debuginfod_find_debuginfo.3 debuginfod_find_source.3 debuginfod_find_executable.3
 notrans_dist_man1_MANS += debuginfod-find.1
 endif
diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
new file mode 100644
index 00000000..eb1d8910
--- /dev/null
+++ b/doc/debuginfod.8
@@ -0,0 +1,369 @@
+'\"! tbl | nroff \-man
+'\" t macro stdmacro
+
+.de SAMPLE
+.br
+.RS 0
+.nf
+.nh
+..
+.de ESAMPLE
+.hy
+.fi
+.RE
+..
+
+.TH DEBUGINFOD 8
+.SH NAME
+debuginfod \- debuginfo-related http file-server daemon
+
+.SH SYNOPSIS
+.B debuginfod
+[\fIOPTION\fP]... [\fIPATH\fP]...
+
+.SH DESCRIPTION
+\fBdebuginfod\fP serves debuginfo-related artifacts over HTTP.  It
+periodically scans a set of directories for ELF/DWARF files and their
+associated source code, as well as RPM files containing the above, to
+build an index by their buildid.  This index is used when remote
+clients use the HTTP webapi, to fetch these files by the same buildid.
+
+If a debuginfod cannot service a given buildid artifact request
+itself, and it is configured with information about upstream
+debuginfod servers, it queries them for the same information, just as
+\fBdebuginfod-find\fP would.  If successful, it locally caches then
+relays the file content to the original requester.
+
+If the \fB\-F\fP option is given, each listed PATH creates a thread to
+scan for matching ELF/DWARF/source files under the given physical
+directory.  Source files are matched with DWARF files based on the
+AT_comp_dir (compilation directory) attributes inside it.  Duplicate
+directories are ignored.  You may use a file name for a PATH, but
+source code indexing may be incomplete; prefer using a directory that
+contains the binaries.  Caution: source files listed in the DWARF may
+be a path \fIanywhere\fP in the file system, and debuginfod will
+readily serve their content on demand.  (Imagine a doctored DWARF file
+that lists \fI/etc/passwd\fP as a source file.)  If this is a concern,
+audit your binaries with tools such as:
+
+.SAMPLE
+% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^File.name.table/p'
+or
+% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^Line.number/p'
+or even use debuginfod itself:
+% debuginfod -vvv -d :memory: -F BINARY 2>&1 | grep 'recorded.*source'
+^C
+.ESAMPLE
+
+If the \fB\-R\fP option is given each listed PATH creates a thread to
+scan for ELF/DWARF/source files contained in matching RPMs under the
+given physical directory.  Duplicate directories are ignored.  You may
+use a file name for a PATH, but source code indexing may be
+incomplete; prefer using a directory that contains normal RPMs
+alongside debuginfo/debugsource RPMs.  Because of complications such
+as DWZ-compressed debuginfo, may require \fItwo\fP scan passes to
+identify all source code.  Source files for RPMs are only served
+from other RPMs, so the caution for \-F does not apply.
+
+If no PATH is listed, or neither \-F nor \-R option is given, then
+\fBdebuginfod\fP will simply serve content that it scanned into its
+index in previous runs: the data is cumulative.
+
+File names must match extended regular expressions given by the \-I
+option and not the \-X option (if any) in order to be considered.
+
+
+.SH OPTIONS
+
+.TP
+.B "\-F"
+Activate ELF/DWARF file scanning threads.  The default is off.
+
+.TP
+.B "\-R"
+Activate RPM file scanning threads.  The default is off.
+
+.TP
+.B "\-d FILE" "\-\-database=FILE"
+Set the path of the sqlite database used to store the index.  This
+file is disposable in the sense that a later rescan will repopulate
+data.  It will contain absolute file path names, so it may not be
+portable across machines.  It may be frequently read/written, so it
+should be on a fast filesytem.  It should not be shared across
+machines or users, to maximize sqlite locking performance.  The
+default database file is $HOME/.debuginfod.sqlite.
+
+.TP
+.B "\-D SQL" "\-\-ddl=SQL"
+Execute given sqlite statement after the database is opened and
+initialized as extra DDL (SQL data definition language).  This may be
+useful to tune performance-related pragmas or indexes.  May be
+repeated.  The default is nothing extra.
+
+.TP
+.B "\-p NUM" "\-\-port=NUM"
+Set the TCP port number on which debuginfod should listen, to service
+HTTP requests.  Both IPv4 and IPV6 sockets are opened, if possible.
+The webapi is documented below.  The default port number is 8002.
+
+.TP
+.B "\-I REGEX"  "\-\-include=REGEX"  "\-X REGEX"  "\-\-exclude=REGEX"
+Govern the inclusion and exclusion of file names under the search
+paths.  The regular expressions are interpreted as unanchored POSIX
+extended REs, thus may include alternation.  They are evaluated
+against the full path of each file, based on its \fBrealpath(3)\fP
+canonicalization.  By default, all files are included and none are
+excluded.  A file that matches both include and exclude REGEX is
+excluded.  (The \fIcontents\fP of RPM files are not subject to
+inclusion or exclusion filtering: they are all processed.)
+
+.TP
+.B "\-t SECONDS"  "\-\-rescan\-time=SECONDS"
+Set the rescan time for the file and RPM directories.  This is the
+amount of time the scanning threads will wait after finishing a scan,
+before doing it again.  A rescan for unchanged files is fast (because
+the index also stores the file mtimes).  A time of zero is acceptable,
+and means that only one initial scan should performed.  The default
+rescan time is 300 seconds.  Receiving a SIGUSR1 signal triggers a new
+scan, independent of the rescan time (including if it was zero).
+
+.TP
+.B "\-g SECONDS" "\-\-groom\-time=SECONDS"
+Set the groom time for the index database.  This is the amount of time
+the grooming thread will wait after finishing a grooming pass before
+doing it again.  A groom operation quickly rescans all previously
+scanned files, only to see if they are still present and current, so
+it can deindex obsolete files.  See also the \fIDATA MANAGEMENT\fP
+section.  The default groom time is 86400 seconds (1 day).  A time of
+zero is acceptable, and means that only one initial groom should be
+performed.  Receiving a SIGUSR2 signal triggers a new grooming pass,
+independent of the groom time (including if it was zero).
+
+.TP
+.B "\-G"
+Run an extraordinary maximal-grooming pass at debuginfod startup.
+This pass can take considerable time, because it tries to remove any
+debuginfo-unrelated content from the RPM-related parts of the index.
+It should not be run if any recent RPM-related indexing operations
+were aborted early.  It can take considerable space, because it
+finishes up with an sqlite "vacuum" operation, which repacks the
+database file by triplicating it temporarily.  The default is not to
+do maximal-grooming.  See also the \fIDATA MANAGEMENT\fP section.
+
+.TP
+.B "\-c NUM"  "\-\-concurrency=NUM"
+Set the concurrency limit for all the scanning threads.  While many
+threads may be spawned to cover all the given PATHs, only NUM may
+concurrently do CPU-intensive operations like parsing an ELF file
+or an RPM.  The default is the number of processors on the system;
+the minimum is 1.
+
+.TP
+.B "\-v"
+Increase verbosity of logging to the standard error file descriptor.
+May be repeated to increase details.  The default verbosity is 0.
+
+.SH WEBAPI
+
+.\" Much of the following text is duplicated with debuginfod-find.1
+
+The debuginfod's webapi resembles ordinary file service, where a GET
+request with a path containing a known buildid results in a file.
+Unknown buildid / request combinations result in HTTP error codes.
+This file service resemblance is intentional, so that an installation
+can take advantage of standard HTTP management infrastructure.
+
+There are three requests.  In each case, the buildid is encoded as a
+lowercase hexadecimal string.  For example, for a program \fI/bin/ls\fP,
+look at the ELF note GNU_BUILD_ID:
+
+.SAMPLE
+% readelf -n /bin/ls | grep -A4 build.id
+Note section [ 4] '.note.gnu.buildid' of 36 bytes at offset 0x340:
+Owner          Data size  Type
+GNU                   20  GNU_BUILD_ID
+Build ID: 8713b9c3fb8a720137a4a08b325905c7aaf8429d
+.ESAMPLE
+
+Then the hexadecimal BUILDID is simply:
+
+.SAMPLE
+8713b9c3fb8a720137a4a08b325905c7aaf8429d
+.ESAMPLE
+
+.SS /buildid/\fIBUILDID\fP/debuginfo
+
+If the given buildid is known to the server, this request will result
+in a binary object that contains the customary \fB.*debug_*\fP
+sections.  This may be a split debuginfo file as created by
+\fBstrip\fP, or it may be an original unstripped executable.
+
+.SS /buildid/\fIBUILDID\fP/executable
+
+If the given buildid is known to the server, this request will result
+in a binary object that contains the normal executable segments.  This
+may be a executable stripped by \fBstrip\fP, or it may be an original
+unstripped executable.  \fBET_DYN\fP shared libraries are considered
+to be a type of executable.
+
+.SS /buildid/\fIBUILDID\fP/source\fI/SOURCE/FILE\fP
+
+If the given buildid is known to the server, this request will result
+in a binary object that contains the source file mentioned.  The path
+should be absolute.  Relative path names commonly appear in the DWARF
+file's source directory, but these paths are relative to
+individual compilation unit AT_comp_dir paths, and yet an executable
+is made up of multiple CUs.  Therefore, to disambiguate, debuginfod
+expects source queries to prefix relative path names with the CU
+compilation-directory, followed by a mandatory "/".
+
+Note: contrary to RFC 3986, the client should not elide \fB../\fP or
+\fB/./\fP or extraneous \fB///\fP sorts of path components in the
+directory names, because if this is how those names appear in the
+DWARF files, that is what debuginfod needs to see too.
+
+For example:
+.TS
+l l.
+#include <stdio.h>	/buildid/BUILDID/source/usr/include/stdio.h
+/path/to/foo.c	/buildid/BUILDID/source/path/to/foo.c
+\../bar/foo.c AT_comp_dir=/zoo/	/buildid/BUILDID/source/zoo//../bar/foo.c
+.TE
+
+.SH DATA MANAGEMENT
+
+debuginfod stores its index in an sqlite database in a densely packed
+set of interlinked tables.  While the representation is as efficient
+as we have been able to make it, it still takes a considerable amount
+of data to record all debuginfo-related data of potentially a great
+many files.  This section offers some advice about the implications.
+
+As a general explanation for size, consider that debuginfod indexes
+ELF/DWARF files, it stores their names and referenced source file
+names, and buildids will be stored.  When indexing RPMs, it stores
+every file name \fIof or in\fP an RPM, every buildid, plus every
+source file name referenced from a DWARF file.  (Indexing RPMs takes
+more space because the source files often reside in separate
+subpackages that may not be indexed at the same pass, so extra
+metadata has to be kept.)
+
+Getting down to numbers, in the case of Fedora RPMs (essentially,
+gzip-compressed cpio files), the sqlite index database tends to be
+from 0.5% to 3% of their size.  It's larger for binaries that are
+assembled out of a great many source files, or packages that carry
+much debuginfo-unrelated content.  It may be even larger during the
+indexing phase due to temporary sqlite write-ahead-logging files;
+these are checkpointed (cleaned out and removed) at shutdown.  It may
+be helpful to apply tight \-I or \-X regular-expression constraints to
+exclude files from scanning that you know have no debuginfo-relevant
+content.
+
+As debuginfod runs, it periodically rescans its target directories,
+and any new content found is added to the database.  Old content, such
+as data for files that have disappeared or that have been replaced
+with newer versions is removed at a periodic \fIgrooming\fP pass.
+This means that the sqlite files grow fast during initial indexing,
+slowly during index rescans, and periodically shrink during grooming.
+There is also an optional one-shot \fImaximal grooming\fP pass is
+available.  It removes information debuginfo-unrelated data from the
+RPM content index such as file names found in RPMs ("rpm sdef"
+records) that are not referred to as source files from any binaries
+find in RPMs ("rpm sref" records).  This can save considerable disk
+space.  However, it is slow and temporarily requires up to twice the
+database size as free space.  Worse: it may result in missing
+source-code info if the RPM traversals were interrupted, so the not
+all source file references were known.  Use it rarely to polish a
+complete index.
+
+You should ensure that ample disk space remains available.  (The flood
+of error messages on -ENOSPC is ugly and nagging.  But, like for most
+other errors, debuginfod will resume when resources permit.)  If
+necessary, debuginfod can be stopped, the database file moved or
+removed, and debuginfod restarted.
+
+sqlite offers several performance-related options in the form of
+pragmas.  Some may be useful to fine-tune the defaults plus the
+debuginfod extras.  The \-D option may be useful to tell debuginfod to
+execute the given bits of SQL after the basic schema creation
+commands.  For example, the "synchronous", "cache_size",
+"auto_vacuum", "threads", "journal_mode" pragmas may be fun to tweak
+via \-D, if you're searching for peak performance.  The "optimize",
+"wal_checkpoint" pragmas may be useful to run periodically, outside
+debuginfod.  The default settings are performance- rather than
+reliability-oriented, so a hardware crash might corrupt the database.
+In these cases, it may be necessary to manually delete the sqlite
+database and start over.
+
+As debuginfod changes in the future, we may have no choice but to
+change the database schema in an incompatible manner.  If this
+happens, new versions of debuginfod will issue SQL statements to
+\fIdrop\fP all prior schema & data, and start over.  So, disk space
+will not be wasted for retaining a no-longer-useable dataset.
+
+In summary, if your system can bear a 0.5%-3% index-to-RPM-dataset
+size ratio, and slow growth afterwards, you should not need to
+worry about disk space.  If a system crash corrupts the database,
+or you want to force debuginfod to reset and start over, simply
+erase the sqlite file before restarting debuginfod.
+
+
+.SH SECURITY
+
+debuginfod \fBdoes not\fP include any particular security features.
+While it is robust with respect to inputs, some abuse is possible.  It
+forks a new thread for each incoming HTTP request, which could lead to
+a denial-of-service in terms of RAM, CPU, disk I/O, or network I/O.
+If this is a problem, users are advised to install debuginfod with a
+HTTPS reverse-proxy front-end that enforces site policies for
+firewalling, authentication, integrity, authorization, and load
+control.
+
+When relaying queries to upstream debuginfods, debuginfod \fBdoes not\fP
+include any particular security features.  It trusts that the binaries
+returned by the debuginfods are accurate.  Therefore, the list of
+servers should include only trustworthy ones.  If accessed across HTTP
+rather than HTTPS, the network should be trustworthy.  Authentication
+information through the internal \fIlibcurl\fP library is not currently
+enabled.
+
+
+.SH "ENVIRONMENT VARIABLES"
+
+.TP 21
+.B DEBUGINFOD_URLS
+This environment variable contains a list of URL prefixes for trusted
+debuginfod instances.  Alternate URL prefixes are separated by space.
+Avoid referential loops that cause a server to contact itself, directly
+or indirectly - the results would be hilarious.
+
+.TP 21
+.B DEBUGINFOD_TIMEOUT
+This environment variable governs the timeout for each debuginfod HTTP
+connection.  A server that fails to respond within this many seconds
+is skipped.  The default is 5.
+
+.TP 21
+.B DEBUGINFOD_CACHE_PATH
+This environment variable governs the location of the cache where
+downloaded files are kept.  It is cleaned periodically as this
+program is reexecuted.  The default is $HOME/.debuginfod_client_cache.
+.\" XXX describe cache eviction policy
+
+.SH FILES
+.LP
+.PD .1v
+.TP 20
+.B $HOME/.debuginfod.sqlite
+Default database file.
+.PD
+
+.TP 20
+.B $HOME/.debuginfod_client_cache
+Default cache directory for content from upstream debuginfods.
+.PD
+
+
+.SH "SEE ALSO"
+.I "debuginfod-find(1)"
+.I "sqlite3(1)"
+.I \%https://prometheus.io/docs/instrumenting/exporters/
author	Frank Ch. Eigler <fche@elastic.org>	2019-10-28 13:29:26 -0400
committer	Mark Wielaard <mark@klomp.org>	2019-11-22 23:02:29 +0100
commit	e27e30cae0468903473641efe3853c12d9294ac3 (patch)
tree	1c99041d8d11ee3d620adb08f03616a1fb870a16 /doc
parent	288f6b199a8b1a60d4fb1f54ca7b883cdd5afca2 (diff)
download	elfutils-e27e30cae0468903473641efe3853c12d9294ac3.tar.gz