summaryrefslogtreecommitdiff
path: root/lib/fatal-signal.c
Commit message (Collapse)AuthorAgeFilesLines
* fatal-signal: Remove snprintf.William Tu2020-04-191-8/+37
| | | | | | | | | | | | | | | | | | | | | | Function snprintf is not async-signal-safe. Replace it with our own implementation. Example ovs-vswitchd.log output: 2020-03-25T01:08:19.673Z|00050|memory|INFO|handlers:2 ports:3 SIGSEGV detected, backtrace: 0x4872d9 <fatal_signal_handler+0x49> 0x7f4e2ab974b0 <killpg+0x40> 0x7f4e2ac5d74d <__poll+0x2d> 0x531098 <time_poll+0x108> 0x51aefc <poll_block+0x8c> 0x445ca9 <udpif_revalidator+0x289> 0x5056fd <ovsthread_wrapper+0x7d> 0x7f4e2b65f6ba <start_thread+0xca> 0x7f4e2ac6941d <clone+0x6d> 0x0 <+0x0> Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/674901331 Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* fatal-signal: Fix clang error due to lock.William Tu2020-03-241-6/+2
| | | | | | | | | | | | | | | | Due to not acquiring lock, clang reports: lib/vlog.c:618:12: error: reading variable 'log_fd' requires holding mutex 'log_file_mutex' [-Werror,-Wthread-safety-analysis] return log_fd; The patch fixes it by creating a function in vlog.c to write directly to log file unsafely. Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666165883 Fixes: ecd4a8fcdff2 ("fatal-signal: Log backtrace when no monitor daemon.") Suggested-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
* fatal-signal: Log backtrace when no monitor daemon.William Tu2020-03-231-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the backtrace logging is only available when monitor daemon is running. This patch enables backtrace logging when no monitor daemon exists. At signal handling context, it detects whether monitor daemon exists. If not, write directly the backtrace to the vlog fd. Note that using VLOG_* macro doesn't work due to it's buffer I/O, so this patch directly issue write() syscall to the file descriptor. For some system we stop using monitor daemon and use systemd to monitor ovs-vswitchd, thus need this patch. Example of ovs-vswitchd.log (note that there is no timestamp printed): 2020-03-23T14:42:12.949Z|00049|memory|INFO|175332 kB peak resident 2020-03-23T14:42:12.949Z|00050|memory|INFO|handlers:2 ports:3 reva SIGSEGV detected, backtrace: 0x0000000000486969 <fatal_signal_handler+0x49> 0x00007f7f5e57f4b0 <killpg+0x40> 0x000000000047daa8 <pmd_thread_main+0x238> 0x0000000000504edd <ovsthread_wrapper+0x7d> 0x00007f7f5f0476ba <start_thread+0xca> 0x00007f7f5e65141d <clone+0x6d> 0x0000000000000000 <+0x0> Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
* trivial: Fix indentation.William Tu2020-03-201-1/+1
| | | | | | | Add extra space to fix indentation. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
* Avoid clobbered variable warning on ppc64le.David Wilder2019-10-091-1/+2
| | | | | | | | | | | | | | | Since commit e2ed6fbeb1, Ci on ppc64le with Ubuntu 16.04.6 LTS throws this error: lib/fatal-signal.c: In function 'send_backtrace_to_monitor': lib/fatal-signal.c:168:9: error: variable 'dep' might be clobbered by 'longjmp' or 'vfork' [-Werror=clobbered] int dep; Declaring dep as a volatile int. Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* fatal-signal: Catch SIGSEGV and print backtrace.William Tu2019-09-271-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch catches the SIGSEGV signal and prints the backtrace using libunwind at the monitor daemon. This makes debugging easier when there is no debug symbol package or gdb installed on production systems. The patch works when the ovs-vswitchd compiles even without debug symbol (no -g option), because the object files still have function symbols. For example: |daemon_unix(monitor)|WARN|SIGSEGV detected, backtrace: |daemon_unix(monitor)|WARN|0x0000000000482752 <fatal_signal_handler+0x52> |daemon_unix(monitor)|WARN|0x00007fb4900734b0 <killpg+0x40> |daemon_unix(monitor)|WARN|0x00007fb49013974d <__poll+0x2d> |daemon_unix(monitor)|WARN|0x000000000052b348 <time_poll+0x108> |daemon_unix(monitor)|WARN|0x00000000005153ec <poll_block+0x8c> |daemon_unix(monitor)|WARN|0x000000000058630a <clean_thread_main+0x1aa> |daemon_unix(monitor)|WARN|0x00000000004ffd1d <ovsthread_wrapper+0x7d> |daemon_unix(monitor)|WARN|0x00007fb490b3b6ba <start_thread+0xca> |daemon_unix(monitor)|WARN|0x00007fb49014541d <clone+0x6d> |daemon_unix(monitor)|ERR|1 crashes: pid 122849 died, killed \ (Segmentation fault), core dumped, restarting However, if the object files' symbols are stripped, then we can only get init function plus offset value. This is still useful when trying to see if two bugs have the same root cause, Example: |daemon_unix(monitor)|WARN|SIGSEGV detected, backtrace: |daemon_unix(monitor)|WARN|0x0000000000482752 <_init+0x7d68a> |daemon_unix(monitor)|WARN|0x00007f5f7c8cf4b0 <killpg+0x40> |daemon_unix(monitor)|WARN|0x00007f5f7c99574d <__poll+0x2d> |daemon_unix(monitor)|WARN|0x000000000052b348 <_init+0x126280> |daemon_unix(monitor)|WARN|0x00000000005153ec <_init+0x110324> |daemon_unix(monitor)|WARN|0x0000000000407439 <_init+0x2371> |daemon_unix(monitor)|WARN|0x00007f5f7c8ba830 <__libc_start_main+0xf0> |daemon_unix(monitor)|WARN|0x0000000000408329 <_init+0x3261> |daemon_unix(monitor)|ERR|1 crashes: pid 106155 died, killed \ (Segmentation fault), core dumped, restarting Most C library functions are not async-signal-safe, meaning that it is not safe to call them from a signal handler, for example printf() or fflush(). To be async-signal-safe, the handler only collects the stack info using libunwind, which is signal-safe, and issues 'write' to the pipe, where the monitor thread reads and prints to ovs-vswitchd.log. Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/590503433 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* lib: Move lib/poll-loop.h to include/openvswitchXiao Liang2017-11-031-1/+1
| | | | | | | | Poll-loop is the core to implement main loop. It should be available in libopenvswitch. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* json: Move from lib to include/openvswitch.Terry Wilson2016-07-221-1/+1
| | | | | | | | | | | | | | | To easily allow both in- and out-of-tree building of the Python wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to include/openvswitch. This also requires moving lib/{hmap,shash}.h. Both hmap.h and shash.h were #include-ing "util.h" even though the headers themselves did not use anything from there, but rather from include/openvswitch/util.h. Fixing that required including util.h in several C files mostly due to OVS_NOT_REACHED and things like xmalloc. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Move lib/type-props.h to include/openvswitch directoryBen Warren2016-04-141-1/+1
| | | | | Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* lib: Move vlog.h to <openvswitch/vlog.h>Thomas Graf2014-12-151-1/+1
| | | | | | | | A new function vlog_insert_module() is introduced to avoid using list_insert() from the vlog.h header. Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Ben Pfaff <blp@nicira.com>
* poll-loop: Create Windows event handles for sockets automatically.Gurucharan Shetty2014-06-301-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently have a poll_fd_wait_event(fd, wevent, events) function that is used at places common to Windows and Linux where we have to wait on sockets. On Linux, 'wevent' is always set as zero. On Windows, for sockets, when we send both 'fd' and 'wevent', we associate them with each other for 'events' and then wait on 'wevent'. Also on Windows, when we only send 'wevent' to this function, we would simply wait for all events for that 'wevent'. There is a disadvantage with this approach. * Windows clients need to create a 'wevent' and then pass it along. This means that at a lot of places where we create sockets, we also are forced to create a 'wevent'. With this commit, we pass the responsibility of creating a 'wevent' to poll_fd_wait() in case of sockets. That way, a client using poll_fd_wait() is only concerned about sockets and not about 'wevents'. There is a potential disadvantage with this change in that we create events more often and that may have a performance penalty. If that turns out to be the case, we will eventually need to create a pool of wevents that can be re-used. In Windows, there are cases where we want to wait on a event (not associated with any sockets) and then control it using functions like SetEvent() etc. For that purpose, introduce a new function poll_wevent_wait(). For this function, the client needs to create a event and then pass it along as an argument. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-By: Ben Pfaff <blp@nicira.com>
* ovs-vsctl.at: Workaround lack of 'kill -l' on Windows.Gurucharan Shetty2014-06-261-0/+1
| | | | | | | | Also, fflush(stderr) when we raise a signal. The test this commit is changing would fail otherwise. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* daemon-windows: unlink pidfile before stopping the service.Gurucharan Shetty2014-06-241-4/+3
| | | | | | | | | | | | | | | When a OVS daemon is configured to run as a Windows service, when the service is stopped by calling service_stop(), the windows services manager does not give enough time to do everything in the atexit handler. So call the exit handler directly from service_stop(). Also add a test case for Windows services which checks for the termination of the service by looking at pidfile cleaned by the exit handler. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com Acked-by: Ben Pfaff <blp@nicira.com>
* process: block signals while spawning child processesAnsis Atteka2014-05-301-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | Between fork() and execvp() calls in the process_start() function both child and parent processes share the same file descriptors. This means that, if a child process received a signal during this time interval, then it could potentially write data to a shared file descriptor. One such example is fatal signal handler, where, if child process received SIGTERM signal, then it would write data into pipe. Then a read event would occur on the other end of the pipe where parent process is listening and this would make parent process to incorrectly believe that it was the one who received SIGTERM. Also, since parent process never reads data from this pipe, then this bug would make parent process to consume 100% CPU by immediately waking up from the event loop. This patch will help to avoid this problem by blocking signals until child closes all its file descriptors. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Reported-by: Suganya Ramachandran <suganyar@vmware.com> Issue: 1255110
* lib: make wevent staticAndy Zhou2014-02-281-1/+1
| | | | | | | | Fixed sparse non static symbol warning. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com> Acked-by: Gurucharan Shetty <gshetty@nicira.com>
* fatal-signal: SIGPIPE for Windows.Gurucharan Shetty2014-02-261-0/+8
| | | | | | | | | Windows does not have a SIGPIPE. We ignore SIGPIPE for Linux. To compile on Windows, carve out a new function to ignore SIGPIPE on Linux. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* fatal-signal: Handle SIGINT for Windows.Gurucharan Shetty2014-02-261-0/+15
| | | | | | | | | | | | | | | | | | | | | | | Ctrl+C signals are a special case for Windows and can be handled by registering a handle through SetConsoleCtrlHandler() routine. This is only useful when we run it directly on console and not as services in the background. Once we get a Ctrl+C signal, we call the cleanup functions and then exit. One thing to know here is that MinGW terminal handles Ctrl+C signal differently (and looks a little buggy. I see it exiting the handler midway with some sort of timeout). So this implementation is only useful when run on Windows terminal. Since we only use MinGW for compilation and eventually to run unit tests, it should be okay. (The unit tests would ideally use windows services and not expect Ctrl+C) Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* fatal-signal: Fatal signal handling for Windows.Gurucharan Shetty2014-02-261-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Windows does not have a SIGHUP or SIGALRM. It does have a SIGINT and SIGTERM. The documentation at msdn says that SIGINT is not supported for win32 applications because WIN32 operating systems generate a new thread to specifically handle Ctrl+C. This commit handles SIGTERM for Windows. The documentation also states that nothing generates SIGTERM in Windows, but one can use raise(SIGTERM) to manage it. The idea for handling SIGTERM for Windows is to just have a place holder if there is need to raise() a signal for some other purpose. We use SIGALRM in timeval.c if we wake up from a sleep after 'deadline'. For Windows, print an error message and then use SIGTERM. There is an atexit() function for Windows, so we can call cleanup functions during exit. An upcoming commit separately handles Ctrl+C so that we can call clean up functions for that use case. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* Rename NOT_REACHED to OVS_NOT_REACHEDHarold Lim2013-12-171-1/+1
| | | | | | | | This allows other libraries to use util.h that has already defined NOT_REACHED. Signed-off-by: Harold Lim <haroldl@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* Use "error-checking" mutexes in place of other kinds wherever possible.Ben Pfaff2013-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | We've seen a number of deadlocks in the tree since thread safety was introduced. So far, all of these are self-deadlocks, that is, a single thread acquiring a lock and then attempting to re-acquire the same lock recursively. When this has happened, the process simply hung, and it was somewhat difficult to find the cause. POSIX "error-checking" mutexes check for this specific problem (and others). This commit switches from other types of mutexes to error-checking mutexes everywhere that we can, that is, everywhere that we're not using recursive mutexes. This ought to help find problems more quickly in the future. There might be performance advantages to other kinds of mutexes in some cases. However, the existing mutex type choices were just guesses, so I'd rather go for easy detection of errors until we know that other mutex types actually perform better in specific cases. Also, I did a quick microbenchmark of glibc mutex types on my host and found that the error checking mutexes weren't any slower than the other types, at least when the mutex is uncontended. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
* clang: Add annotations for thread safety check.Ethan Jackson2013-07-301-17/+12
| | | | | | | | | | This commit adds annotations for thread safety check. And the check can be conducted by using -Wthread-safety flag in clang. Co-authored-by: Alex Wang <alexw@nicira.com> Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* fatal-signal: Make thread-safe.Ben Pfaff2013-07-231-9/+47
| | | | Signed-off-by: Ben Pfaff <blp@nicira.com>
* fatal-signal: Remove write-only variable fatal_signal_set.Ben Pfaff2013-07-111-5/+0
| | | | | Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ed Maste <emaste@freebsd.org>
* Replace all uses of strerror() by ovs_strerror(), for thread safety.Ben Pfaff2013-06-281-3/+3
| | | | Signed-off-by: Ben Pfaff <blp@nicira.com>
* signals: Make signal_name() thread-safe.Ben Pfaff2013-06-051-1/+3
| | | | Signed-off-by: Ben Pfaff <blp@nicira.com>
* Replace most uses of assert by ovs_assert.Ben Pfaff2013-01-161-2/+1
| | | | | | | | This is a straight search-and-replace, except that I also removed #include <assert.h> from each file where there were no assert calls left. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
* lib: Add xpipe_nonblocking helperEd Maste2012-09-281-3/+1
| | | | | Signed-off-by: Ed Maste <emaste@adaranet.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* socket-util: New function xset_nonblocking().Ben Pfaff2012-07-181-2/+2
| | | | Signed-off-by: Ben Pfaff <blp@nicira.com>
* lib: Do not assume sig_atomic_t is int.Ed Maste2012-06-261-2/+2
| | | | | | | | On FreeBSD sig_atomic_t is long, which causes the comparison in fatal_signal_run to be true when no signal has been reported. Signed-off-by: Ed Maste <emaste@freebsd.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
* fatal-signal: Log when terminating due to a fatal signal.Ben Pfaff2012-05-141-1/+3
| | | | | | This makes it easier to diagnose why and when a daemon exited. Signed-off-by: Ben Pfaff <blp@nicira.com>
* Global replace of Nicira Networks.Raju Subramanian2012-05-021-1/+1
| | | | | | | | Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc. Feature #10593 Signed-off-by: Raju Subramanian <rsubramanian@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* Add fallback definition of SIG_ATOMIC_MAXSimon Horman2011-09-221-0/+6
| | | | | | | | | Android appears to lack SIG_ATOMIC_MAX which is only used in fatal-signal.c. Observed when compiling using the Android NDK r6b (Android API level 13). Patch based on a suggestion by Ben Pfaff
* Log anything that could prevent a daemon from starting.Ben Pfaff2011-04-041-7/+4
| | | | | If a daemon doesn't start, we need to know why. Being able to consistently consult the log to find out is helpful.
* Convert shash users that don't use the 'data' value to sset instead.Ben Pfaff2011-03-311-14/+10
| | | | | | In each of the cases converted here, an shash was used simply to maintain a set of strings, with the shash_nodes' 'data' values set to NULL. This commit converts them to use sset instead.
* vlog: Make client supply semicolon for VLOG_DEFINE_THIS_MODULE.Ben Pfaff2010-10-291-1/+1
| | | | | It's kind of odd for VLOG_DEFINE_THIS_MODULE to supply its own semicolon, so this commit switches to the more common form.
* treewide: Remove trailing whitespaceJoe Perches2010-08-301-1/+1
| | | | | | Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
* vlog: Introduce VLOG_DEFINE_THIS_MODULE for declaring vlog module in use.Ben Pfaff2010-07-211-2/+2
| | | | | | | Adding a macro to define the vlog module in use adds a level of indirection, which makes it easier to change how the vlog module must be defined. A followup commit needs to do that, so getting these widespread changes out of the way first should make that commit easier to review.
* Simplify shash_find() followed by shash_add() into shash_add_once().Ben Pfaff2010-06-301-3/+1
| | | | This is just a cleanup.
* Make fatal signals cause an exit more promptly in special cases.Ben Pfaff2010-04-131-0/+11
| | | | | | | | | | | | | | The fatal-signal library notices and records fatal signals (e.g. SIGTERM) and terminates the process on the next trip through poll_block(). But some special utilities do not always invoke poll_block() promptly, e.g. "ovs-ofctl monitor" does not call poll_block() as long as OpenFlow messages are available. But these special cases seem like they are all likely to call into functions that themselves block (those with "_block" in their names). So make a new rule that such functions should always call fatal_signal_run(), either directly or through poll_block(). This commit implements and documents that rule. Bug #2625.
* fatal-signal: Initialize library upon any call to public function.Ben Pfaff2010-03-241-1/+5
| | | | | | | | | | | Not calling fatal_signal_init() means that the signal handlers don't get registered, so the process won't clean up on fatal signals. Furthermore, signal_fds[0] is then 0, which means that fatal-signal_wait() waits on stdin, so if you are testing a program interactively and accidentally type something on stdin then that program's CPU usage jumps to 100%. Since poll_block() calls fatal_signal_wait() this seems like the most reliable solution.
* Merge "master" into "next".Ben Pfaff2010-02-111-2/+2
|\ | | | | | | | | The main change here is the need to update all of the uses of UNUSED in the next branch to OVS_UNUSED as it is now spelled on "master".
| * Rename UNUSED macro to OVS_UNUSED to avoid naming conflict.Ben Pfaff2010-02-111-2/+2
| | | | | | | | Requested by Jean Tourrilhes <jt@hpl.hp.com>.
* | fatal-signal: After fork, clear hooks instead of disabling them.Ben Pfaff2010-01-151-28/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, fatal_signal_fork() has simply disabled all the fatal signal callback hooks. This worked fine, because a daemon process forked only once and the parent didn't do much before it exited. But upcoming commits will introduce a --monitor option, which requires processes to fork multiple times. Sometimes the parent process will fork, then run for a while, then fork again. It's not good to disable the hooks in the child process in such a case, because that prevents e.g. pidfiles from being removed at the child's exit. So this commit changes the semantics of fatal_signal_fork() to just clearing out hooks. After hooks are cleared, new hooks can be added and will be executed on process termination in the usual way. This commit also introduces a cancellation callback function so that a canceled hook can free resources.
* | fatal-signal: Run signal hooks outside of actual signal handlers.Jesse Gross2010-01-061-70/+57
|/ | | | | | | Rather than running signal hooks directly from the actual signal handler, simply record the fact that the signal occured and run the hook next time around the poll loop. This allows significantly more freedom as to what can actually be done in the signal hooks.
* fatal-signal: Add clarifying comments.Ben Pfaff2009-09-211-1/+11
| | | | Suggested by Justin Pettit.
* fatal-signal: New function fatal_signal_unlink_file_now().Ben Pfaff2009-09-211-0/+18
| | | | | This is a helper function that combines two actions that callers commonly wanted. It will have an additional user in an upcoming commit.
* fatal-signal: Clean up code by using shash.Ben Pfaff2009-09-211-15/+11
| | | | | This simplifies the code here and should speed it up, too, when there are lots of files to unlink on a fatal signal.
* Update primary code license to Apache 2.0.Ben Pfaff2009-06-151-10/+10
|
* Import from old repository commit 61ef2b42a9c4ba8e1600f15bb0236765edc2ad45.v0.90.0Ben Pfaff2009-07-081-0/+253