[PATCH] Log message before aborting
Signed-off-by: Mike Crute <mike@crute.us> --- This is a really tiny patch but the behavior of aborting was very confusing to one of the Alpine Linux users and caused them to spend a lot of time creating and installing debug builds and tracing the BIRD process only to find a configuration error. Hopefully this patch will help to prevent such issues for future users while still providing proper debug info for developers. Original Alpine Issue: https://gitlab.alpinelinux.org/alpine/aports/-/issues/14408 Thanks! sysdep/unix/io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sysdep/unix/io.c b/sysdep/unix/io.c index 810e782d..57bed092 100644 --- a/sysdep/unix/io.c +++ b/sysdep/unix/io.c @@ -2129,7 +2129,7 @@ watchdog_sigalrm(int sig UNUSED) io_update_time(); /* We want core dump */ - abort(); + bug("Watchdog timer timed out."); } static inline void -- 2.39.0
On Mon, Jan 09, 2023 at 07:35:57PM -0800, Mike Crute via Bird-users wrote:
Signed-off-by: Mike Crute <mike@crute.us> ---
This is a really tiny patch but the behavior of aborting was very confusing to one of the Alpine Linux users and caused them to spend a lot of time creating and installing debug builds and tracing the BIRD process only to find a configuration error. Hopefully this patch will help to prevent such issues for future users while still providing proper debug info for developers.
Hi Unfortunately it is more complicated: 1) vlog() (which is internally caled from bug()) is not safe to call from a signal handler, as it internally takes a mutex, if the signal is received when the interrupted thread is already in mutex, it will deadlock. 2) This alarm is here to ensure that BIRD does not hang, if there is some issue withing logging (e.g. deadlock or long-term blocking on write), we could hang on the logging, circumventing the watchdog. 3) bug() has slightly different meaning, so it is not really matching here, but that is easily fixed by using log() and abort() separately. These issues could be fixed, but more intricate approach has to be used. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, Jan 11, 2023, at 08:36, Ondrej Zajicek wrote:
1) vlog() (which is internally caled from bug()) is not safe to call from a signal handler, as it internally takes a mutex, if the signal is received when the interrupted thread is already in mutex, it will deadlock.
2) This alarm is here to ensure that BIRD does not hang, if there is some issue withing logging (e.g. deadlock or long-term blocking on write), we could hang on the logging, circumventing the watchdog.
3) bug() has slightly different meaning, so it is not really matching here, but that is easily fixed by using log() and abort() separately.
These issues could be fixed, but more intricate approach has to be used.
Thanks Ondrej. I did not realize that vlog was unsafe in this context. It looks like the debug function is closer to a correct solution except that on error it will call bug and also fputs is not async safe. My goal is to present some log message immediately prior to abort to alert the user that this crash was on purpose. I think this would have helped our Alpine user find their configuration error faster rather than wasting time debugging the bird binary itself. Is this proposed log function a valid solution?: void watchdog_debug(const char *msg) { if (dbgf) { write(dbgf, msg, strlen(msg)); abort(); } } write is async-safe and there is no need for string formatting in this case. Is there any other complexity that I'm missing? ~mike
On Wed, Jan 11, 2023 at 11:51:12AM -0800, Michael Crute wrote:
On Wed, Jan 11, 2023, at 08:36, Ondrej Zajicek wrote:
1) vlog() (which is internally caled from bug()) is not safe to call from a signal handler, as it internally takes a mutex, if the signal is received when the interrupted thread is already in mutex, it will deadlock.
2) This alarm is here to ensure that BIRD does not hang, if there is some issue withing logging (e.g. deadlock or long-term blocking on write), we could hang on the logging, circumventing the watchdog.
3) bug() has slightly different meaning, so it is not really matching here, but that is easily fixed by using log() and abort() separately.
These issues could be fixed, but more intricate approach has to be used.
Thanks Ondrej. I did not realize that vlog was unsafe in this context. It looks like the debug function is closer to a correct solution except that on error it will call bug and also fputs is not async safe.
My goal is to present some log message immediately prior to abort to alert the user that this crash was on purpose. I think this would have helped our Alpine user find their configuration error faster rather than wasting time debugging the bird binary itself.
Is this proposed log function a valid solution?:
void watchdog_debug(const char *msg) { if (dbgf) { write(dbgf, msg, strlen(msg)); abort(); } }
That would work (with some minor modifications - abort() should be out of condition, dbgf is FILE *, not fd, fileno() is not async-safe, so we would need keep dbg_fd). The disadantage is it would not write to log file, but only to debug output (enabled with -d / -D option). If that is acceptable to you, i would apply necessary changes. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Signed-off-by: Mike Crute <mike@crute.us> --- On Thu, Jan 12, 2023 at 03:32:52AM +0100, Ondrej Zajicek wrote:
That would work (with some minor modifications - abort() should be out of condition, dbgf is FILE *, not fd, fileno() is not async-safe, so we would need keep dbg_fd).
The disadantage is it would not write to log file, but only to debug output (enabled with -d / -D option). If that is acceptable to you, i would apply necessary changes.
I think it's okay that this only prints to the debug log file/stderr. I think a person would try debug mode first when they encouter a crash before doing more invasive debugging. Here's a revised patch that incorporates your notes. Thanks! sysdep/unix/io.c | 2 +- sysdep/unix/log.c | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/sysdep/unix/io.c b/sysdep/unix/io.c index 810e782d..06930fd7 100644 --- a/sysdep/unix/io.c +++ b/sysdep/unix/io.c @@ -2129,7 +2129,7 @@ watchdog_sigalrm(int sig UNUSED) io_update_time(); /* We want core dump */ - abort(); + watchdog_debug("Watchdog timer timed out.\n"); } static inline void diff --git a/sysdep/unix/log.c b/sysdep/unix/log.c index 4e9df069..54614143 100644 --- a/sysdep/unix/log.c +++ b/sysdep/unix/log.c @@ -31,6 +31,7 @@ #include "lib/lists.h" #include "sysdep/unix/unix.h" +static int dbg_fd; static FILE *dbgf; static list *current_log_list; static char *current_syslog_name; /* NULL -> syslog closed */ @@ -324,6 +325,24 @@ debug(const char *msg, ...) va_end(args); } +/** + * watchdog_debug - async-safe write to debug output + * @msg: a string message + * + * This function prints the message @msg to the debugging output in a + * way that is async safe and can be used in signal handlers. No newline + * character is appended. It terminates the program after writing. + */ +void +watchdog_debug(const char *msg) +{ + if (dbgf) + { + write(dbg_fd, msg, strlen(msg)); + } + abort(); +} + static list * default_log_list(int initial, const char **syslog_name) { @@ -427,7 +446,10 @@ log_init_debug(char *f) if (!f) dbgf = NULL; else if (!*f) + { dbgf = stderr; + dbg_fd = STDERR_FILENO; + } else if (!(dbgf = fopen(f, "a"))) { /* Cannot use die() nor log() here, logging is not yet initialized */ @@ -435,5 +457,8 @@ log_init_debug(char *f) exit(1); } if (dbgf) + { setvbuf(dbgf, NULL, _IONBF, 0); + dbg_fd = fileno(dbgf); + } } -- 2.39.0
On Wed, Jan 11, 2023 at 08:02:23PM -0800, Mike Crute wrote:
I think it's okay that this only prints to the debug log file/stderr. I think a person would try debug mode first when they encouter a crash before doing more invasive debugging.
Here's a revised patch that incorporates your notes. Thanks!
Thanks, merged with minor changes: https://gitlab.nic.cz/labs/bird/-/commit/64a2b7aaa303be0b407508747bfc96c1c65... -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (3)
-
Michael Crute -
Mike Crute -
Ondrej Zajicek