> Also see this in ALPHA at CERN.
> The elogd we use is this: https://bitbucket.org/ritt/elog/commits/44800a769b99599db7620779e2142b1161c694fc?at=master
Okey, found it. waitpid() in my_shell() is not protected against the periodic alarm signal. (UNIX signals are evil).
In the following log file, notice the entries that have "wait_status" of "-1". Those would have generated zombies ("defunct" processes).
Nov 12 03:43:05 alphacpc05 elogd[4809]: WAITPID pid 4873, wait_status 4873, errno 2 (No such file or directory), status 0, command "convert
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04-%d.png'"
Nov 12 03:43:05 alphacpc05 elogd[4809]: WAITPID pid 4880, wait_status 4880, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4890, wait_status 4890, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4896, wait_status -1, errno 4 (Interrupted system call), status 0, command "convert
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05-%d.png'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4896, wait_status 4896, errno 4 (Interrupted system call), status 0, command "convert
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05-%d.png'"
Nov 12 03:43:20 alphacpc05 elogd[4809]: WAITPID pid 4904, wait_status 4904, errno 4 (Interrupted system call), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:48 alphacpc05 elogd[4809]: WAITPID pid 4922, wait_status 4922, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:49 alphacpc05 elogd[4809]: WAITPID pid 4929, wait_status -1, errno 4 (Interrupted system call), status 1302603136, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:49 alphacpc05 elogd[4809]: WAITPID pid 4929, wait_status 4929, errno 4 (Interrupted system call), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:50 alphacpc05 elogd[4809]: WAITPID pid 4935, wait_status 4935, errno 2 (No such file or directory), status 0, command "convert
'/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06-%d.png'"
Nov 12 03:43:50 alphacpc05 elogd[4809]: WAITPID pid 4943, wait_status 4943, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06.pdf[0]'"
The following code is verified to not generate zombies, please apply it to the master branch of elog:
alphadaq.cern.ch:~/packages/elog> git diff
diff --git a/src/elogd.c b/src/elogd.c
index 277ba30..2d9a848 100755
--- a/src/elogd.c
+++ b/src/elogd.c
@@ -892,14 +892,25 @@ int my_shell(char *cmd, char *result, int size)
#ifdef OS_UNIX
pid_t child_pid;
- int fh, status, i;
+ int fh, status, i, wait_status;
char str[1024];
if ((child_pid = fork()) < 0)
return 0;
else if (child_pid > 0) {
/* parent process waits for child */
- waitpid(child_pid, &status, 0);
+
+ while (1) {
+ wait_status = waitpid(child_pid, &status, 0);
+
+ sprintf(str, "WAITPID pid %d, wait_status %d, errno %d (%s), status %d, command \"%s\"", child_pid, wait_status, errno, strerror(errno), status, cmd);
+ write_logfile(NULL, str);
+ eprintf("%s", str);
+
+ if (wait_status == -1 && errno == EINTR)
+ continue;
+ break;
+ }
/* read back result */
memset(result, 0, size);
diff --git a/src/git-revision.h b/src/git-revision.h
K.O. |