Demo Discussion
Forum Config Examples Contributions Vulnerabilities
  Discussion forum about ELOG  Not logged in ELOG logo
icon4.gif   Defunct daemons, posted by David Pilgram on Sat Mar 29 12:14:11 2014 
    icon2.gif   Re: Defunct daemons, posted by David Pilgram on Sat Mar 29 13:14:44 2014 
       icon2.gif   Re: Defunct daemons, posted by Konstantin Olchanski on Wed Nov 12 03:19:17 2014 
          icon2.gif   Re: Defunct daemons, posted by Konstantin Olchanski on Wed Nov 12 03:48:29 2014 
             icon2.gif   Re: Defunct daemons, posted by Stefan Ritt on Mon Nov 24 13:24:27 2014 
Message ID: 67714     Entry time: Wed Nov 12 03:48:29 2014     In reply to: 67713     Reply to this: 67722
Icon: Reply  Author: Konstantin Olchanski  Author Email: olchansk@triumf.ca 
Category: Bug report  OS: Linux  ELOG Version: 2.9.2-a738 
Subject: Re: Defunct daemons 
> Also see this in ALPHA at CERN.
> The elogd we use is this: https://bitbucket.org/ritt/elog/commits/44800a769b99599db7620779e2142b1161c694fc?at=master

Okey, found it. waitpid() in my_shell() is not protected against the periodic alarm signal. (UNIX signals are evil).

In the following log file, notice the entries that have "wait_status" of "-1". Those would have generated zombies ("defunct" processes).

Nov 12 03:43:05 alphacpc05 elogd[4809]: WAITPID pid 4873, wait_status 4873, errno 2 (No such file or directory), status 0, command "convert  
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04-%d.png'"
Nov 12 03:43:05 alphacpc05 elogd[4809]: WAITPID pid 4880, wait_status 4880, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4890, wait_status 4890, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4896, wait_status -1, errno 4 (Interrupted system call), status 0, command "convert  
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05-%d.png'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4896, wait_status 4896, errno 4 (Interrupted system call), status 0, command "convert  
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05-%d.png'"
Nov 12 03:43:20 alphacpc05 elogd[4809]: WAITPID pid 4904, wait_status 4904, errno 4 (Interrupted system call), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:48 alphacpc05 elogd[4809]: WAITPID pid 4922, wait_status 4922, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:49 alphacpc05 elogd[4809]: WAITPID pid 4929, wait_status -1, errno 4 (Interrupted system call), status 1302603136, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:49 alphacpc05 elogd[4809]: WAITPID pid 4929, wait_status 4929, errno 4 (Interrupted system call), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:50 alphacpc05 elogd[4809]: WAITPID pid 4935, wait_status 4935, errno 2 (No such file or directory), status 0, command "convert  
'/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06-%d.png'"
Nov 12 03:43:50 alphacpc05 elogd[4809]: WAITPID pid 4943, wait_status 4943, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06.pdf[0]'"

The following code is verified to not generate zombies, please apply it to the master branch of elog:

alphadaq.cern.ch:~/packages/elog> git diff
diff --git a/src/elogd.c b/src/elogd.c
index 277ba30..2d9a848 100755
--- a/src/elogd.c
+++ b/src/elogd.c
@@ -892,14 +892,25 @@ int my_shell(char *cmd, char *result, int size)
 
 #ifdef OS_UNIX
    pid_t child_pid;
-   int fh, status, i;
+   int fh, status, i, wait_status;
    char str[1024];
 
    if ((child_pid = fork()) < 0)
       return 0;
    else if (child_pid > 0) {
       /* parent process waits for child */
-      waitpid(child_pid, &status, 0);
+
+      while (1) {
+         wait_status = waitpid(child_pid, &status, 0);
+
+         sprintf(str, "WAITPID pid %d, wait_status %d, errno %d (%s), status %d, command \"%s\"", child_pid, wait_status, errno, strerror(errno), status, cmd);
+         write_logfile(NULL, str);
+         eprintf("%s", str);
+
+         if (wait_status == -1 && errno == EINTR)
+            continue;
+         break;
+      }
 
       /* read back result */
       memset(result, 0, size);
diff --git a/src/git-revision.h b/src/git-revision.h

K.O.
ELOG V3.1.5-3fb85fa6