Demo Discussion
Forum Config Examples Contributions Vulnerabilities
  Discussion forum about ELOG, Page 309 of 808  Not logged in ELOG logo
ID Date Icon Author Author Email Category OS ELOG Versiondown Subject
  67714   Wed Nov 12 03:48:29 2014 Reply Konstantin Olchanskiolchansk@triumf.caBug reportLinux2.9.2-a738Re: Defunct daemons
> Also see this in ALPHA at CERN.
> The elogd we use is this: https://bitbucket.org/ritt/elog/commits/44800a769b99599db7620779e2142b1161c694fc?at=master

Okey, found it. waitpid() in my_shell() is not protected against the periodic alarm signal. (UNIX signals are evil).

In the following log file, notice the entries that have "wait_status" of "-1". Those would have generated zombies ("defunct" processes).

Nov 12 03:43:05 alphacpc05 elogd[4809]: WAITPID pid 4873, wait_status 4873, errno 2 (No such file or directory), status 0, command "convert  
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04-%d.png'"
Nov 12 03:43:05 alphacpc05 elogd[4809]: WAITPID pid 4880, wait_status 4880, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4890, wait_status 4890, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4896, wait_status -1, errno 4 (Interrupted system call), status 0, command "convert  
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05-%d.png'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4896, wait_status 4896, errno 4 (Interrupted system call), status 0, command "convert  
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05-%d.png'"
Nov 12 03:43:20 alphacpc05 elogd[4809]: WAITPID pid 4904, wait_status 4904, errno 4 (Interrupted system call), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:48 alphacpc05 elogd[4809]: WAITPID pid 4922, wait_status 4922, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:49 alphacpc05 elogd[4809]: WAITPID pid 4929, wait_status -1, errno 4 (Interrupted system call), status 1302603136, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:49 alphacpc05 elogd[4809]: WAITPID pid 4929, wait_status 4929, errno 4 (Interrupted system call), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:50 alphacpc05 elogd[4809]: WAITPID pid 4935, wait_status 4935, errno 2 (No such file or directory), status 0, command "convert  
'/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06-%d.png'"
Nov 12 03:43:50 alphacpc05 elogd[4809]: WAITPID pid 4943, wait_status 4943, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h' 
'/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06.pdf[0]'"

The following code is verified to not generate zombies, please apply it to the master branch of elog:

alphadaq.cern.ch:~/packages/elog> git diff
diff --git a/src/elogd.c b/src/elogd.c
index 277ba30..2d9a848 100755
--- a/src/elogd.c
+++ b/src/elogd.c
@@ -892,14 +892,25 @@ int my_shell(char *cmd, char *result, int size)
 
 #ifdef OS_UNIX
    pid_t child_pid;
-   int fh, status, i;
+   int fh, status, i, wait_status;
    char str[1024];
 
    if ((child_pid = fork()) < 0)
       return 0;
    else if (child_pid > 0) {
       /* parent process waits for child */
-      waitpid(child_pid, &status, 0);
+
+      while (1) {
+         wait_status = waitpid(child_pid, &status, 0);
+
+         sprintf(str, "WAITPID pid %d, wait_status %d, errno %d (%s), status %d, command \"%s\"", child_pid, wait_status, errno, strerror(errno), status, cmd);
+         write_logfile(NULL, str);
+         eprintf("%s", str);
+
+         if (wait_status == -1 && errno == EINTR)
+            continue;
+         break;
+      }
 
       /* read back result */
       memset(result, 0, size);
diff --git a/src/git-revision.h b/src/git-revision.h

K.O.
  67722   Mon Nov 24 13:24:27 2014 Reply Stefan Rittstefan.ritt@psi.chBug reportLinux2.9.2-a738Re: Defunct daemons
> Okey, found it. waitpid() in my_shell() is not protected against the periodic alarm signal. (UNIX signals are evil).

Acknowledged. Thanks for the fix. I added it to the development branch.

/Stefan
  68729   Thu Feb 1 03:12:03 2018 Question Yvesvanhaarlemyves@gmail.comQuestionWindows2.9.2->3.1.3 v3.1.3 does not work with logbooks from v2.9.2?

I have just upgraded elog from 2.9.2 -> 3.1.3.

3.1.3 runs fine with new logbooks. However, when trying to run 3.1.3 with my logbooks created with 2.9.2 things stop working.

Here is the command I run for testing [attachment 1]: first of all it takes a very long time (~ 10 minutes) for it to index the logbooks. When finished indexing I try it out in a web browser - it takes infinite time to load: no error message appears but also no logbook. After an hour or so elogd crashes without an error message.

When running 2.9.2 on the same machine, all runs wel (attachment 2)

 

cfd file: (I only left in one logbook - they are all configured the same)

[global]
port = 18080
Logging level = 3
Max content length = 500000000
Date format = %A, %d %B %Y


[Logrun - Amptek]
Theme = default
Comment = Logrun Amptec
Reverse sort = 0
Quick filter = Date, Type

 

Any ideas on how to solve this?

  68730   Thu Feb 1 10:14:55 2018 Reply Andreas Luedekeandreas.luedeke@psi.chQuestionWindows2.9.2->3.1.3 Re: v3.1.3 does not work with logbooks from v2.9.2?
Hi Yves,
just my two pence, maybe they help you to figure out what's going on:
versions 2.* had all entries of one logbook in one directory. Version 3.* create a subdirectory for each year. This had been added for me: if you use AFS for logbook storage, then you have a limit on how many files you can put into a single directory.
So the first time you start elogd 3.* with data from an elogd 2.* it should move all your logbook entries into sub-directories for each year. If that would have happened, you would not be able to use these logbook directories with the 2.9.2 version.
Maybe your logbook client is not allowed to create sub-directories? Although I would guess that it then would just throw an error message and stop.
Cheers, Andreas
Yves wrote:

I have just upgraded elog from 2.9.2 -> 3.1.3.

3.1.3 runs fine with new logbooks. However, when trying to run 3.1.3 with my logbooks created with 2.9.2 things stop working.

Here is the command I run for testing [attachment 1]: first of all it takes a very long time (~ 10 minutes) for it to index the logbooks. When finished indexing I try it out in a web browser - it takes infinite time to load: no error message appears but also no logbook. After an hour or so elogd crashes without an error message.

When running 2.9.2 on the same machine, all runs wel (attachment 2)

 

cfd file: (I only left in one logbook - they are all configured the same)

[global]
port = 18080
Logging level = 3
Max content length = 500000000
Date format = %A, %d %B %Y


[Logrun - Amptek]
Theme = default
Comment = Logrun Amptec
Reverse sort = 0
Quick filter = Date, Type

 

Any ideas on how to solve this?

 

  68733   Fri Feb 2 00:02:54 2018 Reply Yvesvanhaarlemyves@gmail.comQuestionWindows2.9.2->3.1.3 Re: v3.1.3 does not work with logbooks from v2.9.2? - solved

Hi Andreas,

Thanks - you pointed me in the right direction. It appears that my logbooks were a combination of the two versions. I had all the year-directories (version 3) but also all the entry files in the main logbook directory. Seems version 2 does not care but version 3 does not like it. After carefully checking and removing all the logbook files in the main directory version 3 now works.

Cheers,

  Yves

Andreas Luedeke wrote:
Hi Yves,
just my two pence, maybe they help you to figure out what's going on:
versions 2.* had all entries of one logbook in one directory. Version 3.* create a subdirectory for each year. This had been added for me: if you use AFS for logbook storage, then you have a limit on how many files you can put into a single directory.
So the first time you start elogd 3.* with data from an elogd 2.* it should move all your logbook entries into sub-directories for each year. If that would have happened, you would not be able to use these logbook directories with the 2.9.2 version.
Maybe your logbook client is not allowed to create sub-directories? Although I would guess that it then would just throw an error message and stop.
Cheers, Andreas
Yves wrote:

I have just upgraded elog from 2.9.2 -> 3.1.3.

3.1.3 runs fine with new logbooks. However, when trying to run 3.1.3 with my logbooks created with 2.9.2 things stop working.

Here is the command I run for testing [attachment 1]: first of all it takes a very long time (~ 10 minutes) for it to index the logbooks. When finished indexing I try it out in a web browser - it takes infinite time to load: no error message appears but also no logbook. After an hour or so elogd crashes without an error message.

When running 2.9.2 on the same machine, all runs wel (attachment 2)

 

cfd file: (I only left in one logbook - they are all configured the same)

[global]
port = 18080
Logging level = 3
Max content length = 500000000
Date format = %A, %d %B %Y


[Logrun - Amptek]
Theme = default
Comment = Logrun Amptec
Reverse sort = 0
Quick filter = Date, Type

 

Any ideas on how to solve this?

 

 

  67602   Tue Nov 5 23:21:52 2013 Question A.G. Schubertalexis4@stanford.eduBug reportMac OSX2.9.2-2494Compilation failure on Mac OSX 10.9

When compiling elog on OSX 10.9 (Mavericks), I get the error below.

Elog will compile without error if I add -D_FORTIFY_SOURCE=0 to CFLAGS in Makefile, but I'm not sure whether this is a good idea.

 

$ make

cc -O3 -funroll-loops -fomit-frame-pointer -W -Wall  -I../mxml  -DHAVE_SSL -w -c -o crypt.o src/crypt.c

cc -O3 -funroll-loops -fomit-frame-pointer -W -Wall  -I../mxml  -DHAVE_SSL -o elog src/elog.c crypt.o -lssl

src/elog.c:125:8: error: expected parameter declarator

size_t strlcpy(char *dst, const char *src, size_t size)

       ^

/usr/include/secure/_string.h:105:44: note: expanded from macro 'strlcpy'

  __builtin___strlcpy_chk (dest, src, len, __darwin_obsz (dest))

                                           ^

/usr/include/secure/_common.h:39:62: note: expanded from macro '__darwin_obsz'

#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)

                                                             ^

/usr/include/secure/_common.h:30:32: note: expanded from macro '_USE_FORTIFY_LEVEL'

#    define _USE_FORTIFY_LEVEL 2

                               ^

src/elog.c:125:8: error: expected ')'

/usr/include/secure/_string.h:105:44: note: expanded from macro 'strlcpy'

  __builtin___strlcpy_chk (dest, src, len, __darwin_obsz (dest))

                                           ^

/usr/include/secure/_common.h:39:62: note: expanded from macro '__darwin_obsz'

#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)

                                                             ^

/usr/include/secure/_common.h:30:32: note: expanded from macro '_USE_FORTIFY_LEVEL'

#    define _USE_FORTIFY_LEVEL 2

                               ^

src/elog.c:125:8: note: to match this '('

/usr/include/secure/_string.h:105:44: note: expanded from macro 'strlcpy'

  __builtin___strlcpy_chk (dest, src, len, __darwin_obsz (dest))

                                           ^

/usr/include/secure/_common.h:39:53: note: expanded from macro '__darwin_obsz'

#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)

 

                                                    ^

  67603   Wed Nov 6 09:04:32 2013 Reply Stefan Rittstefan.ritt@psi.chBug reportMac OSX2.9.2-2494Re: Compilation failure on Mac OSX 10.9

A.G. Schubert wrote:

When compiling elog on OSX 10.9 (Mavericks), I get the error below.

Elog will compile without error if I add -D_FORTIFY_SOURCE=0 to CFLAGS in Makefile, but I'm not sure whether this is a good idea.

All over sudden gcc comes with its own version of "strlcpy", which I had defined "manually" since many years inside ELOG. Using -DFORTIFY_SOURCE=0 will not harm, so you can use it. The "real" solution is to take our ELOG's strlcpy/strlcat, which I did on the current SVN version.

Best regards,
Stefan 

  67605   Thu Nov 7 02:18:17 2013 Reply A.G. Schubertalexis4@stanford.eduBug reportMac OSX2.9.2-2494Re: Compilation failure on Mac OSX 10.9

Stefan Ritt wrote:

A.G. Schubert wrote:

When compiling elog on OSX 10.9 (Mavericks), I get the error below.

Elog will compile without error if I add -D_FORTIFY_SOURCE=0 to CFLAGS in Makefile, but I'm not sure whether this is a good idea.

All over sudden gcc comes with its own version of "strlcpy", which I had defined "manually" since many years inside ELOG. Using -DFORTIFY_SOURCE=0 will not harm, so you can use it. The "real" solution is to take our ELOG's strlcpy/strlcat, which I did on the current SVN version.

Best regards,
Stefan 

Ok, I tried updating my SVN working copy, but I didn't get any updates past elog rev. 2494, mxml rev. 74.  I undid my changes to Makefile, tried to compile, but got the same errors.  

I then pulled down elog and mxml with git, and these are working for me with no errors.  Thanks!

ELOG V3.1.5-3fb85fa6