ID |
Date |
Icon |
Author |
Author Email |
Category |
OS |
ELOG Version |
Subject |
67714
|
Wed Nov 12 03:48:29 2014 |
| Konstantin Olchanski | olchansk@triumf.ca | Bug report | Linux | 2.9.2-a738 | Re: Defunct daemons | > Also see this in ALPHA at CERN.
> The elogd we use is this: https://bitbucket.org/ritt/elog/commits/44800a769b99599db7620779e2142b1161c694fc?at=master
Okey, found it. waitpid() in my_shell() is not protected against the periodic alarm signal. (UNIX signals are evil).
In the following log file, notice the entries that have "wait_status" of "-1". Those would have generated zombies ("defunct" processes).
Nov 12 03:43:05 alphacpc05 elogd[4809]: WAITPID pid 4873, wait_status 4873, errno 2 (No such file or directory), status 0, command "convert
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04-%d.png'"
Nov 12 03:43:05 alphacpc05 elogd[4809]: WAITPID pid 4880, wait_status 4880, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4890, wait_status 4890, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4896, wait_status -1, errno 4 (Interrupted system call), status 0, command "convert
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05-%d.png'"
Nov 12 03:43:19 alphacpc05 elogd[4809]: WAITPID pid 4896, wait_status 4896, errno 4 (Interrupted system call), status 0, command "convert
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05-%d.png'"
Nov 12 03:43:20 alphacpc05 elogd[4809]: WAITPID pid 4904, wait_status 4904, errno 4 (Interrupted system call), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:48 alphacpc05 elogd[4809]: WAITPID pid 4922, wait_status 4922, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034304_xvthr04.pdf[0]'"
Nov 12 03:43:49 alphacpc05 elogd[4809]: WAITPID pid 4929, wait_status -1, errno 4 (Interrupted system call), status 1302603136, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:49 alphacpc05 elogd[4809]: WAITPID pid 4929, wait_status 4929, errno 4 (Interrupted system call), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034318_xvthr05.pdf[0]'"
Nov 12 03:43:50 alphacpc05 elogd[4809]: WAITPID pid 4935, wait_status 4935, errno 2 (No such file or directory), status 0, command "convert
'/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06.pdf[0-7]' -thumbnail '600' '/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06-%d.png'"
Nov 12 03:43:50 alphacpc05 elogd[4809]: WAITPID pid 4943, wait_status 4943, errno 2 (No such file or directory), status 0, command "identify -format '%wx%h'
'/home/alpha/online/elog/logbooks/test/141112_034348_xvthr06.pdf[0]'"
The following code is verified to not generate zombies, please apply it to the master branch of elog:
alphadaq.cern.ch:~/packages/elog> git diff
diff --git a/src/elogd.c b/src/elogd.c
index 277ba30..2d9a848 100755
--- a/src/elogd.c
+++ b/src/elogd.c
@@ -892,14 +892,25 @@ int my_shell(char *cmd, char *result, int size)
#ifdef OS_UNIX
pid_t child_pid;
- int fh, status, i;
+ int fh, status, i, wait_status;
char str[1024];
if ((child_pid = fork()) < 0)
return 0;
else if (child_pid > 0) {
/* parent process waits for child */
- waitpid(child_pid, &status, 0);
+
+ while (1) {
+ wait_status = waitpid(child_pid, &status, 0);
+
+ sprintf(str, "WAITPID pid %d, wait_status %d, errno %d (%s), status %d, command \"%s\"", child_pid, wait_status, errno, strerror(errno), status, cmd);
+ write_logfile(NULL, str);
+ eprintf("%s", str);
+
+ if (wait_status == -1 && errno == EINTR)
+ continue;
+ break;
+ }
/* read back result */
memset(result, 0, size);
diff --git a/src/git-revision.h b/src/git-revision.h
K.O. |
67722
|
Mon Nov 24 13:24:27 2014 |
| Stefan Ritt | stefan.ritt@psi.ch | Bug report | Linux | 2.9.2-a738 | Re: Defunct daemons | > Okey, found it. waitpid() in my_shell() is not protected against the periodic alarm signal. (UNIX signals are evil).
Acknowledged. Thanks for the fix. I added it to the development branch.
/Stefan |
68729
|
Thu Feb 1 03:12:03 2018 |
| Yves | vanhaarlemyves@gmail.com | Question | Windows | 2.9.2->3.1.3 | v3.1.3 does not work with logbooks from v2.9.2? | I have just upgraded elog from 2.9.2 -> 3.1.3.
3.1.3 runs fine with new logbooks. However, when trying to run 3.1.3 with my logbooks created with 2.9.2 things stop working.
Here is the command I run for testing [attachment 1]: first of all it takes a very long time (~ 10 minutes) for it to index the logbooks. When finished indexing I try it out in a web browser - it takes infinite time to load: no error message appears but also no logbook. After an hour or so elogd crashes without an error message.
When running 2.9.2 on the same machine, all runs wel (attachment 2)
cfd file: (I only left in one logbook - they are all configured the same)
[global]
port = 18080
Logging level = 3
Max content length = 500000000
Date format = %A, %d %B %Y
[Logrun - Amptek]
Theme = default
Comment = Logrun Amptec
Reverse sort = 0
Quick filter = Date, Type
Any ideas on how to solve this? |
68730
|
Thu Feb 1 10:14:55 2018 |
| Andreas Luedeke | andreas.luedeke@psi.ch | Question | Windows | 2.9.2->3.1.3 | Re: v3.1.3 does not work with logbooks from v2.9.2? | Hi Yves,
just my two pence, maybe they help you to figure out what's going on:
versions 2.* had all entries of one logbook in one directory. Version 3.* create a subdirectory for each year. This had been added for me: if you use AFS for logbook storage, then you have a limit on how many files you can put into a single directory.
So the first time you start elogd 3.* with data from an elogd 2.* it should move all your logbook entries into sub-directories for each year. If that would have happened, you would not be able to use these logbook directories with the 2.9.2 version.
Maybe your logbook client is not allowed to create sub-directories? Although I would guess that it then would just throw an error message and stop.
Cheers, Andreas
Yves wrote: |
I have just upgraded elog from 2.9.2 -> 3.1.3.
3.1.3 runs fine with new logbooks. However, when trying to run 3.1.3 with my logbooks created with 2.9.2 things stop working.
Here is the command I run for testing [attachment 1]: first of all it takes a very long time (~ 10 minutes) for it to index the logbooks. When finished indexing I try it out in a web browser - it takes infinite time to load: no error message appears but also no logbook. After an hour or so elogd crashes without an error message.
When running 2.9.2 on the same machine, all runs wel (attachment 2)
cfd file: (I only left in one logbook - they are all configured the same)
[global]
port = 18080
Logging level = 3
Max content length = 500000000
Date format = %A, %d %B %Y
[Logrun - Amptek]
Theme = default
Comment = Logrun Amptec
Reverse sort = 0
Quick filter = Date, Type
Any ideas on how to solve this?
|
|
68733
|
Fri Feb 2 00:02:54 2018 |
| Yves | vanhaarlemyves@gmail.com | Question | Windows | 2.9.2->3.1.3 | Re: v3.1.3 does not work with logbooks from v2.9.2? - solved | Hi Andreas,
Thanks - you pointed me in the right direction. It appears that my logbooks were a combination of the two versions. I had all the year-directories (version 3) but also all the entry files in the main logbook directory. Seems version 2 does not care but version 3 does not like it. After carefully checking and removing all the logbook files in the main directory version 3 now works.
Cheers,
Yves
Andreas Luedeke wrote: |
Hi Yves,
just my two pence, maybe they help you to figure out what's going on:
versions 2.* had all entries of one logbook in one directory. Version 3.* create a subdirectory for each year. This had been added for me: if you use AFS for logbook storage, then you have a limit on how many files you can put into a single directory.
So the first time you start elogd 3.* with data from an elogd 2.* it should move all your logbook entries into sub-directories for each year. If that would have happened, you would not be able to use these logbook directories with the 2.9.2 version.
Maybe your logbook client is not allowed to create sub-directories? Although I would guess that it then would just throw an error message and stop.
Cheers, Andreas
Yves wrote: |
I have just upgraded elog from 2.9.2 -> 3.1.3.
3.1.3 runs fine with new logbooks. However, when trying to run 3.1.3 with my logbooks created with 2.9.2 things stop working.
Here is the command I run for testing [attachment 1]: first of all it takes a very long time (~ 10 minutes) for it to index the logbooks. When finished indexing I try it out in a web browser - it takes infinite time to load: no error message appears but also no logbook. After an hour or so elogd crashes without an error message.
When running 2.9.2 on the same machine, all runs wel (attachment 2)
cfd file: (I only left in one logbook - they are all configured the same)
[global]
port = 18080
Logging level = 3
Max content length = 500000000
Date format = %A, %d %B %Y
[Logrun - Amptek]
Theme = default
Comment = Logrun Amptec
Reverse sort = 0
Quick filter = Date, Type
Any ideas on how to solve this?
|
|
|
67602
|
Tue Nov 5 23:21:52 2013 |
| A.G. Schubert | alexis4@stanford.edu | Bug report | Mac OSX | 2.9.2-2494 | Compilation failure on Mac OSX 10.9 | When compiling elog on OSX 10.9 (Mavericks), I get the error below.
Elog will compile without error if I add -D_FORTIFY_SOURCE=0 to CFLAGS in Makefile, but I'm not sure whether this is a good idea.
$ make
cc -O3 -funroll-loops -fomit-frame-pointer -W -Wall -I../mxml -DHAVE_SSL -w -c -o crypt.o src/crypt.c
cc -O3 -funroll-loops -fomit-frame-pointer -W -Wall -I../mxml -DHAVE_SSL -o elog src/elog.c crypt.o -lssl
src/elog.c:125:8: error: expected parameter declarator
size_t strlcpy(char *dst, const char *src, size_t size)
^
/usr/include/secure/_string.h:105:44: note: expanded from macro 'strlcpy'
__builtin___strlcpy_chk (dest, src, len, __darwin_obsz (dest))
^
/usr/include/secure/_common.h:39:62: note: expanded from macro '__darwin_obsz'
#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)
^
/usr/include/secure/_common.h:30:32: note: expanded from macro '_USE_FORTIFY_LEVEL'
# define _USE_FORTIFY_LEVEL 2
^
src/elog.c:125:8: error: expected ')'
/usr/include/secure/_string.h:105:44: note: expanded from macro 'strlcpy'
__builtin___strlcpy_chk (dest, src, len, __darwin_obsz (dest))
^
/usr/include/secure/_common.h:39:62: note: expanded from macro '__darwin_obsz'
#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)
^
/usr/include/secure/_common.h:30:32: note: expanded from macro '_USE_FORTIFY_LEVEL'
# define _USE_FORTIFY_LEVEL 2
^
src/elog.c:125:8: note: to match this '('
/usr/include/secure/_string.h:105:44: note: expanded from macro 'strlcpy'
__builtin___strlcpy_chk (dest, src, len, __darwin_obsz (dest))
^
/usr/include/secure/_common.h:39:53: note: expanded from macro '__darwin_obsz'
#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)
^ |
67603
|
Wed Nov 6 09:04:32 2013 |
| Stefan Ritt | stefan.ritt@psi.ch | Bug report | Mac OSX | 2.9.2-2494 | Re: Compilation failure on Mac OSX 10.9 |
A.G. Schubert wrote: |
When compiling elog on OSX 10.9 (Mavericks), I get the error below.
Elog will compile without error if I add -D_FORTIFY_SOURCE=0 to CFLAGS in Makefile, but I'm not sure whether this is a good idea.
|
All over sudden gcc comes with its own version of "strlcpy", which I had defined "manually" since many years inside ELOG. Using -DFORTIFY_SOURCE=0 will not harm, so you can use it. The "real" solution is to take our ELOG's strlcpy/strlcat, which I did on the current SVN version.
Best regards,
Stefan |
67605
|
Thu Nov 7 02:18:17 2013 |
| A.G. Schubert | alexis4@stanford.edu | Bug report | Mac OSX | 2.9.2-2494 | Re: Compilation failure on Mac OSX 10.9 |
Stefan Ritt wrote: |
A.G. Schubert wrote: |
When compiling elog on OSX 10.9 (Mavericks), I get the error below.
Elog will compile without error if I add -D_FORTIFY_SOURCE=0 to CFLAGS in Makefile, but I'm not sure whether this is a good idea.
|
All over sudden gcc comes with its own version of "strlcpy", which I had defined "manually" since many years inside ELOG. Using -DFORTIFY_SOURCE=0 will not harm, so you can use it. The "real" solution is to take our ELOG's strlcpy/strlcat, which I did on the current SVN version.
Best regards,
Stefan
|
Ok, I tried updating my SVN working copy, but I didn't get any updates past elog rev. 2494, mxml rev. 74. I undid my changes to Makefile, tried to compile, but got the same errors.
I then pulled down elog and mxml with git, and these are working for me with no errors. Thanks! |
|