Demo Discussion
Forum Config Examples Contributions Vulnerabilities
  Discussion forum about ELOG, Page 175 of 808  Not logged in ELOG logo
ID Date Icon Author Author Email Category OS ELOG Version Subjectdown
  66189   Wed Feb 4 18:46:58 2009 Reply Edmundo T Rodriguezedrodrig@chpnet.orgRequestLinux2.7.5Re: frequent crashes on SL4
> ------
> 
> I plan on letting elogd create a core dump, but so far I haven't managed to change its cwd to a directory 
elog can write to.
> 
> Please let me know if there is any other information I can provide.  Any suggestions would be greatly 
appreciated.
> 
> Many thanks,
> Devin 

There are other debugers ...

  Whay don't you give them a try?

  example: Install "strace" (if you don't have it) and do something like ...

           strace   gdb /usr/local/sbin/elogd 6162  -debug 2>debug.out


   Also there is "ltrace", etc.
  66190   Wed Feb 4 19:34:35 2009 Reply Stefan Rittstefan.ritt@psi.chRequestLinux2.7.5Re: frequent crashes on SL4
> Hi, All.  Ever since upgrading from an old ELOG release on an aging windows machine to the latest version on Scientific Linux 4 (RHEL4), and 
> greatly increasing its use, we have seen frequent crashes of elogd.  This has become very disruptive to operations, and any help would be greatly 
> appreciated.  We are using Apache (running on the same machine as elogd) to secure ELOG using https as per the Administrator's Guide.

Just follow

https://midas.psi.ch/elog/faq.html#19

Crashes with attached images are getting reported more and more these days, but so far I was not able to reproduce it. Maybe it's related to ImageMagic 
somehow, in which case disabling this feature might give some insight. To do so, you have to modify elogd.c and recompile. Change

   /* check for ImageMagick */
   my_shell("convert -version", str, sizeof(str));
   image_magick_exist = (strstr(str, "ImageMagick") != NULL);


to

   /* check for ImageMagick */
   image_magick_exist = 0;
  66191   Wed Feb 4 21:41:46 2009 Reply Devin Bougiedab66@cornell.eduRequestLinux2.7.5Re: frequent crashes on SL4
Hi Stefan,

> Just follow
> https://midas.psi.ch/elog/faq.html#19

That's what I attempted to do, but the need to restart ELOG before I could get to the gdb console prevented us from obtaining a stack trace.  I am now setting up a test ELOG server where we will continue 
trying to reproduce our crashes and obtain a stack trace.

> Crashes with attached images are getting reported more and more these days, but so far I was not able to reproduce it. Maybe it's related to ImageMagic 
> somehow, in which case disabling this feature might give some insight. To do so, you have to modify elogd.c and recompile. Change
> 
>    /* check for ImageMagick */
>    my_shell("convert -version", str, sizeof(str));
>    image_magick_exist = (strstr(str, "ImageMagick") != NULL);
> 
> 
> to
> 
>    /* check for ImageMagick */
>    image_magick_exist = 0;

This has now been done and installed on our production server.  I will let you know if we have any more crashes with ImageMagick disabled.

Many thanks,
Devin
  66197   Fri Feb 6 23:43:47 2009 Reply Devin Bougiedab66@lepp.cornell.eduRequestLinux2.7.5Re: frequent crashes on SL4
Hi Stefan,

The bad news is that elogd is still crashing even after disabling Image Magick.  The good news is that this time it was reproducible and I did obtain a
stack trace using gdb.  In this instance, a user was attempting to edit an entry he had just successfully posted.  In the crash shown below, he just Clicked
on "Edit" to edit the entry and then "Submit" without changing any text.  In the previous crash (that I don't have a stack trace for), he did actually try
to update the text of the entry.

Please let me know if there is any more information I can provide.

Many thanks,
Devin

------
[root@lnx248 ~]# gdb /usr/local/sbin/elogd 18720
GNU gdb Red Hat Linux (6.3.0.0-1.143.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /usr/local/sbin/elogd, process 18720
Reading symbols from /lib/libssl.so.4...(no debugging symbols found)...done.
Loaded symbols for /lib/libssl.so.4
Reading symbols from /lib/tls/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /usr/lib/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/libkrb5.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /usr/lib/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/libresolv.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libcrypto.so.4...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.4
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_nis.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
0x007ef7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0087663b in strlen () from /lib/tls/libc.so.6
(gdb) where
#0  0x0087663b in strlen () from /lib/tls/libc.so.6
#1  0x0804a4de in strieq ()
#2  0x636f6c2f in ?? ()
#3  0x00000012 in ?? ()
#4  0x00000003 in ?? ()
#5  0x62676f6c in ?? ()
#6  0x736b6f6f in ?? ()
#7  0xbff0a870 in ?? ()
#8  0x08051ddd in getcfg ()
#9  0xbff87340 in ?? ()
#10 0xbff87340 in ?? ()
#11 0x08051cfc in getcfg ()
#12 0x080c8e3a in __PRETTY_FUNCTION__.2 ()
#13 0xbff84100 in ?? ()
#14 0x00002710 in ?? ()
#15 0x00000000 in ?? ()
(gdb) bt
#0  0x0087663b in strlen () from /lib/tls/libc.so.6
#1  0x0804a4de in strieq ()
#2  0x636f6c2f in ?? ()
#3  0x00000012 in ?? ()
#4  0x00000003 in ?? ()
#5  0x62676f6c in ?? ()
#6  0x736b6f6f in ?? ()
#7  0xbff0a870 in ?? ()
#8  0x08051ddd in getcfg ()
#9  0xbff87340 in ?? ()
#10 0xbff87340 in ?? ()
#11 0x08051cfc in getcfg ()
#12 0x080c8e3a in __PRETTY_FUNCTION__.2 ()
#13 0xbff84100 in ?? ()
#14 0x00002710 in ?? ()
#15 0x00000000 in ?? ()
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/local/sbin/elogd, process 18720
  66198   Sat Feb 7 01:47:07 2009 Reply Devin Bougiedab66@lepp.cornell.eduRequestLinux2.7.5Re: frequent crashes on SL4
> The bad news is that elogd is still crashing even after disabling Image Magick.  The good news is that this time it was reproducible and I did obtain a
> stack trace using gdb.

The seg fault reported above seems to be related to a specific elog entry combined with sending email notifications.  When trying to submit changes to that entry while sending email notifications, elogd seg faults.  We can edit the entry before it and create and edit a new entry after 
it.  Trying to edit that one entry and send notifications, however, reliably crashes elogd.  If I check "supress email notification", elogd does not seem to crash.  I have replicated this on separate elog servers running both SL4 and on SL5.  To compare with SL4, here is the trace I see on SL5 (Scientific Linux 5).

Once I receive the OK, I will send you the actual file that stores the problematic entry.  Please let me know if there is anything else I can send you.

Many thanks,
Devin

------
[root@lnx767 ~]# gdb /usr/local/sbin/elogd 13328
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".

Attaching to program: /usr/local/sbin/elogd, process 13328
Reading symbols from /lib/libssl.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libssl.so.6
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/libkrb5.so.3...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /usr/lib/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/libresolv.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libcrypto.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.6
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libkrb5support.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libkrb5support.so.0
Reading symbols from /lib/libkeyutils.so.1...
(no debugging symbols found)...done.
Loaded symbols for /lib/libkeyutils.so.1
Reading symbols from /lib/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libselinux.so.1
Reading symbols from /lib/libsepol.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libsepol.so.1
Reading symbols from /lib/libnss_files.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1

(no debugging symbols found)
0x008db402 in __kernel_vsyscall ()
(gdb) c
Continuing.
(no debugging symbols found)

Program received signal SIGSEGV, Segmentation fault.
0x0804a125 in strieq ()
(gdb) where
#0  0x0804a125 in strieq ()
#1  0x73207365 in ?? ()
#2  0x00000000 in ?? ()
  66199   Sat Feb 7 01:59:53 2009 Reply Devin Bougiedab66@lepp.cornell.eduRequestLinux2.7.5Re: frequent crashes on SL4
Hi Stefan,

I hope I'm not bombarding you, but we seem to be seeing crashes in two separate scenarios.  In addition to the crashes I previously reported (editing a problematic entry and sending notifications), we 
have a (seemingly) separate means of crashing elogd.

One of our users connects from home using a satellite network (bursty and high latency).  Any time he attempts to upload an image from this connection, elogd crashes.  He has not yet seen any 
problems when using the same computer on site.  Here are the stack traces I've obtained, first on SL4, then on SL5.  Again, please let me know if there is any other information I can provide.

Thank you very much for your time and effort.

Sincerely,
Devin

------
[root@lnx100 ~]# gdb /usr/local/sbin/elogd 3728
GNU gdb Red Hat Linux (6.3.0.0-1.143.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /usr/local/sbin/elogd, process 3728
Reading symbols from /lib/libssl.so.4...(no debugging symbols found)...done.
Loaded symbols for /lib/libssl.so.4
Reading symbols from /lib/tls/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /usr/lib/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/libkrb5.so.3...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /usr/lib/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/libresolv.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libcrypto.so.4...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.4
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_nis.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
0x008c17a2 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x080af727 in decode_post ()
(gdb) where
#0  0x080af727 in decode_post ()
#1  0x00000000 in ?? ()
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/local/sbin/elogd, process 3728

----

[root@lnx767 ~]# gdb /usr/local/sbin/elogd 13111
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".

Attaching to program: /usr/local/sbin/elogd, process 13111
Reading symbols from /lib/libssl.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libssl.so.6
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/libkrb5.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /usr/lib/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libcrypto.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.6
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libkrb5support.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libkrb5support.so.0
Reading symbols from /lib/libkeyutils.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libkeyutils.so.1
Reading symbols from /lib/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libselinux.so.1
Reading symbols from /lib/libsepol.so.1...
(no debugging symbols found)...done.
Loaded symbols for /lib/libsepol.so.1
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
(no debugging symbols found)
0x00308402 in __kernel_vsyscall ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x080b2f8a in decode_post ()
(gdb) where
#0  0x080b2f8a in decode_post ()
#1  0x00000100 in ?? ()
#2  0x00000000 in ?? ()
  66200   Sat Feb 7 06:26:48 2009 Reply Devin Bougiedab66@lepp.cornell.eduRequestLinux2.7.5Re: frequent crashes on SL4
Hi Stefan,

Just incase it helps, I am attaching the file for an entry in our demo logbook.  If I edit this entry and click on submit (without checking "suppress email notification"), I get the seg fault shown below.

If I delete any single character from the entry, I do not see any problems.  If I re-insert a character anywhere, the problem returns.

I hope this is useful.  

Thanks again,
Devin

------
[root@lnx100 ~]# gdb /usr/local/sbin/elogd 6040
GNU gdb Red Hat Linux (6.3.0.0-1.143.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /usr/local/sbin/elogd, process 6040
Reading symbols from /lib/libssl.so.4...(no debugging symbols found)...done.
Loaded symbols for /lib/libssl.so.4
Reading symbols from /lib/tls/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /usr/lib/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/libkrb5.so.3...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /usr/lib/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/libresolv.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libcrypto.so.4...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.4
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_nis.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
0x008c17a2 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x080ad900 in interprete ()
(gdb) where
#0  0x080ad900 in interprete ()
#1  0x00000031 in ?? ()
#2  0x000005dc in ?? ()
#3  0x00000000 in ?? ()
Attachment 1: 090206a.log
$@MID@$: 570
Date: Fri, 06 Feb 2009 17:01:25 -0500
Author: Devin Bougie
Area: EXAMPLE
Type: EXAMP
Category: Beam Operation
Subject: beam running
Email: dab66@cornell.edu
Text: test
Record date: 1233939600
Attachment: 
Encoding: plain
========================================
test
ka
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaagggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaafrfff
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaahhhhhhhh
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaajjjjjjjj
aaaaaaaaaaaaa aaaaaaaaaaaaaffffffff
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaam
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  66206   Thu Feb 12 17:13:05 2009 Reply Stefan Rittstefan.ritt@psi.chRequestLinux2.7.5Re: frequent crashes on SL4
Hi Devin,

first of all, your stack traces are only of limited use for me. This typically happens 
if you attach gdb to a running process, then you get something like

#0  0x080b2f8a in decode_post ()
#1  0x00000100 in ?? ()
#2  0x00000000 in ?? ()

(note the ??). If you run elogd directly from gdb, the stack trace contains much more information:

[meg@megon elog]# gdb elogd
...

(gdb) run
...
Server listening on port 8080 ...

Program received signal SIGINT, Interrupt.
0x0000003cb48c78d3 in __select_nocancel () from /lib64/libc.so.6
(gdb) where
#0  0x0000003cb48c78d3 in __select_nocancel () from /lib64/libc.so.6
#1  0x000000000046ea51 in server_loop () at src/elogd.c:27688
#2  0x0000000000471de8 in main (argc=1, argv=0x7fffe2b9bf18) at src/elogd.c:29018
(gdb) 

including the line numbers, arguments etc. So please try to start elogd from inside gdb 
and then reproduce your crash.

Your first problem seems to be related to some contents of your elogd.cfg, since in 
one stack dump I saw a 

strlen()
...
getcfg()

Here, the getcfg() function is called to retrieve some configuration from elogd.cfg. 
Maybe you have a very long line, or the file is otherwise corrupt. Please check that
carfully and send me your elogd.cfg so that I can have a look myself. Usually it helps
to remove one line after the other and check when the problem disappears.

Your other problem which has the decode_post() in the stack dump seems to be related
to the case when you upload an entry (or attachment), and the TCP link breaks in 
the middle. Probably the error handling in such a case is not correct. I will try
to reproduce this, although I don't have a satellite network.

Best regards,

  Stefan
ELOG V3.1.5-3fb85fa6