I was finally able to catch the crash.
I paste below the info provided by lldb..
It seems that it has something to do with the 'first' logbook that contains 115 entries and is displayed in 6 pages.
But I do not know how to go any further...
Any idea on how to debug from now on?
Thanks!
2019-04-30 12:32:27.602782+0200 elogd[19289:1908166] detected source and destination buffer overlap
Process 19289 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff7a1272c6 <+10>: jae 0x7fff7a1272d0 ; <+20>
0x7fff7a1272c8 <+12>: movq %rax, %rdi
0x7fff7a1272cb <+15>: jmp 0x7fff7a121457 ; cerror_nocancel
0x7fff7a1272d0 <+20>: retq
Target 0: (elogd) stopped.
(lldb) thread backtrace all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff7a1dcbf1 libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff7a0916a6 libsystem_c.dylib`abort + 127
frame #3: 0x00007fff7a091819 libsystem_c.dylib`abort_report_np + 177
frame #4: 0x00007fff7a0b5cb1 libsystem_c.dylib`__chk_fail + 48
frame #5: 0x00007fff7a0b5cc1 libsystem_c.dylib`__chk_fail_overlap + 16
frame #6: 0x00007fff7a0b5ce3 libsystem_c.dylib`__chk_overlap + 34
frame #7: 0x00007fff7a0b5d39 libsystem_c.dylib`__strlcpy_chk + 58
frame #8: 0x000000010006a7ac elogd`build_ref(ref="page6?&sort=Subject", size=256, mode="full", expand="", attach="", new_entries="") at elogd.c:19021:7
frame #9: 0x000000010006aaf6 elogd`show_page_filters(lbs=0x0000000102804308, n_msg=115, page_n=6, mode_commands=YES, mode="Summary") at elogd.c:19072:10
frame #10: 0x00000001000536b8 elogd`show_elog_list(lbs=0x0000000102804308, past_n=0, last_n=0, page_n=6, default_page=NO, info=0x0000000000000000) at elogd.c:21506:10
frame #11: 0x000000010008ee58 elogd`interprete(lbook="first", path="") at elogd.c:28543:7
frame #12: 0x000000010008f096 elogd`decode_get(logbook="first", string="?id") at elogd.c:28583:4
frame #13: 0x00000001000937fd elogd`process_http_request(request="GET /first?id=108&sort=Subject", i_conn=0) at elogd.c:29361:7
frame #14: 0x0000000100097744 elogd`server_loop at elogd.c:30375:20
frame #15: 0x000000010009a073 elogd`main(argc=3, argv=0x00007ffeefbffc20) at elogd.c:31403:4
frame #16: 0x00007fff79fec3d5 libdyld.dylib`start + 1
Stefan Ritt wrote: |
What you recommend is enough. Just make sure to compile elogd with the flags mentioned before, and when you get the segment violation, do a stack trace inside the debugger to learn where the fault happend. Maybe also print the contents of some variables at the current location.
Stefan
Alessio Sarti wrote: |
Thanks for the prompt feedback.
a) I confirm that the problems shows up also when running interactively the elog through elogd -p 8080
b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)
c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?
Thanks again.
Alessio
Stefan Ritt wrote: |
This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.
Do you see the same problem running under linux?
Do you see the same problem if you run elogd interactively (not through launchd)?
If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.
Stefan
Alessio Sarti wrote: |
Dear all.
I am running elog
elogd 3.1.4 , revision ead6bbc6
on Macosx Mojave
Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
I managed to compile and run without problems the elog source code.
I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog but then, after few hours.. I get that the service stops and the elog is no longer accessible.
So far I was able to track down the problems only to the
/var/log/system.log
file in which I find a not useful error message:
Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1
I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.
I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?
Thanks
|
|
|
|
|