ID |
Date |
Icon |
Author |
Author Email |
Category |
OS |
ELOG Version |
Subject |
68785
|
Sun Apr 15 08:03:21 2018 |
| Michael Kelsey | kelsey@slac.stanford.edu | Bug report | Mac OSX | 3.1.3 | Re: "Slow script" problem posting/editing from Safari -- browser hangs, times out | Thank you for your suggestion, Stefan! The sysadmin who handles our e-Log server implemented your suggestion earlier today (Saturday). I have been able to successfully create and modify e-Log entries with Safari since then. Since the "slow script" issue has been intermittent in the past, I plan to continue testing and monitoring for the next day or so. Nevertheless, it appears that removing the waiting loop has alleviated my problem.
-- Mike Kelsey
Followup Sunday 15 Apr (U.S. Pacifiic time): I think your suggestion has solved my problem. I've been able to create and modify e-Log entries through our server multiple times over the weekend. There have been no hangs, timeouts, or lost content. Thank you very much for your response!
Stefan Ritt wrote: |
I'm not 100% sure, but I believe it should work without the
while (in_asend);
So can you please remove that line (it's in src/elogd.c) and recompile elogd and test it?
Stefan
Michael Hibbard wrote: |
I dont' have a solution, but I just wanted to bring more attention to your post. I too am having the same issue with the ELOG system and the Safari browser. I have noticed that ELOG is most stabel and function on the client end from the IE browser. A few weeks ago I also had to switch from using ELOG on the client end from Safari to Firefox.
Michael Kelsey wrote: |
Hello! The CDMS collaboration is using e-Log as one of it's issue tracking systems. In the last few months, I have noticed a problem when either creating or editing entries from my usual Safari browser (currently 11.1 on MacOSX 10.13.4): The [Submit] button triggers a spinning beach ball, with no connection to our e-Log server, and after several minutes, Safari complains the the page had to be reloaded, discarding all of my edits, uploads, whatever. This used to be occasional, but in the past month it has become routine, such that the only way I can edit or create entries is by launching a different browser entirely (Firefox), just for e-Log editing.
Now, I am also seeing the same problems with Firefox, but at the "occasional" level. The difference is that Firefox produces some diagnostic information, which is why I'm posting here. When the browser hangs, after a short while Firefox produces a "Warning: unresponsive script" drop-down box:
Warning: unresponsive script
A script on this page may be busy, or it may have stopped responding. You can stop the script now, open the script in the debugger, or let the script continue.
Script: http://titus.stanford.edu/cdms…/SuperSim/681?cmd=Edit&steal=1:30
[] Don't ask me again
[Debug script] [Stop script] [Continue]
If I use the [Debug script] button, the call stack shows "onclick 681:1" -> "chkform 681:30", and the line-by-line traceback shows the chkform function:
16 var in_asend = false;
17
18 function chkform()
19 {
20 if (last_key == 13) {
21 var ret = confirm('Really submit this entry?');
22 if (!ret) {
23 last_key = 0;
24 return false;
25 }
26 }
27
28 if (autoSaveTimer != null)
29 clearTimeout(autoSaveTimer);
30 while (in_asend); <=== This is the stuck line
31 submitted = true;
32 return true;
33 }
I presume that in_asend is supposed to get changed from false to true asynchronously, by some other parallel communication with the server. But that doesn't seem to be happening.
Does this look like an issue with the e-Log distribution? Or is there a configuration issue with our e-Log server which we could improve?
|
|
|
|
68786
|
Mon Apr 16 08:19:16 2018 |
| Stefan Ritt | stefan.ritt@psi.ch | Bug report | Mac OSX | 3.1.3 | Re: "Slow script" problem posting/editing from Safari -- browser hangs, times out | Ok, I removed the code from the official code now.
A bit background: The "autosave" mechanism in elog saves regularly the current content in a "draft" message, so that the data does not get lost if the browser for example crashes. The saving is done asynchronously via some AJAX call. This call takes some time, since it's a round-trip to the elogd server. If the user hist "submit" during such a save, the second save might be issue before the first one has been finsihed. That's why I had a "while (in_asend)" in the code, where in_asend gets set to true when the AJAX call is started and false when it completes. Now JavaScript is not really multi-threading, so having a loop "while (in_asend)" can actually prevent the AJAX request to complete. This might have been different when I implemented that feature, which is the reason that it worked before. Without that code, it can now happen that a second HTTTP POST is sent before the first request finishes, but I guess this should not be a problem, since both requests come sequentially to the elogd server and are executed one after the other. So in worst case the elog entry text is just saved twice.
Michael Kelsey wrote: |
Thank you for your suggestion, Stefan! The sysadmin who handles our e-Log server implemented your suggestion earlier today (Saturday). I have been able to successfully create and modify e-Log entries with Safari since then. Since the "slow script" issue has been intermittent in the past, I plan to continue testing and monitoring for the next day or so. Nevertheless, it appears that removing the waiting loop has alleviated my problem.
-- Mike Kelsey
Followup Sunday 15 Apr (U.S. Pacifiic time): I think your suggestion has solved my problem. I've been able to create and modify e-Log entries through our server multiple times over the weekend. There have been no hangs, timeouts, or lost content. Thank you very much for your response!
Stefan Ritt wrote: |
I'm not 100% sure, but I believe it should work without the
while (in_asend);
So can you please remove that line (it's in src/elogd.c) and recompile elogd and test it?
Stefan
Michael Hibbard wrote: |
I dont' have a solution, but I just wanted to bring more attention to your post. I too am having the same issue with the ELOG system and the Safari browser. I have noticed that ELOG is most stabel and function on the client end from the IE browser. A few weeks ago I also had to switch from using ELOG on the client end from Safari to Firefox.
Michael Kelsey wrote: |
Hello! The CDMS collaboration is using e-Log as one of it's issue tracking systems. In the last few months, I have noticed a problem when either creating or editing entries from my usual Safari browser (currently 11.1 on MacOSX 10.13.4): The [Submit] button triggers a spinning beach ball, with no connection to our e-Log server, and after several minutes, Safari complains the the page had to be reloaded, discarding all of my edits, uploads, whatever. This used to be occasional, but in the past month it has become routine, such that the only way I can edit or create entries is by launching a different browser entirely (Firefox), just for e-Log editing.
Now, I am also seeing the same problems with Firefox, but at the "occasional" level. The difference is that Firefox produces some diagnostic information, which is why I'm posting here. When the browser hangs, after a short while Firefox produces a "Warning: unresponsive script" drop-down box:
Warning: unresponsive script
A script on this page may be busy, or it may have stopped responding. You can stop the script now, open the script in the debugger, or let the script continue.
Script: http://titus.stanford.edu/cdms…/SuperSim/681?cmd=Edit&steal=1:30
[] Don't ask me again
[Debug script] [Stop script] [Continue]
If I use the [Debug script] button, the call stack shows "onclick 681:1" -> "chkform 681:30", and the line-by-line traceback shows the chkform function:
16 var in_asend = false;
17
18 function chkform()
19 {
20 if (last_key == 13) {
21 var ret = confirm('Really submit this entry?');
22 if (!ret) {
23 last_key = 0;
24 return false;
25 }
26 }
27
28 if (autoSaveTimer != null)
29 clearTimeout(autoSaveTimer);
30 while (in_asend); <=== This is the stuck line
31 submitted = true;
32 return true;
33 }
I presume that in_asend is supposed to get changed from false to true asynchronously, by some other parallel communication with the server. But that doesn't seem to be happening.
Does this look like an issue with the e-Log distribution? Or is there a configuration issue with our e-Log server which we could improve?
|
|
|
|
|
68932
|
Tue Apr 23 14:06:36 2019 |
| Alessio Sarti | alessio.sarti@uniroma1.it | Bug report | Mac OSX | 3.1.4 | elogd Service exited with abnormal code: 1 | Dear all.
I am running elog
elogd 3.1.4 , revision ead6bbc6
on Macosx Mojave
Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
I managed to compile and run without problems the elog source code.
I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog but then, after few hours.. I get that the service stops and the elog is no longer accessible.
So far I was able to track down the problems only to the
/var/log/system.log
file in which I find a not useful error message:
Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1
I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.
I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?
Thanks |
68933
|
Tue Apr 23 14:26:51 2019 |
| Stefan Ritt | stefan.ritt@psi.ch | Bug report | Mac OSX | 3.1.4 | Re: elogd Service exited with abnormal code: 1 | This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.
Do you see the same problem running under linux?
Do you see the same problem if you run elogd interactively (not through launchd)?
If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.
Stefan
Alessio Sarti wrote: |
Dear all.
I am running elog
elogd 3.1.4 , revision ead6bbc6
on Macosx Mojave
Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
I managed to compile and run without problems the elog source code.
I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog but then, after few hours.. I get that the service stops and the elog is no longer accessible.
So far I was able to track down the problems only to the
/var/log/system.log
file in which I find a not useful error message:
Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1
I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.
I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?
Thanks
|
|
68942
|
Thu Apr 25 11:16:06 2019 |
| Alessio Sarti | alessio.sarti@uniroma1.it | Bug report | Mac OSX | 3.1.4 | Re: elogd Service exited with abnormal code: 1 | Thanks for the prompt feedback.
a) I confirm that the problems shows up also when running interactively the elog through elogd -p 8080
b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)
c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?
Thanks again.
Alessio
Stefan Ritt wrote: |
This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.
Do you see the same problem running under linux?
Do you see the same problem if you run elogd interactively (not through launchd)?
If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.
Stefan
Alessio Sarti wrote: |
Dear all.
I am running elog
elogd 3.1.4 , revision ead6bbc6
on Macosx Mojave
Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
I managed to compile and run without problems the elog source code.
I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog but then, after few hours.. I get that the service stops and the elog is no longer accessible.
So far I was able to track down the problems only to the
/var/log/system.log
file in which I find a not useful error message:
Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1
I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.
I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?
Thanks
|
|
|
68943
|
Thu Apr 25 11:27:21 2019 |
| Stefan Ritt | stefan.ritt@psi.ch | Bug report | Mac OSX | 3.1.4 | Re: elogd Service exited with abnormal code: 1 | What you recommend is enough. Just make sure to compile elogd with the flags mentioned before, and when you get the segment violation, do a stack trace inside the debugger to learn where the fault happend. Maybe also print the contents of some variables at the current location.
Stefan
Alessio Sarti wrote: |
Thanks for the prompt feedback.
a) I confirm that the problems shows up also when running interactively the elog through elogd -p 8080
b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)
c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?
Thanks again.
Alessio
Stefan Ritt wrote: |
This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.
Do you see the same problem running under linux?
Do you see the same problem if you run elogd interactively (not through launchd)?
If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.
Stefan
Alessio Sarti wrote: |
Dear all.
I am running elog
elogd 3.1.4 , revision ead6bbc6
on Macosx Mojave
Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
I managed to compile and run without problems the elog source code.
I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog but then, after few hours.. I get that the service stops and the elog is no longer accessible.
So far I was able to track down the problems only to the
/var/log/system.log
file in which I find a not useful error message:
Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1
I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.
I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?
Thanks
|
|
|
|
68951
|
Tue Apr 30 12:47:46 2019 |
| Alessio Sarti | alessio.sarti@uniroma1.it | Bug report | Mac OSX | 3.1.4 | Re: elogd Service exited with abnormal code: 1 | I was finally able to catch the crash.
I paste below the info provided by lldb..
It seems that it has something to do with the 'first' logbook that contains 115 entries and is displayed in 6 pages.
But I do not know how to go any further...
Any idea on how to debug from now on?
Thanks!
2019-04-30 12:32:27.602782+0200 elogd[19289:1908166] detected source and destination buffer overlap
Process 19289 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff7a1272c6 <+10>: jae 0x7fff7a1272d0 ; <+20>
0x7fff7a1272c8 <+12>: movq %rax, %rdi
0x7fff7a1272cb <+15>: jmp 0x7fff7a121457 ; cerror_nocancel
0x7fff7a1272d0 <+20>: retq
Target 0: (elogd) stopped.
(lldb) thread backtrace all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff7a1dcbf1 libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff7a0916a6 libsystem_c.dylib`abort + 127
frame #3: 0x00007fff7a091819 libsystem_c.dylib`abort_report_np + 177
frame #4: 0x00007fff7a0b5cb1 libsystem_c.dylib`__chk_fail + 48
frame #5: 0x00007fff7a0b5cc1 libsystem_c.dylib`__chk_fail_overlap + 16
frame #6: 0x00007fff7a0b5ce3 libsystem_c.dylib`__chk_overlap + 34
frame #7: 0x00007fff7a0b5d39 libsystem_c.dylib`__strlcpy_chk + 58
frame #8: 0x000000010006a7ac elogd`build_ref(ref="page6?&sort=Subject", size=256, mode="full", expand="", attach="", new_entries="") at elogd.c:19021:7
frame #9: 0x000000010006aaf6 elogd`show_page_filters(lbs=0x0000000102804308, n_msg=115, page_n=6, mode_commands=YES, mode="Summary") at elogd.c:19072:10
frame #10: 0x00000001000536b8 elogd`show_elog_list(lbs=0x0000000102804308, past_n=0, last_n=0, page_n=6, default_page=NO, info=0x0000000000000000) at elogd.c:21506:10
frame #11: 0x000000010008ee58 elogd`interprete(lbook="first", path="") at elogd.c:28543:7
frame #12: 0x000000010008f096 elogd`decode_get(logbook="first", string="?id") at elogd.c:28583:4
frame #13: 0x00000001000937fd elogd`process_http_request(request="GET /first?id=108&sort=Subject", i_conn=0) at elogd.c:29361:7
frame #14: 0x0000000100097744 elogd`server_loop at elogd.c:30375:20
frame #15: 0x000000010009a073 elogd`main(argc=3, argv=0x00007ffeefbffc20) at elogd.c:31403:4
frame #16: 0x00007fff79fec3d5 libdyld.dylib`start + 1
Stefan Ritt wrote: |
What you recommend is enough. Just make sure to compile elogd with the flags mentioned before, and when you get the segment violation, do a stack trace inside the debugger to learn where the fault happend. Maybe also print the contents of some variables at the current location.
Stefan
Alessio Sarti wrote: |
Thanks for the prompt feedback.
a) I confirm that the problems shows up also when running interactively the elog through elogd -p 8080
b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)
c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?
Thanks again.
Alessio
Stefan Ritt wrote: |
This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.
Do you see the same problem running under linux?
Do you see the same problem if you run elogd interactively (not through launchd)?
If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.
Stefan
Alessio Sarti wrote: |
Dear all.
I am running elog
elogd 3.1.4 , revision ead6bbc6
on Macosx Mojave
Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
I managed to compile and run without problems the elog source code.
I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog but then, after few hours.. I get that the service stops and the elog is no longer accessible.
So far I was able to track down the problems only to the
/var/log/system.log
file in which I find a not useful error message:
Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1
I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.
I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?
Thanks
|
|
|
|
|
68952
|
Tue Apr 30 14:07:52 2019 |
| Alessio Sarti | alessio.sarti@uniroma1.it | Bug report | Mac OSX | 3.1.4 | Re: elogd Service exited with abnormal code: 1 | Actually it is a little bit more difficult than that.
I have restarted elogd and got a crash but this time it seems related to a different logbook...
Below the stack trace..
Alessio
2019-04-30 13:58:52.408845+0200 elogd[22152:2009063] detected source and destination buffer overlap
Process 22152 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff7a1272c6 <+10>: jae 0x7fff7a1272d0 ; <+20>
0x7fff7a1272c8 <+12>: movq %rax, %rdi
0x7fff7a1272cb <+15>: jmp 0x7fff7a121457 ; cerror_nocancel
0x7fff7a1272d0 <+20>: retq
Target 0: (elogd) stopped.
(lldb)
error: No auto repeat.
(lldb) thread backtrace all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff7a1dcbf1 libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff7a0916a6 libsystem_c.dylib`abort + 127
frame #3: 0x00007fff7a091819 libsystem_c.dylib`abort_report_np + 177
frame #4: 0x00007fff7a0b5cb1 libsystem_c.dylib`__chk_fail + 48
frame #5: 0x00007fff7a0b5cc1 libsystem_c.dylib`__chk_fail_overlap + 16
frame #6: 0x00007fff7a0b5ce3 libsystem_c.dylib`__chk_overlap + 34
frame #7: 0x00007fff7a0b5d39 libsystem_c.dylib`__strlcpy_chk + 58
frame #8: 0x00000001000684e3 elogd`subst_param(str="&Type=%5EInfo%24", size=1500, param="last", value="") at elogd.c:18712:7
frame #9: 0x000000010004bbaa elogd`show_elog_list(lbs=0x0000000103801008, past_n=0, last_n=0, page_n=0, default_page=YES, info=0x0000000000000000) at elogd.c:20183:7
frame #10: 0x000000010008ee58 elogd`interprete(lbook="FOOTGsi2019", path="") at elogd.c:28543:7
frame #11: 0x000000010008f096 elogd`decode_get(logbook="FOOTGsi2019", string="?last") at elogd.c:28583:4
frame #12: 0x00000001000937fd elogd`process_http_request(request="GET /FOOTGsi2019/?last=_all_&Type=%5EInfo%24", i_conn=2) at elogd.c:29361:7
frame #13: 0x0000000100097744 elogd`server_loop at elogd.c:30375:20
frame #14: 0x000000010009a073 elogd`main(argc=3, argv=0x00007ffeefbffc20) at elogd.c:31403:4
frame #15: 0x00007fff79fec3d5 libdyld.dylib`start + 1
Alessio Sarti wrote: |
I was finally able to catch the crash.
I paste below the info provided by lldb..
It seems that it has something to do with the 'first' logbook that contains 115 entries and is displayed in 6 pages.
But I do not know how to go any further...
Any idea on how to debug from now on?
Thanks!
2019-04-30 12:32:27.602782+0200 elogd[19289:1908166] detected source and destination buffer overlap
Process 19289 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff7a1272c6 <+10>: jae 0x7fff7a1272d0 ; <+20>
0x7fff7a1272c8 <+12>: movq %rax, %rdi
0x7fff7a1272cb <+15>: jmp 0x7fff7a121457 ; cerror_nocancel
0x7fff7a1272d0 <+20>: retq
Target 0: (elogd) stopped.
(lldb) thread backtrace all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff7a1dcbf1 libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff7a0916a6 libsystem_c.dylib`abort + 127
frame #3: 0x00007fff7a091819 libsystem_c.dylib`abort_report_np + 177
frame #4: 0x00007fff7a0b5cb1 libsystem_c.dylib`__chk_fail + 48
frame #5: 0x00007fff7a0b5cc1 libsystem_c.dylib`__chk_fail_overlap + 16
frame #6: 0x00007fff7a0b5ce3 libsystem_c.dylib`__chk_overlap + 34
frame #7: 0x00007fff7a0b5d39 libsystem_c.dylib`__strlcpy_chk + 58
frame #8: 0x000000010006a7ac elogd`build_ref(ref="page6?&sort=Subject", size=256, mode="full", expand="", attach="", new_entries="") at elogd.c:19021:7
frame #9: 0x000000010006aaf6 elogd`show_page_filters(lbs=0x0000000102804308, n_msg=115, page_n=6, mode_commands=YES, mode="Summary") at elogd.c:19072:10
frame #10: 0x00000001000536b8 elogd`show_elog_list(lbs=0x0000000102804308, past_n=0, last_n=0, page_n=6, default_page=NO, info=0x0000000000000000) at elogd.c:21506:10
frame #11: 0x000000010008ee58 elogd`interprete(lbook="first", path="") at elogd.c:28543:7
frame #12: 0x000000010008f096 elogd`decode_get(logbook="first", string="?id") at elogd.c:28583:4
frame #13: 0x00000001000937fd elogd`process_http_request(request="GET /first?id=108&sort=Subject", i_conn=0) at elogd.c:29361:7
frame #14: 0x0000000100097744 elogd`server_loop at elogd.c:30375:20
frame #15: 0x000000010009a073 elogd`main(argc=3, argv=0x00007ffeefbffc20) at elogd.c:31403:4
frame #16: 0x00007fff79fec3d5 libdyld.dylib`start + 1
Stefan Ritt wrote: |
What you recommend is enough. Just make sure to compile elogd with the flags mentioned before, and when you get the segment violation, do a stack trace inside the debugger to learn where the fault happend. Maybe also print the contents of some variables at the current location.
Stefan
Alessio Sarti wrote: |
Thanks for the prompt feedback.
a) I confirm that the problems shows up also when running interactively the elog through elogd -p 8080
b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)
c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?
Thanks again.
Alessio
Stefan Ritt wrote: |
This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.
Do you see the same problem running under linux?
Do you see the same problem if you run elogd interactively (not through launchd)?
If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.
Stefan
Alessio Sarti wrote: |
Dear all.
I am running elog
elogd 3.1.4 , revision ead6bbc6
on Macosx Mojave
Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
I managed to compile and run without problems the elog source code.
I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog but then, after few hours.. I get that the service stops and the elog is no longer accessible.
So far I was able to track down the problems only to the
/var/log/system.log
file in which I find a not useful error message:
Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1
I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.
I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?
Thanks
|
|
|
|
|
|
|