Demo Discussion
Forum Config Examples Contributions Vulnerabilities
  Discussion forum about ELOG, Page 292 of 807  Not logged in ELOG logo
ID Date Icon Author Author Email Category OSdown ELOG Version Subject
  68785   Sun Apr 15 08:03:21 2018 Agree Michael Kelseykelsey@slac.stanford.eduBug reportMac OSX3.1.3Re: "Slow script" problem posting/editing from Safari -- browser hangs, times out

Thank you for your suggestion, Stefan!  The sysadmin who handles our e-Log server implemented your suggestion earlier today (Saturday).  I have been able to successfully create and modify e-Log entries with Safari since then.  Since the "slow script" issue has been intermittent in the past, I plan to continue testing and monitoring for the next day or so.  Nevertheless, it appears that removing the waiting loop has alleviated my problem.

  -- Mike Kelsey

Followup Sunday 15 Apr (U.S. Pacifiic time):  I think your suggestion has solved my problem.  I've been able to create and modify e-Log entries through our server multiple times over the weekend.  There have been no hangs, timeouts, or lost content.  Thank you very much for your response!

Stefan Ritt wrote:

I'm not 100% sure, but I believe it should work without the 

while (in_asend);

So can you please remove that line (it's in src/elogd.c) and recompile elogd and test it?

Stefan

Michael Hibbard wrote:

I dont' have a solution, but I just wanted to bring more attention to your post. I too am having the same issue with the ELOG system and the Safari browser. I have noticed that ELOG is most stabel and function on the client end from the IE browser. A few weeks ago I also had to switch from using ELOG on the client end from Safari to Firefox.

Michael Kelsey wrote:

Hello!  The CDMS collaboration is using e-Log as one of it's issue tracking systems.  In the last few months, I have noticed a problem when either creating or editing entries from my usual Safari browser (currently 11.1 on MacOSX 10.13.4):  The [Submit] button triggers a spinning beach ball, with no connection to our e-Log server, and after several minutes, Safari complains the the page had to be reloaded, discarding all of my edits, uploads, whatever.  This used to be occasional, but in the past month it has become routine, such that the only way I can edit or create entries is by launching a different browser entirely (Firefox), just for e-Log editing.

Now, I am also seeing the same problems with Firefox, but at the "occasional" level.  The difference is that Firefox produces some diagnostic information, which is why I'm posting here.  When the browser hangs, after a short while Firefox produces a "Warning: unresponsive script" drop-down box:

Warning: unresponsive script

A script on this page may be busy, or it may have stopped responding.  You can stop the script now, open the script in the debugger, or let the script continue.

Script: http://titus.stanford.edu/cdms…/SuperSim/681?cmd=Edit&steal=1:30

[] Don't ask me again

[Debug script]                       [Stop script]     [Continue]

If I use the [Debug script] button, the call stack shows "onclick 681:1" -> "chkform 681:30", and the line-by-line traceback shows the chkform function:

16   var in_asend = false;
17
18   function chkform()
19   {
20     if (last_key == 13) {
21       var ret = confirm('Really submit this entry?');
22       if (!ret) {
23         last_key = 0;
24         return false;
25       }
26     }
27
28     if (autoSaveTimer != null)
29       clearTimeout(autoSaveTimer);
30     while (in_asend);               <=== This is the stuck line
31     submitted = true;
32     return true;
33   }

I presume that in_asend is supposed to get changed from false to true asynchronously, by some other parallel communication with the server. But that doesn't seem to be happening.

Does this look like an issue with the e-Log distribution? Or is there a configuration issue with our e-Log server which we could improve?

 

 

 

  68786   Mon Apr 16 08:19:16 2018 Reply Stefan Rittstefan.ritt@psi.chBug reportMac OSX3.1.3Re: "Slow script" problem posting/editing from Safari -- browser hangs, times out

Ok, I removed the code from the official code now. 

A bit background: The "autosave" mechanism in elog saves regularly the current content in a "draft" message, so that the data does not get lost if the browser for example crashes. The saving is done asynchronously via some AJAX call. This call takes some time, since it's a round-trip to the elogd server. If the user hist "submit" during such a save, the second save might be issue before the first one has been finsihed. That's why I had a "while (in_asend)" in the code, where in_asend gets set to true when the AJAX call is started and false when it completes. Now JavaScript is not really multi-threading, so having a loop "while (in_asend)" can actually prevent the AJAX request to complete. This might have been different when I implemented that feature, which is the reason that it worked before. Without that code, it can now happen that a second HTTTP POST is sent before the first request finishes, but I guess this should not be a problem, since both requests come sequentially to the elogd server and are executed one after the other. So in worst case the elog entry text is just saved twice.

Michael Kelsey wrote:

Thank you for your suggestion, Stefan!  The sysadmin who handles our e-Log server implemented your suggestion earlier today (Saturday).  I have been able to successfully create and modify e-Log entries with Safari since then.  Since the "slow script" issue has been intermittent in the past, I plan to continue testing and monitoring for the next day or so.  Nevertheless, it appears that removing the waiting loop has alleviated my problem.

  -- Mike Kelsey

Followup Sunday 15 Apr (U.S. Pacifiic time):  I think your suggestion has solved my problem.  I've been able to create and modify e-Log entries through our server multiple times over the weekend.  There have been no hangs, timeouts, or lost content.  Thank you very much for your response!

Stefan Ritt wrote:

I'm not 100% sure, but I believe it should work without the 

while (in_asend);

So can you please remove that line (it's in src/elogd.c) and recompile elogd and test it?

Stefan

Michael Hibbard wrote:

I dont' have a solution, but I just wanted to bring more attention to your post. I too am having the same issue with the ELOG system and the Safari browser. I have noticed that ELOG is most stabel and function on the client end from the IE browser. A few weeks ago I also had to switch from using ELOG on the client end from Safari to Firefox.

Michael Kelsey wrote:

Hello!  The CDMS collaboration is using e-Log as one of it's issue tracking systems.  In the last few months, I have noticed a problem when either creating or editing entries from my usual Safari browser (currently 11.1 on MacOSX 10.13.4):  The [Submit] button triggers a spinning beach ball, with no connection to our e-Log server, and after several minutes, Safari complains the the page had to be reloaded, discarding all of my edits, uploads, whatever.  This used to be occasional, but in the past month it has become routine, such that the only way I can edit or create entries is by launching a different browser entirely (Firefox), just for e-Log editing.

Now, I am also seeing the same problems with Firefox, but at the "occasional" level.  The difference is that Firefox produces some diagnostic information, which is why I'm posting here.  When the browser hangs, after a short while Firefox produces a "Warning: unresponsive script" drop-down box:

Warning: unresponsive script

A script on this page may be busy, or it may have stopped responding.  You can stop the script now, open the script in the debugger, or let the script continue.

Script: http://titus.stanford.edu/cdms…/SuperSim/681?cmd=Edit&steal=1:30

[] Don't ask me again

[Debug script]                       [Stop script]     [Continue]

If I use the [Debug script] button, the call stack shows "onclick 681:1" -> "chkform 681:30", and the line-by-line traceback shows the chkform function:

16   var in_asend = false;
17
18   function chkform()
19   {
20     if (last_key == 13) {
21       var ret = confirm('Really submit this entry?');
22       if (!ret) {
23         last_key = 0;
24         return false;
25       }
26     }
27
28     if (autoSaveTimer != null)
29       clearTimeout(autoSaveTimer);
30     while (in_asend);               <=== This is the stuck line
31     submitted = true;
32     return true;
33   }

I presume that in_asend is supposed to get changed from false to true asynchronously, by some other parallel communication with the server. But that doesn't seem to be happening.

Does this look like an issue with the e-Log distribution? Or is there a configuration issue with our e-Log server which we could improve?

 

 

 

 

  68932   Tue Apr 23 14:06:36 2019 Warning Alessio Sartialessio.sarti@uniroma1.itBug reportMac OSX3.1.4elogd Service exited with abnormal code: 1

Dear all.

I am running elog 

elogd 3.1.4 , revision ead6bbc6

on Macosx Mojave

Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

I managed to compile and run without problems the elog source code.

I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog  but then, after few hours.. I get that the service stops and the elog is no longer accessible.

So far I was able to track down the problems only to the 

/var/log/system.log

file in which I find a not useful error message:

Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1

I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.

I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?

Thanks

  68933   Tue Apr 23 14:26:51 2019 Reply Stefan Rittstefan.ritt@psi.chBug reportMac OSX3.1.4Re: elogd Service exited with abnormal code: 1

This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.

Do you see the same problem running under linux?

Do you see the same problem if you run elogd interactively (not through launchd)?

If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.

Stefan 

Alessio Sarti wrote:

Dear all.

I am running elog 

elogd 3.1.4 , revision ead6bbc6

on Macosx Mojave

Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

I managed to compile and run without problems the elog source code.

I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog  but then, after few hours.. I get that the service stops and the elog is no longer accessible.

So far I was able to track down the problems only to the 

/var/log/system.log

file in which I find a not useful error message:

Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1

I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.

I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?

Thanks

 

  68942   Thu Apr 25 11:16:06 2019 Reply Alessio Sartialessio.sarti@uniroma1.itBug reportMac OSX3.1.4Re: elogd Service exited with abnormal code: 1

Thanks for the prompt feedback.

a) I confirm that the problems shows up also when running interactively the elog through  elogd -p 8080

b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)

c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?

Thanks again.

Alessio

 

Stefan Ritt wrote:

This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.

Do you see the same problem running under linux?

Do you see the same problem if you run elogd interactively (not through launchd)?

If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.

Stefan 

Alessio Sarti wrote:

Dear all.

I am running elog 

elogd 3.1.4 , revision ead6bbc6

on Macosx Mojave

Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

I managed to compile and run without problems the elog source code.

I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog  but then, after few hours.. I get that the service stops and the elog is no longer accessible.

So far I was able to track down the problems only to the 

/var/log/system.log

file in which I find a not useful error message:

Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1

I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.

I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?

Thanks

 

 

  68943   Thu Apr 25 11:27:21 2019 Reply Stefan Rittstefan.ritt@psi.chBug reportMac OSX3.1.4Re: elogd Service exited with abnormal code: 1

What you recommend is enough. Just make sure to compile elogd with the flags mentioned before, and when you get the segment violation, do a stack trace inside the debugger to learn where the fault happend. Maybe also print the contents of some variables at the current location.

Stefan

Alessio Sarti wrote:

Thanks for the prompt feedback.

a) I confirm that the problems shows up also when running interactively the elog through  elogd -p 8080

b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)

c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?

Thanks again.

Alessio

 

Stefan Ritt wrote:

This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.

Do you see the same problem running under linux?

Do you see the same problem if you run elogd interactively (not through launchd)?

If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.

Stefan 

Alessio Sarti wrote:

Dear all.

I am running elog 

elogd 3.1.4 , revision ead6bbc6

on Macosx Mojave

Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

I managed to compile and run without problems the elog source code.

I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog  but then, after few hours.. I get that the service stops and the elog is no longer accessible.

So far I was able to track down the problems only to the 

/var/log/system.log

file in which I find a not useful error message:

Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1

I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.

I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?

Thanks

 

 

 

  68951   Tue Apr 30 12:47:46 2019 Reply Alessio Sartialessio.sarti@uniroma1.itBug reportMac OSX3.1.4Re: elogd Service exited with abnormal code: 1

I was finally able to catch the crash.

I paste below the info provided by lldb..

It seems that it has something to do with the 'first' logbook that contains 115 entries and is displayed in 6 pages.

But I do not know how to go any further... 

Any idea on how to debug from now on?

Thanks!

 

2019-04-30 12:32:27.602782+0200 elogd[19289:1908166] detected source and destination buffer overlap

Process 19289 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT

    frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10

libsystem_kernel.dylib`__pthread_kill:

->  0x7fff7a1272c6 <+10>: jae    0x7fff7a1272d0            ; <+20>

    0x7fff7a1272c8 <+12>: movq   %rax, %rdi

    0x7fff7a1272cb <+15>: jmp    0x7fff7a121457            ; cerror_nocancel

    0x7fff7a1272d0 <+20>: retq   

Target 0: (elogd) stopped.

 

(lldb) thread backtrace all

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT

  * frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10

    frame #1: 0x00007fff7a1dcbf1 libsystem_pthread.dylib`pthread_kill + 284

    frame #2: 0x00007fff7a0916a6 libsystem_c.dylib`abort + 127

    frame #3: 0x00007fff7a091819 libsystem_c.dylib`abort_report_np + 177

    frame #4: 0x00007fff7a0b5cb1 libsystem_c.dylib`__chk_fail + 48

    frame #5: 0x00007fff7a0b5cc1 libsystem_c.dylib`__chk_fail_overlap + 16

    frame #6: 0x00007fff7a0b5ce3 libsystem_c.dylib`__chk_overlap + 34

    frame #7: 0x00007fff7a0b5d39 libsystem_c.dylib`__strlcpy_chk + 58

    frame #8: 0x000000010006a7ac elogd`build_ref(ref="page6?&sort=Subject", size=256, mode="full", expand="", attach="", new_entries="") at elogd.c:19021:7

    frame #9: 0x000000010006aaf6 elogd`show_page_filters(lbs=0x0000000102804308, n_msg=115, page_n=6, mode_commands=YES, mode="Summary") at elogd.c:19072:10

    frame #10: 0x00000001000536b8 elogd`show_elog_list(lbs=0x0000000102804308, past_n=0, last_n=0, page_n=6, default_page=NO, info=0x0000000000000000) at elogd.c:21506:10

    frame #11: 0x000000010008ee58 elogd`interprete(lbook="first", path="") at elogd.c:28543:7

    frame #12: 0x000000010008f096 elogd`decode_get(logbook="first", string="?id") at elogd.c:28583:4

    frame #13: 0x00000001000937fd elogd`process_http_request(request="GET /first?id=108&sort=Subject", i_conn=0) at elogd.c:29361:7

    frame #14: 0x0000000100097744 elogd`server_loop at elogd.c:30375:20

    frame #15: 0x000000010009a073 elogd`main(argc=3, argv=0x00007ffeefbffc20) at elogd.c:31403:4

    frame #16: 0x00007fff79fec3d5 libdyld.dylib`start + 1

 

 

Stefan Ritt wrote:

What you recommend is enough. Just make sure to compile elogd with the flags mentioned before, and when you get the segment violation, do a stack trace inside the debugger to learn where the fault happend. Maybe also print the contents of some variables at the current location.

Stefan

Alessio Sarti wrote:

Thanks for the prompt feedback.

a) I confirm that the problems shows up also when running interactively the elog through  elogd -p 8080

b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)

c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?

Thanks again.

Alessio

 

Stefan Ritt wrote:

This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.

Do you see the same problem running under linux?

Do you see the same problem if you run elogd interactively (not through launchd)?

If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.

Stefan 

Alessio Sarti wrote:

Dear all.

I am running elog 

elogd 3.1.4 , revision ead6bbc6

on Macosx Mojave

Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

I managed to compile and run without problems the elog source code.

I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog  but then, after few hours.. I get that the service stops and the elog is no longer accessible.

So far I was able to track down the problems only to the 

/var/log/system.log

file in which I find a not useful error message:

Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1

I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.

I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?

Thanks

 

 

 

 

  68952   Tue Apr 30 14:07:52 2019 Reply Alessio Sartialessio.sarti@uniroma1.itBug reportMac OSX3.1.4Re: elogd Service exited with abnormal code: 1

Actually it is a little bit more difficult than that.

I have restarted elogd and got a crash but this time it seems related to a different logbook...

Below the stack trace..

Alessio

 

 

2019-04-30 13:58:52.408845+0200 elogd[22152:2009063] detected source and destination buffer overlap

Process 22152 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT

    frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10

libsystem_kernel.dylib`__pthread_kill:

->  0x7fff7a1272c6 <+10>: jae    0x7fff7a1272d0            ; <+20>

    0x7fff7a1272c8 <+12>: movq   %rax, %rdi

    0x7fff7a1272cb <+15>: jmp    0x7fff7a121457            ; cerror_nocancel

    0x7fff7a1272d0 <+20>: retq   

Target 0: (elogd) stopped.

(lldb) 

error: No auto repeat.

(lldb) thread backtrace all

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT

  * frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10

    frame #1: 0x00007fff7a1dcbf1 libsystem_pthread.dylib`pthread_kill + 284

    frame #2: 0x00007fff7a0916a6 libsystem_c.dylib`abort + 127

    frame #3: 0x00007fff7a091819 libsystem_c.dylib`abort_report_np + 177

    frame #4: 0x00007fff7a0b5cb1 libsystem_c.dylib`__chk_fail + 48

    frame #5: 0x00007fff7a0b5cc1 libsystem_c.dylib`__chk_fail_overlap + 16

    frame #6: 0x00007fff7a0b5ce3 libsystem_c.dylib`__chk_overlap + 34

    frame #7: 0x00007fff7a0b5d39 libsystem_c.dylib`__strlcpy_chk + 58

    frame #8: 0x00000001000684e3 elogd`subst_param(str="&Type=%5EInfo%24", size=1500, param="last", value="") at elogd.c:18712:7

    frame #9: 0x000000010004bbaa elogd`show_elog_list(lbs=0x0000000103801008, past_n=0, last_n=0, page_n=0, default_page=YES, info=0x0000000000000000) at elogd.c:20183:7

    frame #10: 0x000000010008ee58 elogd`interprete(lbook="FOOTGsi2019", path="") at elogd.c:28543:7

    frame #11: 0x000000010008f096 elogd`decode_get(logbook="FOOTGsi2019", string="?last") at elogd.c:28583:4

    frame #12: 0x00000001000937fd elogd`process_http_request(request="GET /FOOTGsi2019/?last=_all_&Type=%5EInfo%24", i_conn=2) at elogd.c:29361:7

    frame #13: 0x0000000100097744 elogd`server_loop at elogd.c:30375:20

    frame #14: 0x000000010009a073 elogd`main(argc=3, argv=0x00007ffeefbffc20) at elogd.c:31403:4

    frame #15: 0x00007fff79fec3d5 libdyld.dylib`start + 1

Alessio Sarti wrote:

I was finally able to catch the crash.

I paste below the info provided by lldb..

It seems that it has something to do with the 'first' logbook that contains 115 entries and is displayed in 6 pages.

But I do not know how to go any further... 

Any idea on how to debug from now on?

Thanks!

 

2019-04-30 12:32:27.602782+0200 elogd[19289:1908166] detected source and destination buffer overlap

Process 19289 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT

    frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10

libsystem_kernel.dylib`__pthread_kill:

->  0x7fff7a1272c6 <+10>: jae    0x7fff7a1272d0            ; <+20>

    0x7fff7a1272c8 <+12>: movq   %rax, %rdi

    0x7fff7a1272cb <+15>: jmp    0x7fff7a121457            ; cerror_nocancel

    0x7fff7a1272d0 <+20>: retq   

Target 0: (elogd) stopped.

 

(lldb) thread backtrace all

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT

  * frame #0: 0x00007fff7a1272c6 libsystem_kernel.dylib`__pthread_kill + 10

    frame #1: 0x00007fff7a1dcbf1 libsystem_pthread.dylib`pthread_kill + 284

    frame #2: 0x00007fff7a0916a6 libsystem_c.dylib`abort + 127

    frame #3: 0x00007fff7a091819 libsystem_c.dylib`abort_report_np + 177

    frame #4: 0x00007fff7a0b5cb1 libsystem_c.dylib`__chk_fail + 48

    frame #5: 0x00007fff7a0b5cc1 libsystem_c.dylib`__chk_fail_overlap + 16

    frame #6: 0x00007fff7a0b5ce3 libsystem_c.dylib`__chk_overlap + 34

    frame #7: 0x00007fff7a0b5d39 libsystem_c.dylib`__strlcpy_chk + 58

    frame #8: 0x000000010006a7ac elogd`build_ref(ref="page6?&sort=Subject", size=256, mode="full", expand="", attach="", new_entries="") at elogd.c:19021:7

    frame #9: 0x000000010006aaf6 elogd`show_page_filters(lbs=0x0000000102804308, n_msg=115, page_n=6, mode_commands=YES, mode="Summary") at elogd.c:19072:10

    frame #10: 0x00000001000536b8 elogd`show_elog_list(lbs=0x0000000102804308, past_n=0, last_n=0, page_n=6, default_page=NO, info=0x0000000000000000) at elogd.c:21506:10

    frame #11: 0x000000010008ee58 elogd`interprete(lbook="first", path="") at elogd.c:28543:7

    frame #12: 0x000000010008f096 elogd`decode_get(logbook="first", string="?id") at elogd.c:28583:4

    frame #13: 0x00000001000937fd elogd`process_http_request(request="GET /first?id=108&sort=Subject", i_conn=0) at elogd.c:29361:7

    frame #14: 0x0000000100097744 elogd`server_loop at elogd.c:30375:20

    frame #15: 0x000000010009a073 elogd`main(argc=3, argv=0x00007ffeefbffc20) at elogd.c:31403:4

    frame #16: 0x00007fff79fec3d5 libdyld.dylib`start + 1

 

 

Stefan Ritt wrote:

What you recommend is enough. Just make sure to compile elogd with the flags mentioned before, and when you get the segment violation, do a stack trace inside the debugger to learn where the fault happend. Maybe also print the contents of some variables at the current location.

Stefan

Alessio Sarti wrote:

Thanks for the prompt feedback.

a) I confirm that the problems shows up also when running interactively the elog through  elogd -p 8080

b) I am trying to catch the exit using lldb on the mac machine. I will be able to give you some feedback on that I hope in the next week (not easy access to the server)

c) What is the clean - recommended way to port everything on the linux machine and debug? I would do the following: download/install elog on a linux server, 'copy' all that now lives under /usr/local/elog on the mac one on the linux server, start the elog... is this ok? or there's anything else that I need to copy from the mac server to be sure to have the same environment?

Thanks again.

Alessio

 

Stefan Ritt wrote:

This kind of behavior we typically see if some elog entry is corrupt. After a few hours you might access this corrupt entry by accident, and then the server stops. If you see however this behavior on a fresh logbook with no corrupt entries, then the problem must lie somewhere else.

Do you see the same problem running under linux?

Do you see the same problem if you run elogd interactively (not through launchd)?

If you run elogd inside a debugger (like gdb or lldb), what does the debugger tell you when it crashes and you show the stack frames? Make sure to compile with -O0 and -g flags to include debug information in the executable.

Stefan 

Alessio Sarti wrote:

Dear all.

I am running elog 

elogd 3.1.4 , revision ead6bbc6

on Macosx Mojave

Darwin arpg-serv.ing2.uniroma1.it 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

I managed to compile and run without problems the elog source code.

I can run it and have it properly displayed at boot time. After the server boot, for few hours, I have the elog ready at http://arpg-serv.ing2.uniroma1.it/elog  but then, after few hours.. I get that the service stops and the elog is no longer accessible.

So far I was able to track down the problems only to the 

/var/log/system.log

file in which I find a not useful error message:

Eg: Apr 23 14:00:46 arpg-serv com.apple.xpc.launchd[1] (ch.psi.elogd[85248]): Service exited with abnormal code: 1

I do not know I can I debug this nor why the code runs for few hours without problems... I just re-downloaded the code from scratch today, unloaded and then re-loaded the daemon but still it fails with the same error.

I am sure that I can get it running again for few hours by re-booting. But I want to understand the source of the problem.. Anyone can be of help on this long standing issue?

Thanks

 

 

 

 

 

ELOG V3.1.5-3fb85fa6