Soren Poulsen wrote: |
Soren Poulsen wrote: |
soren poulsen wrote: |
ELOG seems to enter a loop when you do certain opeations on certain messages: I moved a message to a different logbook and the deamon just gets stuck.
If I restart the daemon, the message was in fact moved: I can move it back to its original destination without problems.
I started in GDB and break with ctrl-C when the process gets stuck, to be told :
Program received signal SIGINT, Interrupt.
0x000000000040a968 in find_thread_head ()
I then made a core dump.
I put the files here: http://cern.ch/poulsen2/elog-error-report-110430.zip (they are too big to upload).
I get into the same problem in other circumstances such as when opening some threads (maybe because they contain "Reply-to" references to non-existing messages, but I have problems reproducing this on the test installation.
I should maybe also submit the incriminating thread.
Soren
|
1. It appears that some times find_thread_head is called with message references that do not exist. That is not good.
I put in a little check like this before seeing if the message has an "in_reply_to" reference:
The line:
if (lbs->el_index[i].in_reply_to)
becomes:
if (i < *lbs->n_el_index && lbs->el_index[i].in_reply_to)
2. The trouble started when I deleted a message in the middle of a thread, which left the thread badly "connected" (references to a deleted message).
3. Also, when a thread is badly connected, it is a problem moving messages to a different logbook. ELOG complains that it cannot access the message (with the invalid reference). But ELOG should ignore it, since the message was deleted.
Soren
|
It would be nice to have this corrected. The problem occurs when you select (read) a message which refers to another message via "In-reply-to", and this message does not exist.
Soren
|
Soren, you're not alone! I've had similar problems, as did Sara Vanini (elog:67077).
In my case, it is because the "move" or "copy" function does not move all the messages in very long threads. To be more precise, elog will crash in the attempt to move a long thread - say over 40 replies, I don't know for sure. Sometimes it has already moved the entire thread before it crashes, sometimes not. I'd not flagged it up as an issue because I could not be sure it was not a memory issue with the old (>12 years) linux box I was using earlier this year, but it still happens on this new (to me, only 3 years old) linux box.
Whether it is the number of entries, the total memory size of the thread or some combination, I don't know.
I've found that in the "move" case, it has not deleted all the messages from the donor thread, so that there is a semi-thread still hidden there. Should one by chance select that semi-thread, (because it is found during a search) elog goes into infinate loop, which requires a reboot of this linux box to fix. Certainly the pinning down the issue to the missing entry referenced by an <i>In reply to:</i> explains this part of the issue. Of course, deletion of one entry within a thread, or other adjustments will do the same thing, just as you (Soren) point out above.
If it happens to me, I will go in to the yymmdda.log files and fix the problem, be it deleting the entries of the semi-thread, moving across missing entries from the donor to the acceptor logbooks, adjusting the <i>Reply:</i> and <i>In reply to:</i> lines, but that is quite a time consuming and error prone exercise. |
Soren Poulsen wrote: |
soren poulsen wrote: |
ELOG seems to enter a loop when you do certain opeations on certain messages: I moved a message to a different logbook and the deamon just gets stuck.
If I restart the daemon, the message was in fact moved: I can move it back to its original destination without problems.
I started in GDB and break with ctrl-C when the process gets stuck, to be told :
Program received signal SIGINT, Interrupt.
0x000000000040a968 in find_thread_head ()
I then made a core dump.
I put the files here: http://cern.ch/poulsen2/elog-error-report-110430.zip (they are too big to upload).
I get into the same problem in other circumstances such as when opening some threads (maybe because they contain "Reply-to" references to non-existing messages, but I have problems reproducing this on the test installation.
I should maybe also submit the incriminating thread.
Soren
|
1. It appears that some times find_thread_head is called with message references that do not exist. That is not good.
I put in a little check like this before seeing if the message has an "in_reply_to" reference:
The line:
if (lbs->el_index[i].in_reply_to)
becomes:
if (i < *lbs->n_el_index && lbs->el_index[i].in_reply_to)
2. The trouble started when I deleted a message in the middle of a thread, which left the thread badly "connected" (references to a deleted message).
3. Also, when a thread is badly connected, it is a problem moving messages to a different logbook. ELOG complains that it cannot access the message (with the invalid reference). But ELOG should ignore it, since the message was deleted.
Soren
|
It would be nice to have this corrected. The problem occurs when you select (read) a message which refers to another message via "In-reply-to", and this message does not exist.
Soren |
The simple config file below produces a segmentation fault when elogd is started,
http://localhost/Test/?cmd=New
is opened in the browser and then e.g. "Entry" is switched to "Problem".
gdb shows the following output:
(gdb) run -c /usr/local/elog/elogd.cfg
Starting program: /usr/local/sbin/elogd -c /usr/local/elog/elogd.cfg
elogd 2.9.0 built Jun 20 2011, 04:57:23 revision 2414
Falling back to default group "elog"
Falling back to default user "elog"
FCKedit detected
Falling back to default group "elog"
Falling back to default user "elog"
ImageMagick detected
Indexing logbooks ... done
Server listening on port 80 ...
Program received signal SIGSEGV, Segmentation fault.
0x080a2940 in get_user_line (lbs=0xae3c1c0, user=0x0, password=0x0, full_name=0xbfca1690 "", email=0x0, email_notify=0x0,
last_logout=0x0, inactive=0x0) at src/elogd.c:24864
24864 if (!str[0] || !user[0])
|