Soren Poulsen wrote: |
Soren Poulsen wrote: |
soren poulsen wrote: |
ELOG seems to enter a loop when you do certain opeations on certain messages: I moved a message to a different logbook and the deamon just gets stuck.
If I restart the daemon, the message was in fact moved: I can move it back to its original destination without problems.
I started in GDB and break with ctrl-C when the process gets stuck, to be told :
Program received signal SIGINT, Interrupt.
0x000000000040a968 in find_thread_head ()
I then made a core dump.
I put the files here: http://cern.ch/poulsen2/elog-error-report-110430.zip (they are too big to upload).
I get into the same problem in other circumstances such as when opening some threads (maybe because they contain "Reply-to" references to non-existing messages, but I have problems reproducing this on the test installation.
I should maybe also submit the incriminating thread.
Soren
|
1. It appears that some times find_thread_head is called with message references that do not exist. That is not good.
I put in a little check like this before seeing if the message has an "in_reply_to" reference:
The line:
if (lbs->el_index[i].in_reply_to)
becomes:
if (i < *lbs->n_el_index && lbs->el_index[i].in_reply_to)
2. The trouble started when I deleted a message in the middle of a thread, which left the thread badly "connected" (references to a deleted message).
3. Also, when a thread is badly connected, it is a problem moving messages to a different logbook. ELOG complains that it cannot access the message (with the invalid reference). But ELOG should ignore it, since the message was deleted.
Soren
|
It would be nice to have this corrected. The problem occurs when you select (read) a message which refers to another message via "In-reply-to", and this message does not exist.
Soren
|
Soren, you're not alone! I've had similar problems, as did Sara Vanini (elog:67077).
In my case, it is because the "move" or "copy" function does not move all the messages in very long threads. To be more precise, elog will crash in the attempt to move a long thread - say over 40 replies, I don't know for sure. Sometimes it has already moved the entire thread before it crashes, sometimes not. I'd not flagged it up as an issue because I could not be sure it was not a memory issue with the old (>12 years) linux box I was using earlier this year, but it still happens on this new (to me, only 3 years old) linux box.
Whether it is the number of entries, the total memory size of the thread or some combination, I don't know.
I've found that in the "move" case, it has not deleted all the messages from the donor thread, so that there is a semi-thread still hidden there. Should one by chance select that semi-thread, (because it is found during a search) elog goes into infinate loop, which requires a reboot of this linux box to fix. Certainly the pinning down the issue to the missing entry referenced by an <i>In reply to:</i> explains this part of the issue. Of course, deletion of one entry within a thread, or other adjustments will do the same thing, just as you (Soren) point out above.
If it happens to me, I will go in to the yymmdda.log files and fix the problem, be it deleting the entries of the semi-thread, moving across missing entries from the donor to the acceptor logbooks, adjusting the <i>Reply:</i> and <i>In reply to:</i> lines, but that is quite a time consuming and error prone exercise. |