Mark Bergman wrote: |
I recently upgraded elog from 2.7.8 to 2.8.0 (and moved servers, removed unused logbooks, etc.). I'm now having a problem where elog consistently crashes when attempting to edit multiple entries. This is a very common use case, as we use a "status" field, set to "open" or "closed" to track problems. When a problem is resolved, we will go to the "list" display, set it to "threaded", "select" the thread, and then edit it, to change the status field for all posts in the thread to "closed".
Now, as soon as the "edit" button is clicked, elog crashes. This happens on every thread and logbook that I've tried. The elog logfile itself doesn't show anything useful.
However, if eLog is run with "-v" in place of "-D", it does not crash.
Environment:
CentOS 5.4
eLog 2.8.0 built Aug 5 2010, 12:24:11
|
I'm now running eLog 2.9.0 and seeing the same crashes. However, I've got some more information that may be helpful.
The crash seems to be directly related to the order of replies in the thread. For example, in this thread I am replying to the original entry. The original entry has 2 children (the entries are siblings) and no grandchildren.
In our installation, eLog crashes consistently under the following conditions:
go to the "list" display
set it to "threaded"
"select" a thread that has siblings at any generation of replies
choose "edit"
If the selected thread only has one entry at any generation, eLog does not crash.
Here's a horrible attempt at a display of two message threads. Note that in the first example, there are 2 replies at the same generation (siblings)--both the person who responded and the original submitter replied to the initial submission. After that, all replies were to successive generations.
-------------- Causes eLog to Crash ------------------
! Full Name (submitter) module failure
=> Full Name (submitter) Re: module failure
=> Full Name (replier) Re: module failure
=> Full Name (submitter) Re: Re: module failure
=> Full Name (submitter) Re: Re: Re: module failue
------------------------------------------------------
-------------- No eLog Problem ------------------
! Full Name (submitter) Labwide failure of mcc
=> Full Name (replier) Re: Labwide failure of mcc
=> Full Name (submitter) Re: Re: Labwide failure of mcc
=> Full Name (replier) Re: Re: Re: Labwide failure of mcc
------------------------------------------------------
|
ELOG seems to enter a loop when you do certain opeations on certain messages: I moved a message to a different logbook and the deamon just gets stuck.
If I restart the daemon, the message was in fact moved: I can move it back to its original destination without problems.
I started in GDB and break with ctrl-C when the process gets stuck, to be told :
Program received signal SIGINT, Interrupt.
0x000000000040a968 in find_thread_head ()
I then made a core dump.
I put the files here: http://cern.ch/poulsen2/elog-error-report-110430.zip (they are too big to upload).
I get into the same problem in other circumstances such as when opening some threads (maybe because they contain "Reply-to" references to non-existing messages, but I have problems reproducing this on the test installation.
I should maybe also submit the incriminating thread.
Soren
|
soren poulsen wrote: |
ELOG seems to enter a loop when you do certain opeations on certain messages: I moved a message to a different logbook and the deamon just gets stuck.
If I restart the daemon, the message was in fact moved: I can move it back to its original destination without problems.
I started in GDB and break with ctrl-C when the process gets stuck, to be told :
Program received signal SIGINT, Interrupt.
0x000000000040a968 in find_thread_head ()
I then made a core dump.
I put the files here: http://cern.ch/poulsen2/elog-error-report-110430.zip (they are too big to upload).
I get into the same problem in other circumstances such as when opening some threads (maybe because they contain "Reply-to" references to non-existing messages, but I have problems reproducing this on the test installation.
I should maybe also submit the incriminating thread.
Soren
|
1. It appears that some times find_thread_head is called with message references that do not exist. That is not good.
I put in a little check like this before seeing if the message has an "in_reply_to" reference:
The line:
if (lbs->el_index[i].in_reply_to)
becomes:
if (i < *lbs->n_el_index && lbs->el_index[i].in_reply_to)
2. The trouble started when I deleted a message in the middle of a thread, which left the thread badly "connected" (references to a deleted message).
3. Also, when a thread is badly connected, it is a problem moving messages to a different logbook. ELOG complains that it cannot access the message (with the invalid reference). But ELOG should ignore it, since the message was deleted.
Soren |
Mark Bergman wrote: |
Mark Bergman wrote: |
I recently upgraded elog from 2.7.8 to 2.8.0 (and moved servers, removed unused logbooks, etc.). I'm now having a problem where elog consistently crashes when attempting to edit multiple entries. This is a very common use case, as we use a "status" field, set to "open" or "closed" to track problems. When a problem is resolved, we will go to the "list" display, set it to "threaded", "select" the thread, and then edit it, to change the status field for all posts in the thread to "closed".
Now, as soon as the "edit" button is clicked, elog crashes. This happens on every thread and logbook that I've tried. The elog logfile itself doesn't show anything useful.
However, if eLog is run with "-v" in place of "-D", it does not crash.
Environment:
CentOS 5.4
eLog 2.8.0 built Aug 5 2010, 12:24:11
|
I'm now running eLog 2.9.0 and seeing the same crashes. However, I've got some more information that may be helpful.
The crash seems to be directly related to the order of replies in the thread. For example, in this thread I am replying to the original entry. The original entry has 2 children (the entries are siblings) and no grandchildren.
In our installation, eLog crashes consistently under the following conditions:
go to the "list" display
set it to "threaded"
"select" a thread that has siblings at any generation of replies
choose "edit"
If the selected thread only has one entry at any generation, eLog does not crash.
Here's a horrible attempt at a display of two message threads. Note that in the first example, there are 2 replies at the same generation (siblings)--both the person who responded and the original submitter replied to the initial submission. After that, all replies were to successive generations.
-------------- Causes eLog to Crash ------------------
! Full Name (submitter) module failure
=> Full Name (submitter) Re: module failure
=> Full Name (replier) Re: module failure
=> Full Name (submitter) Re: Re: module failure
&nb sp; => Full Name (submitter) Re: Re: Re: module failue
------------------------------------------------------
-------------- No eLog Problem ------------------
! Full Name (submitter) Labwide failure of mcc
=> Full Name (replier) Re: Labwide failure of mcc
=> Full Name (submitter) Re: Re: Labwide failure of mcc
&nb sp; => Full Name (replier) Re: Re: Re: Labwide failure of mcc
------------------------------------------------------
|
I am also experiencing the same exact problem as explained above. It only seems to happen when a part of the title has changed. I will include my config for an example. Make a few entries, then change the "Customer" paramater. Then try and edit an entry in the thread.
Enable attachments = 1
Attributes = Employee, Customer, XXX-ID, XXXX Number, SCD, Type, Status, Folder Created, XXXX Received, Equipment Installed, XXX Carrier Up, Customer Carrier Up, QA Completed
Thread display = $Customer - $SLR-ID - $Type - $Status - SCD: $SCD - XXXX: $XXXX Number
Propagate attributes = Status
Locked Attributes = Employee
Options Type = Activation, Termination, Change
Options Folder Created = Yes, No, N/A
Options XXXX Received = Yes, No, N/A
Options Equipment Installed = Yes, No, N/A
Options XXX Carrier Up = Yes, No, N/A
Options Customer Carrier Up = Yes, No, N/A
Options QA Completed = Yes, No
Entries per page = 100
Preset Text = $short_name @ $utcdate -
Append on reply = $short_name @ $utcdate -
Locked Attributes = Employee
Preset Employee = $long_name
Options Status = Open, Closed
Quick Filter = Status, Type
Type SCD = datetime
Summary lines = 0
Enable Attachments = 0
Suppress default = 2
Entries per page = 10
Suppress Email to users = 1
Display mode = threaded
Collapse to last = 1
Expand default = 0
Subst on reply Employee = $long_name
Faulting application elogd.exe, version 0.0.0.0, faulting module elogd.exe, version 0.0.0.0, fault address 0x000646c7. |