I had no intention of causing any offence with my lazy archiving comment - hope I didn't, sorry if I did.
No offence taken :-)
Personally, I would have found it useful to put the attachments into a separate directory - or at least to allow
the possibility. Elog as it stands sometimes can, and sometimes cannot cope with that functionality - and even to try means messing around directly with the
yymmdda.log files. For me it would have saved me having duplicates of the same large attachment in two or three different logbooks, if I could always reference
the same Master copy of the attachment. This was at the time I was severely memory constrained, and in part forced me to change how I had operated elog, so for
me that need isn't as great as it once was.
David.
You can put a reference to the attachment of the other entry in your logbook: elog:67896/1
Or, if it is an image, you can just include it in your new entry like I did below.
Of course this only works if the other logbook is accessible on-line.
But how would you manage access rights to a common attachment folder?
Probably I just did not understand your idea.
Cheers
Andreas
|
First, I can only fix problems which are reproducible. Can we do another power outage at PSI again?
But seriously, I guess what happened is that elog sees an empty directory when the AFS server
goes down. If this happens, it rebuilds its internal (RAM based) indes, and sees no entries there.
So the next entry will be ID 1. That should be independent of the ELOG version. I guess if PSI
would have a power outage a year ago, you could have had the same problem.
I had problem some long time a go with AFS, where the network access blocked the program
for several minutes. I decided then to ONLY use local filesystems for elog servers, and do the
backup via rsync to an AFS account. Since then I never had problems.
Now it is hard for me to develop code which avoids the mentioned problem. I could maybe
check if there are many entries, and all over sudden there are no entries any more, the server
just stops with some detailed error message. But it is hard for me to mimic the AFS server
outage. I can try to manually delete elog files and see what happens, but this only partially
mimics network problems.
/Stefan
> I'm running ELOG since several years with rather heavy usage.
> Last week I've upgrades from 3.0.0 to 3.1.0 and this week I had twice the same problem:
> elogd lost the index for old entries and showed empty logbooks, without having restarted.
> The logbooks appeared to be empty; new entries started with index "1".
> The first time the origin of the problem were network troubles;
> the second time it had been caused by a severe problem of our AFS file system service.
> I never experienced this consequence for ELOG in the past when we had AFS problems.
>
> Since the logbooks are used for the operation log of a user facility they continued to do new entries.
> The next day I had to re-number the new entries and restart elogd and everything was fine.
>
> I could understand if elogd crashes when the filesystem of the logbook goes away.
> And when it restarts with an (temporarily) empty filesystem, that would explain what happened.
> But it did not restart and the log file does not contain any information about any problem,
> just that suddenly all new entries in each logbook started with ID "1" again.
>
> Stefan, any idea?
> Anyone else ever experienced that with the new ELOG version (or older ones)? |
I have to figure out where elog hangs. I guess it must be some kind of endless loop, triggered by some corrupt data in one of the elog entries. Under linux this is fairly simple (just run elogd under the gdb debugger, wait until it hangs, then press ctrl-c and enter "where" to see a full stack dump where elogd is currently executing). Under Windows this is more difficult, since you need Visual C++ from Microsoft to do the debugging. One thing you can do however without VC is to check if the CPU time is consumed to 100% by elogd, indicating an endless loop.
Stefan
Alan Grant wrote: |
I have a very long standing problem with elog over the last few versions where almost daily the service will hang. Cannot even Restart elogd, that just hangs. Clients experience Page not Found. I can only get the service reinitialized by rebooting the VM machine. I have Elog verbose logging On plus a number of external triage monitors running but nothing is yielding clues beyond the precise time the hang occurs. Aside from providing the Config and log files what else can I provide for you to assist, and what other triage measures can you suggest I try? FYI, there can be up to 20 users at one time doing searches (not updates), and I've trimmed the depth of log files that can be searched so that the CPU/service doesn't bog down but that hasn't helped either. Inserts happen in the background using the elog client app (about 2 or 3 inserts per batch at sporadic times).
|
|
I have experience of elog hanging (under linux). I'll describe my situation, although it may not apply to you. I still use elog 2.9.2 but I am unaware of this issue ever being resolved although I have mentioned it in the past. (Possibly because I'm one of the few who has this situation). I certainly recall other person had this as the problem, and my reply on this forum solved their problem. The cause is the following:
1. A thread with a large number of replies - something over 40 I think.
2. This long thread is deleted from the first entry. This will crash elog,
3. Once restarted, the later entries of the deleted thread (which survived the deletion attempt when elog crashed) are accessed. This will cause elog to go into an endless loop and hang. Until I learnt better, I had to reboot the computer. Under linux kill -9 (process) does the job, but kill (process) does not.
The problem lays with the first entry that survived the attempt at deletion. It has an "In reply to" line in the entry in the yymmdda.log file, referring to an entry that has now been deleted. Manually editing the yymmdda.log file to remove that line does the trick, and then the surviving entries can be accessed and deleted.
A good work-around is that if you are about to delete a long thread is to delete it in sections, starting at the end. It is useful to note the entry number or some other way to find it again after the last section is deleted, as of course it will now be back in with the even older entries. Or have two tabs on your browser accessing the same thread.
If you want to move the long thread to another logbook, to avoid the problem, Copy the thread, and then do the deletion in stages. Moving a long thread does the same computer crash/computer hang, although the Copying part is done fine, the deletion part is the problem.
You don't have to have a large number of replies to an entry to cause the hang in controlled conditions. Just edit the yymmdda.log file of a new entry adding in a "In reply to" line referring to an earlier entry number that does not exist is enough to cause the problem when you try and access the thread.
If this is the cause of your issue, the problem is to find the orphan thread that is causing the hang, especially after all this time. Also, you may have more than one orphan thread. Even though I am aware of the problem, I do occasionally find orphan threads in my logbooks. In my case I use the ticketing system, and searching by ticket number will find an orphan thread without hanging the computer, but if you then click on any entry found - hang.
There is a related issue, which I think I have now resolved. If the entry in the "Reply to" field in the yymmdda.log file does not exist, that is a later entry (not earlier, as above), elog will cause a duplicate entry, always in bold, with entry no 0 to appear in the listings. This entry is an artifact that appears in the listings, not a real entry in a yymmdda.log file. Again, finding the rogue entry is the tricky bit.
Stefan Ritt wrote: |
I have to figure out where elog hangs. I guess it must be some kind of endless loop, triggered by some corrupt data in one of the elog entries. Under linux this is fairly simple (just run elogd under the gdb debugger, wait until it hangs, then press ctrl-c and enter "where" to see a full stack dump where elogd is currently executing). Under Windows this is more difficult, since you need Visual C++ from Microsoft to do the debugging. One thing you can do however without VC is to check if the CPU time is consumed to 100% by elogd, indicating an endless loop.
Stefan
Alan Grant wrote: |
I have a very long standing problem with elog over the last few versions where almost daily the service will hang. Cannot even Restart elogd, that just hangs. Clients experience Page not Found. I can only get the service reinitialized by rebooting the VM machine. I have Elog verbose logging On plus a number of external triage monitors running but nothing is yielding clues beyond the precise time the hang occurs. Aside from providing the Config and log files what else can I provide for you to assist, and what other triage measures can you suggest I try? FYI, there can be up to 20 users at one time doing searches (not updates), and I've trimmed the depth of log files that can be searched so that the CPU/service doesn't bog down but that hasn't helped either. Inserts happen in the background using the elog client app (about 2 or 3 inserts per batch at sporadic times).
|
|
|