Demo Discussion
Forum Config Examples Contributions Vulnerabilities
  Discussion forum about ELOG  Not logged in ELOG logo
icon5.gif   elogd lost entry database without restart during file system trouble, posted by Andreas Luedeke on Thu Apr 30 09:54:17 2015 
    icon2.gif   Re: elogd lost entry database without restart during file system trouble, posted by Stefan Ritt on Thu Apr 30 13:01:29 2015 
Message ID: 67866     Entry time: Thu Apr 30 13:01:29 2015     In reply to: 67865
Icon: Reply  Author: Stefan Ritt  Author Email: stefan.ritt@psi.ch 
Category: Bug report  OS: Linux  ELOG Version: 3.1.0 
Subject: Re: elogd lost entry database without restart during file system trouble 
First, I can only fix problems which are reproducible. Can we do another power outage at PSI again?

But seriously, I guess what happened is that elog sees an empty directory when the AFS server
goes down. If this happens, it rebuilds its internal (RAM based) indes, and sees no entries there.
So the next entry will be ID 1. That should be independent of the ELOG version. I guess if PSI
would have a power outage a year ago, you could have had the same problem.

I had problem some long time a go with AFS, where the network access blocked the program
for several minutes. I decided then to ONLY use local filesystems for elog servers, and do the
backup via rsync to an AFS account. Since then I never had problems.

Now it is hard for me to develop code which avoids the mentioned problem. I could maybe
check if there are many entries, and all over sudden there are no entries any more, the server
just stops with some detailed error message. But it is hard for me to mimic the AFS server 
outage. I can try to manually delete elog files and see what happens, but this only partially
mimics network problems.

/Stefan

> I'm running ELOG since several years with rather heavy usage.
> Last week I've upgrades from 3.0.0 to 3.1.0 and this week I had twice the same problem:
> elogd lost the index for old entries and showed empty logbooks, without having restarted.
> The logbooks appeared to be empty; new entries started with index "1".
> The first time the origin of the problem were network troubles;
> the second time it had been caused by a severe problem of our AFS file system service.
> I never experienced this consequence for ELOG in the past when we had AFS problems.
> 
> Since the logbooks are used for the operation log of a user facility they continued to do new entries.
> The next day I had to re-number the new entries and restart elogd and everything was fine.
> 
> I could understand if elogd crashes when the filesystem of the logbook goes away.
> And when it restarts with an (temporarily) empty filesystem, that would explain what happened.
> But it did not restart and the log file does not contain any information about any problem, 
> just that suddenly all new entries in each logbook started with ID "1" again.
> 
> Stefan, any idea?
> Anyone else ever experienced that with the new ELOG version (or older ones)?
ELOG V3.1.5-fe60aaf