We have been experiencing corruption of logbook entries by elogd mirror synchronization. Has anyone else encountered this? Is there a known cause and/or workaround for it?
Details
We have two elog servers set up with identical elogd.cfg and password files, except that one server has "Mirror server" pointing to the other host. There are three logbooks defined. (Their names are DoubleChooz, BigBrotherTable, and FlushingTable.) When the mirror synchronization happens, whether by "Mirror cron" or by an administrator hitting the "Synchronize all logbooks" link, it often happens that entries requiring synchronization are corrupted on both servers (not just the one to which the entry was copied). This is particularly likely to happen if entries have been made on both servers since the previous sync.
Looking at the logbook files themselves, we see that the corrupted entries will have attributes from the wrong logbooks. E.g., we'll see an empty "Barometer: " line in a DoubleChooz logbook file, where "Barometer" is an attribute that is only in the FlushingTable logbook, or we will see there are unexpected DoubleChooz logbook attributes in the FlushingTable files.
Strangely, the entries will not be identical on the two machines after syncing, and they stay non-identical on further syncs.
Most disturbingly, data is lost from entries that were perfectly valid before the sync, on both servers.
This was happening with elogd 2.7.8, and continued to happen after upgrading to 2.8.0. Both servers are running Linux. One is a 32-bit machine and another 64-bit, in case that might matter (but read on).
I made copies of both servers' files and ran two elogd servers on my Mac on different ports, compiled from a fresh checkout of 2.8.0, and the same behavior was observed as I repeatedly made test entries and synchronized. This suggests it isn't specific to Linux architecture, 64-bit or otherwise.
|