Fedora 19/20 logfile explosions

PSA: if you're running Fedora 19 or 20, I highly recommend you stop what you're doing right now and do this instead. There are a couple of unfortunate bugs in F19/F20 right now which may well be screwing the hell out of your log files.

systemd-journald bug combines with new rsyslog to eat your CPU time, make /var/log/messages huge, break journalctl and generally spread woe and misery SELinux denial happening every three seconds to GNOME users

Chapter One

1) Run top and check if 'rsyslogd' is sucking a bunch of your CPU time 2) If so, immediately stop rsyslog.service, do 'yum downgrade rsyslog', and restart it 2b) If not, breathe a sigh of relief, and make a note not to update to rsyslog 7.4.0 until this blows over. But still read the following chapters

EDIT 2.5) There's now a systemd update which should both prevent any further buggy journal entries being written, and cope with reading existing buggy entries better: systemd-204-8.fc19. After updating to that systemd, the infinite loop in buggy journals should no longer occur. If you prune your journal as described in the rest of this post, and update to systemd 204-8, you should then be able to update back to rsyslog 7.4.0 without further problems. Once we get systemd 204-8 pushed stable, new F19 installs should no longer ever be affected by this problem in any way, so please upkarma it.

3) If you got bitten, /var/log/messages is now probably infeasibly huge (mine was 8GB). The easiest thing to do is probably just nuke it. If you have valuable logs, they can probably be extracted from journalctl.



Chapter Two

If you got hit by chapter one, you're now in for some joyous fun. If your system logs are of no particular important to you, you can probably just go to /var/log/journal and blow .journal away, and you should now be more or less happy (though read Chapter Three too). The files in /var/log/journal are the Fabled Systemd-Journald Binary Logfiles themselves: you wipe one, you lose all the logs in it.

If your system logs are important to you, you get to have some fun. Run 'journalctl --no-pager' and wait a while. It'll likely start looping around some very small period of time, endlessly. ctrl-c it, go to /var/log/journal , do 'ls -ltr' and spot the file(s) with a very close timestamp to where the loop happened. Move those files somewhere else (if you want to retain maximum data, you can move them one at a time, doing 'journalctl --no-pager' after each move until you hit the exact combo of files you need to move to unblock it. You may need to rinse and repeat: I had five different loops in mine.

You may now be able to update rsyslog to 7.4.0 again and be okay, but personally, I'm leaving it on the old build till things quiet down a bit.

Chapter Three

You may well still have a giant assload of SELinux alerts about bug #973849 in your logs - "SELinux is preventing /usr/libexec/accounts-daemon from 'read' accesses on the directory /var/log." accounts-daemon tries this read about every three seconds, spamming the logs each time it fails. So first order of business, install selinux-policy-3.12.1-52.fc19 to fix it. Then you can clean up your logs, if you like.

I just gave up and wiped all the really large files in /var/log/journal - if you do an 'ls -lSr' in that directory you should see that most files are a few MB at most, but a few recent ones are tens or hundreds of MB in size. I killed all of those. You can leave 'em, but they eat space and make journalctl really slow. Your choice.

Chapter Four

While doing all this you might notice a bunch of rather old files with the extension "journal~" in /var/log/journal . It seems that journald doesn't rotate these (they're journal files that were uncleanly closed on shutdown). I manually deleted all the really old ones I had.

Chapter Five

If you have gnome-shell-extension-fedmsg installed and you're not really using it, you might want to consider removing it, because it spams the hell out of the journal: see bug #974429. Just something else I noticed on my log sanitization quest.

I am now running a sweepstakes on the first slashdot "systemd SUCKS!" post to cite this little mess as justification for the 'binary logs are awful!' meme...


Christopher Meng wrote on 2013-06-14 08:41:
My 60GB log seems the biggest XD
Chris Seymour wrote on 2013-06-14 13:48:
Mine grew until my partition was full :\
Troy Large wrote on 2013-06-14 14:46:
I tried to download the SELinux update, but it said no such update available - via yum as follows: $ sudo yum install selinux-policy-3.12.1-52.fc19 [sudo] password for troy: Loaded plugins: langpacks, refresh-packagekit No package selinux-policy-3.12.1-52.fc19 available. Error: Nothing to do
Jiri Eischmann wrote on 2013-06-14 15:13:
F19 Beta looked so solid and since then there have been so many regressions...
adamw wrote on 2013-06-14 16:48:
Troy: it's a very new update, you can grab it directly from Koji or run 'bodhi -D selinux-policy-3.12.1-52.fc19'. Jiri: this is one of those 'slow burn' bugs which isn't particularly obvious and then explodes in your face a few weeks later. It's been happening in the background the whole time you've been running F19; the rsyslog update really just exposed it. I had noticed the issue with journalctl a few weeks back and really ought to have looked into it, but it kept slipping down my priority list :(
David Lang wrote on 2013-06-14 20:07:
So if I am reading the bugzilla correctly, the problem is that systemd would write a corrupted journal, and then anything that tried to read from it (either the journalctl tool or rsyslog) would find an endless loop of messages. Is this correct?
adamw wrote on 2013-06-14 20:18:
David Lang: s/systemd/journald (systemd-journald), but basically yes. Well, it wasn't exactly 'corrupted', but it assigned the same IDs to different log messages, and was not able to cope with reading those in some kind of sane way; anything which basically asked journald 'read out all your logs' (which is what journalctl does in most invocations, and what rsyslog 7.4.0 is trying to do) would get stuck in the loop. systemd 204-7 should prevent the writing of any *further* journal entries with duplicate IDs. systemd 204-8 should also make journald's behaviour when reading back a journal that *already contains* duplicate IDs better, though so far we don't have enough data to be confident that it successfully resolves all cases (further testing of this from people who still have, or can recreate, the buggy journals would be exceedingly welcome).
bobby wrote on 2013-06-14 21:57:
thanks for the help nice sizes, but don't forget to mention the CPU temperature... mine was at 95°C when I noticed something was wrong. Strong smell of burned dirt warned me ^^
Paul W. Frields wrote on 2013-06-15 14:35:
As always, Adam, thanks for chasing down this scurrilous behavior and posting an informative solution! I'll pass this URL on elsewhere too.
Rainer Gerhards wrote on 2013-06-16 11:10:
Thanks for the well-researched article. The rsyslog team has also been a bit busy and here is our position on the problem and how we intend to mitigate further issues: http://blog.gerhards.net/2013/06/systemd-journal-causes-loop-in-imjournal.html
Magnum wrote on 2013-06-17 04:08:
Tks man, updating systemd make everything normal again :) Fedora 19 beta.