Fair warning up front: this is going to be one of my long, rambling, deep dives. It doesn't even have a conclusion yet, just lots of messy details about what I'm trying to deal with.
So this week I was aiming to do some work on our openQA stuff - ideally, add some extra tests for Server things. But first, since we finally had working Rawhide images again, I figured I'd check the existing openQA tests are still working.
Unfortunately...nope. There are various problems, but I'm focusing on one particular case here. The 'default install of a live image' test reached the 'Root Password' screen and then failed.
With a bit of careful image comparison, I figured out why openQA wasn't happy. With the Workstation live image, the capital letters in the text - 'R', 'P', 'D' - were all one pixel shorter than on the reference screenshot, which had been taken from a boot.iso install. openQA works on screenshot comparisons; this difference was sufficient for it to decide the screen of the test didn't match the reference screen, and so the test failed. Now I needed to figure out why the fonts rendered ever so slightly differently between the two cases.
I happen to know a bit about font rendering and GNOME internals, so I had an initial theory: the difference was in the font rendering configuration. Somehow some element of this was set differently in the Workstation live environment vs. the traditional installer environment. On the Workstation live, obviously, GNOME is running. On 'traditional install' images - boot.iso (network install) and the Server DVD image - GNOME is not running; you just have anaconda running on metacity on X.
I happen to know that GNOME - via gnome-settings-daemon - has some of its own default settings for font rendering. It defaults to 'medium' hinting, 'greyscale' antialiasing, and a resolution of exactly 96dpi (ignoring hidpi cases for now). You can see the hinting values in this gsettings schema, and the DPI comes in via gsd-xsettings-manager.c.
I also knew that there's a file
/etc/X11/Xresources, which can be used to set the same configuration at the X level in some way. I checked a clean Fedora install and found that file looks like this (minus comments):
so...it appeared to be identical to the GNOME defaults. At this point I was a bit stuck, and was very grateful to Matthias Clasen for helping with the next bit.
He told me how to find out what settings a given anaconda instance was actually using. You can press ctrl+shift+d (or ctrl+shift+i) in any running GTK+ 3 app to launch GtkInspector, a rather handy debugging tool. By clicking the GtkSettings 'object' and then going to 'Properties', we can see the relevant settings (down at the bottom): gtk-xft-antialias, gtk-xft-dpi, gtk-xft-hinting and gtk-xft-hintstyle. In my test case - a KVM at 1024x768 resolution - I found that in the Workstation live case, gtk-xft-dpi was 98304 and gtk-xft-hintstyle was 'medium', while in the boot.iso case (and also on KDE live images), gtk-xft-dpi was 98401 and gtk-xft-hintstyle was 'full'.
So then we had to figure out why. gtk-xft-dpi has a multiplication factor of 1024. So on the Workstation image the actual DPI being used was 98304 / 1024 = 96 (precisely), while on boot.iso it was 98401 / 1024 = 96.094726563 . I confirmed that changing the gtk-xft-dpi value to 98304 and gtk-xft-hintstyle to medium made the affected glyphs the same in boot.iso as in Workstation. But why was boot.iso getting different values from Workstation? And where did the rather odd 98401 gtk-xft-dpi value come from?
Matthias again pointed out what actually happens with the
/etc/X11/Xresources values, which I never knew. They get loaded via a command
xrdb, which affects the contents of something called an X resource database. This is reasonably well documented on Wikipedia and in the xrdb man page. Basically, the resource database is used as a store of configuration data. The
/etc/X11/Xresources values (usually...) get loaded into the RESOURCE_MANAGER property of the root window of screen 0 at X startup, and act as defaults, more or less.
There are various paths on which the
/etc/X11/Xresources values actually get loaded into the RESOURCE_MANAGER. There's a file
/etc/X11/xinitrc-common which does it: it calls
xrdb -nocpp -merge /etc/X11/Xresources if that files exists. (It also loads in ~/.Xresources afterwards if that exists, allowing a user to override the system-wide config).
xinitrc-common is sourced by
/etc/X11/xinit/Xsession, meaning any way you start X that runs through one of those files will get the
/etc/X11/Xresources values applied. The
startx command runs through
xinit which runs through
xinitrc, so that's covered. Display managers either run through
/etc/X11/xinit/Xsession or implement something rather similar themselves; either way, they all ultimately load in the
Xresources settings via
Sidebar: how gnome-settings-daemon actually implements the configuration that's specified in dconf is to load it into the X resource database and xsettings. So what happens when you boot to GNOME is that gdm loads in the values from
/etc/X11/Xresources, then gnome-settings-daemon overwrites them with whatever it gets from dconf).
In the anaconda traditional installer image case, though, nothing does actually load the settings. X gets started by anaconda itself, which does it simply by calling
/usr/libexec/Xorg directly; it doesn't do anything to load in the
Xresources settings, either then or at any other time.
GTK+ is written to use the values from xsettings or the X resource database, if there are any. But what does it do when nothing has written the settings?
Well, as Matthias and I finally found out, it has its own internal fallbacks. As of Monday, it would default to hintstyle 'full', and for the DPI, it would calculate it from the resolution and the display size reported by X. I found that, on the test VM, with a resolution of 1024x768, xdpyinfo reported the display size as 270x203mm. The calculation uses the vertical values, so we get: 768 / (203 / 25.4) * 1024 = 98400.851231527 , which rounds up to 98401 - the gtk-xft-dpi value we found that boot.iso was using! Eureka.
So finally we understood what was actually going on: nothing was loading in the
Xresources or dconf settings (DPI exactly 96, medium hinting) so GTK+ was falling back on its internal defaults, and getting a DPI of just over 96 and 'full' hinting, and that was causing the discrepancy in rendering.
The next thing I did for extra credit was figure out what's going on with KDE, which had the same rendering as traditional installer images. You'd think the KDE path would load in the
Xresources values, though. And - although at first I guessed the problem was that it was failing to - it does. SDDM, the login manager we use for KDE, certainly does load them in via
xrdb. No, the problem turned out to be a bit subtler.
KDE does something broadly similar to GNOME. It has its own font rendering configuration options stored in its own settings database, and a little widget called 'krdb' which kicks in on session start and loads them into the X resource database. Qt then (I believe) respects the xrdb and xsettings values like GTK+ does.
Like GNOME, KDE overrides any defaults loaded in from
Xresources (or anywhere else) with its own settings. In the KDE control center you can pick hinting and antialiasing settings and a DPI value, and they all get written out via xrdb. However, KDE has one important difference in how it handles the DPI setting. As I said above, GNOME wires it to 96 by default. KDE, instead, tries to entirely unset the
Xft.dpi setting. What KDE expects to happen in that case is that apps/toolkits will calculate a 'correct' DPI for the screen and use that. As we saw earlier, this is indeed what GTK+ attempts to do, and Qt attempts much the same.
As another sidebar, while figuring all this out, Kevin Kofler - who was equally helpful on the KDE side of things as Matthias was on the GNOME side - and I worked out that the krdb code is actually quite badly broken here: instead of just unsetting
Xft.dpi, it throws away all existing X resource settings. But even if that bug gets fixed, the fact that it throws out any DPI setting that came from
/etc/X11/Xresources is actually intentional. So in the KDE path, the values from
/etc/X11/Xresources do get loaded (by SDDM), but krdb then throws them away. And so we fall into the same path we did on boot.iso, for anaconda: GTK+ has no X resource or xsetting value for the DPI, so it calculates it, and gets the same 98401 result.
For today's extra credit, I started looking at the DPI calculation aspect of this whole shebang from the other end: where was the 270x203mm 'display size', that results in GTK+ calculating a DPI of just over 96, coming from in the first place?
The (rather funny, to me at least...) answer is that it comes from X!
When you start X, it sets an initial physical screen size - and it doesn't do it the way you might, perhaps, expect. It doesn't look at the results of an EDID monitor probe, or try in some way to combine the results of multiple probes for multiple monitors. Nope, it simply specifies a physical size with the intent of producing a particular DPI value!
Here's the code.
As you can see there are basically three paths: one if
monitorResolution is set, one if
output->conf_monitor->mon_height are set, and a fallback if neither of those are true.
monitorResolution is set if the Xorg binary was called with the
mon_height are set if some X config file or another contains the
If you specified a dpi on the command line, X will figure out whatever 'physical size' would result in that DPI for the resolution that's in use, and claim that the screen is that size. If you specify a size with the
DisplaySize parameter, X just uses that. And if neither of those is true, X will pick a physical size based on a built-in default DPI, which is...96.
One important wrinkle is that X uses ints for this calculation. What that means is it uses whole numbers; the ultimate result (the display size in millimetres) is rounded to the nearest whole number.
You know those bits where GTK+ and Qt try to calculate a 'correct' DPI based on the monitor size? Well (I bet you can guess where this is going), so far as I can tell, the value they use is the value X made up to match some arbitrary DPI in the first place!
Let's have a look at the full series of magnificent inanity that goes on when you boot up a clean Fedora KDE and run an app:
- X is called without
-dpi and there is no
DisplaySize config option, so X calculates a completely artificial 'physical size' for the screen in order to get a result of
DEFAULT_DPI, which is 96. Let's say we get 1024x768 resolution - X figures out that a display of size 270x203mm has a resolution of 96dpi, and sets that as the 'physical size' of the display. (Well...almost. Note that bit about rounding above.)
- sddm loads in the
/etc/X11/Xresources DPI value - precisely 96 - to the X resource database:
- krdb says 'no, we don't want some silly hardcoded default DPI! We want to be clever and calculate the 'correct' DPI for the display!' and throws away the
Xft.dpi value from the X resource database. (Because it's broken it throws away all the other settings too, but for now let's pretend it just does what it actually means to do.)
- When the app starts up, Qt (or GTK+, they do the same calculation) takes the resolution and the physical display size reported by X and calculates the DPI...which, not surprisingly, comes out to about 96. Only not exactly 96, because of the rounding thing: X rounded off its calculation, so in most cases, the DPI the toolkit arrives at won't be quite exactly 96.
Seems a mite silly, doesn't it? :) There is something of a 'heated debate' about whether it's better to try and calculate a DPI based on screen size or simply assume 96 (with some kind of integer scaling for hidpi displays) - the GNOME code which assumes 96dpi has a long email justification from ajax as a comment block - but regardless of which side of that debate you pick, it's fairly absurd that the toolkits have this code to perform the 'correct' DPI calculation based on a screen size which is almost always going to be a lie intended to produce a specific DPI result!
Someone proposed a patch for X.org back in 2011 which would've used the EDID-reported display size (so KDE would actually get the result it wants), but it seems to have died on the vine. There's some more colorful, er, 'context' in this bug report.
I haven't figured out absolutely every wrinkle of this stuff, yet. There's clearly some code in at least Qt and possibly in X which does...stuff...to at least some 'display size' values when the monitor mode is changed (via RandR) in any way. One rather funny thing is that if you go into the KDE Control Center and change the display resolution, the physical display size reported by
xdpyinfo changes - but it doesn't change to match the correct physical size of the screen, it looks like something else does the same 'report a physical screen size that would be 96dpi at this resolution' calculation and changes the physical size to that. It doesn't quite seem to do the calculation identically to how X does it, though, or something - it comes up with slightly different answers.
So on my 1920x1080 laptop, X initially sets the physical size to 507x285mm (which gives a DPI of 96.252631579). If I use KDE control center to change the resolution to 1400x1050, the physical size reported by xdpyinfo changes to 369x277mm (which gives a DPI of 96.281588448). If I then change it back to 1920x1080, the physical size changes to 506x284mm - nearly the same value X initially set it to, but not quite. That gives a DPI of 96.591549296. Why is that? I'm damned if I know. None of those sizes, of course, resembles the actual size of the laptop's display, which is 294x165mm (as reported by xrandr) - it's a 13" laptop.
None of this happens, though, if I change the resolution via xrandr instead of the KDE control panel: the reported physical size stays at 507x285mm throughout. (So if you change resolutions in KDE with xrandr then launch a new app, it'll be using some different DPI value from your other apps, most likely).
So what the hell do I do to make anaconda look consistent in all cases, for openQA purposes? Well, mclasen made a change to GTK+ which may be controversial, but would certainly solve my problem: he changed its 'fallback' behaviour. If no hinting style is set in the X resource database or xsettings, now it'll default to 'medium' instead of 'full' - that's fairly uncontroversial. But if no specific DPI is configured in xrdb or xsettings, instead of doing the calculation, it now simply assumes 96dpi (pace hidpi scaling). In theory, this could be a significant change - if things all worked as KDE thought they did, there'd now be a significant discrepancy between Qt and GTK+ apps in KDE on higher resolution displays. But in point of fact, since the 'calculated' DPI is going to wind up being ~96dpi in most cases anyway, it's not going to make such a huge difference...
Without the GTK+ change (or if it gets reverted), we'd probably have to make anaconda load the
Xresources settings in when it starts X, so it gets the precise 96dpi value rather than the calculation which ends up giving not-quite-exactly 96dpi. For the KDE case, we'd probably have to futz up the tests a bit to somehow force the 96dpi value into xrdb or xsettings before running anaconda. We're not yet running the openQA tests on the KDE live image, though, so it's not urgent yet.