Fedora QA goings-on: Test Days, Fedora 24 Beta testing, LinuxFest NorthWest and more!

I'm on a train and I haven't been blogging enough lately, so I figured I'd write something up!

We've had a couple of great Test Days in the last couple of weeks: i18n Test Day and Live Media Writer Test Day. Thanks a lot to everyone who came out and tested - the attendance was awesome and we got a huge amount of valuable feedback from both events.

This train I'm on is heading to beautiful Bellingham, WA, where I'll be attending the awesome LinuxFest NorthWest, along with several other Fedora and RH folks. There'll be a Fedora booth as always, where I'll be hanging out some of the time, and I'm also giving a joint openQA presentation with openSUSE's Richard Brown, which should be really awesome. That's on Sunday afternoon at 3pm in CC-208, please do come along if you can!

Speaking of openQA, we got a big shiny new box to use for hosting openQA workers, so the production openQA instance now has 18 workers. Which means tests run much faster. Mmmmm, fast tests. We'll be doing interesting stuff with this extra capacity soon, I hope! Both production and staging are also now running a recent git snapshot of openQA, which tweaks a few things here and there and saves me maintaining >20 backported patches.

Fedora 24 has been moving along pretty well recently; we got a big stable push with a bunch of important fixes in it done today, so I'm hoping that in tomorrow's compose, 32-bit images will be working again and so will the Atomic installer image. There are also some useful anaconda fixes, so we should be able to get down to completing the Beta validation tests for tomorrow's nightly compose and finding any remaining lurking blockers.

I've also been keeping an eye on Rawhide and trying to get major bugs fixed lately; there've been some interesting ones like these, but I'm hoping we'll hit a clear patch soon...

We have several interesting automation projects ATM. On the openQA side, I'm working on initial desktop testing, while jsedlak is working on ARM testing and adding KDE and Server upgrade tests. On the taskotron side, there's some interesting work going on to add package ABI diffing using libabigail, where the Fedora QA team is working together with Sinny Kumari and Dodji Seketeli - some awesome collaboration going on there!

Fedora 24 Beta is coming up fast: the Go/No-Go meeting is next Thursday, so we'll be working hard this coming week to try and complete the Beta validation tests and shepherd fixes for the known and surely-yet-to-come blocker bugs. Interested in helping out with this or any of the other fun QA stuff going on? Come help us!

Fedora Media Writer Test Day tomorrow! (2016-04-19)

Hi folks! Just another Test Day notice: tomorrow (Tuesday 2016-04-19) will be Fedora Media Writer Test Day!

As part of this planned Change for Fedora 24, the Fedora graphical USB writing tool - formerly called "Live USB Creator", and still technically called that in terms of source repos and filenames, but in the process of being rebranded as "Fedora Media Writer" - is being extensively revised and rewritten. The idea is the new tool will be sufficiently capable, reliable, and cross-platform usable that it can be the primary download for Fedora Workstation 24: the main 'flow' of the Workstation download page will run through the tool, instead of giving you a download link to the ISO file and various instructions for using it in different ways.

This would be a pretty big change, and of course it would be a bad idea to do it if the tool isn't ready. So this is an important Test Day! We'll be testing the new version of the tool to see whether it's working well enough and catch any remaining issues.

It's also pretty easy to join in: all you'll need is a USB stick you don't mind overwriting, and a system (or ideally more than one!) you can test booting the stick on (but you don't need to make any permanent changes to it).

All the instructions are on the wiki page, so please read through and come help us test tomorrow!

As always, the event will be in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

Fedora nightly image finder

Guess what? I wrote another new thing! Here's what it does: it makes a page where you can easily find Fedora nightly images. Yup, that simple!

Finding nightly Fedora builds has always been a bit of a pain. For quite a while we had this page, which just linked to a couple of canned Koji searches. It kinda worked, but it was terribly slow and the results weren't the nicest thing to look at; it also couldn't find you installer images, as they don't come out of Koji. It doesn't work any more, as the Koji tasks it searches for are no longer correct; it could easily be 'fixed' but it'd still be a bad experience.

We also had a thing called the release engineering dashboard for a while (link is to the code as the page is no longer live, it redirects to PDC), but it was killed recently as it wasn't entirely in line with current workflows, particularly since the Pungi 4 change. It also did live Koji queries when you hit it, I think, which made it rather slow. It covered more stuff than just nightly composes, though, to be fair, and my new thing doesn't do that.

So for the last few weeks we've really had nothing at all to point people to when they say "hey, how can I reliably find a recent Fedora nightly image?", and that's kinda embarrassing. We can just tell you to go look in the development tree, but that doesn't entirely work, because not every image compose succeeds every time, so the image you want might be missing, and anyhow, poking through the tree sucks.

My fedfind tool can find all the images in a given compose and do lots of other handy stuff too, but it can't actually answer the question, say, "where's the last Workstation live build for Fedora 24?"

At first I thought of extending fedfind to do that, but then I thought it might be more useful to write a little page generator that uses fedfind (and other things) to produce a simple HTML page with the information. So that's what I did!

The tool is called fedora_nightlies. What it basically does is keep a stash of information (as a flat JSON file, for now) on nightly images. This can be seeded with the last X days worth of composes, and there is a fedmsg consumer which can be used subsequently to update the data store each time a compose completes and each time openQA testing for a compose finishes.

It then produces a pretty basic HTML page from the data. For each 'image group' - like 'Workstation live' or 'Server netinst' - it links to the most recent successful nightly compose for each arch for Rawhide and the current Branched release. For images that are tested by openQA, it also links to the most recent image which passed all image-specific tests: the 'last known good' image.

And that's it! It's pretty simple, but I'm quite happy with it. I've got various ideas for improvements - I'd like to do 'last known good' for Cloud images via Autocloud fedmsgs, for a start - and hopefully we can move it into an official Fedora domain somewhere or other. There's also an open issue for improving the HTML, so if anyone was looking for a web design project, go ahead =) I do have some restrictions there, though - basically, don't go nuts with the Javascript.

Fedora 24 Internationalization Test Day on 2016-04-12!

Hi folks! Yep, it's that time again: Test Day time! Next Tuesday (hey yeah, I'm posting the note in good time for a change) will be Fedora 24 Internationalization Test Day. What's 'internationalization'? Well, it's stuff like input methods and locale-specific packaging, so one big topic this time will be the changes to glibc locale packaging in Fedora 24. But don't worry, there will be folks from the Fedora i18n team on hand to help with testing, so please just come along and help out! Whoever you are, you can easily do some testing.

As always, the event will be in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

Translation Test Day tomorrow, Tuesday 2016-03-29!

Hey folks! Sorry for the short notice, but once again it's time for a Fedora Test Day! Tomorrow, 2016-03-29, will be Translation Test Day over in #fedora-test-day on Freenode IRC. Fedora's dedicated g11n (globalization) team is working hard as always to get translation and internationalization in shape for the Fedora 24 release, so please do stop by and help out if you can. If you speak any language other than English, you can help out by checking the translations of some key apps in other languages!

If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

Fedora 24 and Rawhide: What's goin' on (aka why is everything awful)

Hi folks!

Welp, I was doing a Fedora 24 status update in the QA meeting this morning, and figured a quick(ish) summary of what all is going on in Fedora 24 and Rawhide right now might also be of interest to a wider audience.

So, uh, the executive summary is: stuff's busted. Lots of stuff is busted. We are aware of this, and fixing it. Hold onto your hats.

glibc langpacks

A rather big change landed in Fedora 24 and Rawhide last week. glibc locales are now split into subpackages using the 'langpack' mechanism that yum introduced and dnf supports. This lets us drop a somewhat ugly hack we were using to remove unneeded locales from space-sensitive images (Cloud and container images). However, it broke...quite a lot of stuff.

Locales lost on update/upgrade

As it initially landed, Fedora 24 / Rawhide users who updated to the new glibc packages lose all their locales; after the update you may have no locales except C and C.UTF-8. Various apps have trouble when you have a locale configured but not available, including but probably not limited to ssh and gnome-terminal. If you've hit this, you probably will want to do something like dnf install glibc-langpack-en at least (substitute your actual locale group for en). If you just want to have all locales back (so you can test apps in other locales and so on), you can do dnf install glibc-all-langpacks.

Installer doesn't run any more

anaconda tries to set os.environ["LANG"] as the default locale when starting up. There's also no dependency or lorax configuration to pull any glibc langpacks into the installer environment. The result is that in recent Rawhide and F24 nightly installer images, anaconda blows up during startup, trying to set a locale that isn't installed.

Live and cloud images don't build

This is actually a consequence of the previous issue. Cloud images are built via anaconda. As of last week, with the Pungi 4 switchover, live images are also now built using anaconda (via livemedia-creator). anaconda hits the locale bug in both those workflows, blows up, and consequently we get no live or cloud images in current Rawhide and Fedora 24 nightly composes.

Pungi 4

Speaking of Pungi 4...yep, as of the middle of last week, Fedora 24 and Rawhide composes are being done with that tool, as I've been talking about for a while. You can see evidence of this in the Rawhide and Branched trees. They now look more like release trees, with variant directories at the top level and all the regular images produced daily (well, they should be, except see above for why half of them are missing). If you've got scripts or anything which expect a certain layout of these trees, you're probably going to have to update them.

Up until glibc threw a spanner in the works this seems to have turned out quite well, but there are a few known consequences so far. There was a bug with the Server DVD installer image not booting properly due to an incorrect inst.stage2 kernel parameter, but that seems to be fixed now.

No name resolution on live images

If you manage to find a Pungi 4-created live image (from the few days before glibc broke 'em) and get it to boot, you'll probably find networking is busted. In fact basic connectivity works, but name resolution doesn't. This is because /etc/resolv.conf is a dangling symlink. This is the latest incarnation of a longstanding...disagreement among the systemd developers, NetworkManager developers, and everyone else unfortunate enough to get caught up in the crossfire. No doubt it'll get bodged up somehow this time, too, soon enough. You can easily resolve the problem manually with rm -f /etc/resolv.conf; ln -s /var/run/NetworkManager/resolv.conf /etc/resolv.conf. The change here isn't Pungi 4 per se, but the fact that under the Pungi 4 regime, live images are now created by livemedia-creator rather than livecd-creator. livecd-creator stuffed a /etc/resolv.conf into the live image it created, which avoided this bug by preventing systemd-tmpfiles from creating it as a dangling symlink on boot. livemedia-creator does not do this, so when the live image boots /etc/resolv.conf does not exist, systemd creates it as a dangling symlink, and NetworkManager refuses to replace the dangling symlink with its own symlink.

Rawhide / Branched reports missing depcheck information

The 'Rawhide report' and 'Branched report' emails are still going out, but they're now generated by a new tool and look a bit different. I kinda like the added information, but some people don't like the new format so much; send patches ;) It is known that at present the new reports are missing information on broken dependencies, and releng are working to get this back ASAP.

compose check report emails not appearing

I've mentioned this before, but briefly, the 'compose check report' emails sent out by my tool aren't happening at all at the moment. The process for producing them runs through fedfind and needed rather a lot of rework for the new Pungi 4-ish world. I have code that works now and am aiming to get it deployed this week. Right now all the reports would basically say "all the tests failed and half the images are missing" due to the above-mentioned problems anyhow.

Long term I'd like to move the image checks from check-compose into compose-utils and thus have the 'missing expected images' and 'image diff to previous compose' bits appear in the 'Rawhide report' / 'Branched report' emails; check-compose would then just generate an openQA test report, basically. Doing that cleanly requires a change to the productmd metadata format, though, which I need to work through with pungi and productmd folks.

Release validation test events not happening

Also due to the compose process changes, we can't really create release validation events at present. Well, we could create nightly ones, but the image download tables would be missing, and we'd have to do it manually; the stuff for creating them automatically is kind of outdated now (it relied on some assumptions about the compose process which no longer really hold true). We can't do Alpha TCs and RCs (and thus the events for them) until we work out with releng how we want to handle TCs and RCs with Pungi 4.

This week I'm aiming to at least update python-wikitcms and relval so we can have proper nightly validation events again and they'll have correct download links. Probably this will just involve changing the page names a bit to add the 'respin' component of Pungi 4 nightly compose IDs (so we'll have e.g. Test Results:Fedora 24 Branched 20160301.0 Installation or Test Results:Fedora 24 Branched 20160301.n.0 Installation instead of Test Results:Fedora 24 Branched 20160301 Installation) and tweaking wikitcms a bit to add the 'respin' concept to its event/page versioning design, and writing a fedmsg consumer which replaces the relval nightly --if-needed mode to create the nightly events every so often.

It'll probably take a bit longer to figure out what we want to do for non-nightly composes.

Other bits: Wayland and SELinux

We also have a couple of other fairly prominent issues related to other changes.

Lives don't boot or don't work properly with SELinux in enforcing mode

A change to systemd seems to result in several things in /run being mislabelled in Fedora live images. (Yeah, yeah, systemd and SELinux...please put down the comment box and step away from the keyboard, trolls, moderation is in effect). With SELinux in enforcing mode (the default), this seems to result in Workstation lives not booting (it sits there looping over failing to set up the live user's session, basically). KDE lives boot, but then lots of stuff is broken (you can't reboot, for instance, and probably lots of other bits, that's just the one our tests noticed). I didn't check other desktops yet.

You can work around this one quite easily by booting with enforcing=0.

Installer doesn't run on Workstation lives

Workstation live images for F24 and Rawhide were flipped over to running on Wayland by default (in most cases) quite recently. Unfortunately, the live installer relies on using consolehelper to run as root, but consolehelper doesn't work on Wayland. So if you find a recent Rawhide / F24 Workstation nightly live, and you get it to boot, and you ignore the fact that networking is busted, you won't be able to install it (bet you were just dying to do that after this blog post, weren't you?) unless you just run /usr/sbin/liveinst directly as root. Well, I mean, I'm not guaranteeing you'll actually be able to install it if you do that. I haven't got that far in testing yet.

IN CONCLUSION

So, um, yeah. We know everything's busted. We know! We're sorry. It's all gettin' fixed. Return to your homes, and your Fedora 23 installs. :)

Pungi 4: the new generation of the Fedora compose tools, and what it means for QA

There's a big change coming to Fedora 24. The way Fedora composes are built is changing.

How do things look now

Currently we have three distinct types of Fedora composes. Probably everyone knows about 'nightly composes' and TCs/RCs. You may not know about the post-release nightly Cloud composes. (I'm not counting the live respins, which are demi-semi-official and not produced by releng).

Nightly composes

'Nightly composes' are an interesting concept - in fact, they hardly exist from any perspective but that of the actual compose process. What really happens nightly is:

  1. buildrawhide (or buildbranched, for Branched nightlies) calls pungify (note that all the heavy lifting is done by build-functions.sh, which is sourced by both buildrawhide and buildbranched)
  2. pungify 'pungifies' the Rawhide repositories (it doesn't know anything about variants - that's Workstation / Server / Cloud). That is, it creates network installer images and provides a kernel and initramfs for direct kernel boots, and writes the old-style metadata like the .treeinfo file
  3. buildrawhide / buildbranched calls a livecd script, a cloud image script, an arm image script and a few other bits
  4. buildrawhide / buildbranched syncs the pungified tree to the public mirrors (each day's Rawhide and Branched trees are also kept around here for a bit)

So we wind up with the pungi outputs and a bunch of Koji tasks that try to build some live images, ARM disk images, and Cloud disk images. There's nothing that really ties the Koji tasks to the pungified repositories, and after the fact there's no metadata about the compose as a whole. There are some fedmsg signals sent during the compose process, but there's no fedmsg signal sent after all the Koji tasks complete.

The pungified repos live in the main public Fedora servers for one day, then get replaced with the next day's compose. They're kept around for a few weeks, but the location where they live is not really documented anywhere, and there's no signalling of it via fedmsg. The images built with Koji never get put anywhere else and nothing in the releng process communicates their location (or the Koji task IDs or anything like that) - if you want them, you have to go find them in Koji somehow, and you're on your own with the "how".

Finally, the old compose processes are both fairly slow. A nightly compose takes approximately 9 hours (including Koji tasks) - it usually starts around 0515 UTC with the final Koji task completing around 1415 UTC. TC and RC composes are similar.

TCs / RCs

TCs and RCs are built using a different script from the same releng repo - it uses the same livecd/arm/cloud build scripts, but builds the variant install trees rather than pungifying the main repositories, and builds the Server DVD and Cloud_Atomic installer image as well as network install images. The Koji task outputs and installer trees are then rather messily glommed together and sync'ed out in a way which requires some manual intervention. There are no fedmsg signals sent at all as part of TC/RC creation. There is no useful metadata for the compose as a whole, only the bits of metadata pungi produces in the installer trees.

Post-release nightly Cloud composes

For the last few months we have also had nightly composes built from the current stable release. These composes contain only Cloud images. They are created to support the two week Atomic process; each two weeks one of these composes is 'blessed' and released (yes, the Atomic downloads on the official page are not the ones from the initial Fedora 23 release, but the latest 'two-week Atomic' images). These are created with yet another script.

What's changing

From some time relatively soon (according to Dennis), all Fedora composes will be built with the new Pungi 4 tool and the newer scripts and configuration that go with it. Though it's still called Pungi, this 'new version' is almost a completely different tool (it's actually the union of old-pungi and Red Hat's distribution build tool used for RHEL). With Pungi 4:

  • Composes will happen frequently - multiple times per day - and take less time
  • The same process will be used for all composes (at least, I'm assuming the post-release nightlies will use it too)
  • All composes will (try to) build all images (inc. all the variant installer images)
  • All image builds happen in Koji (even installer media, which did not before)
  • An 'Everything' variant will provide a generic network install image
  • Live images will be created with livemedia-creator
  • fedmsg signals will be sent throughout the process, including one after the whole compose process is done and the compose is available, with the compose ID and location
  • Composes will include much better and more comprehensive metadata and logs. Some metadata is still inside the installer trees (these links are to a current Pungi 4 compose at the time of writing, they will likely go stale in future and may not represent the future state of the metadata)
  • The various reports generated as part of the compose process itself (such as the ones combined into the 'rawhide report' email) will be improved in several ways, for instance when packages change, not only is the new NEVR reported but also the previous one

What it means

Easy task scheduling (RIP fedfind...ish)

The most obvious consequence of the fedmsg and metadata improvements is that it gets much easier to find out when a compose completes and where you can find the bits of it once it's done. We are already scheduling various things to happen on completion of a compose at present (more on exactly what later), but we had to build a whole messy project to make it possible - fedfind.

Fedfind compensates for the lack of fedmsg signals by working out (quite painfully, in the case of nightly composes) when a compose is complete, and compensates for the lack of consistent compose locations, contents and metadata by having lots of hardcoded knowledge about where composes live (which needs updating whenever that changes) and having some quite ugly capabilities for crawling through compose trees and querying Koji and figuring out what images can be considered to be part of a given 'compose'.

With Pungi 4, all of this becomes unnecessary. To find out when a compose is complete, you can simply listen for a fedmsg signal. If you don't want to do that, there's a STATUS file in the compose tree which can tell you the current status of it. As images that come from Koji tasks are properly pulled into the final compose tree, there's no need to go and poke Koji to find lives or Cloud images or ARM images. And as there's comprehensive metadata about the images in the compose, there's no need to crawl through the compose tree to find images and try to infer their identity from their filenames.

This is great news for me, because I no longer need to maintain the messy ball of hacks that is fedfind! Well, mostly - there are some odds and ends in there that we'll probably still need. But it gets much smaller, and we might be able to move the remaining bits into python-fedora or similar.

More frequent and rapid compose testing

With composes happening multiple times a day and taking significantly less time, we can really shorten the time between a package being tagged in Koji and it appearing in a compose. This is of course great news for Rawhide users in general, but for QA it also means we can get finer grained with our compose testing: each compose can be run through the automated tests we have available (e.g. the openQA tests) as soon as it appears, and thus the longest time between a package being tagged and a compose including it being tested might be seven or eight hours rather than over 32.

Validation process changes: the end of TCs?

I recently floated an idea on the mailing lists: dropping TCs. I won't rehash the mail, but basically, if we have regular composes that look just like release composes, there's no meaning to TCs any more. TCs at first existed because we simply didn't have anything like nightly composes - TCs and RCs were all the composes we had. If we wanted to see how a Fedora build would work right now, hey, we went to releng and asked for a TC. Then when we started getting nightly composes, we kept TCs because nightlies still didn't really look like real composes. Pungi 4 solves both those problems, and so there's no real reason to have TCs any more.

No-one seems to be opposed to this, so it looks like we'll be going ahead and killing off TCs once we officially switch to Pungi 4. RCs will likely remain, as there are issues with identification and certain settings that are flipped for 'released' images, but the difference between an RC and a regular compose will be much smaller than before and we can start to think about whether we might want to move away from a milestone-based development / test process in future.

How the release validation process will likely actually work is that we'll keep 'nominating' nightly composes for manual testing (a process which is really just about controlling the compose firehose, because humans can't cope with running complex tests every six hours and we don't really want seven thousand wiki pages per release) all the way up until Alpha RC, then we'll do a series of RC composes just as usual, then after Alpha release we'll switch back to nominating nightlies until Beta RC, and so forth. openQA - and in future any other automated tests we run on composes - will run on every compose that comes out of releng, and report its results to the wiki when a compose is nominated.

Consolidating post-compose tasks

The final thing I wanted to talk about is the fact that this change gives us a great opportunity: we can consolidate all the various things that happen after a compose. Over time, and especially over the last year or so, we've accreted kind of a lot of these. My list - probably not exhaustive - is below. Of course, all of these things should be emitting fedmsgs on start and on completion.

There might be opportunities for reconciling some capabilities of the bits below, especially the Stage 2 bits: I've been working on making check-compose capable of replacing the two-week atomic check bits. But more importantly, I think it would be a good idea to run as many of these things as possible out of Taskotron. One of taskotron's main strengths, after all, is running tasks based on fedmsg messages, and in an ideal world (I reckon) all of these things would run off fedmsg.

Once we get Pungi 4 deployed, I'd really like it if we could work to have a nice clean fedmsg and Taskotron-based process for running all of these various things, so if you're involved in any of them and I haven't talked to you already, I'd love to hear from you! I'd also love to hear about anything that should be listed below but isn't (and any corrections to the things that are listed there, or anything else in this post).

Stage 0

This is the start of the whole process, just here for completeness (and because it's where 'rawhide report' comes).

  • The compose itself
  • The reports generated as part of the compose: 'rawhide report'

Stage 1

These are the things that happen (or at least ought to happen - not all of them currently do) immediately on compose completion.

Automated testing

We have openQA and autocloud now. Taskotron and Beaker will likely be running automated tests on composes in future (Taskotron already runs several tests, but none of them are part of compose testing). openQA currently uses fedfind to run when composes complete; autocloud listens for Koji task fedmsgs and runs when it sees one that looks like it was for one of the images it tests.

Manual validation test nomination(?)

relval currently does this simply as a cron job: it wakes up at a given time each day and decides whether to nominate that day's compose for manual testing. This is marked ? as it may be appropriate to move it to stage 2, and only nominate composes that pass certain automated tests.

Submission of compose information to PDC(?)

PDC, if you've never heard of it, is a store of information on composes and a web service for accessing it. Information for all Fedora composes should be stored in PDC in future. I don't know how this is handled at present. It is marked ? as it may be appropriate to move it to stage 0 (i.e. have it happen as part of the compose process itself).

Stage 2

These are the things that happen (or ought to happen) after one or more of the things in stage 1.

check-compose

We have a few things that do something along the lines of a "status check" for a compose. check-compose reports when a compose is 'missing' expected images, summarizes the results of automated testing (currently only openQA), and compares the images in the compose to those in the previous compose. It uses fedfind to wait for the compose to complete, and openQA-python-client to wait for openQA tests to be complete.

Two-week Atomic check

The two week Atomic push script queries Datagrepper (a cache of fedmsg messages, more or less) to check the autocloud results and find the 'latest successful' post-release nightly compose, in order to release it.

compose-utils(?)

compose-utils has a changelog tool that identifies the packages that changed between two composes, and produces a diff of the changelogs of all changed packages (thus identifying all the changes). This is similar to something the Rawhide compose report does, but that only prints the changelog for each new build (so if that package actually had more than one new build since the previous compose, it does not show the changelog for the earlier new builds). I'm not sure if this is intended for use as a standalone, or if it's expected to be integrated with pungi somehow (and thus should be in stage 0).

Submission of test information to PDC

PDC can also store some information on the test status of a given compose. It's undecided yet whether doing this would be of any use to us, but if so, it should be done after the relevant tests have completed, of course.

DevConf 2016: Pungi 4 and the Fedora compose / validation cycle

Hi folks! Just a quick note for anyone who might be wondering - I'll be at DevConf 2016 in Brno next week. (I'll also be at the mostly-Red Hat-only-I-think QEcamp event before that). I'm expecting to spend most of the time running around like a chicken with its head cut off, trying to talk to people about the pending move to Pungi 4 for Fedora composes and the consequences / opportunities for release validation and so on. There will probably be quite a bit of change, hopefully for the better!

I'm sitting on a more detailed mailing list post, but wanted to run it by Dennis before sending it out, so stay tuned for that. In the mean time, I've been working on rewriting the openQA scheduler bits for Pungi 4 composes. I also have a document up on the Fedora Gobby instance with some fairly inside-baseball, rough notes, so if you kinda know what's going on, you might be able to contribute some thoughts to that - it's called pungi4-qa-integration.

If you're gonna be at DevConf and you'd like to talk to me about that or anything else at all, please do buttonhole me. I'm the one with the vaguely bemused grin who can never remember anyone's name.

As a quick follow-up on my previous post about 'N-1 upgrades' - FESCo approved support for such upgrades in principle, so we're now just finalizing the details of changes to the release criteria. For Fedora 24 and onwards, upgrading from the last-but-one release (so Fedora 22, in the F24 case) will be 'officially supported'! Exciting, right?

Upgrading from the previous stable Fedora release

One of the big topics we're working on in Fedora QA right now is what we sometimes refer to as 'N-1 upgrades'. The Fedora release process is expressly designed such that each release does not go EOL until a short time after the next-but-one release comes out (so Fedora 22 will not go EOL until a month after Fedora 24 comes out). This has a couple of benefits which are generally agreed to be valuable: you always have at least a couple of Fedora stable releases to choose from at any given time (so you have the previous one to fall back on if the current one turns out to be a complete lemon for your purposes), and - theoretically at least - if you maintain long-lived Fedora systems, you don't have to upgrade to each new release if you don't want to; you can always skip one.

A problem for the second case, though, is that we don't 'officially support' upgrades across two releases (e.g. from Fedora 21 to Fedora 23, or Fedora 22 to Fedora 24). As I've mentioned before, defining what 'officially support' means is always a bit tricky when it comes to a free-of-charge community distribution: it is always the case that when anything at all goes wrong with Fedora, we offer a 100% money back guarantee ;). But in this specific case, we can say a couple of things:

  • The release criteria require that upgrade from a clean install of the previous stable release works, but not an upgrade from a clean install of the last-but-one stable release
  • There are no official packaging requirements for 'N-1' upgrade support
  • Until recently, there was no formal testing of 'N-1' upgrades

The official 'story' on this was that even if you wanted to skip a release entirely, you were supposed to upgrade through it - so if you wanted to run Fedora 21 until it went EOL then go to Fedora 23, you were still supposed to upgrade to Fedora 22 first, then straight to Fedora 23. This has long struck many people as a bit odd, though, and recently we're taking steps to do something about it.

Fedora upgrades in general have certainly become a lot more reliable lately, first with the introduction of fedup, then DNF-based upgrades. We've done some informal, ad-hoc testing of N-1 upgrades for the last couple of releases, and found that in general, they tend to work. So for the Fedora 24 cycle, we're trying to put this on a bit more of a formal, supported basis.

openQA now has two sets of upgrade tests; it will always test upgrades from both the current stable release and the previous stable release, for clean installs of the Workstation and minimal package sets (we will probably extend the coverage to other package sets soon). We have added a second set of upgrade test cases to the release validation test pages, covering upgrades from the previous stable release. And finally, we've been discussing making N-1 upgrade support 'required' in a couple of senses: adding it to the release criteria and packaging guidelines. The current status is that the QA team is agreed that this should happen, but it's not something we can decide alone, so we are working on an FPC request to change the packaging guidelines.

We're certainly interested in hearing feedback on this topic, so if you have any thoughts, please post them to the test@ mailing list, or pass them along on IRC or anywhere else you find Fedora QA folks!

Possible outage notice

Hey folks! Power's down here at Happyassassin Towers. My servers are on UPS for now, but if the outage goes on for more than an hour or so I'll have to take them down to preserve the remaining UPS power to keep the internet connection up (so I can at least use my laptop). So if this site and all other happyassassin bits stop responding in a while, you'll know why!