Fedora openQA now public

Well, I don't know about everyone else, but this is a beautiful sight to me!

As I've written about before, we've been using openQA for some time in Fedora testing - many thanks to the great folks over at openSUSE who work hard on it. It started as a sort of ad-hoc, skunkworks project, so it initially ran simply on a single box we happened to have lying in one of Red Hat's offices, running openSUSE not Fedora. It's turned out to be quite valuable, and so since we want to keep running it at least for the medium term (and potentially longer, if we can find a sensible way to integrate it into Taskotron in the long term), obviously it would be against Fedora's spirit to leave it running behind Red Hat's firewall (where non-RH folks couldn't see it) and on an openSUSE system.

So we've been working quite hard to get openQA running and cleanly deployable on Fedora, and set up a new more official deployment. There are now actually two Fedora openQA deployments: production and staging. Each is running on Fedora Infrastructure managed boxes (running Fedora 23) in the Fedora data centre, and deployment is completely handled by Ansible - you can see the playbooks in the infrastructure ansible git repo. The staging deployment has a VM acting as the server and one bare metal system hosting the 'workers' (which run the actual tests in VMs); the production deployment has a VM server and two bare metal worker host systems. This gives us rather more test capacity than the old deployment had, so the tests run faster and we can add more without them taking too long.

Almost all the packages are in the Fedora repositories, but a few are still outstanding: perl-Mojolicious-Plugin-Bootstrap3 and os-autoinst are in the review process, and the openqa package itself can be reviewed after those two are approved. For now, those three packages are available in a COPR for anyone wanting to set up their own openQA (only Fedora 23 and Rawhide have all the necessary packages at present).

The 'compose check report' emails that get sent daily to the test and devel mailing lists are now generated by the new production deployment, and from tomorrow (2015-12-06) will include direct links to all failed tests for that day, so now non-RH folks can actually see what went wrong with the tests!

QA happenings, post-F23

Hi folks! I haven't blogged for a while, so I thought I'd write up a few notes on what's going on in QA now Fedora 23 has been released.

We've been working for the last few years to improve ongoing validation testing outside of the Alpha/Beta/Final TC/RC system, so of course, we're testing Fedora 24 already. openQA is testing Rawhide nightly, and we're getting nightly builds nominated for manual validation testing regularly. These validation test events are announced on test-announce and you can always find the current validation event summary. We've found several release blocker and showstopper bugs already, and gotten several of them fixed; today we saw that all the openQA tests had failed, so I checked it out and found that a new dracut build was causing the installer images not to boot.

We also wrote the Fedora 23 Common Bugs page, and have been keeping it up to date (along with the Fedora 22 Common Bugs page). It's always a good idea to check out those pages if you're running into what seems like a major bug - you may find more information and help there.

We've been working on an improved deployment of openQA. As I wrote about before, the current 'semi-official' Fedora openQA instance is running on a machine inside Red Hat's firewall, so only Red Hat folks can see the tests. This isn't what we want, of course, so we've been working for some time to make it possible to deploy openQA in the Fedora infrastructure. This is almost complete, now - there is in fact a staging openQA instance running in the Fedora data centre now, running tests nightly; we're only waiting on some firewall rule changes to make it publicly visible. We should have both staging and production openQA instances fully up and running pretty soon, which will have two major benefits: more capacity (the new production instance should be able to handle 3-4x as many tests as the current instance) and of course allowing non-RH folks to see the tests (which means we can link to them in report emails, Bugzilla reports and so forth as well).

We've also been working on improving the blocker bug process, specifically the handling of 'special blockers'. For some years now, we've been finding release blocker bugs for which the fix doesn't actually go onto the new release media; we've informally referred to these bugs as 'special blockers' and had some ad hoc workarounds for dealing with them, but we really need something better. There are two groups of 'special' blockers: bugs where we need a fix to go into the 0-day updates for the new release (the set of updates which is already in the update repository on release day), and bugs where we need a fix to go out as an update for the previous release(s) (so, for the release of Fedora 23, we had a couple of cases where we needed to ship updates for Fedora 22 and Fedora 21). The current process gives us a strong guarantee that the release media won't be built without fixes for all blocker bugs that need to go on the media, but we don't really have any process in place for making sure either kind of 'special blocker' actually gets fixed in time - we often say something like "we'll have to make sure we send out an update for that before the release", but we don't have any process in place to make sure that actually happens, and sometimes it doesn't.

So we have a mailing list discussion going at present (sorry, I can't link to it, as archives haven't been imported to Hyperkitty yet...) about how we can better track those 'special' blockers, and how the release process could be adjusted to ensure the fixes actually appear in the appropriate places in time for the release date. These changes should start happening pretty soon, well in time for Fedora 24 Alpha.

Taskotron work continues at a good pace, with disposable clients now almost ready for deployment, and several interesting plans for new tests that should help ensure consistent repository quality.

Of course, things like update testing roll on as always, with many QA team members volunteering lots of their time to test updates - this is always a great way to help out the project if you have a little time! If you dropped out of testing while Fedora Easy Karma was broken by the Bodhi 2 transition, good news - it's working again now!

The Fedora 24 Test Day cycle should start ramping up soon with a call for Test Days, and the Heroes of Fedora posts for the Fedora 24 cycle should be coming soon too. Before Fedora 24 testing really ramps up in earnest, we're hoping to be able to automate even more of the release validation tests - in openQA, taskotron, Beaker, or whatever else works best! We're looking at some of the simpler Desktop validation tests, the Base tests, and perhaps some of the Server tests as targets for this cycle.

Cinnamon Test Day tomorrow (2015-10-08)

It's time for the final Test Day of the Fedora 23 cycle: tomorrow, Thursday 2015-10-08, is Cinnamon Test Day. A nice simple one: we'll be testing out the Cinnamon desktop on Fedora 23, particularly the new live spin, and making sure everything works well and looks right. If you're a Cinnamon fan, interested in it, or just a keen tester with a bit of spare time, please come out and help!

The Cinnamon spin maintainer Dan Book (grinnz) will be around and I'll try to drop in from time to time too.

As always for Test Days, the live action is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

OpenWRT on Zyxel NBG6716 (ar71xx nand) - upgrading to Chaos Calmer final

Well, I've had an interesting day...

I wrote before about using OpenWRT on the Zyxel NBG6716. I've been using it ever since; I updated to newer snapshot Chaos Calmer builds a couple of times to fix a couple of minor wifi bugs, but I'd been sitting on a fairly old snapshot (from March 2015) for a while now.

Today I figured it was time to update to the final stable release of Chaos Calmer, which came out not too long ago. Unfortunately it turned out to be quite the experience!

When I upgraded to newer snapshots the sysupgrade method worked fine. Because of some issue or other in their build scripts upstream doesn't actually produce images for the NBG6716 right now, but it's easy enough to build them with the image generator - you just grab the ar71xx-nand image builder and run make image PROFILE=NBG6716 PACKAGES="nano htop luci luci-ssl" (or whatever package loadout you want), and you get images in bin/ar71xx. I'd just been copying across the sysupgrade.tar file to the router and running sysupgrade -v on it and it was fine.

However, when I tried that for CC final, it just did not want to work at all. At first it was failing on a pre-flight check:

 /sbin/sysupgrade: eval: line 1: nand_do_platform_check: not found
 Image check 'platform_check_image' failed.

I worked out that the check it was looking for was in a file /lib/upgrade/nand.sh which didn't seem to be in my current firmware at all. So I copied it over from the image builder root onto the router and tried again, but now it would fail when trying to actually do the upgrade, with Command failed: Method not found. Near as I could tell this was suggesting that some expected capability in OpenWRT's ubus thing is not present in the firmware version I was running. So basically it looks like the upgrade process was simply broken for upgrading from the firmware I had installed to CC final.

Crap.

So I figured, OK, guess I'll back up my config, flash the new firmware clean, and restore my config. So I copied the factory.bin firmware file over to the router and tried to flash it with mtd...and it was just not having it at all. Every attempt would fail with [e]Failed to get erase block status, which near as I can tell indicates that for some reason mtd was hitting an error when it tried to check if the flash blocks were bad (it raises that error when it tries to use the kernel MEMGETBADBLOCK interface to check, and gets back a negative result). I tried various stupid things to try and work around that, but no joy.

So I thought screw it, I'll flash it over tftp. Read the instructions, set up a box as instructed, rebooted the router with the WPS button held down, and...nada. It just booted apparently normally.

So I kept trying, with the rabbit's foot, fiddling with network cables, and chasing after suspicious-looking tftp error messages for hours...finally by running a different TFTP server I could observe that it was definitely sending the image to the router, but for some reason the router wasn't flashing it.

Finally I twigged: it won't flash an OpenWRT image via TFTP, only a stock firmware!

So I had to grab a stock Zyxel firmware, flash that over TFTP, hard reset it (because for some reason it came up out of the flash with non-default admin credentials...whatever), then flash the clean OpenWRT CC final image from the stock firwmare using mtd (whatever mtd is in the stock firmware had no problem doing it...), and finally boot into CC final and reload my configuration backups.

Yeeeeesh.

Cloud Atomic Test Day on Tuesday (2015-09-22)

Hi folks! Just to let everyone know, we have another Test Day coming up on Tuesday 2015-09-22, Cloud Atomic Test Day. The wiki page isn't up yet but should be soon. This will be a combined event with the fine folks over at CentOS, and the idea is to test both Fedora's and CentOS' Atomic host images. This is some cool cutting-edge stuff, so if you're interested in learning more about some cloud technologies (or if you're already an Atomic expert and want to help make sure things are ship-shape), come out and help!

As this is a combined Test Day, it will NOT be in #fedora-test-day, but instead in #atomic on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

Identifying Fedora media redux: What is Atomic?

You may remember my post from yesterday, on whether a 'Fedora Atomic' flavor made sense, and how I think about identifying Fedora media for my fedfind project.

Well, after talking it over with a few folks this morning and thinking some more, I've come to the conclusion that post was kind of wrong, on the major question: what is Atomic?

In that last post I came to the conclusion that Atomic was a deployment method. I still think deployment method is a perfectly viable concept, but I no longer think Atomic really is one. Rather, ostree is a deployment method, and Atomic really is - as I previously was considering it - something more like a payload.

Reading the Project Atomic site, it seems I was kind of working from an outdated understanding of the 'Atomic' concept. I'd always understood it as being more or less a branding of ostree, i.e. an 'Atomic' system was just one that used the ostree deployment method. But it seems the concept has now been somewhat changed upstream. To quote the Introduction to Project Atomic:

"The core of Project Atomic is the Project Atomic Host. This is a lightweight operating system that has been assembled out of upstream RPM content. ...

Project Atomic builds on these features, using the following components, which have been tailored for containerized-application management:

  • Docker, an open source project for creating lightweight, portable, self-sufficient application containers.
  • Kubernetes, an open source project that allows you to manage a cluster of Linux containers as a single system.
  • rpm-ostree, an open source tool for managing bootable, immutable, versioned filesystem trees from upstream RPM content.
  • systemd, an open source system and service manager for Linux."

There is still some conceptual confusion about exactly what Atomic means - for instance, there's a concept called Atomic Apps, and it is apparently fine to deploy Atomic Apps on systems which are not Atomic Hosts. But after discussing it with some folks this morning, I think it's reasonable to take the following as a principle:

Any Fedora image with 'Atomic' in its name deploys as an Atomic Host as defined by Project Atomic.

Assuming that rule holds, for 'image identification' purposes under the fedfind concepts covered in the previous post, atomic really does work best as something in the line of a flavor, loadout and/or payload. It also implies the deployment method ostree, but I think I'm OK with leaving that concept out of fedfind for now as it's functionally useless: as all Atomic images have ostree deployment and all non-Atomic images have rpm deployment, distinguishing the properties is no use at this point. I'm going to hold it in reserve for the future, in case we get other images that use non-RPM deployment methods but are not Atomic hosts.

For now I think I'll more or less go back to the subflavor concept I initially threw out in favour of deployment, and for current purposes, all Atomic images will have flavor cloud and subflavor atomic. The alternative is to have a concept that's a bit more parallel to flavor and loadout which feeds into payload, but for now keeping the interface stable seems better.

Identifying Fedora media

EDIT: With the immensely greater wisdom that comes with being a day older, I'm now more or less convinced this whole post is wrong. But instead of editing it and confusing people who read the original, I've left it here and written a follow-up. Please read that.

Thanks to mizmo for inspiring this post.

On 'Fedora Atomic' as a flavor

Mo wrote that the Cloud WG has decided that, from Fedora 24 onwards, they will focus on Atomic-based images as their primary deliverables. In response to this, Mo and Matt Miller were kicking around the idea of - as I understand it - effectively rebranding the 'Cloud' flavor of Fedora as the 'Atomic' flavor.

This immediately struck me as problematic, and after a bit of thought I was able to identify why.

The current Fedora flavors - Cloud, Server, and Workstation - can be characterized as contexts in which you might want to deploy Fedora. "I want to deploy Fedora on a Cloud!", or "I want to deploy Fedora on a Workstation!", or "I want to deploy Fedora on a Server!" Yes, there's arguably a bit of overlap between 'Cloud' and 'Server', but we do have a reasonable answer to that ('Cloud' is about new-fangled cattle, 'Server' is about old-fashioned pets).

Atomic is not a context in which you might deploy Fedora. You can't say "I want to deploy Fedora on an Atomic!", it just doesn't work. Atomic is rather what I'm referring to as a deployment method. It's a mechanism by which the system is deployed and updated. Another deployment method - the counterpart to 'Atomic' - would be 'RPM'. Atomic happens to a deployment method that's quite appropriate for Cloud deployments, but that doesn't mean you can just do s/Cloud/Atomic/g and everything will make sense.

So for me the idea is simply not conceptually compatible with how we define the flavors. We might even, for instance, want to build 'Atomic Workstation' instances of Fedora. I believe there's even been interest in doing that, before.

Mo's post suggests that we might treat 'cloud' rather like we currently treat 'ARM', as a sort of orthogonal concept to the flavors, with a kind of parallel identity on the download site. I would suggest that it would make more sense to do exactly the opposite: it's Atomic that should be given a sort of parallel existence. We might want to have an Atomic landing page which focused on all the Atomic implementations of Fedora. But it's not, for me, a flavor.

Thinking about identifying Fedora media

So you might be aware, I have a little project called fedfind, which is broadly about finding Fedora media. About half of the work fedfind does is finding the media. The other half of the work it does is identifying them. Fedora 23 Beta RC1 has 59 'images', approximately (depending on whether you count boot.iso files separately from their hardlinked alternate names). Trying to come up with a set of attributes that would serve as a sort of conceptual framework for identifying images has been an interesting (and ongoing!) challenge. In fact, as a result of thinking about the 'Atomic' proposal, I've just revised fedfind's approach a little. So, I thought I'd write quickly about how fedfind does it.

I first thought about this stuff before starting fedfind, when I wrote an image naming policy. fedfind initially followed that policy precisely. Events have overtaken it a bit, though, so the current implementation isn't quite the same. And today, I've added the deployment method concept as an attribute in fedfind's system. The changes described here are sitting in my editor right now, but should show up in git master soon!

Some of the attributes fedfind uses are fairly obvious (arch is just the arch, and release, milestone and compose are just version identifiers), so I'll focus on the more interesting ones. fedfind provides a couple of properties - desc and shortdesc - for images which can be used as descriptions; taken together, all these properties allow us to give a unique identity to every one of the 59 images in 23 Beta RC1, as well as covering all historical releases (in my checks anyway). shortdesc contains payload, deployment (if not 'rpm'), imagetype, and imagesubtype (if present); desc adds arch. For any given release's images, there should be no duplicated descs.

imagetype and imagesubtype

These identify the 'type' of the image specifically. What's the difference between the Fedora 22 x86_64 Workstation live image and the Fedora 22 x86_64 Workstation network install image? They're the same version, same arch, same payload (see below) - but different imagetype. A few images also have an imagesubtype. For instance, we provide two types of Vagrant image, for use with libvirt and virtualbox; these have vagrant as the imagetype and libvirt or virtualbox as the imagesubtype.

flavor, loadout and payload

It's intended that most every fedfind image should have either a flavor or a loadout; for convenience, whichever one it has is also available as the payload. This can be thought of as 'what's the actual content of the image'. Flavors are workstation, server, and cloud; loadouts are things like kde, xfce, mate etc. (for live images and ARM disk images), plus some oddballs. There is a minimal for ARM disk images, and source and desktop for older releases where we had source CDs/DVDs and 'Desktop' live images. There's one fairly special case: pre-Fedora.next DVDs and network install images, boot.isos, and current nightly images have no flavor or loadout, but have the special payload generic.

Current fedfind releases have a concept of subflavor, which is only used to give some Cloud images a subflavor of atomic. After thinking about this (above) it seemed completely wrong, so today I've changed things around so the subflavor property is gone, and instead we have...

deployment

As discussed above, this is the 'deployment method'. Currently it's either rpm or atomic.

i18n Test Day coming up tomorrow (2015-09-01)

It's time for another Test Day tomorrow (2015-09-01)! Continuing the "all those places in the world that aren't America" theme with i18n, it's the i18n Test Day! Here we'll be testing things like complex language input, rendering of non-Latin text, and DNF langpack installation. If you have some time to stop by and help out, please do - we always want to make sure everyone using Fedora gets the best possible experience regardless of the language they speak!

As always for Test Days, the live action is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.