CI and LLM review on Fedora Forge with Forgejo Actions

Adam Williamson

2026-01-19 23:19

Hi folks! Over the last couple of weeks, we have migrated nearly all the quality team's repositories from Pagure (the old Fedora forge) to the new, Forgejo-based Fedora Forge. As part of this, I've figured out a process for doing CI with Forgejo Actions. I also came up with a way to do automated LLM pull request reviews, for those interested in that.

For the impatient, you can just look at / copy the two workflows in python-wikitcms, but you'll at least need to read the stuff about runners below, and set up the necessary API key secret.

Forgejo Actions works very similarly to GitHub Actions, by design. You create a .forgejo/workflows directory in your project and define workflows in it. The syntax is almost entirely compatible with GitHub Actions, but with several missing features.

Some very commonly-used shared actions, like actions/checkout, are ported to Forgejo so you can use them directly. Other shared and third-party actions can be used by giving a full URL to them - e.g. uses: https://github.com/actions-ecosystem/action-remove-labels@2ce5d41b4b6aa8503e285553f75ed56e0a40bae0 # v1.3.0 - but whether a given action will work or not depends on whether it's written to assume it's running on public GitHub, and whether Forgejo has all the features it needs.

Probably the most noticeable difference with using GitHub Actions is runner availability and environment. If you have a public GitHub project you can define workflows with something like runs-on: ubuntu-latest; behind the scenes, GitHub maintains a farm of runners with various labels, of which ubuntu-latest is one, and your jobs will run on any available runner with that label. The available environments for public GitHub repos are a handful of Ubuntu, Windows and macOS versions.

The staging instance of Fedora Forge has a few universal runners you can use like this. Currently each has only one, unique, label, so you can't specify workflows with a label like fedora and have them run on any available runner; you have to just pick one of the labels, and your jobs will always run on that runner. Maybe this will get changed at some point. But the runners are available to all repos in the staging instance, so you can just define a workflow and get it run.

Currently the production instance has no universal runners like this; runners are limited to specific organizations. The releng and infra organizations have runners, and now I requested one, the quality organization has one too. If you want to run workflows for projects in a different organization, the first thing you'll need to do is file a ticket to request runner(s) for that organization. If you have admin access to an organization, you can see whether it has runners, and what labels they have, by visiting https://forge.fedoraproject.org/org/<organization>/settings/actions/runners.

Once your org has at least one runner, you can define workflows and they'll run, as long as you set the runs-on value to a label that at least one of the runners has.

However, you might be surprised by the default environment: it's currently Debian Bookworm. Until that gets fixed, you may be interested in the container directive for workflows, which lets you define any arbitrary container image to be used:

    container:
      image: quay.io/fedora/fedora:latest

There is one little gotcha with this, though. Many GitHub actions, including checkout, are written in Node, but Fedora's stock container images don't have Node installed. So you have to install it before running checkout or anything else that uses Node.

Put it all together, and here's the workflow I've defined for doing CI on Python projects with Tox:

name: CI via Tox
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  tox:
    runs-on: fedora
    container:
      image: quay.io/fedora/fedora:latest
    steps:
      - name: Install required packages
        run: dnf -y install nodejs tox git
      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
      - name: Install Python interpreters
        run: for py in 3.6 3.9 3.10 3.11 3.12 3.13; do dnf -y install python$py; done
      - name: Test with tox
        run: tox

That runs whenever a pull request is opened or pushed (the on section). It expects a runner with the fedora label (the runs-on setting). It uses the fedora:latest container image from quay.io (the container setting). From that image, we install packages we're going to need - including nodejs (the first step). Then we run actions/checkout to check out the PR (the second step, the uses one). Then we install all the Python interpreters we need, and run tox (the final two steps). Of course, if your project isn't Python or doesn't use Tox, you'll have to tweak this a bit, but hopefully you get the general idea.

If you're security-minded, you might notice there's no permissions setting in this workflow. That's because Forgejo currently does not support fine-grained permissions in the automatically-generated workflow tokens. In Forgejo, the automatically-generated token always has full read/write privileges unless it's operating on a pull request from a fork, in which case it has only read permissions. Nothing finer-grained is possible at present. If you need something finer-grained, you have to generate a token manually, save it as a repository secret, and adjust your workflow (somehow) to use that and hide the automatically-generated token as far as is practically possible (that's outside the scope of this post).

So that's CI! What about LLM pull request review? Well, if you dislike or are not interested in that, stop reading now. If you are interested, here's a recipe:

name: AI Code Review
on:
  pull_request_target:
    types: [labeled]

jobs:
  ai-review:
    if: forgejo.event.label.name == 'ai-review-please'
    runs-on: fedora
    container:
      image: registry.gitlab.com/redhat/edge/ci-cd/ai-code-review:v2.5.0
    steps:
      - name: Run AI Review
        env:
          AI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
        run: ai-code-review --platform forgejo --pr-number ${{ forgejo.event.pull_request.number }} --post

  # this has to be a separate job because ai-code-review container does not have nodejs in it
  # also note this does not work for PRs from forks because of a forgejo bug
  # https://codeberg.org/forgejo/forgejo/issues/10733
  remove-label:
    runs-on: fedora
    steps:
      - uses: https://github.com/actions-ecosystem/action-remove-labels@2ce5d41b4b6aa8503e285553f75ed56e0a40bae0 # v1.3.0
        with:
          labels: ai-review-please

That will cause the ai-code-review tool to review the pull request and post its analysis as a comment.

If you trust me and the Fedora Quality organization, you can also just use this:

name: AI Code Review
on:
  pull_request_target:
    types: [labeled]

jobs:
  ai-review:
    runs-on: fedora
    if: forgejo.event.label.name == 'ai-review-please'
    uses: quality/workflows/.forgejo/workflows/ai-review.yml@main
    with:
      pr: ${{ forgejo.event.pull_request.number }}
    secrets:
      GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}

That uses a reusable workflow that does the same thing. We will update that workflow periodically to bump the version of ai-code-review that is used.

Just a couple of things to note here. I decided to have the LLM review happen only when a pull request is given a special label. LLM reviews are relatively expensive, and also quite verbose; you don't necessarily want one cluttering up the ticket any time a pull request is created or edited, and you may not want to make it possible for someone to charge some LLM usage to your account as often as they like just by creating or editing a pull request.

So, to use this recipe you have to create a label called ai-review-please in your repository. You can do this by going to "Issues", then clicking "Labels", then "New label". Give it whatever color and description you like. Any time you add that label to a PR, the review process will be triggered. Before adding the label to a PR you should probably make sure the PR is well-intentioned and not attempting any kind of prompt injection to get ai-code-review to disclose a secret or mess with the repository.

The other thing is you need an AI provider API key. In this recipe we have a Gemini API key saved as a repository secret called GEMINI_API_KEY. To create repository secrets, go to repository "Settings", then "Actions", then "Secrets", and click "Add secret". In the workflow, we make the repository secret called GEMINI_API_KEY (secrets.GEMINI_API_KEY) available in the container as the environment variable AI_API_KEY; ai-code-review reads it in from there. Gemini is the default LLM provider for ai-code-review. You can also use OpenAI or Anthropic by adding an --ai-provider argument to the ai-code-review call in the workflow (obviously, then, the secret you export as AI_API_KEY must be a valid key for that provider). I'm hoping that in the not-too-distant future, we'll have an LLM model provider in Fedora infra, running open source models, that we can use for this purpose; for now, unfortunately, we have to use the hyperscaler ones.

Finally, as noted in the comment, the workflow is intended to remove the ai-review-please label when it runs (so you don't have to remove it manually, then add it again, if you want another review later), but this does not currently work for pull requests from forks due to a Forgejo bug (because we're using pull_request_target the workflow token should have write permissions even for a fork PR, but it doesn't). If you use it on a fork PR, you'll have to remove the label manually once the workflow has triggered.

You can, of course, change the on block to be the same as the CI recipe if you want to have LLM review run automatically whenever a PR is created or edited - but do make sure whoever's paying the bills for the API key is OK with that, and monitor the repo to make sure nobody starts creating hundreds of PRs to try and blow your budget...and hope/pray nobody manages a successful prompt injection attack. On the whole I'd stick with the label (only repository admins can label PRs, so a non-admin attacker can't apply the label themselves to trigger the review).

A half-assed assessment of open source AI code review tools

Adam Williamson

2025-12-16 20:27

Introduction

Hi there, blog readers! For the last week or so I've been poking into AI code review tools. Yes, this is partly because of the Red Hat "you must do AI things!" policy. But also, to be honest, because they seem to be...actually good now. I set up AI reviews for pull requests to our openQA test repo as an experiment. But especially over the last couple of months, they've got to the point where well over half of the review notes are actually useful, and the writing style isn't so awful I want to stab myself in the eyeballs. So I'd quite like to keep doing them, but in a more open source-y way. So far I've simply been cloning the pull requests to a GitHub mirror of the repo that exists solely to get AI reviews done. That repo has Gemini Code Assist enabled so the PRs are reviewed by Gemini automatically, e.g. here. It's very simple, but entirely closed source, there's no control over it, and Google could take it away at any time.

We're in the middle of migrating Fedora projects from Pagure to our new Forgejo instance, so I decided to try and get some sort of AI review system integrated with Forgejo. And I kinda succeeded! I wrote a Forgejo integration for ai-code-review, a tool I found that was written by another Red Hatter, and managed to set up a proof-of-concept Forgejo Actions workflow using it on a repo I own that's hosted at Codeberg (since Codeberg has public Forgejo Actions runners available; we don't have Actions entirely set up in the Fedora instance yet). Right now it's using Gemini as the model provider just because that was the easiest thing to set up for a PoC, but ai-code-review's design makes the LLM provider easily pluggable, so it's trivial to swap it out. Long term I hope we'll get a Fedora LLM provider set up, serving open source models, and we can make it use that. There's an Ollama backend, and adding an OpenAI API backend should be pretty easy.

Before going any further with that, though, I decided to look around and see if there are other tools out there, and if so, which might be the best one. I poked around a bit and found a few, and wrote up a very half-assed comparative assessment. I figured this might interest others, so I've prettied it up a tiny bit and put it below. I make no claims that this is comprehensive, accurate or fair, please send all complaints to the happyassassin.net HR department! The takeaway is that I'll probably keep working on the ai-code-review approach and also experiment with forking Qodo's archived open-source pr-agent project and see if I can add Forgejo support to it, to compare it against ai-code-review.

If anyone knows of any I missed, please let me know! I briefly looked at RhodeCode but discounted it because it's a whole-ass forge, not just a review tool. ReviewBoard doesn't seem to have any LLM integration as best as I could tell.

The Contenders

ai-code-review

Repo: https://gitlab.com/redhat/edge/ci-cd/ai-code-review
Author: Juanje Ojeda (Red Hat)
Language: Python (typed)
Architecture: Modular
Tests: Yes, LLM-generated, fairly comprehensive unit tests, very limited integration tests
Begun: August 2025
Status: Active
Forges: GitLab, GitHub, local changes (Forgejo supported submitted)
Model providers: Gemini, Anthropic, Ollama
Output: Console or PR/MR comment
Deployment: Local execution, GitLab CI, GitHub Actions (one-shot deployment via container image in CI job)
Prompts: Here

ai-codereview

Repo: Red Hat internal
Author: Tuvya Korol (Red Hat)
Language: Python (untyped)
Architecture: Monolithic
Tests: No
Begun: June 2025
Status: Active
Forges: GitLab, local changes
Model providers: RH-internal Claude, Gemini, Granite
Output: Console or MR comment
Deployment: Local execution, GitLab CI (ad hoc deployment via curl/pip in CI job)
Prompts: Red Hat internal

kodus-ai

Repo: https://github.com/kodustech/kodus-ai
Author: Kodus
Language: Typescript
Architecture: Modular
Tests: Yes, handwritten, unit and integration, not sure of coverage
Begun: April 2025
Status: Active
Forges: GitHub, GitLab, BitBucket
Model providers: OpenAI, Gemini, Anthropic, Novita, OpenRouter, any OpenAI-compatible
Output: MR/PR comment and/or review (seems to depend on configuration)
Deployment: Local via yarn (indicated as for development only), as containerized webapp (for prod) with own installer - looks complex
Prompts: Here

pr-agent

Repo: https://github.com/qodo-ai/pr-agent
Author: Qodo (formerly Codium)
Language: Python (untyped)
Architecture: Modular
Tests: Yes, handwritten, unit and integration, somewhat primitive, many commented out, 24% coverage (per codecov)
Begun: July 2023
Status: Archived (Nov 2025)
Forges: GitHub, GitLab, Gitea, Gerrit, BitBucket, AWS CodeCommit, Azure DevOps, local changes
Model providers: Any OpenAI-compatible (looks like some special handling for Azure), LiteLLM
Output: MR/PR comment and/or review, has interactive features
Deployment: Local execution or Forge CI. There's a custom GitHub action but it may be abandoned. Installable via pip, should be trivial to containerize for simple one-shot CI job deployment
Prompts: Here

ai-pr-reviewer

Repo: https://github.com/coderabbitai/ai-pr-reviewer
Author: CodeRabbit
Language: Typescript
Architecture: Modular
Tests: Barely any
Begun: Feb 2023
Status: Archived (Nov 2023)
Forges: GitHub
Model providers: OpenAI
Output: PR review/comment
Deployment: GitHub Action (no longer maintained). No generic or local deployment documented
Prompts: Here

Conclusions

ai-code-review (Juanje) and pr-agent (Qodo/Codium) seem the best options.

Of the RH-developed, greenfield projects, ai-code-review is more featureful and better architected than ai-codereview, and not tied to an RH-internal model provider.

Of the existing public projects, ai-pr-reviewer (CodeRabbit) was very tied to GitHub, has no documented standalone deployment ability, and was archived fairly early in development. Plus it's in TypeScript. Kodus is actively developed, but similarly is in TypeScript, deployment looks complex, and from what I've seen I don't love its review style. Hard to say why but the project overall gives me a sloppy vibe. pr-agent (Qodo) had the longest development history and seems the most mature and capable at the point where it was abandoned (well, they actually seem to have done a heel turn and gone closed source / SaaS). It has a documented standalone deployment process which looks relatively simple and subject to integration into generic CI workflows.

Messing with "AI" (it's not great), and Strix Point (Ryzen AI 365) updates (it's getting better!)

Adam Williamson

2025-05-20 16:50

First up, since it's short: a quick happy note on Strix Point support in Linux. I blogged about this earlier, with my HP Omnibook 14 Ultra laptop with Ryzen AI 9 365 CPU, and it wasn't going great. I figured out some workarounds, but in fact the video hang thing was still happening at that point, despite all the cargo-cult-y command line args. But as of recent 6.15 RCs, it has been more or less fixed! I can still pretty reliably cause one of these "VCN ring timeout" issues just by playing videos, but now the driver reliably recovers from them; my external display goes blank for a few seconds, then comes back and works as normal. Apparently that should also now be fixed, which is great news. I want to give kudos to the awesome AMD folks working on all these problems, they're doing a great job.

At one point during the 6.15 series suspend/resume broke, but it's been fixed. So as of now, support is looking pretty good for my use cases. I haven't tested lately whether Thunderbolt docking station issues have been fixed, as the cheap USB 3 hub is still working fine for what I need.

OK, onto the AI bit. Yeah, it's another Red Hat person posting about AI! If you're wondering why: it's because we have all been told to Do Something With AI And Write About It. So now you know.

I first tried doing something really-actually-useful-for-work with AI a couple of weeks ago. As part of my work on maintaining openQA for Fedora (the packages and our instances of it), I review the upstream git commit logs. I usually try to update the package at least every few months so this isn't overwhelming, but lately I let it go for nearly a year, so I had a year of openQA and os-autoinst messages to look through, which isn't fun. After spending three days or so going through the openQA logs manually, I figured I'd see how AI did at the same job.

I used Gemini, as we have a corporate account with it. I pasted the entire log into Gemini 2.0 Flash and asked it to summarize it for me from the point of view of a package maintainer. It started out okay, then seized up after handling about ten messages, blurping some clearly-intermediate output on a big batch of commits and stopping entirely.

So I tried 2.5 Pro instead, and it actually did a pretty decent job. It boiled things down a long way into five or six appropriate topic areas, with a pretty decent summary of each. It pretty much covered the appropriate things. I then asked it to re-summarize from the point of view of a system administrator, and again it did really pretty well, highlighting the appropriate areas of change that a sysadmin would be interested in. It wasn't 100% perfect, but then, my Puny Human Brain wasn't either. The AI summary probably had more useful detail than my brain had retained over three days of reading.

So for os-autoinst, I didn't do the puny human brain reading. I got Gemini to do the same two summaries for me, and did the package update and deployment based on those. It flagged up appropriate things for me to look at in the package update and test deployment, and it seems like it did fine, since the package built and the deployment is mostly working. For this purpose, it definitely seems useful.

But when it comes to code...seems like a bit of a different story. At a couple of points in the last couple of weeks I was feeling a bit mentally tired, and decided for a break it'd be fun to throw the thing I was working on at AI and see how it would cope. tl;dr summary: not terrible but not great. Details follow!

The first thing I did was throw this openQA template loading issue at it. This was one of those things that grew and grew, and I eventually spent a week or so on a pretty substantial PR to fix all the stuff I found. But at the time, I was focusing on two issues in the previous state of openqa-dump-templates which meant it would almost never dump any JobTemplates.

One was fairly obvious: the condition checked in line 220 is only ever going to be true if --full or --product was passed. $options{product}->{product_key($r->{product})} is only set in line 213 or as the value of --product, and that block is only hit if $options{full} is truth-y.

The other was much more subtle. The other check that can short-circuit JobTemplates dumping - line 219 - looks like it would only kick in if --group is passed, right? The very first condition is if $options{group}, after all. But in fact, a feature called autovivification causes $options{group} to be defined by a keys call earlier in the script even if --group was not passed at the command line. So due to this check we never dump any JobTemplates with group names (in practice, this is almost all of them) unless --group was passed. A lot of languages have something like autovivification, but the fact that it kicks in on a keys call like this is a pretty odd perl quirk. It took me a few hours to work this one out with my puny human brain.

I figured I'd see if Gemini (2.5 Pro again) could find the same issues. So I dumped the entire script into Gemini and asked it in what cases it would dump job templates. It appeared to analyze what the script does, but its initial conclusion mostly missed the problems:

"The script is designed to dump JobTemplates and will do so successfully if they exist on the server and are not completely filtered out. The most reliable ways to get a potentially non-empty JobTemplates array are to run the script with no arguments, specify JobTemplates explicitly, use --group X --full, or use --product Y JobTemplates. Using only --group X might lead to errors or an empty array due to the interaction with the product filter."

It did sort of notice the problem with the line 220 filter - that's what it means by "the interaction with the product filter" - but seemed to think it would only be an issue if --group was passed, which is not the case.

So I gave it a hint: "This is wrong. JobTemplates output is always empty if no arguments are passed. Can you see why?" It came back with an answer that solely blamed the product filter, which is closer. In retrospect I can see to some extent why it had trouble here: it correctly noticed that the product filter should actually cause the script to crash, not just return empty JobTemplates, because $options{product} is usually going to be undefined at this point. (To be fair, my puny human brain didn't notice this wrinkle at first). That was clearly baffling it a bit, because it hadn't noticed the autovivification problem which means the script never actually got to this filter line at all.

I wanted to try and get it to notice the autovivification problem, so I kept hinting it. I went through four rounds of hinting, giving it progressively more specific information on the nature of the problem and its location, and correcting it when it came up with wrong answers, but it never quite got there. At first it flat out insisted I was wrong, and the product filter was the only issue. Later it came up with a plausible-but-wrong explanation based on option parsing, presented with complete confidence. Eventually, when I pointed it to the exact block where the autovivification happens, it considered four possibilities:

Is use Mojo::Base -strict less strict than use strict specifically for hash dereferencing?
Is there a global error handler ($SIG{DIE}) active that catches the error, sets $options{group} somehow, and continues?
Autovivification Anomaly?
Version Mismatch / Local Modification?

...but it discarded them all. The full text for "autovivification anomaly" was:

"Autovivification Anomaly? Could keys %{undef} somehow autovivify $options{group} into an empty hash reference {} before crashing or evaluating? Standard autovivification works on assignment to non-existent nested keys, not usually on reads like keys. It's not the standard behavior."

So it clearly didn't know that, yes, perl does autovivify "on reads like keys". So with a lot of hinting it almost got there, but never quite did. This wasn't a "realistic" scenario, though - I could only give it specific hints because I'd already worked out the problem with Human Brain Mk. I. If I hadn't already known what the more difficult problem was, Gemini clearly would not have been able to help me find it. It may have helped me find the easier problem, but then, I found that one myself in about two minutes, so that's not a huge win.

Before the weekend, I gave it another chance. After a week staring at this templates stuff I came out with some ideas for improving the intermediate loader we use in Fedora, which lets us store the templates in a more maintainable and efficient layout than the upstream loader expects, and transforms them into the upstream format before passing them to the upstream loader. One of these was allowing the "profiles" (an intermediate-loader-only concept) to be grouped. Then I realized we could save some more lines by allowing those groups to nest. So I wrote a trivial group recurser to allow for this. That was a pretty straightforward task, so I thought hey, could I have done it with AI?

So I pasted the script into Gemini (2.5 Pro again) and asked it: "Propose a change to make nested ProfileGroups possible. It should be possible for ProfileGroups keys to be the names of other ProfileGroups. The generate_job_templates function should handle these by recursion."

It came back with a modified version of the script with a superficially-plausible recursion handler...and a whole bunch of other changes. It mentioned some of these changes in the explanation it gave, with a heading "Minor Refinements and Robustness", but not all of them. I had not asked for any of these changes. Almost all of them were bikeshedding. Quite a lot were 'correctness' checks that are not necessary because the input data is validated through a JSON schema before we operate on it, which allows us to make various safe assumptions about what data will be present and as what types.

So I asked it: "Please re-do this without the "minor refinements and robustness". Do not change anything unless it is necessary to achieve the goal of nested ProfileGroups. Do not change existing variable names. Do not change existing functions other than generate_job_templates." It replied "Okay, I've revised the script focusing strictly on the nested ProfileGroups functionality and reverting other changes to keep the script as close to your original structure as possible, outside of the necessary modifications for this feature...Other functions have been reverted to their state before my "minor refinements and robustness" changes, unless a modification was an unavoidable consequence of the primary goal. Variable names in existing functions have been kept as they were.", with another version of the script attached. But this was, well, a lie. It had in fact left some code alterations in place, and it included a lot of passive-aggressive comments in place of code changes, like "# Original exit", "# Original script modified item in place", "# Original access"...

I prompted it again with "You still modified unrelated code by adding comments. Please do this again and do not add those comments. Please respect the instruction to only make changes to existing code in the generate_job_templates function." It came back and said it had done it. It had not - it still included many of the passive-aggressive comments ("# Original: no check if 'out' is empty"), and had removed some of the existing code comments.

I tried one final time to prompt it: "No, it does not. For instance, you added the comment "# Original: no check if 'out' is empty", and removed my existing comment "# now produce the output in upstream-compatible format". These are the changes I do not want. I want a minimal change that can easily be reviewed as a patch or pull request without distracting and irrelevant changes to other areas of the script." Once again it came back with a failure - quite an interesting one. In most cases it removed the passive-aggressive comments...but it literally removed just the comment string. So if the original line was foo with no trailing spaces, and it had changed it to foo # passive aggressive comment, it was now foo with a trailing space - Gemini literally removed only the string # passive aggressive comment compared to its previous version! And it did not restore the pre-existing code comments it had removed in previous rounds. I found it kinda interesting that it literally iterates in this way, instead of going back to the original script and trying again.

After three rounds I gave up on getting it to produce a minimal change, as any human coder would (at least after the very first time in their life they got chewed out for a PR with unrelated changes). Let's ignore that large problem and look at the recursion handler it produced:

def _resolve_recursive_profiles(group_name_to_resolve, all_profile_groups_data, current_base_priority, visited_in_path):
    """
    Recursively resolves profiles from a profile group, handling nested groups.
    This is a new helper function.

    Args:
        group_name_to_resolve (str): The name of the profile group to resolve.
        all_profile_groups_data (dict): The main dictionary containing all profile group definitions.
        current_base_priority (int): The accumulated priority from parent groups or the initial
                                     priority from the TestSuite for this resolution path.
        visited_in_path (set): A set of profile group names already visited in the current
                               recursion path to detect circular dependencies.

    Returns:
        dict: A dictionary of {profile_name: final_priority}.
    """
    if group_name_to_resolve not in all_profile_groups_data:
        # This print is part of the new logic for handling undefined group references
        print(f"Warning: ProfileGroup '{group_name_to_resolve}' referenced but not defined. Skipping.")
        return {}
    if group_name_to_resolve in visited_in_path:
        # This print is part of the new logic for handling circular dependencies
        print(f"Warning: Circular dependency detected in ProfileGroups involving '{group_name_to_resolve}'. Skipping to break loop.")
        return {}

    visited_in_path.add(group_name_to_resolve)
    resolved_profiles_for_this_group = {}
    current_group_definition = all_profile_groups_data[group_name_to_resolve]

    for item_key, item_priority_offset in current_group_definition.items():
        if item_key in all_profile_groups_data:  # It's a nested profile group
            nested_group_base_priority = current_base_priority + item_priority_offset
            nested_profiles = _resolve_recursive_profiles(
                item_key,
                all_profile_groups_data,
                nested_group_base_priority,
                visited_in_path.copy() 
            )
            resolved_profiles_for_this_group.update(nested_profiles)
        else:  # It's a direct profile
            final_profile_priority = current_base_priority + item_priority_offset
            resolved_profiles_for_this_group[item_key] = final_profile_priority
    return resolved_profiles_for_this_group

For comparison, here's my version:

def recurse_pgroup(pgroup, baseprio, pgroups, seen):
    """Recursion handler allowing nested profile groups. Takes the
    top-level profile group name and priority, the full ProfileGroups
    dict, and a set for infinite recursion checking.
    """
    profiles = {}
    for (item, prio) in pgroups[pgroup].items():
        if item in seen:
            sys.exit(f"Infinite recursion between profile groups {pgroup} and {item}")
        seen.add(item)
        if item in pgroups:
            profiles.update(recurse_pgroup(item, prio+baseprio, pgroups, seen))
        else:
            profiles[item] = prio+baseprio
    return profiles

So, well. Gemini's version is...not wrong, I don't think. I didn't bother running/testing it, but just eyeballing it, it looks like it works. But it is extremely verbose (and that's me saying that!) It uses very long variable names which are not in line with the general variable naming approach the rest of the script uses; because these are very long it has to wrap its self-call across multiple lines, which makes the flow less obvious. It uses trailing comments (I dislike these, so there isn't a single one in the existing script). It unnecessarily assigns variables which are used only once (final_profile_priority and current_group_definition, for e.g.) The overall effect is kind of stultifying to read. An entire codebase written in this style would be a nightmare to work on. The long, public interface-y docstring is arguably fine because I didn't give it any specific instructions, but OTOH, the rest of the script is clearly written in a pretty private, concise style which should have clued it in that this wasn't desired, even if this was a 'public' function.

I could try and fight it even harder to get it not to change unrelated things. I could give it specific instructions about variable naming and how I like comments and how I want it to write docstrings. But...that's more work than just writing the damn function myself, and if you're going to call this thing "AI", it should be reasonable for me to expect it to work these things out without explicit instructions, like I'd expect a human coder to do.

So overall my conclusion from these experiments is: I can see value in using "AI" as a summarizer of long git changelogs. I'll probably keep using it for that kind of task. I didn't yet find any value in trying to use it for a pretty typical "why isn't this working?" process of the kind I do all day long, or for what should have been a pretty trivial "write a simple, common enhancement to this script" operation.

A Whistle Blowing in a Pikmin Face, Forever: On The Moral Depravity of Captain Olimar, Gaming History's Greatest Monster

Adam Williamson

2025-02-19 07:47

Recently, I have been playing the classic Nintendo video game series "Pikmin", and have reached two conclusions. Firstly, they are excellent and very enjoyable video games. Secondly, they form the bleakest fictional dystopia yet conceived by man. The Hunger Games? Amateurish. Never Let Me Go? Takes a while to really kick in. The Handmaid's Tale? Strong competition, but Pikmin edges it. Allow me to elaborate.

I will focus on the first two games in the series. For anyone not familiar with them, here is a summary of the plot. In the first game, Captain Olimar crash lands his spacecraft on a "mysterious planet" (Earth), spreading thirty parts of it across the landscape. On the planet's surface, he discovers creatures he names "Pikmin", which seem to follow his instructions, and uses them to recover the lost ship parts, so he can escape. He also discovers various other things, through which we find that he is very small by human standards. In the second game, we find out that Olimar works for a delivery company, which has gone broke in his absence. On discovering that a bottle cap Olimar brought back with him is very valuable, his boss sends Olimar back to Earth - with a subordinate along this time - to collect more 'artifacts' to pay off the company's debt.

Now, even with this bald summary you may find the ethics of this plot a bit questionable. It gets worse! Let's dig deeper.

Both games occasionally imply - or have characters outright state - that the Pikmin cannot thrive without Olimar's "guidance". It does seem to be the case that, under Olimar's tutelage, their numbers and capabilities increase substantially. However, the ends to which these capabilities are put seem shockingly exploitative, and - initially - unexamined. Olimar never does anything purely for the benefit of the Pikmin. Everything he does is ultimately for the purpose of recovering the ship parts or "artifacts", to fulfil his own goals. He takes a superficial interest in the "strange planet" on which he finds himself, but it seems to be mostly for his own benefit and amusement; he performs no serious, disinterested scientific research, and makes no plans for further study once he leaves. His methods are, franky, brutal. He directs Pikmin to their tasks wordlessly and without affection, by blowing a whistle and bodily throwing them places (small as he is by our standards, he is much larger than the Pikmin). And once he has completed his task, he simply leaves, with no apparent offer of gratitude, companionship or compensation for the Pikmin who have been so crucial to his escape. Yet the game appears to portray him sympathetically, and by making him the only character the player can play, appears to have us condone his behaviour.

This is bad enough! Yet we have not arrived at the full, horrifying truth. Let's go even further.

Let us consider the ecology of Pikmin. Olimar does not discover Pikmin first; he discovers the organisms in which they live, which he names "onions" (he has no interest in what the Pikmin call them). These "onions" serve as both safe living quarters for Pikmin and as their reproductive mechanism. Pikmin can drag organic matter to the onion, where it is absorbed, and the onion sheds seeds that sprout into new Pikmin. A particularly vital source of this matter is a flower called the "pellet posy", which produces "pellets" that are easily handled by Pikmin and produce many new Pikmin seeds relative to their size and weight.

When Olimar arrives, the onions, several Pikmin, the pellet posies, and several predator species which can easily defeat individual Pikmin are already present. The game appears to suggest to us that these Pikmin are, somehow, barely clinging to survival, and without Olimar's intervention would presumably soon have died out. Yet how is this possible? How could this ecosystem of pellet posies and onions have come into existence? The pellet posies flower daily, and are easily harvested by a small number of Pikmin; the rate of flowering and the number of Pikmin they produce should result in plentiful Pikmin reproduction with no outside intervention at all. Add to this the ability of Pikmin to harvest the corpses of their predator species - which will of course die naturally or be killed by their own predators, even if the Pikmin cannot organize to kill them themselves - and we should probably be thankful the predator species do exist, or else our world would be entirely overrun by them.

The games later indicate that the Pikmin are perfectly capable of harvesting pellet posies and predator corpses without Olimar's direct instructions - if Pikmin are left to their own devices near to a pellet or corpse, they will transport it back to their onion. Carefully considered, what we see is a robust and healthy ecosystem. So how are we to believe the game when it appears to tell us about the Pikmins' tragic plight on Olimar's arrival?

Now that we're paying attention, here is the shocking revelation: we must not. What we see is not the truth; it is a twisted lie originating in the diseased mind of gaming history's greatest monster, Captain Olimar.

On careful examination, there is no reliable, independent witness to the events of either game. We directly encounter nobody in the first game but Olimar, the Pikmin, and the various other species with which he interacts, none of which are given a voice. No other member of Olimar's race is present, nor is any outside observer. The game's third person perspective is nothing but a clever narrative ruse. Make no mistake - this is Olimar's story as told by Olimar, and it is not a pretty one. In the second game, the tragic Louie is present, but he is entirely under Olimar's spell.

No, what appears to be the game's somewhat half-hearted justification for Olimar's behaviour is merely the tortured scream of what remains of this black-hearted villain's conscience, attempting to retroactively whitewash his crimes. This is the classic excuse of the rapacious imperialist - "these natives would be nothing without me; they should be grateful I am here to lead them". There was no small band of struggling Pikmin when Olimar arrived. Either he found a few individuals from a much larger group who merely happened to be in the area where he crashed, or he is simply lying. Either way, his subsequent actions lay bare the true, horrific depths of his vicious depravity.

As my own little narrative device, I merely passed lightly over these actions above, but let us now look into them in full, excruciating detail. What does "Captain" [1] Olimar do immediately upon encountering Pikmin for the first time? He discovers that when he blows his whistle, they follow him unquestioningly. He - allow me some license, here - he grins. We do not see him grin, but unquestionably he grins. It is not a pretty grin. From this moment on, his whistle will rarely leave his mouth except in his sleep.

He continues his experiments. He discovers, rapidly and roughly, that these poor, trusting/terrified (we are never quite sure which - think how big this creature, with its diabolical whistle, is to them!) Pikmin will follow him even into extremely dangerous situations of no benefit to them. They will follow him even to their deaths. They will allow him to hurl them indiscriminately around the landscape, even - are you sensing a theme? - to their deaths. Does Olimar repent of his experiments and come to treat this trust as the rare, beautiful, and morally dangerous wonder that it is? Does he take some time to meditate and to consider his ethical position? Does he put his predicament - that he must, at least in his telling, recover the ship parts within 30 days to ensure his own survival - to the Pikmin, and beg of them their assistance? Does he look elsewhere for possible sources of help, or means of survival?

He does not. He grasps his whistle, and he sets about establishing a fascistic, imperialist nightmare.

He works the Pikmin unforgivingly, from sunrise to sunset each day. He even, chillingly, has his shipboard computer declare them 'lazy' for not doing so of their own volition; we are no longer so naive as to believe these are really the computer's words, it is merely a mouthpiece for what can now only be termed "Olimarism". He treats them as an expendable resource, forcing them to fight deadly enemies and conquer deathly hazardous terrain, all for nobody's benefit but his own. He alone determines when they wake, when they sleep, where they go, what they do. In the classic fascist style, he even asserts control of their reproduction and demands ceaseless population increase, all for his own survival and greater glory. The climax of the first game sees the Pikmin pitted against a terrifying foe, finally defeating it after dying in droves - and all merely in order to retrieve Olimar's own piggy bank! A more brutally clear metaphor would be hard to conceive.

Once he has retrieved all his possessions, he immediately returns to his home planet, leaving a horrific trail of death and destruction in his wake. The brutalized Pikmin, trapped deep in their fear and their Stockholm syndrome, line up to see him off. Does he thank them? Does he offer - in consistency with his alleged belief that they will not survive without him - to take them with him? Does he, in fact, speak a single word to them? He does not. This creature has no soul, only a screaming void within.

In the second game, he and his soon-thoroughly-corrupted lackey force the Pikmin to explore deep caves which they would never enter alone, and in which they find nothing but more death, destruction, and resources for Olimar to take away to his fatherland. Once again using the transparent device of his "talking, sentient shipboard computer", Olimar's shrivelled husk of a conscience pathetically attempts to convince us, or himself, or both, that the Pikmin "want" him to hurl them into flowers which eat them and shed Pikmin seeds of a different color, which of course he does with abandon. At this point there is little we can do but cry, and wait for the nightmare to end.

Properly considered, this is no cosy, cutesy creature game series. This is not even a standard-issue video game series with a plot that's a bit problematic if you think about it for a minute. This is such a masterpiece that we must admire it, but so dark and twisted that we must perhaps implore the Japanese authorities belatedly to round up the minds behind it for a cup of tea, a wellness check and possibly some kind of long-term psychological evaluation and monitoring. This is the hardest-hitting, most uncompromising polemic against expansionist imperialism you will find outside of any given issue of the Observer.

[1] Of course, the title alone is a dead giveaway. There is nobody else on board Olimar's ship for him to be the "captain" of. Giving oneself a ludicrous, overinflated title with military overtones? Classic evil dictator behaviour.

New laptop and Silverblue update

Adam Williamson

2025-01-14 22:25

Figured I'd post an update on how things are going with the new laptop (HP Omnibook Ultra 14, AMD Ryzen AI 9 365 "Strix Point", for the searchers) and with Silverblue.

I managed to work around the hub issue by swapping out the fancy $300 Thunderbolt hub for a $40 USB-C hub off Amazon. This comes with limitations - you're only going to get a single 4k 60Hz external display, and limited bandwidth for anything else - but it's sufficient for my needs, and makes me regret buying the fancy hub in the first place. It seems to work 100% reliably on startup, reboot and across suspend/resume. There's still clearly something wrong with Thunderbolt handling in the kernel, but it's not my problem any more.

The poor performance of some sites in Firefox turned out to be tied to the hanging problem - I'd disabled graphics acceleration in Firefox, which helped with the hanging, but was causing the appalling performance on Google sites and others. I've now cargo-culted a set of kernel args - amdgpu.dcdebugmask=0x800 amdgpu.lockup_timeout=100000 drm.vblankoffdelay=0 - which seem to be helping; I turned graphics acceleration back on in Firefox and it hasn't started hanging again. At least, I haven't had random hangs for the last few days, and this morning I played a video on youtube and the system has not hung since then. I've no idea how bad they are for battery life, but hey, they seem to be keeping things stable. So, the system is pretty workable at this point. I've been using it full-time, haven't had to go back to the old one.

I'm also feeling better about Silverblue as a main OS this time. A lot of things seem to have got better. The toolbox container experience is pretty smooth now. I managed to get adb working inside a container by putting these udev rules in /etc/udev/rules.d. It seems like I have to kill and re-start the adb server any time the phone disconnects or reboots - usually adb would keep seeing the phone just fine across those events - but it's a minor inconvenience. I had to print something yesterday, was worried for a moment that I'd have to figure out how to get hp-setup to do its thing, but then...Silverblue saw my ancient HP printer on the network, let me print to it, and it worked, all without any manual setup at all. It seems to be working over IPP, but I'm a bit surprised, as the printer is from 2010 or 2011 and I don't think it worked before. But I'm not complaining!

I haven't had any real issues with app availability so far. All the desktop apps I need to use are available as flatpaks, and the toolbox container handles CLI stuff. I'm running Firefox (baked-in version), Evolution, gedit, ptyxis (built-in), liferea, nheko, slack and vesktop (for discord) without any trouble. LibreOffice and GIMP flatpaks also work fine. Everything's really been pretty smooth.

I do have a couple of tweaks in my bashrc (I put them in a file in ~/.bashrc.d, which is a neat invention) that other Atomic users might find useful...

if [ -n "$container" ]
then
  alias gedit="flatpak-spawn --host /var/lib/flatpak/exports/bin/org.gnome.gedit"
  alias xdg-open=flatpak-xdg-open
else
  alias gedit=/var/lib/flatpak/exports/bin/org.gnome.gedit
fi

the gedit aliases let me do gedit somefile either inside or outside a container, and the file just opens in my existing gedit instance. Can't really live without that. You can adapt it for anything that's a flatpak app on the host. The xdg-open alias within containers similar makes xdg-open somefile within the container do the same as it would outside the container.

So it's still early days, but I'm optimistic I'll keep this setup this time. I might try rebasing to the bootc build soon.

New laptop experience (Fedora on HP Omnibook Ultra 14 - Ryzen AI 365, "Strix Point")

Adam Williamson

2025-01-08 00:39

New year, new blog post! Fedora's going great...41 came out and seems to be getting good reviews, there's exciting stuff going on with atomic/bootc, we're getting a new forge, it's an exciting time to be alive...

Personally I've spent a large chunk of the last few weeks bashing my head against two awkward bugs - one kernel bug, one systemd bug. It's a bit of a slog, but hey. Also now working up to our next big openQA project, extending test coverage for the new installer.

But also! I bought myself a new laptop. For the last couple of years I've been using a Dell XPS 13 9315, the Alder Lake generation. I've been using various generations of XPS 13 ever since Sony stopped making laptops (previously I used a 2010 Vaio Z) - I always found it to be the best thin-and-light design, and this one was definitely that. But over time it felt really underpowered. Some of this is the fault of modern apps. I have to run a dumb amount of modern chat apps, and while they're much nicer than IRC, they sure use a lot more resources than hexchat. Of course I have a browser with about 50 tabs open at all times, Evolution uses quite a lot of memory for my email setup for some reason, and I have to run VMs quite often for my work obviously. Put all that together, and...I was often running out of RAM despite having 16GB, which is pretty ridiculous. But even aside from that, you could tell the CPU was just struggling with everything. Just being in a video chat was hard work for it (if I switched apps too much while in a meeting, my audio would start chopping up for others on the call). Running more than two VMs tend to hang the system irretrievably. Just normal use often caused the fan to spin up pretty high. And the battery life wasn't great. It got better with kernel updates over time, but still only 3-4 hours probably.

So I figured I'd throw some hardware at the problem. I've been following all the chipset releases over the last couple of years, and decided I wanted to get something with AMD's latest silicon, codenamed "Strix Point", the Ryzen AI 3xx chips. They're not massively higher-performing than the previous gen, but the battery life seems to be improved, and they have somewhat better GPUs. That pretty much brought it down to the Asus Vivobook S 14, HP Omnibook Ultra 14, and Lenovo T14S gen 6 AMD. The Asus is stuck with 24GB of RAM max and I'm not a huge Asus fan in general, and the HP came in like $600 cheaper than the Thinkpad with equivalent specs, and had a 3 year warranty included. So I went with the HP, with 1TB of storage and 32GB of RAM.

I really like the system as a whole. It's heavier than the XPS 13, obviously, the bezels are a little bigger, and the screen is glossier. But the screen is pretty nice, I like the keyboard, and the overall build quality feels pretty solid. The trackpad seems fine.

As for running Fedora (and Linux in general) on it...well, it's almost great. Everything more or less works out of the box, except the fingerprint reader. I don't care about that because I set up the reader on the XPS 13 and kinda hated it; it's nowhere near as nice as a fingerprint reader on a phone. Even if it worked on the HP I'd leave it off. The performance is fantastic (except that Google office sites perform weirdly terribly on Firefox, haven't tried on Chromium yet).

But...after using it for a while, the issues become apparent. The first one I hit is that the system seems to hang pretty reproducibly playing video in browsers. This seems to be affecting pretty much everyone with a Strix Point system, and the only 'fix' is to turn off hardware video acceleration in the browser, which isn't really great (it means playing video will use the CPU, hurting battery life and performance). Then I found even with that workaround applied the system would hang occasionally. Looking at the list of Strix Point issues on the AMD issue tracker, I found a few that recommended kernel parameters to disable various features of the GPU to work around this; I'm running with amdgpu.dcdebugmask=0x800, which disables idle power states for the GPU, which probably hurts battery life pretty bad. Haven't had a hang with that yet, but we'll see. But aside from that, I'm also having issues with docks. I have a Caldigit TS3+, which was probably overpowered for what I really need, but worked great with the XPS 13. I have a keyboard, camera, headset, ethernet and monitor connected to it. With the HP, I find that at encryption passphrase entry during boot (so, in the initramfs) the keyboard works fine, but once I reach the OS proper, only the monitor works. Nothing else attached to the dock works at all. A couple of times, suspending the system and resuming it seemed to make it start working - but then I tried that a couple more times and it didn't work. I have another Caldigit dock in the basement; tried that, same deal. Then I tried my cheap no-name travel hub (which just has power pass-through, an HDMI port, and one USB-A port on it) with a USB-A hub, and...at first it worked fine! But then I suspended and resumed and the camera and headset stopped working. Keyboard still worked. Sigh. I've ordered a mid-range hub with HDMI, ethernet, a card reader and four USB-A ports on it off Amazon, so I won't need the USB-A hub daisychain any more...I'm hoping that'll work well enough. If not, it's a bit awkward.

So, so far it's a bit of a frustrating experience. It could clearly be a fantastic Linux laptop, but it isn't quite one yet. I'd probably recommend holding off for a bit while the upstream devs (hopefully) shake out all the bugs...

AdamW's Debugging Adventures: Inadvertent Extreme Optimization

Adam Williamson

2024-11-19 07:10

It's time for that rarest of events: a blog post! And it's another debugging adventure. Settle in, folks!

Recently I got interested in improving the time it takes to do a full compose of Fedora. This is when we start from just the packages and a few other inputs (basically image recipes and package groups), and produce a set of repositories and boot trees and images and all that stuff. For a long time this took somewhere between 5 and 10 hours. Recently we've managed to get it down to 3-4, then I figured out a change which has got it under 3 hours.

After that I re-analyzed the process and figured out that the biggest remaining point to attack is something called the 'pkgset' phase, which happens near the start of the process, not in parallel with anything else, and takes 35 minutes or so. So I started digging into that to see if it can be improved.

I fairly quickly found that it spends about 20 minutes in one relatively small codepath. It's created one giant package set (this is just a concept in memory at the time, it gets turned into an actual repo later) with every package in the compose in it. During this 20 minutes, it creates subsets of that package set per architecture, with only the packages relevant to that architecture in it (so packages for that arch, plus noarch packages, plus source packages, plus 'compatible' arches, like including i686 for x86_64).

I poked about at that code a bit and decided I could maybe make it a bit more efficient. The current version works by creating each arch subset one at a time by looping over the big global set. Because every arch includes noarch and src packages, it winds up looping over the noarch and src lists once per arch, which seemed inefficient. So I went ahead and rewrote it to create them all at once, to try and reduce the repeated looping.

Today I was testing that out, which unfortunately has to be done more or less 'in production', so if you like you can watch me here, where you'll see composes appearing and disappearing every fifteen minutes or so. At first of course my change didn't work at all because I'd made the usual few dumb mistakes with wrong variable names and stuff. After fixing all that up, I timed it, and it turned out about 7 minutes faster. Not earth shattering, but hey.

So I started checking it was accurate (i.e. created the same package sets as the old code). It turned out it wasn't quite (a subtle bug with noarch package exclusions). While fixing that, I ran across some lines in the code that had bugged me since the first time I started looking at it:

                if i.file_path in self.file_cache:
                    # TODO: test if it really works
                    continue

These were extra suspicious to me because, not much later, they're followed by this:

                self.file_cache.file_cache[i.file_path] = i

that is, we check if the thing is in self.file_cache and move on if it is, but if it's not, we add it to self.file_cache.file_cache? That didn't look right at all. But up till now I'd left it alone, because hey, it had been this way for years, right? Must be OK. Well, this afternoon, in passing, I thought "eh, let's try changing it".

Then things got weird.

I was having trouble getting the compose process to actually run exactly as it does in production, but once I did, I was getting what seemed like really bizarre results. The original code was taking 22 minutes in my tests. My earlier test of my branch had taken about 14 minutes. Now it was taking three seconds.

I thought, this can't possibly be right! So I spent a few hours running and re-running the tests, adding debug lines, trying to figure out how (surely) I had completely broken it and it was just bypassing the whole block, or something.

Then I thought...what if I go back to the original code, but change the cache thing?

So I went back to unmodified pungi code, commented out those three lines, ran a compose...and it took three seconds. Tried again with the check corrected to self.file_cache.file_cache instead of self.file_cache...three seconds.

I repeated this enough times that it must be true, but it still bugged me. So I just spent a while digging into it, and I think I know why. These file caches are kobo.pkgset.FileCache instances; see the source code here. So, what's the difference between foo in self.file_cache and foo in self.file_cache.file_cache? Well, a FileCache instance's own file_cache is a dict. FileCache instances also implement __iter__, returning iter(self.file_cache). I think this is why foo in self.file_cache works at all - it actually does do the right thing. But the key is, I think, that it does it inefficiently.

Python's preferred way to do foo in bar is to call bar.__contains__(foo). If that doesn't work, it falls back on iterating over bar until it either hits foo or runs out of iterations. If bar doesn't support iteration it just raises an exception.

Python dictionaries have a very efficient implementation of __contains__. So when we do foo in self.file_cache.file_cache, we hit that efficient algorithm. But FileCache does not implement __contains__, so when we do foo in self.file_cache, we fall back to iteration and wind up using that iterator over the dictionary's keys. This works, but is massively less efficient than the dictionary's __contains__ method would be. And because these package sets are absolutely fracking huge, that makes a very significant difference in the end (because we hit the cache check a huge number of times, and every time it has to iterate over a huge number of dict keys).

So...here's the pull request.

Turns out I could have saved the day and a half it took me to get my rewrite correct. And if anyone had ever got around to TODOing the TODO, we could've saved about 20 minutes out of every Fedora compose for the last nine years...

DevConf.CZ 2023, Rawhide update test gating, ELN testing and more!

Adam Williamson

2023-06-20 11:17

I'm in Brno, working from the office for a few days after the end of DevConf.CZ. It was a great conference, good to see people and feel some positive energy after all the stuff with RH layoffs and so on. It was really well attended, and there were a lot of useful talks. I presented on the current state of openQA and Fedora CI, with Miroslav Vadkerti kindly covering the Fedora CI stuff (thanks to Miro for that). The segmented talk video hasn't been updated yet, but you can watch it from the recorded live stream starting here (at 6:04:32). I think it went pretty well, I'm much happier with this latest version of the talk than the versions I did at DevConf.US and LinuxCon (sorry to anyone who saw those ones!)

The talk by Aoife Moloney and Michal Konecny on how the CPE team (which handles Fedora infra and apps) has started doing organized pre-scoping for projects instead of just diving in was really interesting and informative. The anaconda meetup wound up being just the anaconda team and myself and Sumantro from QA, but it was very useful as we were able to talk about the plans for moving forward with the new anaconda webUI and how we can contribute testing for that - look out for Test Weeks coming soon. Davide Cavalca's talk on Fedora ELN usage at Meta was great, and inspired me to work on stuff (more on that later).

There were a lot of random conversations as always - thanks to it being June, the "hallway track" mostly evolved into the "shadow track", under the shade of a big tree in the courtyard, with beanbags and ice cream! That's a definite improvement. The social event was in a great location - around an outdoor swimming pool (although we couldn't swim - apparently we couldn't serve drinks if swimming was allowed, so that seems like the best choice!) All in all, a great conference. I'm very much looking forward to Flock in Cork now, and will be doing my talk there again if it's accepted.

Tomorrow will be an exciting day, because (barring any unforeseen issues) we'll be turning on gating of Rawhide updates! I've been working towards this for some time now - improving the reliability of the tests, implementing test re-run support from Bodhi, implementing the critical path group stuff, and improving the Bodhi web UI display of test results and gating status - so I'm really looking forward to getting it done (and hoping it goes well). This should mean Rawhide's stability improves even more, and Kevin and I don't have to scramble quite so much to "shadow gate" Rawhide any more (by untagging builds that fail the tests).

Davide mentioned during his ELN talk that they ran into an issue that openQA would have caught if it ran on ELN, so I asked if that would be useful, and he said yes. So, yesterday I did it. This required changes to fedfind, the openQA tests, and the openQA scheduler - and then after that all worked out well and I deployed it, I realized it also needed changes to the result reporting code and a couple of other things too, which I had to do in rather a hurry! But it's all sorted out now, and we have new ELN composes automatically tested in production when they land. Initially only a couple of default-install-and-boot tests were running, I'm now working to extend the test set and the tested images.

Other than that I've been doing a lot of work on the usual things - keeping openQA updated and running smoothly, investigating and fixing test failures, improving stuff in Bodhi and Greenwave, and reviewing new tests written by lruzicka. I'll be on vacation for a week or so from Friday, which will be a nice way to decompress from DevConf, then back to work on a bunch of ideas that came out of it!

Thoughts on a pile of laptops

Adam Williamson

2023-02-02 21:42

Hi folks! For the first post of 2023, I thought I'd do something a bit different. If you want to keep up with what I've been working on, these days Mastodon is the best place - I've been posting a quick summary at the end of every working day there. Seems to be working out well so far. The biggest thing lately is that "grouped critical path", which I wrote about in my last post, is deployed in production now. This has already reduced the amount of tests openQA has to run, and I'm working on some further changes to optimize things more.

So instead of that, I want to rhapsodize on this pile of laptops:

A pile of laptops

On the top is the one I used as my main laptop for the last six years, and my main system for the last couple, since I got rid of my desktop. It's a Dell XPS 13 9360, the "Kaby Lake" generation. Not pictured (as it's over here being typed on, not in the pile) is its replacement, a 2022 XPS 13 (9315), which I bought in December and have been pretty happy with so far. On the bottom of the pile is a Lenovo tester (with AMD Ryzen hardware) which I tried to use as my main system for a bit, but it didn't work out as it only has 8G of RAM and that turns out to be...not enough. Second from bottom is a terrible budget Asus laptop with Windows on it that I keep around for the occasional time I need to use Windows - mainly to strip DRM from ebooks. Not pictured is the older XPS 13 I used before the later two, which broke down after a few years.

But the hidden star of the show is the one second from top. It has a high-resolution 13" display with pretty slim bezels and a built-in webcam. It has dual NVIDIA and Intel GPUs. It has 8G of RAM, SSD storage and a multicore CPU, and runs Fedora 36 just fine, with decent (3-4hr) battery life. It weighs 3.15lb (1.43kg) and has USB, HDMI and ethernet outs.

It also has a built-in DVD drive, VGA out and an ExpressCard slot (anyone remember those?) That's because it's from 2010.

It's a Sony Vaio Z VPC-Z11, and I still use it as a backup/test system. It barely feels outdated at all (until you remember about the DVD drive, which is actually pretty damn useful sometimes still). Every time I open it I'm still amazed at what a ridiculous piece of kit it is/was. Just do an image search for "2010 laptop" and you'll see stuff like, well, this. That's what pretty much every laptop looked like in 2010. They had 4G of RAM if you were lucky, and hard disks. They weighed 2kg+. They had huge frickin' bezels. The Macbook Air had come out in 2008, but it was an underpowered thing with a weak CPU and HDD storage. The 2010 models had SSDs, but maxed out at 4G RAM and still had pretty weak CPUs (and way bigger bezels, and worse screens, and they certainly didn't have DVD drives). They'd probably feel pretty painful to use now, but the Vaio still feels fine. Here's a glamour shot:

One very cool laptop

I've only had to replace its battery twice and its SSDs (it came from the factory with two SSDs configured RAID-0, because weird Sony is like that) once in 12 years. Probably one day it will finally not be really usable any more, but who the heck knows how long that will be.

Fedora 37, openQA news, Mastodon and more

Adam Williamson

2022-11-18 22:34

Hey, time for my now-apparently-annual blog post, I guess? First, a quick note: I joined the herd showing up on Mastodon, on the Fosstodon server, as @[email protected]. So, you know, follow me or whatever. I posted to Twitter even less than I post here, but we'll see what happens!

The big news lately is of course that Fedora 37 is out. Pulling this release together was a bit more painful than has been the norm lately, and it does have at least one bug I'm sad we didn't sort out, but unless you have one of a very few motherboards from six years ago and want to do a reinstall, everything should be great!

Personally I've been running Fedora Silverblue this cycle, as an experiment to see how it fares as a daily driver and a dogfooding base. Overall it's been working fine; there are still some awkward corners if you are strict about avoiding RPM overlays, though. I'm definitely interested in Colin's big native container rework proposal, which would significantly change how the rpm-ostree-based systems work and make package layering a more 'accepted' thing to do. I also found that sourcing apps feels slightly odd - I'd kinda like to use Fedora Flatpaks for everything, from a dogfooding perspective, but not everything I use is available as one, so I wound up with kind of a mix of things sourced from Flathub and from Fedora Flatpaks. I was also surprised that Fedora Flatpaks aren't generally updated terribly often, and don't seem to have 'development' branches - while Fedora 37 was in development, I couldn't get Flatpak builds of apps that matched the Fedora 37 RPM builds, I was stuck running Fedora 36-based Flatpaks. So it actually impeded my ability to test the latest versions of everything. It'd be nice to see some improvement here going forward.

My biggest project this year has been working towards gating Rawhide critical path updates on the openQA tests, as we do for stable and Branched releases. This has been a deceptively large effort; ensuring all the tests work OK on Rawhide was a relatively small job, but the experience of actually having the tests running has been interesting. There are, overall, a lot more updates for Rawhide than any other release, and obviously, they tend to break things more often. First I turned the tests on for the staging instance, then after a few months trying to get on top of things there, turned them on for the production instance. I planned to run this way for a month or two to see if I could stay on top of keeping the tests running smoothly and passing when they should, and dealing with breakage. On the whole, it's been possible...but just barely. The increased workload means tests can take several hours to complete after an update is submitted, which isn't ideal. Because we don't have the gating turned on, when somebody does submit an update that breaks the tests, I have to ensure it gets fixed right away or else get it untagged before the next Rawhide compose happens, or else the test will fail for every subsequent update too; that can be stressful. We also have had quite a lot of 'fun' with intermittent problems like systemd-oomd killing things it shouldn't. This can result in a lot of time spent manually restarting failed tests, coming up with awkward workarounds, and trying to debug the problems.

So, I kinda felt like things aren't quite solid enough yet to turn the gating on, and I wound up working down a path intended to help with the "too many jobs take too long" and "intermittent failures" angles. This actually started out when I added a proper critical path definition for KDE. This rather increased the openQA workload, as it added a bunch of packages to critical path that weren't there before. There was especially a fun moment when a couple hundred KDE package updates got submitted separately as Rawhide updates, and openQA spent a day running 55 tests on all of them, including all the GNOME and Server tests.

As part of getting the KDE stuff added to the critical path, I wound up doing a big update to the script that actually generates the critical path definition, and working on that made me realize it wouldn't be difficult to track the critical path package set by group, not just as one big flat list. That, in turn, could allow us to only run "relevant" openQA tests for an update: if the update is only in the KDE critical path, we don't need to run the GNOME and Server tests on it, for instance. So for the last few weeks I've been working on what turned out to be quite a lot of pieces relevant to that.

First, I added the fundamental support in the critical path generation script. Then I had to make Bodhi work with this. Bodhi decides whether an update is critical path or not, and openQA gets that information from Bodhi. Bodhi, as currently configured, actually gets this information from PDC, which seems to me an unnecessary layer of indirection, especially as we're hoping to retire PDC; Bodhi could just as easily itself be the 'source of truth' for the critical path. So I made Bodhi capable of reading critpath information directly from the files output by the script, then made it use the group information for Greenwave queries and show it in the web UI and API query results. That's all a hard requirement for running fewer tests on some updates, because without that, we would still always gate on all the openQA tests for every critical path update - so if we didn't run all the tests for some update, it would always fail gating. I also changed the Greenwave policies accordingly, to only require the appropriate set of tests to pass for each critical path group, once our production Bodhi is set up to use all this new stuff - until then, the combined policy for the non-grouped decision contexts Bodhi still uses for now winds up identical to what it was before.

Once a new Bodhi release is made and deployed to production, and we configure it to use the new grouped-critpath stuff instead of the flat definition from PDC, all of the groundwork is in place for me to actually change the openQA scheduler to check which critical path group(s) an update is in, and only schedule the appropriate tests. But along the way, I noticed this change meant Bodhi was querying Greenwave for even more decision contexts for each update. Right now for critical path updates Bodhi usually sends two queries to Greenwave (if there are more than seven packages in the update, it sends 2*((number of packages in update+1)/8) queries). With these changes, if an update was in, say, three critical path groups, it would send 4 (or more) queries. This slows things down, and also produces rather awkward and hard-to-understand output in the web UI. So I decided to fix that too. I made it so the gating status displayed in the web UI is combined from however many queries Bodhi has to make, instead of just displaying the result of each query separately. Then I tweaked greenwave to allow querying multiple decision contexts together, and had Bodhi make use of that. With those changes combined, Bodhi should only have to query once for most updates, and for updates with more than seven packages, the displayed gating status won't be confusing any more!

I'm hoping all those Bodhi changes can be deployed to stable soon, so I can move forward with the remaining work needed, and ultimately see how much of an improvement we see. I'm hoping we'll wind up having to run rather fewer tests, which should reduce the wait time for tests to complete and also mitigate the problem of intermittent failures a bit. If this works out well enough, we might be able to move ahead with actually turning on the gating for Rawhide updates, which I'm really looking forward to doing.