It's a good thing they pay me to break stuff

Adam Williamson

2014-11-27 18:17

Had an interesting couple of days deep-diving into juicy issues.

Yesterday's was triggered by me messing up the Fedora kernel package git repository - whoops. I keep Fedlet's kernel as a branch that only exists in my checkout; it's not pushed anywhere (I should just push it out on my git server now I have one, but I keep forgetting). I accidentally ran git push from that branch yesterday, and it promptly pushed all the changes on it to master, effectively turning Fedora's kernel into the Fedlet kernel for a few glorious hours until Josh reverted it.

Obviously at least 50% of the blame there lies squarely on me for having set an upstream for my branch and running git push from it at all, but it seemed odd to me that git would run such a push in a fairly 'default' configuration. So I did some digging.

It turns out it's not meant to. I don't have git's push.default setting set locally. So every time I push anything anywhere, I see this message:

warning: push.default is unset; its implicit value has changed in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the traditional behavior, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

When push.default is set to 'matching', git will push local branches
to the remote branches that already exist with the same name.

Since Git 2.0, Git defaults to the more conservative 'simple'
behavior, which only pushes the current branch to the corresponding
remote branch that 'git pull' uses to update the current branch.

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

Yeah, I've been staring at that for like a year or two instead of just picking a default like it tells me to. What can I say, I'm lazy. Anyway, as that clearly says, the intent in this case is that git should use its simple push behaviour. If you go and look at git help config, like it says, you find this interesting snippet:

       ·   upstream - push the current branch back to the branch whose changes
           are usually integrated into the current branch (which is called
           @{upstream}). This mode only makes sense if you are pushing to the
           same repository you would normally pull from (i.e. central
           workflow).

       ·   simple - in centralized workflow, work like upstream with an added
           safety to refuse to push if the upstream branch’s name is different
           from the local one.

That is, simple is supposed to reject the push if the source and target branches don't have the same name. But clearly, in my case, it didn't.

I was now pretty convinced I could blame git for the other 50% of my booboo, so I did some digging, and eventually wound up figuring out what was going on and filing a 'bug report' (mailing list post) with the git folks. Basically, it turns out that they sort of got their wires crossed with a couple of different people working on this code when they implemented the behaviour, and in the case where push.default is not explicitly set, this has been broken all along: in effect, it's been behaving as upstream, not simple, in the case where the default isn't explicitly set. If you explicitly set the default to simple it works as expected - the push is rejected.

I'm mildly surprised more people haven't hit this; I guess not many people do idiotic mistaken pushes like me, and/or most people read the message saying to pick a default behaviour and actually pick one, rather than saying 'meh, maybe next decade'. (Or, other people have been hit by this but didn't feel so convinced git should've saved them from themselves, they just accepted the blame and moved on...)

Anyway, it looks like that'll wind up getting fixed in git upstream, so at least I'll hopefully have saved the next idiot monkey who both procrastinates needlessly and doesn't double check his git pushes. And if you're an idiot monkey like me and have been staring at that same advisory message for months, I'd advise you go right ahead and run git config --global push.default simple right now, and/or be very careful with your git pushes.

Today has been interesting as on the one hand we're in Fedora 21 Final rush mode but on the other hand all the Americans are off for Thanksgiving. Thanks to some heroic work by Kamil Paral and Vratislav Podzimek I was able to file the 21 Final RC1 compose request, which is great news for the chances of getting F21 out on time - it's quite likely we'll wind up needing more RCs, but for a while I was worried we wouldn't even make RC1 until Monday, and with Go/No-Go next Thursday, that would've been yet another week of crazy overnight test runs and lack of sleep. With RC1 composed this afternoon we have tomorrow and the weekend to do really exhaustive testing on it, and if there do turn out to be any more bits to clean up, we'll have much fewer moving parts to keep track of and a much more solid test base to work off next week.

So aside from getting that done, I spent most of the day digging into a bug that was niggling at me. It had come up in blocker review and been rejected because at the time it looked like it was related to installing on a Mac with both the internal storage and an SD card as target disks, which is a pretty obscure corner case, but I wasn't entirely sure that was the whole story.

After an hour or two of running scenarios and several more hours laboriously picking through anaconda code, I'm pretty sure I've figured out what's going on and provided at least a candidate fix - I think the bug in fact affects any install with a 'platform' that uses a partition as the bootloader stage1 target (mainly UEFI, but that also covers one ARM variant, and possibly a few other things like BIOS + GPT and PPC), when you install to multiple disks and want to put the bootloader stage1 partition on a disk other than the first. I'm pretty sure that just doesn't work, and I know more or less why.

That might sound like, you know, kind of a bad bug. You might be sitting there wondering how something that broken happened, and why we hadn't really spotted it until now. But you know what...this is an ideal illustration of the thing we keep banging on about, that writing OS installers is really goddamn complicated. See where I mentioned 'platforms' up there? anaconda supports installing to 10 different 'platforms' - platforms are things like x86 BIOS, x86 UEFI, Mac UEFI, ARM, various ARM variants, PPC and its variants, S390 - all of which require drastically different bootloader configurations. It has (depending on how exactly you count) like 4-6 different interfaces / interface 'workflows' in which partitioning could be defined - kickstart with explicit partition layout. kickstart autopart, text UI, GUI guided, GUI custom+'create partitions for me', GUI custom. The intersection of partitioning and bootloader code has to somehow manage to create valid configurations when using 'guided' or 'automatic' partition creation, and correctly validate the partition layout plus bootloader configuration for all platforms on all workflows, both accepting valid configurations and rejecting invalid ones. I spent like four hours just picking through anaconda and blivet code to even understand the flow from partitioning to bootloader configuration validation before I could touch a line of code, and I'm sure I don't completely grok all the subtleties of it even now. Then I had to try and figure out where exactly in the 15+ source files I have lying around in gedit was the right place to fix it, and try to think through whether the fix not only fixed the case I was trying to fix but didn't break any other cases (with that combinatorial explosion of possible paths I just mentioned to think about). Then I had to test it with a half dozen installs just to feel comfortable in at least submitting it to Bugzilla. And even then I'm only like 70% sure I more or less got things right and my patch is a decent first starting point. I don't know how anyone stands working on anaconda for a living, I'd go insane(r).

And the thing is, you can't even say 'well that's obviously too complex, no-one can deal with that mess, let's simplify it' because, well, Fedora/RH really wants to work on all those platforms and offer all those partitioning workflows. If we don't, people scream. Hell, there are still people complaining about us not letting you install grub to a partition header any more - that was one small thing the devs did to simplify this whole mess (a platform for which the stage1 target can be either a disk or a partition just makes the whole thing even more of a damn nightmare), and just that small change resulted in like weeks of contentious bug reports and ML threads. So we're basically stuck with it. The code is a complex mess, but it's not unnecessarily or egregiously so, it's inevitably so - it's trying to do so much stuff it really can't be otherwise....