Doings (including a not-too-short and possibly wildly inaccurate history of PHP class loading, for some reason), and OpenID is back (for now)

Doings

Edit: As you can probably see, I also decided to switch up my blog theme, finally. Wanted something minimalist and single-column, and this looks decent. I also put a bit of effort into getting half-decent rendering of preformatted/code blocks, and installed an extension to let me write blog entries in Markdown, yay. Sorry for any weirdness you noticed while I was hurriedly rewriting this post and editing the theme.

So I took a look at the bug that seemed to be breaking the OpenID Wordpress plugin for me - php-openid bug here, Wordpress plugin bug here - and found what looks like a simple fix for it, and it seems to work for me, so I've turned OpenID back on. You should be able to post comments without recaptcha by logging in with a valid OpenID now. (Also, I can use https://www.happyassassin.net/ as my OpenID again...)

I've been posting miniblogs to G+ lately, which is bad of me, but it's hard to resist with the F/OSS community there. You can find me on G+ here (separate browser profile recommended when using evil social networks!)

On that topic - I wrote a G+ entry on running things like Facebook, G+ and other sites that do privacy-invading things in a separate profile, conveniently.

I spent about half of the Red Hat holiday shutdown running around fighting fires in Fedora 20 and Rawhide. Mostly I made things better, I think. Well, I broke Rhythmbox, but then I made it better again. I also fixed OwnCloud, fixed some GNOME crashing, fixed gtkpod crashing on start, and fixed the problem with disabling SELinux via the config file not working any more.

The second half of the holiday shutdown I spent, for some completely inexplicable reason, working on OwnCloud - I started out by building OC 6 for Fedora 20 so I could deploy it on my own OC instance and test it, but things sort of branched out crazily from there...

OC is a PHP web app. If you've never had cause to look into that sausage factory - you lucky, lucky person - PHPland is one of those development communities where it's become standard practice to bundle libraries you depend on. In fact, PHP is possibly worse than Java in this regard: you should count yourself lucky if they even bother bundling the entire library, or indeed telling you where the code came from at all. The standard approach of a PHP developer who needs a bit of code to do something and doesn't want to write it appears to be to Google around a bit, find a random PHP file containing a function that does what they want, dump a copy of it into their source tree, and start using it.

I exaggerate only slightly. I think if you show up at a PHP conference and start talking about things like library naming conventions and interface stability and conventions concerning code re-use they'd look at you like you'd sprouted an extra head and run you out of town on a rail.

So, anyway, the major challenge in packaging PHP web apps (and PHP stuff in general) is unbundling. Fedora, being a fairly traditional Linux distribution, is staffed by people who very keenly understand the importance of standardized use of shared resources, and has lots of strict and very sensible policies about same. Mapping PHP development practices onto the policies of Fedora (and most other distributions) is...a fun exercise.

Fedora packages are usually not allowed to bundle shared libraries. If the thing you're packaging wants to use an external library, the policy is that that external library should be packaged as an independent Fedora package, and the dependent package should use the packaged copy of its dependency. Bundling is endemic in PHP packages, and often done very poorly (see above). Usually 90% of the work of packaging anything substantial written in PHP is unpicking the external dependencies.

So it is with OwnCloud. Its use of external libraries is as chaotic as most PHP projects, though to OC's credit, at least its developers recognize that this is a problem and are responsive to submissions to improve the situation. If you take a look at the OwnCloud 6.0.0a tarball, you'll find directories named '3rdparty' tucked away here and there:

./apps/documents/css/3rdparty
./apps/documents/js/3rdparty
./apps/search_lucene/3rdparty
./apps/files_pdfviewer/3rdparty
./apps/bookmarks/3rdparty
./apps/files_external/3rdparty
./apps/files_encryption/3rdparty
./apps/updater/js/3rdparty
./apps/calendar/3rdparty
./3rdparty

These contain OC's external dependencies (some PHP, some Javascript, some CSS, and some stuff like fonts). There are...a lot of them. We care more about unbundling the PHP than the Javascript or the CSS1, but that still leaves a lot. As with most PHP projects, there is some fairly gross insanity in there, often related to nested bundling: PHP projects often bundle things which bundle other things...sometimes the same thing as is bundled by some other thing.

OwnCloud 6.0.0a, for instance, contains two copies of dompdf and one of tcpdf. It contains two copies of Doctrine's cache component - one in 3rdparty/doctrine/common/lib/Doctrine/Common/Cache , one in apps/files_external/3rdparty/aws-sdk-php/Doctrine/Common/Cache (which as you can see is bundled by aws-sdk-php which in turn is bundled by an OwnCloud app) - which are nearly, but not quite, the same. It contains version 2.2 of Symfony's routing component in 3rdparty/symfony/routing , version 2.3 of Symfony's console component in 3rdparty/symfony/console, and God-knows-what version of Symfony's classloader and event dispatcher in apps/files_external/3rdparty/aws-sdk-php/Symfony (again, nested via aws-sdk-php). It contains an entire pure PHP Unicode framework - which it uses unconditionally (despite the fact that standard PHP extensions like intl are likely to be available and perfectly capable of doing the job in the majority of cases) to set a UTF-8 locale and do Unicode string normalization. And there's probably more I haven't found yet.

The main OwnCloud package maintainer, Gregor Tätzner, has done yeoman's work on unpicking all this stuff over the years, and with 5.x it was pretty close to being fully unbundled, but there's some new stuff to deal with in 6.x, and some things that still need working on. So I spent some time helping out with that.

My instinctive response to a lot of PHP bundling is "oh god can we kill that with fire?", so rather than just methodically working through bundled stuff and trying to package it for Fedora (the common unbundling approach), I tend to wind up trying to figure out why the code is bundled in the first place. This tends to lead me to poking at various bits of the upstream code, and from there with OwnCloud I wound up sidetracked into submitting various improvements to upstream.

Most of these were fairly trivial - this one looks like a big pull request, for instance, but actually it just tweaks the way several OC apps load external libraries so that they can be more easily unbundled without patching by downstream distributions2.

But there was one where I kind of surprised myself. Here's where it all started: PHP is a class based language. If you want to use something (a function, usually) that's in a class that's not actually a part of the file your code is in, you're going to need to load the file that contains that class somehow.

This, in PHPland, is...a topic with a history. Disclaimer: the following is a likely-deeply-flawed explanation of class loading in general and in PHP in particular which is the result of my four day self-taught crash course in PHP class loading. It incorporates a not-so-brief history of quite a lot of PHP development. If you know much about class loading and/or the history of PHP, you can happily either skim it, skip it, or (preferably) read it and explain to me what I got wrong. If you're as dumb as me, feel free to read the following, feel like you learned something, and go break someone's system. You're welcome!

Class loading background

The 'primitive' way of doing class loading is just to do it manually. In PHP, you can include some other source file with the include, include_once, require and require_once statements. So, the dumb way to do class loading is just to make sure your hierarchy of includes works manually. You still see this in many projects, it's not at all unusual.

It does kind of suck, though, and is often the source of silly bugs and unnecessary handiwork. So over time, there have been various attempts to do something better.

Classes are usually used, whether as the result of conscious planning or just organic development, to define some sort of namespace.

In layman's terms...so, imagine you've got ten pieces of code (could be files within a project, could be different projects, doesn't really matter). They all need a function that strips input in some way or another. Maybe they all call it strip_text().

If you want these pieces to share code in any way, you've now got a bit of a problem, because it's almost certainly the case that not all of those strip_text() functions actually do the same thing. But strip_text is a perfectly decent name for all of them!

When we talk about namespacing we're really just talking about this kind of problem. Say one of the pieces is called foo, and one is called bar. We need a way to say 'this is foo's strip_text() function, and this is bar's strip_text() function'. And that's the Idiot Monkey Guide To Namespacing.

Like anything else people do, there are inevitably five thousand different ways you can do this, with passionate arguments about which is the right one and why all the others are awful and probably cause cancer.

PHP: The Early Years, and PEAR

PHP, per se, doesn't enforce one. For some time, it didn't even endorse one. You put objects in classes, and you named the classes. How you named the classes, and how they related to the layout of your source tree, was your own business. So, of course, everyone named them differently, and laid out their source trees differently.

This is kind of a problem, especially with reusable components. If you're writing code and you want to use several external libraries, it's kind of a pain in the ass to keep track of how each ones names its classes and lays out its source tree to make sure you include the right files and don't ever wind up with namespace collisions or anything like that. So long as there is no standard or widely-accepted convention or, generally, coherence at all in naming classes or laying out source trees, it's also almost impossible to come up with anything cleverer than manual inclusion of required files.

Still a long way back, but after PHP had been around for a while, PEAR happened. PEAR was one of the first attempts to impose some kind of order on the chaos of code re-use in PHPland, and was really the only game in town for a long time. PEAR provided a repository, hosting system, and a set of conventions for PHP libraries. It's the conventions we're interested in here, because (AFAIK) PEAR provided the first widely-observed standard or convention for PHP namespacing.

As PHP still didn't have any actual namespacing features at the time PEAR came about, it defined a standard for namespacing via class names. Basically it said classes should be named in the style Project_Name_Subclass, and stated that "The PEAR class hierarchy is also reflected in the class name", without AFAICS quite defining what the "PEAR class hierarchy" is (in practice I believe this was referring to PEAR's standard top-level class prefixes like Crypt_, Auth_ and Net_).

In PEAR, it came to be standard practice - and was later made a policy - to reflect the class hierarchy in the layout of the source tree as delivered by PEAR. So putting together PEAR's package naming, class naming and source layout conventions, you get the quite nice structure you see in /usr/share/pear on a Fedora system. For instance, /usr/share/pear/MDB2.php is part of the "MDB2" package and defines the class MDB2. /usr/share/pear/MDB2/Extended.php is also part of the "MDB2" package and defines the class MDB2_Extended. /usr/share/pear/Net/Curl.php is part of the "Net_Curl" package and defines the class Net_Curl. And so on.

The Chaotic Interregnum

Ah, but that's just the first part of our story! Over time, a few things happened. More and more PHP development started happening outside PEAR, for a few reasons, which can be summarized as it erring too far on the side of maintaining central control over things (making it hard to get stuff added or make major changes), not keeping up with the times (they moved to SVN in 2010...they're still moving to git) and not sufficiently accounting for the trend towards dependency bundling (in which regard I'm 100% on their damn side).

This post illustrates what, I think, the 'typical PHP developer' saw as 'the problems with PEAR' (though right around the line "The nail in the coffin for me with PEAR is one of the biggest bug bears of the Gem system: system-wide installation" I develop an irresistible urge to stab the author in the head with a rusty fork).

Also, PHP itself got more mature. Very significantly, PHP 5 introduced frameworks for namespacing and class autoloading. Naturally, as we're dealing with humans here, the now-official way of doing namespacing in the PHP language was not the same as the one existing standard way of doing namespacing in the most widely-used external repository of reusable PHP code.

Since PHP 5.3, you can define namespaces (which can contain classes, interfaces, functions and constants) in PHP by using the namespace keyword. A backslash - \ - is used as a hierarchy separator.

So now we had a sort of ersatz namespacing convention originating in PEAR, based around class names and using underscores as a separator, and a formal namespacing feature of PHP itself, encapsulating class names and using backslashes as a separator.

This was dealt with in PHPland in just about precisely as careful, considered and organized a fashion as you would expect.

PHP 5 also introduced a framework for doing automatic class loading, which is where this post has been going - slowly, oh so very slowly - for a long time.

If you remember a few thousand words back, we noted that the manual approach to including the appropriate files for the external classes you want to use kind of sucks. Class autoloading is The Solution to this.

Basically, if you have a reliable relationship between class names and the layout of the source files, you don't need to do it manually. If you know that the class Foo_Bar is in the file /some/where/Foo/Bar.php and the class Monkeys_Chickens_Scrabble is in /some/where/Monkeys/Chicken/Scrabble.php...well, there's a pretty obvious possibility to add some convenience.

PHP 5 introduced a standard mechanism for implementing autoloading. If you defined a function called __autoload and then tried to invoke a class which was not already present in the files that had been included so far, instead of just dying with an error like it used to, PHP would call __autoload(classname), then try again. If your autoloader function successfully included the correct file for the class, you won. In PHP 5.1.2, they refined this design to allow multiple autoloaders to be stacked (yes, I know, I'm cringing too) with the spl_autoload_register function.

Anyhow, point is, there were now autoloading and namespacing features of PHP itself, and some large, influential and forward-thinking PHP projects started using them. This starts to stretch my search fu - it helps to have been around at the time - but Symfony, for instance, appears to have grown a basic autoloader in late 2006. Zend and Doctrine have had autoloaders for a long time, too. Smaller PHP projects started to grow their own.

Unfortunately, there was, at this time, no standard in the space, and the different implementations were often incompatible with each other. Not all projects used namespacing the same way, and not all projects mapped namespace and class names to file paths in the same way. PEAR would have been the most likely centre for some sort of standardization to develop, but PEAR was already on its way out at this point. Major frameworks were not part of PEAR, not delivered using PEAR's tools, and didn't comply with PEAR's standards. PEAR never even defined a namespacing standard.

Fast forward a few years and you have the typical PHP chaotic mess. This is, I think, why PHP 5.1.2 let you stack autoloaders. Multiple projects were using different conventions for namespacing, class naming and filesystem layout, and implementing their own autoloaders. If you wanted to depend on two projects which didn't use the same conventions, you needed to be able to use both their autoloaders.

OwnCloud, to come all the way back to where we started for a minute, implements its own autoloading for its own classes using different conventions to any other project I've seen, implements autoloading for one or two of its bundled dependencies with conditionals in its own autoloader implementation, and uses the different autoloaders of some of its other bundled dependencies as an entry point to those dependencies (e.g. php-opencloud).

For instance, remember I noted earlier that OwnCloud core includes some bits of symfony, while one of the official OwnCloud apps bundles php-aws-sdk which itself bundles other bits of symfony? Well, OwnCloud uses its own autoloader to autoload the bits of symfony it bundles directly, but the copy of php-aws-sdk bundled by one of the OC apps uses its bundled copy of symfony's autoloader both to load the bits of symfony it bundles and as an autoloader for php-aws-sdk itself (confused yet?)

A New Standard Emerges: PSR-0

Finally someone tried to do something about this kind of mess, and lo, PSR-0 was born!

(Boy, I recall the halcyon days of last Thursday or so when I first laid eyes on the PSR-0 page; I was still piecing together a lot of this background. It's a whole lot more easy to understand in context.)

PSR-0 is presented as an autoloading standard, but in effect it's really a combined standard for doing namespacing, class naming and filesystem layout in order to enable simple cross-compatible autoloading. It's defined by a group called PHP-FIG, the PHP Framework Interoperability Group, which is a consortium of major PHP frameworks and libraries, including Symfony, Doctrine, Zend and other big hitters. The main point of PHP-FIG is to define standards for these big frameworks to interoperate with each other, but given its size, anything it does tends to be watched pretty closely by - and sometimes adopted by - PHPland as a whole, it seems.

What PSR-0 really does in practice - as it looks to this monkey from the outside, anyway - is take the approach Symfony was already taking in practice and boil it up with a bit of backwards compatibility with PEAR. What I think they really wanted PSR-0 to say was this:

Define namespaces and put classes in the namespaces. The path to the class monkeys in the namespace Foo\Bar is Foo/Bar/monkeys.php .

But they wanted PSR-0 compliant autoloaders to be backwards compatible with projects that still use a layout based on the PEAR standards (remember I explained those?), so they baked PEAR compatibility into the standard. This adds rather a bit of complexity, mainly because of the different separators, \ and _. It can't simply unconditionally treat _ as a directory separator because there could be underscores in namespaces. If you define the namespace \Foo_Bar\Monkeys_Love_Peanuts and put the class Really inside it, the filesystem path should be /Foo_Bar/Monkeys_Love_Peanuts/Really.php , not /Foo/Bar/Monkeys/Love/Peanuts/Really.php. So without ever entirely clearly explaining why (which made figuring this out fun...) the PSR-0 standard ensures PEAR compatibility by treating an underscore as a directory separator in class names but not in namespaces.

For a given tree of PHP files to be "PSR-0 compliant" they must define at least one level of namespacing, and be laid out on the filesystem such that the file that implement a class maps to the namespace in which the class resides and the name of the class with \ taken as a directory separator in the namespace and _ taken as a directory separator in the class name. PEAR-compliant trees aren't technically "PSR-0 compliant" as they have no namespacing, but PSR-0 is inherently backwards-compatible with the PEAR standards: any PSR-0 compliant autoloader will be able to autoload classes from a PEAR-compliant tree.

Are We There Yet?

So now we have two conventions for class naming and source tree layout that make it practical to write an efficient and not overly complex autoloader implementation, and a standard which encapsulates them both. It took a while, and a lot of background, but we've arrived somewhere sane, haven't we? Oh god, please say we have.

Well, no, sadly, not entirely. No. That'd (still) be too easy. For a start, PSR-0 adoption seems to have been solid but nowhere near universal. Some people really don't like it. That was kind of a random Google hit, but in practice, there's still a lot of code out there in the wild that isn't either PEAR or PSR-0 compliant. So, that sucks.

But less randomly and more structurally, there's one more big upheaval in PHPland which affects this whole shebang, and we haven't covered it yet. If he's still reading at this point, my good friend Remi Collet is about to reach for his revolver, because it's called Composer (which comes in a package deal with its sidekick, Packagist).

Is It A Bird? Is It A Plane? No, It's Composer!

Taken together, Packagist and Composer are basically an attempt to replace PEAR, while being much much more friendly to library bundling. In fact, Remi (while waving his revolver) describes them as a bundling machine.

Composer is more or less explicitly a framework you stick into your PHP project to handle your external dependencies. It is designed such that you put Composer in your project, tell it what versions of what external dependencies you want to use, and Composer pulls copies of those into your source tree (from Packagist, which is its actual repository) and provides you with a bunch of convenience functions for updating them and autoloading them.

The metadata for a Composer-delivered project includes a spot where you can define how your project should be autoloaded - you tell it whether you're PSR-0/PEAR compliant, PSR-4 compliant (I'm coming to that, don't worry), or use some other wacky scheme, and Composer takes care of the rest. If you're writing a PHP project and you consistently use Composer to handle your external dependencies, you can pretty much just go ahead and invoke whatever classes you want from those deps and rely on Composer to handle the details for you.

If you're writing a PHP library or framework or whatever, you define some metadata in a json file and bung it into Packagist, and you're now delivered by Composer. It's all set up to be very low-bar-to-entry stuff.

Composer is a pretty good fit for today's favourite buzzwords - it's very decentralized and works well with a git-style workflow, for instance. My impression is that it's rather taken off like wildfire. Composer has become the primary upstream distribution channel for quite a lot of PHP libraries and frameworks (sometimes to the point where if you download a tarball, you just get a frozen copy of a Composer distribution of that project).

This comes with benefits and drawbacks, from a distribution point of view. On the one hand, PEAR was a lot more distribution-friendly, with its implicit assumption that the things distributed by PEAR would be installed system-wide and shared, not copied into each dependent project separately.

Composer - and the widespread adoption of Composer - effectively establishes library bundling as The Way PHP Does Things. If you think library bundling is a terrible long-term idea, like most people involved in Linux distributions tend to, this is obviously not good. And if you're a poor distro packager trying to reconcile PHP's 'let's throw sixteen copies of every library in there, so long as the code runs!' approach with your distro's probably strict policies about bundled code, Composer sure doesn't look like it's making anything easier.

On the other hand, you can look at the chicken and egg as being the other way around. If you take the view that PHPland has just about always had a tendency to bundle libraries, and it really took hold in the Chaotic Interregnum - a view which I think has a lot of truth to it - you could say that Composer isn't really the villain of the piece. This is closer to my view. This 'glass half full' view is that, while it enshrines bundling as accepted practice, Composer at least brings some order to it.

As noted with several examples from OwnCloud above, PHP projects are really bad at bundling things. They're not just addicted to doing it, they tend to do it in a really chaotic way. They throw random bits of other projects into their source trees in random places, often without ever indicating in any reliable way that they've included bits of other projects at all (a ritual of packaging some new PHP thing is to poke through the entire source tree finding the bits of code that have been ripped off from other places). They'll happily strip down external dependencies massively, and fiddle with their filesystem layouts. They'll happily patch their external dependencies, often without any easily locatable explanation of how they've patched them or even that they've patched them at all. It's a horrible, untenable mess.

Big projects can be even worse than small projects in this - some big projects are relatively organized, and arrange and document their external dependencies carefully, but some have accumulated whole piles of external dependencies in different locations in their source trees, each being laid out, documented (if at all) and loaded differently from the others.

If everyone adopts Composer, at least it should bring some order to this chaos. Distributions will still have lots of unbundling messes to unpick, but looking at the Composer metadata of any Composer-using project should at least give us an immediate indication of what external dependencies it actually has, what versions it's using, where it's keeping them, and so on. This makes unbundling somewhat easier, not harder.

Like PEAR, Composer also gives us a widely-observed standard that we can use to do efficiency-improving abstraction at the distribution level. Fedora and other distributions have a whole little infrastructure for packaging PEAR projects efficiently and consistently; packaging something delivered via PEAR is heavily automated. If Composer keeps up its current momentum, we'll get value out of doing the same thing with it - setting up a packaging infrastructure which lets you very easily generate a package for a Composer-delivered component. My other good friend Shawn Iwinski has done some experimental work down this avenue, and it looks pretty promising.

Widespread Composer adoption does somewhat lessen the chances of PHPland ever seeing the light and getting a lot less addicted to bundled dependencies, but frankly, that looked like a pretty long shot before Composer showed up.

PSR-4, Bringing It All Back Home

So, Composer is more or less directly responsible for the final wrinkle (so far...) in PHP class loading: PSR-4. PSR-4 - which was very recently adopted by PHP-FIG, and even more recently merged into Composer - is the second autoloader standard among PHP-FIG's five accepted standards so far, meaning they're batting a solid .400 in the 'produce nothing but autoloader specifications' stakes.

PSR-4 is explicitly designed to solve a fairly superficial problem that shows up when you mix a little PSR-0 and a little Composer. Dependency management frameworks like Composer, you see, have to do namespacing too. It would hardly be practical for Composer to stick every library it downloads into a single big directory. So, Composer came up with a perfectly 'common sense' form of namespacing for its problem space. If you're a package being delivered by Composer, you have a vendor name and a package name, and you're installed in the sub-directory vendorname/packagename/.

Unfortunately, when a Composer-delivered project is PSR-0 compliant, because packages that implement PSR-0 tend to use something very similar to "vendor name" and "package name" in their namespace layouts, you wind up with this kind of thing: my-awesome-php-project/vendor/symfony/routing/Symfony/Component/Routing/Router.php - which would be the file containing the class Router, in the Symfony\Component\Routing namespace. vendor/ is the top-level directory for all Composer-delivered components, and the Composer vendor name is 'symfony' and the Composer package name 'routing'. This is perfectly PSR-0 compliant from the base level my-awesome-php-project/vendor/symfony/routing (Composer's autoloader keeps track of where to start looking for what namespaces, PHP autoloaders tend to implement a 'prefix' or 'base directory' concept for this), but looks pretty absurd.

PSR-4 aims to address this by introducing the concept of mapping subsets of namespaces to arbitrary 'base directories'. To take the example above, with PSR-4, Symfony could declare that the 'base directory' for the 'namespace prefix' \Symfony\Component\Routing is /symfony/routing/ , and then set itself up such that when delivered by Composer, it is laid out as my-awesome-php-project/vendor/symfony/routing/Router.php.

Bits of the namespace beyond the 'prefix' are treated as under PSR-0, with \ as a directory separator, so the class \Symfony\Component\Routing\Matcher\UrlMatcher would be found at my-awesome-php-project/vendor/symfony/routing/Matcher/UrlMatcher.php.

It's arguable whether this 'problem' is significant enough to need a new alternative autoloader standard to solve it, but PHP-FIG decided it was (PSR-4 also drops the PEAR-compatibility stuff, so it doesn't have the added complexity of underscores to deal with, and this also solves a class of cases where PSR-0 compliant classes could be validly mapped to more than one directory path), and now we have one.

So now both PSR-0 and PSR-4 are active PHP-FIG standards and you can choose to lay out your project according to either (or, of course, neither). Composer, as mentioned above, now implements both PSR-4 and PSR-0 and you can mark a Composer project as being compliant with either. Composer's autoloader is a large (and, to be honest, quite impressively written) beast, which I've found invaluable as a reference in figuring all this stuff out. It implements classmap-based, PSR-4, PSR-0 and PEAR autoloading, falling back in that order.

classmap-based autoloading, which I didn't bother mentioning above, basically involves generating a static map of what filenames contain what classes and feeding that to the autoloader; some projects use this instead of implementing PSR-0 or PSR-4 or their own deterministic layout.

And....whew. That was Adam's Not-Very-Potted History Of PHP Class Loading. Finally, how does this tie back to my 'vacation' time? Well, I got sucked into this vortex by trying to clean up how OwnCloud loads its external dependencies, as the examples given above may have suggested. OwnCloud currently loads its external dependencies...very messily, and in ways which aren't at all conduicive to downstream unbundling. Eventually, in trying to get a handle on how it currently works and how (I think) it ought to work, it turned out to be necessary to figure out absolutely all of the above. After several false starts and bad ideas, what I was able to do with all that learnin' so far is two things:

  1. The above-linked pull request to clean up how several things that load external dependencies manually do so - what I actually did was have them, wherever possible, require_once the PSR-0 compliant relative class path, after adding the appropriate top-level directory to the PHP include path. This means that if a distribution packages the external dependency such that it's PSR-0 compliant with regard to the distribution's PHP include path - as both Fedora and Debian aim to do with PSR-0 compliant PHP packages - and then drops OwnCloud's bundled copy of the dependency, the require_once statement will find the system copy, and the distribution does not need to patch the require_once statement to give the right location.
  2. This pull request which (in my opinion, anyway...) improves OwnCloud's own autoloader quite a bit. It separates out the handling of OwnCloud classes and external classes more clearly, and makes the autoloader path for external classes PSR-0 compliant, and very similar to the PSR-0 reference implementation, with a prefixing mechanism (so OwnCloud can declare that a given set of classes is PSR-0 compliant with a root of 3rdparty/ or 3rdparty/symfony/routing or whatever) and a fallback to simply looking for the PSR-0 classpath relative to the include path, if the prefixed search doesn't work. This, again, allows for transparent unbundling, so long as the requisite dependency is present in a PSR-0 compliant location relative to the system PHP include path. In other words, Fedora can drop OwnCloud's bundled copy of, say, Symfony, and trust that OwnCloud will successfully autoload the needed classes from the system copy in /usr/share/php/Symfony , which is laid out in a PSR-0 compliant fashion (with /usr/share/php on Fedora's PHP include path).

The second patch, in particular, is 45 lines of code that took me (the monkey) a mere couple of hours to write (it would take someone who can actually code about ten minutes), but required about three days of research to be relatively sure I was getting it right. I haven't actually done anything with PSR-4 yet, but I sure needed to make sure I understood why it existed and what it was about. All this has merely confirmed to me that people who deal with this kind of stuff for a living are nuts, and PHP developers are - without exception - evil. But I had fun!


  1. Although under the Web Assets change we are supposed to be unbundling those now, which...yeah, ETA 3014. 

  2. It also adds lists of the libraries bundled in several of the 3rdparty directories with lots of details about what versions they are, where they came from, and how they've been modified, which is something I wish all upstreams would do as a matter of course. 

Comments

kparal.wordpress.com/ wrote on 2014-01-07 11:41:
Thanks for all that work. It's stunning (and very unfortunate) how much work needs to be done just to satisfy Fedora's no-bundle policy. And it reminds me why I think this policy actually hinders Fedora. It's the reason why so many applications are not present in Fedora repositories. I'd rather have a suboptimal version of the application than no application at all. (I'm an author of one Java-based app with lots of bundled deps and I'll never waste time to unbundle that, because it's pointless. The project is complete, works well, and I was able to create it in a fraction of the time that would be spent on dealing with almost-dead upstreams and handling local patches they are not interested in accepting. Sometimes this makes sense, sometimes it doesn't. So, the reasonable approach - bundle libs if it's much easier than not bundling - is a win for me, for my users, just not for Fedora. So, I use OpenSUSE Build Service to provide a generic RPM instead. It seems I'm as evil as PHP developers). It could be a developers and users choice, rather than distribution choice. It is similar to creating unit tests - you can do it and have benefits ("the right way") or not do it ("the easy way", at least initially). We could have a security rating in our software center related to library bundling and users could decide whether to use software with low rating or rather use a different high-rated alternative. And anyway, if we talk about end-user apps (not running under root) and using no networking or just acting as a network client, there's not that much security concerns when it comes to library bundling, I believe. Of course apps running under root or apps acting as a network server (like ownCloud) are more susceptible to security issues and this policy might still make sense. Sorry for my rant about no-bundle policy, it seemed related. I'm very glad somebody invests so much energy into keeping owncloud available in Fedora.
adamw wrote on 2014-01-07 17:09:
Well, the Fedora.next-allied thing about 'rings' is trying to deal with that, AIUI. My simplified take on it is that it's a way to say 'okay, sure, whatever' to ecosystems that are now based around bundling, and say 'Fedora provides the stuff underneath rubygems or Composer or whatever, and from there on, we hand over to your secondary dependency manager and its craaaaaaazy policies'. From the point of view of 'make it less painful for Fedora to deal with the way the world is', that works, sure. I'm still not terribly happy with it, though, because I do think distributions can act as a valuable sanity check on upstreams, and waving through library bundling is inevitably going to mean we stop doing that. That's why I peppered examples of OwnCloud insanity throughout the post: the point is that distribution policies really aren't just some relic of the 1990s, it really is the case that if you take the 'easy' way out and just throw in some bundling any time you need an external function, you start off writing working code really quickly and then a couple of years later realize you've created a completely unsustainable mess. I'm worried that's exactly what's happening to some high-profile projects at present. BTW, one of the Composer links I posted - http://philsturgeon.co.uk/blog/2012/03/packages-the-way-forward-for-php - does nail in passing the fact that disorganized bundling in an ecosystem is a cause of "almost-dead upstreams" which are unresponsive to submissions. If the expectation is that everyone just shows up and takes a copy of your shared code into their project and from then on its their problem, why would you care too much about patch submissions? You wrote the code, you gave it to the world, you're done here, you can move on. This is another small advantage of organized bundling - if people providing shared libraries via Composer know that the majority of their consumers are consuming them via Composer, they at least know there's value in updating and improving the 'master copy' of the library. The central point and system for bundling, Composer/Packagist, becomes more of a hub of activity, and with any luck, should be a centre of influence against shared code being just dumped into the world and abandoned to be forked forever.
adamw wrote on 2014-01-07 17:49:
Ah, I knew PHPland wouldn't let me down. Don't worry, they've already started doing truly bizarre things to 'comply' with PSR-0: https://github.com/serbanghita/Mobile-Detect/blob/master/namespaced/Detection/MobileDetect.php So...because you called your class Mobile_Detect and put it in Mobile_Detect.php not Mobile/Detect.php , the solution is obviously to invent a namespace that makes no sense and put a wrapper class into that namespace and then register your wrapper class with the autoloader. That way people can "simply" autoload the Mobile_Detect class by calling \Detection\MobileDetect! What could be easier? excuse me while I go and drink heavily.
ktdreyer wrote on 2014-01-09 16:58:
"This ‘glass half full’ view is that, while it enshrines bundling as accepted practice, Composer at least brings some order to it." This is exactly what I thought when I first saw Composer, too. I think you've already said this in so many words, but: my hope is that Composer will at least cause PHP developers to think a little more about the concepts of "upstream/downstream", and it might prompt them to spend the extra five minutes to attempt to get their customizations upstream instead of spending 30 seconds tweaking their own private copy. A bundled library that's a vanilla copy from upstream is an order of magnitude easier to remove for Fedora than a bundled library that's been tweaked. But that's just the optimist in me :) By the way, on the subject of bundled libraries, I heard from one of the Puppet guys that the creator of Puppet once had a t-shirt with a phrase "Just /vendor it" as a joke about the Ruby community's rampant tendency to fork and vendor everything. (I've searched all around the internet for a reference to that, and I wish I could find one confirming.) But I bring it up because now that Bundler has gained a lot of traction in Ruby land, I am seeing a tendency to remove things from the /vendor directories and just use Bundler's Gemfile.lock instead. So it's a small shift from private copies of libs dumped into the tree wholesale to libraries that are admittedly still bundled, but are at least vanilla upstream copies. To me, that's baby steps. By the way, to the commenter who said that the policy hinders Fedora, I'll admit that it does make the work in Fedora *harder*, but I think it actually benefits the larger open-source ecosystem overall in the long run when Linux distros encourage upstream projects to reduce bundling and drive contributions to land upstream.
SteveAD wrote on 2014-01-19 17:27:
Well, you can use Long Path Tool for such problems, it works good.