Web

My Experience Writing a Build System

Lately there’s been some interest in build processes among various people — Vellum was announced a while back, Ben has been looking for a tool and looking at Fabric, and Kevin announced Paver. At the same time zc.buildout is starting to gain some users outside of the Zope world, and I noticed Minitage as an abstraction on top of zc.buildout.

A while ago I started working on a build project for Open Plans called fassembler. I think the result has been fairly successful and maintainable, and I thought I’d share some of my own reflections on that tool.

Update: what we were trying to accomplish

I didn’t make it clear in the post just what we were trying to do, and what this build system would accomplish.

Our site (openplans.org) is made up of several separate servers with an HTML-rewriting proxy on the front end. We have a Zope server running a custom application, Apache running WordPress MU, and some servers running Pylons or other Python web applications for portions of our site. We needed a way to consistently reproduce this entire stack, all the pieces, plugged together so that the site would actually work. Two equally important places where we had to reproduce the stack are for developer rigs and the production site.

Our code is primarily Python and we use a lot of libraries, developed both internally and externally. Setting up the site is primarily a matter of installing the right libraries and configuration and setting up any databases (both a ZODB databases and several MySQL databases). We use a few libraries written in C, but distutils handles the compilation of those pretty transparently.

For this case we really don’t care about build tools that focus on compilation. We don’t care about careful dependency tracking because we are compiling very little software.

make doesn’t make sense

Update 2: If you think the make model makes lots of sense, read the preceding section — it makes sense for a different problem set than what we’re doing.

We initially had a system based on BuildIt, which is kind of like make with Python as the control code. It wasn’t really a good basis for our build tool, and I think it added a lot of confusion, compounded by the fact that we weren’t quite sure what we wanted our build to do. Ultimately I think the make model of building doesn’t make sense.

The make model is based on the idea that you really want to save work. So you detect changes and remake things only as necessary. For compilation this might make sense, because you edit code and recompile a lot and it’s tedious to wait. But we are building a website, and installing software, and none of that style of efficiency matters. make-style detection of work to be done doesn’t even save any time. But it does make the build more fragile (e.g., if you define a dependency incorrectly) and much harder to understand, and you constantly find yourself wiping the build and starting from scratch because you don’t trust the system.

The metaphor for the new build system was much simpler: do a list of things, top to bottom. There’s no effort into detecting changes in the build, or changes in the settings, or anything else.

Do things carefully

In the build system almost all actions go through the filemaker module. This is kind of a file abstraction library. But the goals are entirely different than convenience: the goal is transparency and safety. In contrast Paver uses path.py for convenience, but I’m not sure what the win would be if we used a model like that.

filemaker itself is heavily tied to the framework that it’s written for, specifically user interaction and logging. Most tasks just do things, and rely on filemaker to detect problems and ask the user questions. For example, every time a file is written, it checks if the file exists, and if it has the same content. If it exists with other content, it asks the user about what to do. It doesn’t overwrites files without asking (at least by default). I think this makes the tool more humane as the default behavior for a build is to be careful and transparent. The build author has to go out of their way to make things difficult.

Many zc.buildout recipes will blithely overwrite all sorts of files which always made me very uncomfortable with the product. It’s the recipes in zc.buildout which do this, not the buildout framework itself, but because buildout made overwriting the easy thing to do, and didn’t start with humane conventions or tools, this behavior is the norm.

What I think filemaker most accomplished was the ability to do file operations while also asserting the expected state of the system, and so makes build bugs noticeable earlier instead of getting a build process that finishes successfully but creates a buggy build, or having an exception show up far from where the error was originally introduced.

Also, because it won’t overwrite your work in progress this has saved the build from engendering deep feelings of hatred in cases when it might overwrite your work in progress. It’s hard to detect this absence of hatred, but I know that I’ve felt it with other systems.

Update: a corollary: ignore no errors

One question you might wonder about: why not a shell script? We did prototype some things as shell scripts, but we’ve consistently moved to Python at some point, even things that seemed really trivial. The problem with shell scripts is they have horribly bad behavior with respect to errors. Ignoring errors is really really easy, noticing errors is really hard.

This is absolutely unacceptable for builds. Builds must not ignore errors. The build may mostly work despite an error. It might be totally broken, but the error message is lost in all sorts of useless output. The error message probably makes no sense. The context is lost. No suggestion is given to the user.

When builds work, that’s great. Build do not always work. They always fail sometimes, and some poor sucker (usually in some hot potato-like arrangement) has to figure out what went wrong. You have to plan for these problems.

Everything in the build tries to be careful about errors. All places where it is not, it is a bug. The resolution isn’t to see something appear to work, but create a broken build, and say "oh, you forgot to set X". The resolution is to make sure when you forget to set X it gives you an error that tells you to set X.

This is one of the more important and more often ignored principles of a good build/deployment system. Maybe it’s gotten better, but when I first used zc.buildout (very early in its development) the poor handling of errors was by far the biggest problem and it left me with a bad taste in my mouth. easy_install and setuptools in general is also very flawed in this respect.

Log interesting things

I tried to make a compromise between logging very verbosely, and being too quiet. As a user, I want to see everything interesting and leave out everything boring. Determining interesting and boring can be a bit difficult, but really just require some attention and tweaking.

To make it possible to visually parse the output of the tool I found both indentation and color to be very useful. Indentation is used to represent subtasks, and color to make sections and warnings stand out.

The default verbosity setting is not to be completely quiet. Silence is a Unix convention that just doesn’t work for build tools. Silence gets you interactions like this:

$ build-something target-directory/
(much time passes)
Error: cannot write /home/ianb/builds/20080426/target-directory/products/AuxInput/auxinput/config/configuration.xml

Why did it want to write that file? Why can’t it write that file? Is the build buggy? Did I misconfigure it? Does the directory exist?

The typical way of handling this is either to run the build again with logging setup or otherwise make it more verbose, or to get in the habit of always running it verbose.

Mixing code and configuration

BuildIt, which we were using before, had the ability to put variables in settings, and you could read an option from another section with something like ${section/option}. It was limited to simple (but recursive) variable substitution, and had some clever but very confusing rules that created a kind of inheritance.

I liked the ability to do substitution, but wasn’t happy with the compromise BuildIt made. I wasted a lot of time trying to figure out the context of substitutions. So, I saw two directions. One was to remove the cleverness and just do simple substitution. This is the choice zc.buildout made. The other was to go whole-hog. With a bit of trepidation I decided to to go for it, and I made the choice to treat all configuration settings as Tempita templates. All configuration is generally accessed via config.setting_name, and that lazily interpolates the setting (it took me quite a while to figure out how to avoid infinite loops of substitution). Because evaluation is done lazily settings can depend on each other and be overridden and have lots of code in defaults (e.g., a default that is calculated based on the value of another setting), and it works out okay. Most settings just ended up having a smart default, and as a result very little tweaking of the configuration is necessary.

Somewhat ironically the result was a kind of atrophying of the settings, because no one actually set them, instead we just tweaked the defaults to get it right. Now I’m not entirely sure what exactly the "settings" are setting, or who they should really belong to. To the build? To the tasks? While this is conceptually confusing, in practice it isn’t so bad. This mixing of code and configuration has been distinctly useful, and not nearly as problematic to debug as I worried it would be. In some ways it was a way of building lambda into every string, and the lazy evaluation of those strings has been really important. But it’s not clear if they are really settings.

Would normal string interpolation have been enough (e.g., with string.Template)? I’m pretty sure it wouldn’t have been. The ability to do a little math or use functions that read things from the environment has been very important.

Managing Python libraries

fassembler uses virtualenv for building each piece of the stack. Generally it creates several environments and installs things into them — it doesn’t run inside the environments itself. This works fine.

zc.buildout in comparison does some fancy stuff to scripts where specific eggs are enabled when you run a script. Each script has a list of all the eggs to enable. You can’t install things or manage anything manually, even to test — you always have to go through buildout, and it will regenerate the scripts for you. zc.buildout was implemented at the same time as workingenv (the predecessor to virtualenv), and I actually finished virtualenv with fassembler in mind, so I can’t blame zc.buildout for not using virtualenv. That said, I don’t think the zc.buildout system makes any sense. And it’s really complicated and has to access all sorts of not-really-public parts of easy_install to work.

Isolation is only the start. easy_install makes sure each library’s claimed dependencies are satisfied. You might then think easy_install would do all the work to make the stack work. It is nowhere close to making the stack work. setup.py files can/should contain the bare minimum that is known to be necessary to make a package work. But they can’t predict future incompatibilities, and they can’t predict interactions. And you don’t want all your packages changing versions arbitrarily. If you work with a lot of libraries you need those libraries to be pinned, and only update them when you want to update them, not just because an update has been released.

So for each piece of the stack we have a set of "requirements". This is a flat files that indicates all the packages to install. They can have explicit versions, far more restrictive than anything you should put in setup.py. It also can check out from svn, including pinning to revisions. This installation plan can go in svn, you can do diffs on it, you can branch and copy and do whatever. Maybe at some point we could use it to keep cached copies of the libraries. For now it mostly uses easy_install (and python setup.py develop for checkouts).

In parallel we have a command-line program for just installing packages using files like this, called PoachEggs. I want to make this better, and have fassembler use it, but I mostly note it because it implements a feature that can "freeze" all your packages to a requirements file. You take a working build and freeze its requirements, giving explicit (==) versions for packages, and pin all the svn checkouts to a revision, so that the frozen requirements file will install exactly the packages you know work.

An alternative to this is what the Repoze guys are doing, which is to create a custom index that only includes the versions of libraries that you want. You then tell easy_install to use this instead of PyPI. It works with zc.buildout (and anything that uses easy_install), but I can’t get excited about it compared to a simple text file. I also want svn checkouts instead of create tarballs of the checkout — I like an editable environment, because the build is just as much to support developers as to support deployment.

The structure

A big part of the development of fassembler was nailing down the structure of our site, and moving to use tools like supervisor to manage our processes. A lot of these expectations are built into the builds and fassembler itself. This is part of what makes the build Work — the pieces all conform to a common structure with some basic standards. But this isn’t the build tool itself, it’s just a set of conventions.

I don’t know quite what to make of this. Extracting the conventions from the builds leads to a situation where you can more easily misconfigure things, and the installation process ends up being more documentation-based instead of code-based. We do not want to rely on documentation, because documentation is generally because of a flaw in the build process that needs explaining. It’s faster for everyone if the code is just right. Maybe these conventions could be put into code, separate from the build. The abstraction worries me, though — too much to keep track of?

What we don’t get right

The biggest problem is that fassembler is our own system and no one else uses it. If someone wants to use just a piece of our stack they either have to build it manually or they have to use our system which is meant to build all our pieces together with our conventions. There’s some pressure to use zc.buildout to make pieces more accessible to other Zope users. We’ve also found things that build with zc.buildout that we’d like to use (e.g., setups for varnish).

We haven’t figured out how to separate the code for building our stuff from the build software itself. There’s a bootstrapping problem: you need to get the build code to build a project, and so it can’t be part of the project you are building. zc.buildout uses configuration files (that aren’t code, so they lack the bootstrap problem) and it uses recipes (a kind of plugin) and has gone to quite a bit of effort to bootstrap everything. virtualenv also supports a kind of bootstrap which we use to do the initial setup of the environment, but it doesn’t support code organization in the style of zc.buildout.

Builds are also fairly tedious to write. They aren’t horrible, but they feel much longer than they should be. Part of their length, though, is that over time we put in more code to guard against environment differences or build errors, and more code to detect the environment. But compared to zc.buildout’s configuration files, it doesn’t feel quite as nice, and if it’s not as nice sometimes people are lazy and do ad hoc setups.

The future

We haven’t really decided, but as you might have noticed zc.buildout gets a lot of attention here. There’s quite a few things I don’t like about it, but a lot of these have to do with the recipes available. We don’t have to use the standard zc.buildout egg installation recipe. In fact that would be first on the chopping block, replaced with something much simpler that assumes you are running inside a virtualenv environment, and probably something that uses requirement files.

Also, we could extract filemaker into a library and recipes could use that. Possibly logging could be handled the same way (the logging module just isn’t designed for an interactive experience like a build tool). Then if we used other people’s recipes we might feel grumpy, since they’d use neither filemaker or our logging, but it would still work. And our recipes would be full of awesome. The one thing I don’t think we could do is introduce the template-based configuration. Or, if we did, it would be hard.

That said, there is a very different direction we could go, one inspired more by App Engine. In that model we build files under a directory, and that directory is the build. Wherever you build, you get the same files, period. All paths would be relative. All environmental detection would happen in code at runtime. Things that aren’t "files" exactly would simply be standard scripts. E.g., database setup would not be done by the build, but would be a script put in a standard location.

This second file-based model of building is very much different than the principles behind zc.buildout. zc.buildout requires rebuilding when anything changes, and does so without apology. It requires rebuilding to move the directories, or to move to different machines. Using a file-based model requires a lot of push-back into the products themselves. Applications have to be patched to accept smart relative paths. They have to manage themselves a lot more, detect their environment, handle any conflicts or ambiguities, being graceful about stuff like databases, because the files have to be universal. In an extreme case I could imagine going so far as to only keep a template for a configuration file, and write the real configuration file to a temporary location before starting a server (if the server cannot be patched to accept runtime location information).

So this is the choice ahead. I’m not sure when we’ll make this choice (if ever!) — build systems are dull and somewhat annoying, but they are no more dull and annoying than dealing with a poor build system. Actually, they are definitely less dull than working with a build system that isn’t good enough or powerful enough, or one that simply lacks the TLC necessary to keep builds working. So no choice is a choice too, and maybe a bad choice.

Web
Python
Programming

Comments (15)

Permalink

pdb in the browser

People have asked me a few times about evalexception and pdb — they’d like to be able to use something like pdb through the browser, stepping through code.

The technique I used for tracebacks wouldn’t really work for pdb. For a traceback I saved all the information from the frames — mostly just the local variables — and then let the user interact with that through the browser. But with pdb you pause the application part way through waiting for user input, and the routine only completes much later.

While writing WaitForIt I played around with techniques to deal with very slow WSGI applications. Not that hard, really — you launch every request in a new thread, and you manage those requests in an application of its own. So I started thinking about pdb again, and it started seeming feasible. Whenever the app reads from stdin it goes into an interactive mode, showing you what comes out on stdout and letting you add input to stdin. It’s nothing specific to pdb really.

So, with a bit of hacking, I added it into WebError (which is an extraction of the exception handling in Paste). To give the demo a try, do:

hg clone http://knowledgetap.com/hg/weberror/
cd weberror
python setup.py develop
# You need Paste trunk:
easy_install Paste==dev
python weberror/pdbcapture.py

What you’ll see is not polished, it’s just working, but since I mostly did it to see if I could do it, that’s good enough for me.

Web
Python

Comments (1)

Permalink

WebOb Do-It-Yourself Framework

My old do-it-yourself framework tutorial was getting a bit long in the tooth, so I rewrote it to use WebOb. Now: the new do-it-yourself framework.

Web
Python

Comments (8)

Permalink

App Engine and Pylons

So I promised some more technical discussion of App Engine than my last two posts. Here it is:

Google App Engine uses a somewhat CGI-like model. That is, a script is run, and it uses stdin/stdout/environ to handle the requests. To avoid the overhead of CGI a process can be reused by defining __main__.main(). But while a process can be reused, it might not be, and of course it might get run on an entirely separate server. So in many ways it’s like the CGI model, with a small optimization so that, particularly under load, your requests can run with less latency.

This part is all well and good. I’ve already come to terms with servers going up and down without warning. But the environment itself has a number of other restrictions. It seems that App Engine is providing security in the language itself. The interpreter has been modified so that code is sandboxed, with no ability to write to the disk, open sockets, import C extensions, and see quite a few things in its environment. It’s these things that are a bit harder to come to terms with.

While they claim it supports any Python framework, these restrictions don’t actually make it easy. So for the last few days quite a few of us have been hacking various things to get stuff working.

The first thing people noticed is that Mako and Genshi didn’t work, because they use the ast (via the parser module) to handle the templating, and that module has been restricted. Apparently arbitrary bytecode is not safe in this environment, and so anything that can produce bytecode is considered dangerous. From what I understand Philip Jenvy has been working on Mako and the trunk is currently working. He’d already been doing work to get Mako working on Jython, which had similar issues. Genshi is also in progress and fairly close to working, though with some missing features. Genshi has the harder task as Mako was primarily reading the ast, while Genshi was writing it.

The first thing I noticed is that Setuptools doesn’t work. I’m flattered that one of the only 3 libraries included with App Engine is WebOb, but of course I am more enamored of a rich set of reusable libraries. Setuptools didn’t work because several modules and functions have been removed — this like os.open, os.uname, imp.acquire_lock, etc. Some of these are kind of reasonable, while others are not. The removal of many functions from imp doesn’t really make sense, for instance (I think the motivation was the difficulty of auditing the implementation of those functions, not that the functionality itself is dangerous). And while some functions can’t be used in the environment, the fact you can’t import those functions is more problematic. For instance, The Setuptools’ pkg_resources module has support to unzip eggs when they are imported. App Engine doesn’t support importing from zip files at all, and you certainly can’t unzip to a temporary location. But withoutthe necessary modules and objects pkg_resources won’t even import.

To work around this I started a new project: appengine-monkey, which adds several monkeypatches and replacement dummy modules to the environment to simulate a more typical environment. It’s just a small list so far (mostly in this module), but I expect as people experiment with other libraries the list will increase. For example, I would welcome implementations of things like httplib on top of urlfetch in this library. (Implementing httplib and stubbing out parts of socket would probably make urllib run.)

But the good news is that Pylons is pretty much working on App Engine, as is Setuptools and you can manage your libraries using virtualenv.

The instructions are all located in the appengine-monkey Pylons wiki page. Please leave comments if you have improvements or problems with that process. I also welcome contributors and developers to the project itself — this is a project for expediting App Engine development, it is not a project I care to champion or control. Or support to any large degree.

One ticket which is rather important is the apparent maximum number of files and blobs: 1000. Libraries involve lots of files, and the base Pylons install is only barely under this limit. Now I just wish I could use lxml, but that’s probably going to be a long time coming.

Web
Python
Programming

Comments (9)

Permalink

App Engine: Commodity vs. Proprietary

I like this phrasing of the debate about App Engine’s role, from Doug Cutting: Cloud: commodity or proprietary? (via). (Well, I like it more than the sharecropping phrasing referenced in my last post.) I guess I’m excited about this because like Doug I do want a "cloud" of sorts, and this is a move towards that in a way that makes sense to me. Maybe to state my motivations more clearly: I hate computers. I really hate them a lot. I dream of some world of Platonic ideals where software just exists, and existence that state it works. App Engine feels like a strong move in the direction of computers-not-mattering. What does App Engine run on? I don’t care! Where is the server located? I don’t care! What is BigTable? I am comfortable thinking of it only in its abstract sense, an API that works, and I don’t know how, nor do I need to know how. I don’t need to know these things if they just work. Always. Totally reliably. I’m not shy about digging into code. I tend to be light on my reading of documentation, because I’d much rather open up source code and poke around. But when something can work so reliably that I can treat it as completely opaque then it’s a blessing, because I can start to forget about it and think about bigger goals.

There was a time when people were concerned about Big Endian vs. Little Endian in computers. You had to think about this sort of little detail when programming. People formed actual opinions on which way was best. To think! Similarly XML has removed a large number of fairly pointless format decisions people might make. There is progress. Commodity hosting (reliable, consistent hosting, better than what we have now) feels like similar progress.

Unlike Doug I’m optimistic that App Engine is a move in the direction of a commodity cloud. The APIs seem to lack the stench of proprietary APIs. They are based on Google services, but they reasonably abstract and reasonably minimal. This does not seem like some kind of "play" (and the developers’ seem to be reassuring about their intentions). There’s a tendency to be cynical about any company’s work, that it has underlying intentions that are at odds with any competitor (present or future), that anything good is just a loss leader meant to hook you in so they can squeeze you later. Some companies deserve such cynicism. I don’t know that this company, or this team, deserves that.

Mind you, I don’t say this from a Best-Tool-For-The-Job perspective. I believe in the moral foundations of Free Software, not just the technical advantages of its development process. But I’m a Utilitarian, and it doesn’t make me uncomfortable that not everything is Free if I think it’s a step forward for overall freedom. I think App Engine has the potential to be a very powerful tool for enabling people to create and use web applications. If it was great, but still a dead end, then maybe that wouldn’t be good enough. But I don’t think this is a dead end.

Update: Indeed people are reimplementing the interface: see the appdrop.com announcement

Web
Python
Programming

Comments (9)

Permalink

App Engine and Open Source

This is about Google App Engine which probably everyone has read about already.

I’m quite excited about it. Hosting has been the bane of the Python web world for a long time. This provides a very compelling hosting situation for Python applications.

I’m not as interested in this from a competitive perspective as I am from a simple this-is-awesome perspective. Regardless of how this positions Python relative to other languages, this is something Python needs. But even looking beyond that, I think this is something the open source world needs. Open source web development is in a funny place. There’s a lot of reasons why web programming is a good domain for open source. The barrier to entry for web development is extremely low. Developers have choice in their tools, as browsers don’t really care what software you use so long as you serve up HTML. This leads to experimentation and excitement and the kind of self-direction that is very motivating to developers. It leads to the kind of personal excitement that underlies most open source development.

Despite this, open source web application development doesn’t seem sustainable. There’s some applications, sure. WordPress, Trac, MediaWiki, MoinMoin. But most wiki software doesn’t have a vibrant community. Many a bug tracker has fallen by the wayside. Blog software projects have a horrible time building a viable community. Other website software hardly gets anywhere at all. A lot of the development that might appear to be application development really is more like a framework when you look closely (e.g., Plone, Drupal).

I think deployment concerns are a huge part of this. And, given its better deployment story, it’s no surprise PHP is the basis of most viable open source web applications. Being interested in a project requires that you be able to use the product (and usually use it casually, as that’s the point of entry for many developers). Right now most people can’t use open source web applications.

But people can use hosted applications, and that’s where all the effort has gone in the past few years. I am comfortable saying that Trac is a better issue tracker than Google Code’s issue tracker. But I’d probably recommend Google Code to someone starting a new project, because it’s so much less work. Similarly I’d try to dissuade most people from installing their own blog software. I still don’t know what to tell people about a CMS.

Many people are excited about how far up you might be able to scale something based on App Engine, but (like Dave) I’m excited about how far it could be scaled down. For the majority of sites the free quota will be more than enough. But that alone isn’t the point: there’s lots of free services people can use. The difference here is that the free services can be modified and controlled by the anyone who signs up and installs an application.

From the perspective of open source it’s a bit awkward that the platform itself is proprietary. Questions about sharecropping are a valid concern, but I’m optimistic about the ultimate outcome. The SDK is under a permissive open source license, and the APIs are all reasonable enough that they could be reimplemented with open source backends (maybe without the same scalability, but that’s not the aspect I care about anyway). Perhaps the BigTable APIs will serve as the basis for future storage APIs.

But even if other people make compatible implementations of these APIs, would it matter? If Google offers free hosting, is someone else really going to be able to provide a better hosting option? Or would these other implementations just be strawmen, a way to show that It Could Be Done? If the libraries are just written to prove a point, I can’t see them gaining much traction. But I think these could be viable as there are other constraints to the App Engine environment that people may want to escape at some point in their application development.

As to the details of App Engine? Can you run Pylons or Paste on it? Well, that’s a topic for another post.

Update: I wrote up some more thoughts

Web
Python
Programming

Comments (4)

Permalink

JSON-RPC WebOb Example

I just saw this json-rpc recipe go by as a popular link on del.icio.us. It’s yet-another-*Server based recipe (BaseHTTPServer, XMLRPCServer, etc). I don’t know why people keep writing these. WSGI is in all ways easier, clearer, and more useful.

So I figured I’d give it a go myself, using WebOb. Then I figured it might make a good document, and I annotated it and turned it into a tutorial. It’s also an example of using WebOb as a client library.

I’ve added several tutorials in the past months, which I thought I should also point out. The wiki example is fairly pedestrian, but shows how to do typical application-style development with WebOb. A more interesting exampe is the comment middleware example, which shows how much easier it can be to write middleware using WebOb than traditional WSGI middleware.

I think WebOb’s particular strong points are with middleware (where it makes middleware vastly easier to write) and web services of various kinds (RESTful or not).

Web
Python
Programming

Comments (6)

Permalink

HTML Accessibility

So I gave a presentation at PyCon about HTML, which I ended up turning into an XML-sucks HTML-rocks talk. Well that’s a trivialization, but I have the privilege of trivializing my arguments all I want.

Somewhat to my surprise this got me a heckler (of sorts). I think it came up when I was making my <em> lies and <i> is truth argument. That is, presentation and intention are the same. There are those people who feel they can separate the two, creating semantic markup that represents their intent, but they are so few that the reader can never trust that the distinction is intentional, and so <i> and <em> must be treated as equivalent.

Someone then yelled out something like "what about blind people?" The argument being that screen readers would like to distinguish between the two, as not all things we render as italic would be read with emphasis.

It’s not surprising to me that the first time I’ve gotten an actively negative reaction to a talk it was about accessibility. When having technical discussions it’s hard to get that heated up. Is Python or Ruby better? We can talk shit on the web, where all emotions get mixed up and weirded, but in person these discussions tend to be quite calm and reasonable.

Discussions about accessibility, however, have strong moral undertones. This isn’t just What Tool Is Right For The Job. There is a kind of moral certainty to the argument that we should be making a world that is accessible to all people.

I fear this moral certainty has led people self-righteously down unwise paths. They believe — with of course some justification — that the world must be made right. And so many boil-the-ocean proposals are made, and even become codified by standards, but markup standards are useless unless embodied in actual content, and this is where accessibility falls down.

There are two posts that together have greatly eroded my trust in accessibility advocates, so that I feel like I am left adrift, unwilling to jump through the hoops accessibility advocates put up as I strongly suspect they are pointless.

The first post is about the longdesc attribute, an obscure attribute intended to tell the story of a picture. Where alt is typically used as a placeholder for the image, and a short description, longdesc can point to a document that describes the image in length. Empirically they (Ian Hickson in particular) found that the attribute was almost never used in a useful or correct way, rendering it effectively useless. If the discussion had clearly ended at this point, I would have deducted points for those people use advocated longdesc based on bad judgement, but it would not have effected my trust because anyone can mispredict. But the comments just seemed to reinforce the belief that because it should work, that it would work.

The second post was Ian Hickson’s description of using a popular screen reader (JAWS) — you’ll have to dig into the article some, as it’s embedded in other wandering thoughts. In summary, JAWS is a horrible experience, and as an example it didn’t even understand paragraph breaks (where the reader would be expected to pause). What’s the point of semantic markup for accessibility when the most basic markup that is both presentation and semantic (<p>) is ignored? Ian’s brief summary is that if you want to make your page readable in JAWS you’d do better by paying attention to punctuation (which does get read) than to markup. And if you want to help improve accessibility, blind people need a screen reader that isn’t crap.

Months later we started talking a bit about the accessibility of openplans.org. Everyone wants to do the right thing, no? With my trust eroded, I argued strongly that we should only implement accessibility empirically, not based on "best practices". Well, barring some patterns that seem very logical to me, like putting navigation textually at the bottom of the page, and other stuff that any self-respecting web developer does these days. But except for that, if we want to really consider accessibility we should get a tool and use it. But I don’t really know what that tool should be; JAWS is around $1000, all for what sounds like a piece of crap product. We could buy that, even though of course most web developers couldn’t possibly justify the purchase. But is that really the right choice? I don’t know. If we could detect something in the User-Agent string we could see what our users actually use. But I don’t think there’s information there. And I don’t know what people are using. Optimizing for screen magnifiers is much different that optimizing for screen readers.

Another shortcut for accessibility — a shortcut I also distrust — is that to make a site accessible you make sure it works without Javascript. But don’t many screen readers work directly off browsers? Browsers implement Javascript. Do blind users turn Javascript off? I don’t know. If you use no-Javascript as a hint to make the site more accessible, you might just be wasting your effort.

There’s also some weird perspective problems with accessibility. Blind users will always be a small portion of the population. It’s just unreasonable to expect sighted users to write to this small population. Relying on hidden hints in content to provide accessibility just can’t work. Hidden content will be broken, only visible content can be trusted. Admitting this does not mean giving up. As a sighted reader I do not expect the written and spoken word to be equivalent. I don’t think blind listeners lose anything by hearing something that is more a dialect specific to the computer translation of written text to spoken text. (Maybe treating text-to-speech as a translation effort would be more successful anyway?)

A freely available screen reader probably would help a lot as well. I write my markup to render in browsers, not to render to a spec. Anything else is just bad practice. I can’t seriously write my markup for readers based on a spec.

HTML
Web
Programming

Comments (9)

Permalink

A Simple CMS

A while back I was reading this post about creating a simple CMS in Grok. Similar to the author over there I often get questions from friends and family who want to build a simple website about what I’d recommend. I still haven’t figured it out. Most of the big names (Plone, Drupal, etc) feel too big and complex and not content-oriented enough. On the other hand, simple systems like Google Homepages are more like HTML editors and don’t really help with managing content.

I have a vision for something much, much simpler; but I’m not sure it’s a realistic vision. There’s always feature creep. If you implement something simple, what happens when someone wants something more complicated? Do you have to throw it out? That sucks. Do you have to implement the new feature? That’s not very simple.

Here’s what I wish existed:

  • Just edit content. Static files.
  • Probably edit content that is slightly dynamic. Probably PHP code, but without getting too fancy with the PHP. E.g., put <?php include(’header.php’) ?> at the top of each file along with a footer and some variable assignments (like $title).
  • Pages would have simple parent/child relationships.
  • Sidebar navigation would be mostly manually managed. It would be a single PHP file, like any other. The parent/child relationships would help inform and automate aspects of that generation, but only on the UI side. There would be no attempt to incorporate fancy navigation algorithms.
  • The fancy things would be done with Javascript. For instance, if you want to expand navigation depending on the context of the site. This would be easy enough to implement in Javascript, serving everyone the same content and rendering it just slightly differently to emphasize context.
  • Of course it would include the important stuff like versioning and staging. With a file-oriented system you could use an existing version control system, or just simple archiving.

At my last job we wrote a content management system, and one of the last features I added I found really clever; a kind of coding judo. We wanted to add structured page types — things like a news feature with dates summary paragraphs. Instead of adding a fancy structured page system (akin to Archetypes in Plone) I just had the system look for an edit form alongside where the page would go (which itself is just a static file). The edit form has the necessary fields, and when you save the page each field turns into a variable assignment. Then there’s a template that uses these assignments. In PHP you’d have something like this:

<? include('header.php'); ?>
  <h1><? echo $page_title; ?></h1>
  Posted on: <? echo $page_date ?><br />
  <blockquote><? echo $page_summary ?></blockquote>
  <div class="article"><? echo $page_article ?></div>
<? include('footer.php'); ?>

The edit form:

Title: <input type="text" name="title"><br />
Date: <input type="text" name="date"> (YYYY-mm-dd)<br />
Summary:<br />
<textarea name="summary"></textarea> <br />
Article:<br />
<textarea name="article"></textarea>

You would use Javascript to make this form fancy if you were so inclined; adding client-side validation (clients were trusted to submit valid data) or marking things for a WYSIWYG editor (usually just with class="wysiwyg" and rely on unobtrusive Javascript to enable it).

For new pages you just give the form. For editing a page you use htmlfill to fill in the form with existing values.

The thing that is saved looks like:

<?
# This page is automatically generated; do not edit directly!
$page_title = 'A Title';
$page_date = '2008-01-11';
$page_summary = 'A summarynetc';
$page_article = 'The Article';
include('article.php');
?>

The system would know how to parse variable assignments (to get values) and would parse the include statement to determine what editor would be used.

Extending this system required a knowledge of HTML and HTML forms, and a little knowledge of PHP (or SSIs would also work). It fit our company well, as there were quite a few people with that kind of knowledge who would interact with clients, and wouldn’t need to come to the programmers to extend the system. While I’d offer some advice on how to use the system, it was transparent enough that they could debug problems on their own.

This whole concept is not something I can justify spending time on, but it’s a CMS that I wish existed. It’s a CMS I could recommend to my friends.

Web

Comments (26)

Permalink

Documents vs. Objects

Let’s imagine we live in a late binding world, a programming world of messages, what should that look like? What does it look like in the large?

The Smalltalk notion of this is many, many independent referencable objects. Objects all the way down. So you might get something like buddies do: [ buddy | buddy sendMessage: myMessage].

At runtime this means something like:

  • Send the buddies object (some reference we obtained elsewhere) the do: message, with the argument [ buddy | buddy   sendMessage: myMessage]. Except remember that that […] is a block (aka closure), and it’s also an object. Which only makes sense, because buddies doesn’t know what myMessage means. So all buddies knows is that it gets an argument.
  • The buddies object receives this do: message, with an argument we’ll call block (that […] stuff). By convention do: is sent to sequence-like things, and calls its argument multiple times. So buddies figures out some items — maybe it’s a concrete sequence (like an Array) or maybe it looks it up in a database, or does a call over the net, or whatever.
  • Once buddies gets an item, it calls block value: item. That is, it sends a value: message back to the block object, with the argument item.
  • The block has just received that value: message. The block wouldn’t have to be that […] syntactic construct, it just usually is. So it receives that value, and locally binds the buddy variable to the argument and evaluates the inside of that block.
  • Our expression then sends the sendMessage: message to that buddy object, with another object as the argument. That argument might be a string, but it also might be markup, or a richer object that itself does all sorts of magical things. Maybe the buddy object sends the object over the net to your remote buddy, then the remote buddy calls message displayOn: myScreen, and the message object reponds to displayOn: by sending screen showText:   someString withColor: #blue. This can go on for a while.

As you can see, there can be a lot of tracing through code in this style. This has been called Ravioli code. Many people lean heavily on concepts like classes and methods to make sense of this kind of code, though if you look through my description you’ll notice they don’t show up; it is apparently somewhat frustrating for Alan Kay that he first used the term "object-oriented" when he would have preferred this message-oriented concept. Classes and methods are just implementation details. Messages and object references are the true fundamental concept.

Doing this in the large means that a lot of these messages go over the network. It means we can get a handle to an object that lives on another computer. It might result in a system that is very chatty (many systems with transparent RPC do end up being unintentionally chatty). But it doesn’t have to. You can make this asynchronous to facilitate these kinds of efficiencies.

That said, there is another distinct technique for programming in the large: document oriented programming. This is what REST is focused on: transmitting state in as complete a form as feasible, with explicit references whenever referring to another resource.

Resources and objects are in many ways similar, but in other ways completely opposite. An object has behavior but no visible state. It responds to messages, passing more objects along. Resources themselves have no behavior. They are just bare state. REST and the web kind of adds behavior, but strict REST avoids even that. PUT is not behavior. PUT is almost like a declaration of fact. When you PUT to a resource you are claiming that the new state of the resource represents the truth of that resource. Any behavior that occurs because of that is hidden. Other resources may change as a result of the PUT, but this is not exposed in any way. To find out what’s changed you simply have to refresh all the state you’ve acquired — re-GET the resources. (POST, except when used simply to create a resource, is more like a verb than a statement of truth; the HTTP model is not pure in this respect.)

For programming on a web scale resources have become the dominant technique. REST was really an acknowledgement of this idea, not the invention of it. It’s also an advocacy effort that our future work should be built on this past success. It’s been focused very clearly on advocating resources over the object/message-based systems of WS-*, but the WS-* design is hardly the first or only object based system of its kind; anything called "RPC" comes from that line of thinking. But now that WS-* isn’t king of the hill anymore maybe we can more be more thoughtful about how REST relates to other programming techniques.

A way to describe the difference in terms of programming language design is that with resources everything we do is call by value. Resources are values, and we pass them around in their complete form. You can pass around a reference value (a URI), but this is like calling a function with a pointer argument. Object/RPC techniques are call by reference (though at some level there are true "values", things like integers and strings, as things would simply be too chatty otherwise).

One interesting place where objects and resources can overlap is in functional programming. Thinking in terms of call by value or call by reference, in a functional programming language these are identical. Because values are not mutable a value and a copy of the value are equivalent. You can also pass an object between systems by reference, in whole, or some lazy compromise where the object is only incrementally communicated between the processes.

Resources are in a sense immutable regardless of your programming language. When you retrieve a resource you cannot modify it in place, as the result is no longer representative of what the resource really is. You can put a new resource in a named location (PUT), and you can copy resources (maybe making an internal copy). Incidentally this is why I think literal document models are best. It would probably be useful to have a native representation of what a "resource" really is in programming languages, though usually this is not done. A resource should really be a document (content type, some bytes, perhaps a cache of parsed representations) plus the name (URI) and the time. Resources aren’t eternal, but given a time and URI there is (in a REST world) a single canonical representation.

Further investigations of functional programming languages would probably be useful here. I believe strict functional languages have interesting metaphors for handling this kind of information; the eternal immutable sense of what a thing is at a point in time, and a way to reconcile the incomplete knowledge of what a resource is right now. I don’t think this is a clear pattern in Erlang, but is in Haskell.

One useful aspect of resources is the ability to spy on the process, to get a snapshot of it, to understand it in a kind of literal way. This is view source in a browser, or just poking around with curl. You can feel a certain confidence that you can get a real sense of what the truth is using these tools. In an object-based system you can only poke around with a process of 20 questions, hoping that there isn’t some hidden truth that you are unaware of. For instance, talking to a proxy that is almost exactly like the original object, but with some small replacement. Of course internally a server can be peculiarly coupled as well, with relationships between the resources that aren’t explicit or obvious. But at least the faces it projects are fully formed.

I still haven’t fully figured out what distinguishes a document model from an object model, but I think the distinction between the two contains some interesting stuff worth exploring.

Web
Programming

Comments (12)

Permalink