Web

WebOb decorator

Lately I’ve been writing a few applications (e.g., PickyWiki and a revisiting a request-tracking application VaingloriousEye), and I usually use no framework at all. Pylons would be a natural choice, but given that I am comfortable with all the components, I find myself inclined to assemble the pieces myself.

In the process I keep writing bits of code to make WSGI applications from simple WebOb -based request/response cycles. The simplest form looks like this:

from webob import Request, Response, exc

def wsgiwrap(func):
    def wsgi_app(environ, start_response):
        req = Request(environ)
        try:
            resp = func(req)
        except exc.HTTPException, e:
            resp = e
        return resp(environ, start_response)
    return wsgi_app

@wsgiwrap
def hello_world(req):
    return Response('Hi %s!' % (req.POST.get('name', 'You')))

But each time I’d write it, I change things slightly, implementing more or less features. For instance, handling methods, or coercing other responses, or handling middleware.

Having implemented several of these (and reading other people’s implementations) I decided I wanted WebOb to include a kind of reference implementation. But I don’t like to include anything in WebOb unless I’m sure I can get it right, so I’d really like feedback. (There’s been some less than positive feedback, but I trudge on.)

My implementation is in a WebOb branch, primarily in webob.dec (along with some doctests).

The most prominent way this is different from the example I gave is that it doesn’t change the function signature, instead it adds an attribute .wsgi_app which is WSGI application associated with the function. My goal with this is that the decorator isn’t intrusive. Here’s the case where I’ve been bothered:

class MyClass(object):
    @wsgiwrap
    def form(self, req):
        return Response(form_html...)

    @wsgiwrap
    def form_post(self, req):
        handle submission

OK, that’s fine, then I add validation:

@wsgiwrap
def form_post(self, req):
    if req not valid:
        return self.form
    handle submission

This still works, because the decorator allows you to return any WSGI application, not just a WebOb Response object. But that’s not helpful, because I need errors…

@wsgiwrap
def form_post(self, req):
    if req not valid:
        return self.form(req, errors)
    handle submission

That is, I want to have an option argument to the form method that passes in errors. But I can’t do this with the traditional wsgiwrap decorator, instead I have to refactor the code to have a third method that both form and form_post use. Of course, there’s more than one way to address this issue, but this is the technique I like.

The one other notable feature is that you can also make middleware:

@wsgify.middleware
def cap_middleware(req, app):
    resp = app(req)
    resp.body = resp.body.upper()
    return resp

capped_app = cap_middleware(some_wsgi_app)

Otherwise, for some reason I’ve found myself putting an inordinate amount of time into __repr__. Why I’ve done this I cannot say.

Programming
Python
Web

Comments (11)

Permalink

Modern Web Design, I Renounce Thee!

I’m not a designer, but I spend as much time looking at web pages as the next guy. So I took interest when I came upon this post on font size by Wilson Miner, which in turn is inspired by the 100e2r (100% easy to read) standard by Oliver Reichenstein.

The basic idea is simple: we should have fonts at the "default" size, about 16px, no smaller. This is about the size of text in print, read at a reasonable distance (typically closer up than a screen):

http://blog.ianbicking.org/wp-content/uploads/images/typesize_comparison2.jpg

Also it calls out low-contrast color schemes, which I think are mostly passe, and I will not insult you, my reader, by suggesting you don’t entirely agree. Because if you don’t agree, well, I’m afraid I’d have to use some strong words.

I think small fonts, low contrast, huge amounts of whitespace, are a side effect of the audience designers create for.

This makes me think of Modern Architecture:

http://blog.ianbicking.org/wp-content/uploads/images/300px-seagram.jpg

This is a form of architecture popular for skyscapers and other dramatic structures, with their soaring heights and other such dramatic adjectives. These are buildings designed for someone looking at the building from five hundred feet away. They are not designed for occupants. But that’s okay, because the design isn’t sold to occupants, it is sold to people who look at the sketches and want to feel very dramatic.

Similarly, I think the design pattern of small fonts is something meant to appeal to shallow observation. By deemphasizing the text itself, the design is accentuated. Low-contrast text is even more obviously the domination of design over content. And it may very well look more professional and visually pleasing. But web design isn’t for making sites visually pleasing, it is for making the experience of the content more pleasing. Sites exist for their content, not their design.

In 100e2r he also says let your text breathe. You need whitespace. If you view my site directly, you’ll notice I don’t have big white margins around my text. When you come to my site, it’s to see my words, and that’s what I’m going to give you! When I want to let my text breathe with lots of whitespace this is what I do:

http://blog.ianbicking.org/wp-content/uploads/images/500px-my-white-desktop.jpg

Is a huge block of text hard to read? It is. And yeah, I’ve written articles like that. But the solution?

WRITE BETTER

Similarly, it’s hard to read text if you don’t use paragraphs, but the solution isn’t to increase your line height until every line is like a paragraph of its own.

The solution to the drudgery of large swathes of text is:

  1. Make your blocks of text smaller.
  2. Use something other than paragraphs of text.

Throw in a list. Do some indentation. Toss in even a stupid picture. Personally I try to throw in code examples, because that’s how we roll on this blog.

That’s good writing, that’s content that is easy to read. It’s not easy to write, and I’m sure I miss the mark more often than not. But you can’t design your way to good content. If you want to write like this, if you want to let the flow of your text reflect the flow of your ideas, you need room. Huge margins don’t give you room. They are a crutch for poor writing, and not even a good crutch.

So in conclusion: modern design be damned!

HTML
Non-technical
Web

Comments (18)

Permalink

Atompub as an alternative to WebDAV

I’ve been thinking about an import/export API for PickyWiki; I want something that’s sensible, and works well enough that it can be the basic for things like creating restorable snapshots, integration with version control systems, and being good at self-hosting documentation.

So far I’ve made a simple import/export system based on Atom. You can export the entire site as an Atom feed, and you can import Atom feeds. But whole-site import/export isn’t enough for the tools I’d like to write on top of the API.

WebDAV would seem like a logical choice, as it lets you get and put resources. But it’s not a great choice for a few reasons:

  • It’s really hard to implement on the server.
  • Even clients are hard to implement.
  • It uses GET to get resources. This is probably its most fatal flaw. There is no CMS that I know of (except maybe one) where the thing you view the browser is the thing that you’d actually edit. To work around this CMSes use User-Agent sniffing or an alternate URL space.
  • WebDAV is worried about "collections" (i.e., directories). The web basically doesn’t know what "collections" are, it only knows paths, and paths are strings.
  • (In summary) WebDAV uses HTTP, but it is not of the web.

I don’t want to invent something new though. So I started thinking of Atom some more, and Atompub.

The first thought is how to fix the GET problem in WebDAV. A web page isn’t an editable representation, but it’s pretty reasonable to put an editable representation into an Atom entry. Clients won’t necessarily understand extensions and properties you might add to those entries, but I don’t see any way around that. An entry might look like:

<entry>
  <content type="html">QUOTED HTML</content>
  ... other normal metadata (title etc) ...
  <privateprop:myproperty xmlns:privateprop="URL" name="foo" value="bar" />
</entry>

While there is special support for HTML, XHTML, and plain text in Atom, you can put any type of content in <content>, encoded in base64.

To find the editable representation, the browser page can point to it. I imagine something like this:

<link rel="alternate" type="application/atom+xml; type=entry"
 href="this-url?format=atom">

The actual URL (in this example this-url?format=atom) can be pretty much anything. My one worry is that this could be confused with feed detection, which looks like:

<link rel="alternate" type="application/atom+xml"
 href="/atom.xml">

The only difference is "; type=entry", which I’m betting a lot of clients don’t pay attention to.

The Atom entries then can have an element:

<link rel="edit" href="this-url" />

This is a location where you can PUT a new entry to update the resource. You could allow the client to PUT directly over the old page, or use this-url?format=atom or whatever is convenient on the server-side. Additionally, DELETE to the same URL would delete.

This handles updates and deletes, and single-page reads. The next issue is creating pages.

Atompub makes creation fairly simple. First you have to get the Atompub service document. This is a document with the type application/atomsvc+xml and it gives the collection URL. It’s suggested you make this document discoverable like:

<link rel="service" type="application/atomsvc+xml"
 href="/atomsvc.xml">

This document then points to the "collection" URL, which for our purposes is where you create documents. The service document would look like:

<service xmlns="http://www.w3.org/2007/app"
         xmlns:atom="http://www.w3.org/2005/Atom">
  <workspace>
    <atom:title>SITE TITLE</atom:title>
    <collection href="/atomapi">
      <atom:title>SITE TITLE</atom:title>
      <accept>*/*</accept>
      <accept>application/atom+xml;type=entry</accept>
    </collection>
  </workspace>
</service>

Basically this indicates that you can POST any media to /atomapi (both Atom entries, and things like images).

To create a page, a client then does a POST like:

POST /atomapi
Content-Type: application/atom+xml; type=entry
Slug: /page/path

<entry xmlns="...">...</entry>

There’s an awkwardness here, that you can suggest (via the Slug header) what the URL for the new page is. The client can find the actual URL of the new page from the Location header in the response. But the client can’t demand that the slug be respected (getting an error back if it is not), and there’s lots of use cases where the client doesn’t just want to suggest a path (for instance, other documents that are being created might rely on that path for links).

Also, "slug" implies… well, a slug. That is, some path segment probably derived from the title. There’s nothing stopping the client from putting a complete path in there, but it’s very likely to be misinterpreted (e.g. translating /page/path to /2009/01/pagepath).

Bug I digress. Anyway, you can post every resource as an entry, base64-encoding the resource body, but Atompub also allows POSTing media directly. When you do that, the server puts the media somewhere and creates a simple Atom entry for the media. If you wanted to add properties to that entry, you’d edit the entry after creating it.

The last missing piece is how to get a list of all the pages on a site. Atompub does have an answer for this: just GET /atomapi will give you an Atom feed, and for our purposes we can demand that the feed is complete (using paging so that any one page of the feed doesn’t get too big). But this doesn’t seem like a good solution to me. GData specifies a useful set of queries to for feeds, but I’m not sure that this is very useful here; the kind of queries a client needs to do for this use case aren’t things GData was designed for.

The queries that seem most important to me are queries by page path (which allows some sense of "collections" without being formal) and by content type. Also to allow incremental updates on the client side, filtering these queries by last-modified time (i.e., all pages created since I last looked). Reporting queries (date of creation, update, author, last editor, and custom properties) of course could be useful, but don’t seem as directly applicable.

Also, often the client won’t want the complete Atom entry for the pages, but only a list of pages (maybe with minimal metadata). I’m unsure about the validity of abbreviated Atom entries, but it seems like one solution. Any Atom entry can have something like:

<link rel="self" type="application/atom+xml; type=entry"
 href="url?format=atom" />

This indicates where the entry exists, though it doesn’t suggest very forcefully that the actual entry is abbreviated. Anyway, I could then imagine a feed like:

<feed>
  <entry>

    <content type="some/content-type" />
    <link rel="self" href="..." />
    <updated>YYYYMMDDTHH:MM:SSZ</updated>
  <entry>
  ...
</feed>

This isn’t entirely valid, however — you can’t just have an empty <content> tag. You can use a src attribute to use indirection for the content, and then add Yet Another URL for each page that points to its raw content. But that’s just jumping through hoops. This also seems like an opportunity to suggest that the entry is incomplete.

To actually construct these feeds, you need some way of getting the feed. I suggest that another entry be added to the Atompub service document, something like:

<cmsapi:feed href="URI-TEMPLATE" />

That would be a URI Template that accepted several known variables (though frustratingly, URI Templates aren’t properly standardized yet). Things like:

  • content-type: the content type of the resource (allowing wildcards like image/*)
  • container: a path to a container, i.e., /2007 would match all pages in /2007/…
  • path-regex: some regular expression to match the paths
  • last-modified: return all pages modified at the given date or later

All parameters would be ANDed together.

So, open issues:

  • How to strongly suggest a path when creating a resource (better than Slug)
  • How to rename (move) or copy a page (it’s easy enough to punt on copy, but I’d rather move by a little more formal than just recreating a resource in a new location and deleting the original)
  • How to represent abbreviated Atom entries

With these resolved I think it’d be possible to create a much simpler API than WebDAV, and one that can be applied to existing applications much more easily. (If you think there’s more missing, please comment.)

HTML
Programming
Web

Comments (26)

Permalink

Avoiding Silos: “link” as a first-class object

One of the constant annoyances to me in web applications is the self-proclaimed need for those applications to know about everything and do everything, and only spotty ad hoc techniques for including things from other applications.

An example might be blog navigation or search, where you can only include data from the application itself. Or "Recent Posts" which can only show locally-produce posts. What if I post something elsewhere? I have to create some shoddy placeholder post to refer to it. Bah! Underlying this the data is usually structured in a specific way, with the HTML being a sort of artifact of the database, the markup transient and a slave to the database’s structure.

An example of this might be a recent post listing like:

<ul>
  for post in recent_posts:
    <li>
      <a href="/post/{{post.year}}/{{post.month}}/{{post.slug}}">
        {{post.title}}</a>
    </li>
</ul>

There’s clearly no room for exceptions in this code. I am thus proposing that any system like this should have the notion of a "link" as a first-class object. The code should look like this:

<ul>
  for post in recent_posts:
    <li>
      {{post.link()}}
    </li>
</ul>

Just like with changing IDs to links in service documents, the template doesn’t actually look any more complicated than it did before (simpler, even). But now we can use simple object-oriented techniques to create first-class links. The code might look like:

class Post(SomeORM):
    def url(self):
        if self.type == 'link':
            return self.body
        else:
            base = get_request().application_url
            return '%s/%s/%s/%s' % (
                base, self.year, self.month, self.slug)

    def link(self):
        return html('<a href="%s">%s</a>') % (
            self.url(), self.title)

The addition of the .url() method has the obvious effect of making these offsite links work. Using a .link() method has the added advantage of allowing things like HTML snippets to be inserted into the system (even though that is not implemented here). By allowing arbitrary HTML in certain places you make it possible for people to extend the site in little ways — possibly adding markup to a title, or allowing an item in the list that actually contains two URLs (e.g., <a href="url1">Some Item</a> (<a href="url2">via</a>)).

In the context of Python I recommend making these into methods, not properties, because it allows you to later add keyword arguments to specialize the markup (like post.link(abbreviated=True)).

One negative aspect of this is that you cannot affect all the markup through the template alone, you may have to go into the Python code to change things. Anyone have ideas for handling this problem?

HTML
Programming
Python
Web

Comments (13)

Permalink

Javascript Status Message Display

In a little wiki I’ve been playing with I’ve been trying out little ideas that I’ve had but haven’t had a place to actually implement them. One is how notification messages work. I’m sure other people have done the same thing, but I thought I’d describe it anyway.

A common pattern is to accept a POST request and then redirect the user to some page, setting a status message. Typically the status message is either set in a cookie or in the session, then the standard template for the application has some code to check for a message and display it.

The problem with this is that this breaks all caching — at any time any page can have some message injected into it, basically for no reason at all. So I thought: why not do the whole thing in Javascript? The server will set a cookie, but only Javascript will read it.

The code goes like this; on the server (easily translated into any framework):

resp.set_cookie('flash_message', urllib.quote(msg))

I quote the message because it can contain characters unsafe for cookies, and URL quoting is a particularly easy quoting to apply.

Then I have this Javascript (using jQuery):

$(function () {
    // Anything in $(function...) is run on page load
    var flashMsg = readCookie('flash_message');
    if (flashMsg) {
        flashMsg = unescape(flashMsg);
        var el = $('<div id="flash-message">'+
          '<div id="flash-message-close">'+
          '<a title="dismiss this message" '+
          'id="flash-message-button" href="#">X</a></div>'+
          flashMsg + '</div>');
        $('a#flash-message-button', el).bind(
          'click', function () {
            $(this.parentNode.parentNode).remove();
        });
        $('#body').prepend(el);
        eraseCookie('flash_message');
    }
});

Note that I’ve decided to treat the flash message as HTML. I don’t see a strong risk of injection attack in this case, though I must admit I’m a little unclear about what the normal policies are for cross-domain cookie setting.

I use these cookie functions because oddly I can’t find cookie handling functions in jQuery. It’s always weird to me how primitive document.cookie is. Anyway, CSS looks like this:

#flash-message {
  margin: 0.5em;
  border: 2px solid #000;
  background-color: #9f9;
  -moz-border-radius: 4px;
  text-align: center;
}

#flash-message-close {
  float: right;
  font-size: 70%;
  margin: 2px;
}

a#flash-message-button {
  text-decoration: none;
  color: #000;
  border: 1px solid #9f9;
}

a#flash-message-button:hover {
  border: 1px solid #000;
  background-color: #009;
  color: #fff;
}

This doesn’t have non-Javascript fallback, but I think that’s okay. This isn’t something that a spider would ever see (since spiders shouldn’t be submitting forms that result in update messages). Accessible browsers generally implement Javascript so that’s also not particularly a problem, though there may be additional hints I could give in CSS or Javascript to help make this more readable (if there’s a message, it should probably be the first thing read on the page).

Another common component of pages that varies separate from the page itself is logged-in status, but that’s more heavily connected to your application. Get both into Javascript and you might be able to turn caching way up on a lot of your pages.

Javascript
Programming
Web

Comments (13)

Permalink

Where Next For Plone Development?

I attended PloneConf 2008 recently, to talk about Deliverance. But I’ll talk here more about my observations as a relative outsider to that community. (This post is really written for the Plone community — if you aren’t familiar with Plone this post probably won’t make much sense or be very useful.)

One of the ongoing concerns in the Plone community has been the difficulty of attracting and maintaining developer interest in the community, and generally making Plone easier to work with as a developer. It’s been a very successful community for consulting and integration work, but it has not been as rewarding for developers. There’s the idea of a "Plone Tax", which means different things to different people but just generally speaks to the sense that Plone takes a little off the top — time to restart, time to run the tests, time to dig into code that just goes too deep. There are some distinct problems with Plone, but probably the biggest problem is the quantity of small challenges and the general size and complexity of the system.

At the moment there is no clear path forward to resolve this. A previous effort to fix things is a project called "Five", which referred to Zope 2+3 — backporting libraries and techniques from Zope 3 into Zope 2. Plone is intimately tied to Zope 2, and Five let them use new code without having to do abandon the old environment. But the result wasn’t terribly satisfying: Five didn’t remove any complexity, it only added to it. Even if the new components were superior that doesn’t make them simple. Plone is aching for more simplicity, not more power.

It’s unclear how Plone can actually reform itself as a codebase. Zope 2 is a behemoth, and its metaphors are deeply intertwined with existing code. Acquisition in particular is ubiquitous, essential to lots of the machinery, and deeply confusing. I saw a general interest in two directions: one was to encourage non-content-management tasks to be implemented in a complementary (but separate) technology, another direction is continuing to refactor the existing codebase while somehow trying to maintain backward compatibility.

It’s unclear how to refactor the existing codebase in such a way that it is any simpler, but I suppose these two directions are not exclusive. I want to focus on the idea of a complementary environment. There’s two products in particular that have been attracting interest: Grok and Repoze.

These two products usually go under the umbrella of "exciting new ways to improve the developer experience in Plone" which is a kind of generic positive sentiment. But to my mind they represent two significantly different paths forward, and the differences deserve some more critical thought.

Grok is a layer on top of Zope 3 that attempts to make development in that environment more pleasing. It eliminates most ZCML (ZCML is Zope 3’s XML-based language for declaring the relation of various components in a system). Grok uses conventions and introspection to make Zope 3 look more like a traditional web framework, with simple views and models and templates, and less of the wiring you have to set up in typical Zope 3 architectures. At the same time, you can add all the Zope 3 declarations to break out of the automatic conventions. Grok is a more pleasant layer on top of Zope 3, but it’s entirely focused on Zope 3, and it is led by Martijn Faassen and Philipp von Weitershausen who are both very involved with the Zope 3 community.

Repoze is a more recent project, led by Chris McDonough, Tres Seaver, and Paul Everitt (well, Paul might call himself more of a cheerleader). They all work together at Agendaless Consulting. They’ve been highly involved in Zope 2 for a long time, and are all former employees of Zope Corp and major contributors to Zope 2. I don’t think they’ve ever quite made the jump to Zope 3, and their consulting and experience kept them involved with Zope 2. A while back they got some WSGI religion and started splitting out some pieces of Zope into WSGI middleware and other independent libraries. This included things like pulling out the transaction handler from Zope, reimplementing the Zope 2 publisher so it was more WSGIish, and a variety of other libraries. These libraries are essentially extractions from Zope, or in the case of the Zope publisher and repoze.plone, a way to wrap what would be considered a "legacy" application in the same interface as other newer pieces. Having extracted nearly everything they wanted, they’ve started work on a framework intended to be familiar to the Zope community, repoze.bfg. This framework uses some Zope 3 concepts, but it’s more built from scratch than it is built from Zope 3, and it is attached to Zope 2 only insofar as it uses the pieces they’ve extracted and the ideas they’ve become comfortable with. They’ve described it as the framework they want to use when someone asks them to build something in Zope 2.

Grok and Repoze have a significantly different development methodology. One is a layer on Zope 3, the other is an extraction of ideas from Zope 2 (and a few from Zope 3). In part I think the distinction hasn’t been presented very clearly because the Repoze and Grok communities overlap a great deal, and everyone is quite congenial with each other, and they are reluctant to enter debates about the designs. While I also consider them all colleagues, I also feel pretty strongly about the design differences and I feel a discussion contrasting them is important.

Martijn recently wrote a post on why Plone should consider Grok and a follow-up post. These posts speak to a variety of advantages Grok has over plain Zope 3, which Grok can offer to Plone to manage its existing use of Zope 3 technologies.

I think Plone shouldn’t be so focused on managing the complexity of its stack, but focus on reducing that complexity. And it should reduce that complexity by focusing on content management and moving all the other pieces people have built on Plone out of Plone. It can’t just leave people hanging, which is why developing a clear story for how those other pieces should be developed is essential. Plone the community doesn’t have to map one-to-one to Plone the software. Plone the software should become smaller and more focused. Plone the community doesn’t have to become more focused — the community does what it needs to do, what customers ask for, what is necessary to make a site compelling. To be clear on what I’m proposing here: Plone should have a community-recommended way to build non-Plone applications. This doesn’t mean you have to use those techniques, but it should be much more concrete than just "there’s a bunch of cool things out there, and maybe you should look around and use one of those." By having a community-recommended pattern of development you can maintain and build on the Plone community, which is at least as important an asset as the Plone software.

With this in mind I believe an extraction of Zope 2 and Plone ideas is the right path forward. Extraction is the process of isolating code and ideas, and localizing the effect of that code. These are some of the most important ways to actually increase the simplicity of a codebase. In the Zope 2 (and even 3) codebase the thing that brings the most complexity is the non-locality of code, that parts of the system can effect each other in complex and unexpected ways. Zope 3 formalizes these patterns of non-locality: the Component Architecture is largely about introducing non-localized relationships between code.

Arguably this non-locality of effect is exactly what people want, it’s what enables pervasive customizations. When a client asks you to change some little piece deep in the system, you don’t really want to modify the system’s code — you want to add a little code to the outside of the system that effects the change you desire. The Component Architecture is a formalized way of making these kinds of changes, where Acquisition was a lower-level mechanism to do the same sort of thing. I remain a Component Architecture skeptic, mostly because I think the mechanism is overused. Still I think clearly Plone needs flexibility that a purely bespoke application would not require. But there’s no way out of it: that flexibility has a high cost. In this there is no clear solution. In addition to Acquisition, the complexity of Zope 2 security, the many layers of skinning… is the Component Architecture what Plone needs? From where I stand it seems like a step further in the wrong direction… and yes, it is a better placed step than any of the ones before, but if it’s still a step in the wrong direction does that matter?

I think Plone (at least the community) needs to be conservative in its enabling of these customizations. Sure, the customizability is "powerful," but I don’t hear people clamoring for power. They want simple, predictable, fast, maintainable. That’s not necessarily the same as "easy" — I think sometimes it’s worth making things a little longer and less automatic to make a system more explicit and make code more localized. I think Repoze is a step in this direction, a big (maybe even an intimidatingly giant) step towards simplicity. That’s the step I think Plone should make.

So, I offer this as my suggestion to Plone. I think Plone-the-community has the opportunity to be more than Plone-the-software; I think it must do this to remain viable in the long term. But to get there the community make some choices — you can’t add simplicity.

Python
Web
Zope/Plone

Comments (8)

Permalink

Hypertext-driven URLs

Roy T. Fielding, author of the REST thesis wrote an article recently: REST APIs must be hypertext-driven. I liked this article, it fit with an intuition I’ve had. Then he wrote an article explaining that he wouldn’t really explain the other articles because, I guess, he wanted a conversation with the specialists, and it seems like a kind of invitation to reinterpret his writing. So since others are doing it I figured I’d do it too.

I’d summarize his argument thus:

  • Focus on media types, i.e., resource formats, i.e., document formats. The protocol will flow from these if they are well specified.
  • URL structures are not a media type. They are some kind of server layout. You can’t hold them, you can’t pass them around, there is no notion of CRUD. Media types have all sorts of advantages that URL structures do not.

An example of a protocol based on a URL structure would be something like:

  • Do GET /articles/ to get a JSON list of all the article ids, with a response like [1, 2, 3]
  • Do a GET /articles/{id} to get the representation of a specific article.

JSON is a reasonable structure for a media type. It is not itself a fully explained type, because it’s just a container for data, just like XML. In this example you have a document, [1, 2, 3] which isn’t self-describing and just isn’t very useful. A more appropriate protocol would be:

  • You start with a container, in our example /articles/. Do GET /articles/ to get a JSON document listing the URLs of all the articles. These URLs are relative to the container URL. You’ll get a response like ['./1', './2', './3'] (actually ['1', '2', '3'] would be fine too).
  • Do GET {article-url} to get the article representation.

It’s a small difference. Heck, the communication could look identical in practice, but by putting URLs in the JSON document instead of this abstract "id" notion you’ve created a more flexible and self-describing system. You could probably give a name to that list of URLs, and then just talk about that name.

An example in Atompub is rel="edit". An Atom entry can look like:

<entry>...
  <link rel="edit" href="/post/15" />
</entry>

Instead of the client just somehow knowing where to go to edit an entry, it’s made explicit. Thus you can move the entry around, while still pointing back to the canonical location to edit that entry.

There’s nothing really that complicated about this, the rule is really quite simple: link to other things, don’t just expect the client to know or guess where those other things are.

For a more concrete example of where this linking works well, OpenID uses <link rel="openid.server" href="…"> and <link rel="openid.delegate" href="…">, which allows you to add a little information to any HTML homepage so that the login can happen at a third location. If OpenID used something like looking at {homepage}/openid for a OpenID server then you couldn’t select whatever OpenID service you liked, or change services, or apply OpenID to hosted locations where you couldn’t install an OpenID server.

I’ll add my own little opinion in here: this is why the URL structure of applications doesn’t affect their RESTfulness, nor is URL structure all that important of a concern generally. Pretty URL structures are a nice thing to do, like indenting your code in a pleasant way, but it has nothing to do with your API, and if you can’t use a crappy URL structure with that same API then probably something is wrong with that API.

Programming
Web

Comments (13)

Permalink

The Philosophy of Deliverance

I’ll be attending PloneConf this year again, giving a talk about Deliverance. I’ve been working on Deliverance lately for work, but the hard part about it is that it’s not obviously useful. To help explain it I wrote the philosophy of Deliverance, which I will copy here, to give you an idea of what I’ve been doing:

Why is Deliverance? Why was it made, what purpose does it serve, why should you use it, how can it change the way you do web development?

On the Subject of Platforms

Right now we live in an age of platforms. Developers (or management or coincidence) decides on a platform, and that serves as the basis for all future development. Usually there’s some old things from a previous platform (or a primordial pre-platform age: I’m looking at you formmail.pl!) The goal is always to eliminate all of these old pieces, rewriting them for the new platform. That goal is seldom attained in a timely manner, and even before it is accomplished you may be moving to the next platform.

Why do you have to port everything forward to the newest platform? Well, presumably it is better engineered. The newest platform is presumably what people are most familiar with. But if those were the only reasons it would be hard to justify a rewrite of working software. Often the real push comes because your systems don’t work together. It’s hard to keep templates in sync across all the platforms. Multiple logins may be required. Navigation is inconsistent and incomplete. Functionality that cross-cuts pages — comments, login status, shopping cart status, etc — isn’t universally available.

A similar conflict arises when you consider how to add new functionality to a site. For example, you may want to add a blog. Do you:

  1. Use the best blogging software available?
  2. Use something native to your platform?
  3. Write something yourself?

The answer is probably 2 or 3, because it would be too hard to integrate something foreign to your platform. This form of choice means that every platform has some kind of "blog", but the users of that blog are likely to only be a subset of the users of the parent platform. This makes it difficult for winners to emerge, or for a well-developed piece of software to really be successful. Platform-based software is limited by the adoption of the platform.

Not all software has a platform. These tend to be the most successful web applications, things like Trac, WordPress, etc.

"Aha!" you think "I’ll just use those best-of-breed applications!" But no! Those applications themselves turn into platforms. WordPress is practically a CMS. Trac too. Extensible applications, if successful, become their own platform. This is not to place blame, they aren’t necessarily any worse than any other platform, just an acknowledgment that this move to platform can happen anywhere.

Beyond Platforms, or A Better Platform

One of the major goals of Deliverance is to move beyond platforms. It is an integration tool, to allow applications from different frameworks or languages to be integrated gracefully.

There are only a few core reasons that people use platforms:

  1. A common look-and-feel across the site.
  2. Cohesive navigation.
  3. Indexing of the entire site.
  4. Shared authentication and user accounts.
  5. Cross-cutting functionality (e.g., commenting).

Deliverance specifically addresses 1, providing a common look-and-feel across a site. It can provide some help with 2, by allowing navigation to be more centrally managed, without relying purely on per-application navigation (though per-application navigation is still essential to navigating the individual applications). 3, 4, and 5 are not addressed by Deliverance (at least not yet).

Deliverance applies a common theme across all the applications in your site. It’s basic unit of abstraction is HTML. It doesn’t use a particular templating language. It doesn’t know what an object is. HTML is something every web application produces. Deliverance’s means of communication is HTTP. It doesn’t call functions or create request objects [*]. Again, everything speaks HTTP.

Deliverance also allows you to include output from multiple locations. In all cases there’s the theme, a plain HTML page, and the content, whatever the underlying application returns. You can also include output from other parts of the site, most commonly navigation content that you can manage separately. All of these pieces can be dynamic — again, Deliverance only cares about HTML and HTTP, it doesn’t worry about what produces the response.

This is all very similar to systems built on XSLT transforms, except without the XSLT [†], and without XML. Strictly speaking you can apply XSLT to any parseable markup, even HTML, but the most common (or at least most talked about) way to apply XSLT is using "semantic" XML output that is transformed into HTML. Deliverance does not try to understand the semantics of applications, and instead expects them to provide appropriate presentation of whatever semantics the underlying application possesses. Presentation is more universal than semantics.

While Deliverance does its best to work with applications as-they-exist, without making particular demands on those applications, it is not perfect. Conflicting CSS can be a serious problem. Some applications don’t have very good structure to work with. You can’t generate any content in Deliverance, you can only manipulate existing content, and often that means finding new ways to generate content, or making sure you have a place to store your content (as in the case of navigation). This is why arguably Deliverance does not remove the need for a platform, but is just its own platform. In so far as this is true, Deliverance tries to be a better platform, where "better" is "more universal" rather than "more powerful". Most templating systems are more powerful than Deliverance transformations. It can be useful to have access to the underlying objects used to procude the markup. But Deliverance doesn’t give you these things, because it only implements things that can be applied to any source of content. Static files are entirely workable in Deliverance, just as any application written in Python, PHP, or even an application hosted on an entirely separate service is usable through Deliverance.

The Missing Parts

As mentioned before, two important benefits of a platform are missing from Deliverance. I’ll try to describe what I believe are the essential aspects. I hope at some time that Deliverance or some complementary application will be able to satisfy these needs. Also, I suggest some lines of development that might be easier than others.

Indexing The Entire Site

Typically each application has a notion of what all the interesting pages in that application are. Most applications have a set of uninteresting pages, or transient pages. A search result is transient, as an example. An application also knows when new pages appear, and when other pages disappear. A site-wide index of these pages would allow things like site maps, cross-application search, and cross-application reporting to be done.

An interesting exception to the knowledge an application has of itself: search results are generally boring. But a search result based on a category might still be interesting. The difference between a "search" and a "report" is largely in the eye of the beholder. An important feature is that the application shouldn’t be the sole entity allowed to mark interesting pages. Manually-managed lists of resources that may point to specific applications can allow people to usefully and easily tweak the site. Ideally even fully external resources could be included, such as a resource on an entirely different site.

To do indexing you need both events (to signal the creation, update, or deletion of an entity/page), and a list of entities (so the index can be completely regenerated). A simple way of giving a list of entities would be the Google Site Map XML resource. Signaling events is much more complex, so I won’t go into it in any greater depth here, but we’re working on a product called Cabochon to handle events.

One thing that indexing can provide is a way to use microformats. Right now microformats are interesting, but for most sites they are largely useless. You can mark up your content, but no one will do anything interesting with that markup. If you could easily code up an indexer that could keep up-to-date on all the content on your site, you could produce interesting results like cross-application mapping.

Shared Authentication And User Accounts

Authentication is one of the most common and annoying integration tasks when crossing platform boundaries. Systems like Open ID offer the ability to unify cross-site authentication, but they don’t actually solve the problem of a single site with multiple applications.

There is a basic protocol in HTTP for authentication, one that is workable for a system like Deliverance, and there are already several existing products (like repoze.who) that work this way. It works like this:

  • The logged-in username is sent in some header, e.g., X-Remote-User. Some kind of signing is necessary to really trust this header (Deliverance could filter out that header in incoming requests, but if you removed Deliverance from the stack you’d have a security hole).
  • If the user isn’t logged in, and the application wants them to log in, the application response with a 401 Unauthorized response. It is supposed to set the WWW-Authenticate header, probably to some value indicating that the intermediary should determine the authentication type. In some cases a kind of HTTP authentication is required (typically Basic or Digest) because cookie-based logins are too stateful (e.g., in APIs, or for WebDAV access).
  • The intermediary catches the 401 and initiates the login process. This might mean a redirect to a login page, and setting a cookie on successful login. The login page and setting the cookie could potentially be done by an application outside of the intermediary; the intermediary only has to do the appropriate redirects and setting of headers.
  • In the case when a user is logged in but isn’t permitted, the application simply sends a 403 Forbidden response. The intermediary shouldn’t actually do anything in this case (though maybe it could usefully add a logout link to that message). I only mention this because some systems use 401 for Forbidden, which causes no end of problems.

While some applications allow for this kind of authentication scheme, many do not. However, the scheme is general enough that I think it is justifiable that applications could be patched to work like this.

This handles shared authentication, but the only information handed around is a username. Information about the user — the real name, email, homepage, permission roles, etc — are not shared in this model.

You could add something like an internal location to the username. E.g.: X-Remote-User: bob; info_url=http://mysite.com/users/bob.xml. It would be the application’s responsibility to make a subrequest to fetch that information. This can be somewhat inefficient, though with appropriate caching perhaps it would be fine. But many applications want very much to have a complete record of all users. Changing this is likely to be much harder than changing the authentication scheme. A more feasible system might be something on the order of what is described in Indexing the Entire Site: provide a complete listing of the site as well as events when users are created, updated, or deleted, and allow applications to maintain their own private but synced databases of users.

A common permission system is another level of integration. One way of handling this would be if applications had a published set of actions that could be performed, and the person integrating the application could map actions to roles/groups on the system.

Cross-cutting Functionality

This item requires a bit of explanation. This is functionality that cuts across multiple parts of the site. An example might be comments, where you want a commenting system to be applicable to a variety of entities (though probably not all entities). Or you might want page-update notification, or to provide a feed of changes to the entity.

You might also want to include some request logger like Google Analytics to all pages, but this is already handled well by Deliverance theming. Deliverance’s aggregation handles universal content well, but it doesn’t handle content (or subrequests) that should only be present in a portion of pages.

One possible way to address this is transclusion, where a page can specifically request some other resource to be included in the page. A simple subrequest could accomplish this, but many applications make it relatively easy to include some extra markup (e.g., by editing their templates) but not so easy to do something like a subrequest. We’ve written a product Transcluder to use an HTML format to indicate transclusion.

It’s also possible using Deliverance that you could implement this functionality without any application modification, though it means added configuration — an application written to be inserted into a page via Deliverance, and a Deliverance rule that plugs everything together (but if written incorrectly would have to be debugged).

Other Conventions

In addition to this, other platform-like conventions would make the life of the integrator much easier.

Template Customization

While Deliverance handles the look-and-feel of a page, it leaves the inner chunk of content to the application. If you want to tweak something small you will still need to customize the template of the application.

It would be wonderful if applications could report on what files were used in the construction of a request, and used a common search path so you could easily override those files.

Backups and Other Maintenance

Process management can be handled by something like Supervisor, and maybe in the future Deliverance will even embed Supervisor.

But even then, regular backups of the system are important. Typically each application has its own way of producing a backup. Conventions for producing backups would be ideal. Additional conventions for restoring backups would be even better.

Many systems also require periodic maintenance — compacting databases, checking for any integrity problems, etc. Some unified cron-like system might be handy, though it’s also workable for applications to handle this internally in whatever ad hoc way seems appropriate.

Common Error Reporting

With a system where one of many components can fail, it’s important to keep track of these problems. If errors just end up in one of 10 log files, it’s unlikely anyone is closely tracking them.

One product we’re working on to help with this is ErrorEater, which works along with Supervisor. Applications have to be modified to emit errors in a specific format that Supervisor understands, but this is generally not too difficult.

Farming

Application farming is when one instance of an application can support many "sites". These might be sites with their own domains, or just distinct projects. Examples are Trac, which supports multiple projects in one instance, or WordPress MU which supports many WordPress instances running off a single database and code base.

It would be nice if you could add a simple header to a request, like X-Project-Name: foo and that would be used by all these products to select the site (or sub-site or project or any other organization unit). Then mapping domain names, paths, or other aspects of a request to the project could be handled once and the applications could all consistently consume it.

(Internally for openplans.org we’re using X-OpenPlans-Project and custom patches to several projects to support this, but it’s all ad hoc.)

Footnotes

[*] This isn’t entirely true, Deliverance internally uses WSGI which is a Python-level abstraction of HTTP calls.
[†] At different times in the past, in an experimental branch right now, and potentially integrated in the future, Deliverance has been compiled down to XSLT rules. So Deliverance could be seen even as an simple transformation language that compiles down to XSLT.

HTML
Programming
Python
Web

Comments (2)

Permalink

My Experience Writing a Build System

Lately there’s been some interest in build processes among various people — Vellum was announced a while back, Ben has been looking for a tool and looking at Fabric, and Kevin announced Paver. At the same time zc.buildout is starting to gain some users outside of the Zope world, and I noticed Minitage as an abstraction on top of zc.buildout.

A while ago I started working on a build project for Open Plans called fassembler. I think the result has been fairly successful and maintainable, and I thought I’d share some of my own reflections on that tool.

Update: what we were trying to accomplish

I didn’t make it clear in the post just what we were trying to do, and what this build system would accomplish.

Our site (openplans.org) is made up of several separate servers with an HTML-rewriting proxy on the front end. We have a Zope server running a custom application, Apache running WordPress MU, and some servers running Pylons or other Python web applications for portions of our site. We needed a way to consistently reproduce this entire stack, all the pieces, plugged together so that the site would actually work. Two equally important places where we had to reproduce the stack are for developer rigs and the production site.

Our code is primarily Python and we use a lot of libraries, developed both internally and externally. Setting up the site is primarily a matter of installing the right libraries and configuration and setting up any databases (both a ZODB databases and several MySQL databases). We use a few libraries written in C, but distutils handles the compilation of those pretty transparently.

For this case we really don’t care about build tools that focus on compilation. We don’t care about careful dependency tracking because we are compiling very little software.

make doesn’t make sense

Update 2: If you think the make model makes lots of sense, read the preceding section — it makes sense for a different problem set than what we’re doing.

We initially had a system based on BuildIt, which is kind of like make with Python as the control code. It wasn’t really a good basis for our build tool, and I think it added a lot of confusion, compounded by the fact that we weren’t quite sure what we wanted our build to do. Ultimately I think the make model of building doesn’t make sense.

The make model is based on the idea that you really want to save work. So you detect changes and remake things only as necessary. For compilation this might make sense, because you edit code and recompile a lot and it’s tedious to wait. But we are building a website, and installing software, and none of that style of efficiency matters. make-style detection of work to be done doesn’t even save any time. But it does make the build more fragile (e.g., if you define a dependency incorrectly) and much harder to understand, and you constantly find yourself wiping the build and starting from scratch because you don’t trust the system.

The metaphor for the new build system was much simpler: do a list of things, top to bottom. There’s no effort into detecting changes in the build, or changes in the settings, or anything else.

Do things carefully

In the build system almost all actions go through the filemaker module. This is kind of a file abstraction library. But the goals are entirely different than convenience: the goal is transparency and safety. In contrast Paver uses path.py for convenience, but I’m not sure what the win would be if we used a model like that.

filemaker itself is heavily tied to the framework that it’s written for, specifically user interaction and logging. Most tasks just do things, and rely on filemaker to detect problems and ask the user questions. For example, every time a file is written, it checks if the file exists, and if it has the same content. If it exists with other content, it asks the user about what to do. It doesn’t overwrites files without asking (at least by default). I think this makes the tool more humane as the default behavior for a build is to be careful and transparent. The build author has to go out of their way to make things difficult.

Many zc.buildout recipes will blithely overwrite all sorts of files which always made me very uncomfortable with the product. It’s the recipes in zc.buildout which do this, not the buildout framework itself, but because buildout made overwriting the easy thing to do, and didn’t start with humane conventions or tools, this behavior is the norm.

What I think filemaker most accomplished was the ability to do file operations while also asserting the expected state of the system, and so makes build bugs noticeable earlier instead of getting a build process that finishes successfully but creates a buggy build, or having an exception show up far from where the error was originally introduced.

Also, because it won’t overwrite your work in progress this has saved the build from engendering deep feelings of hatred in cases when it might overwrite your work in progress. It’s hard to detect this absence of hatred, but I know that I’ve felt it with other systems.

Update: a corollary: ignore no errors

One question you might wonder about: why not a shell script? We did prototype some things as shell scripts, but we’ve consistently moved to Python at some point, even things that seemed really trivial. The problem with shell scripts is they have horribly bad behavior with respect to errors. Ignoring errors is really really easy, noticing errors is really hard.

This is absolutely unacceptable for builds. Builds must not ignore errors. The build may mostly work despite an error. It might be totally broken, but the error message is lost in all sorts of useless output. The error message probably makes no sense. The context is lost. No suggestion is given to the user.

When builds work, that’s great. Build do not always work. They always fail sometimes, and some poor sucker (usually in some hot potato-like arrangement) has to figure out what went wrong. You have to plan for these problems.

Everything in the build tries to be careful about errors. All places where it is not, it is a bug. The resolution isn’t to see something appear to work, but create a broken build, and say "oh, you forgot to set X". The resolution is to make sure when you forget to set X it gives you an error that tells you to set X.

This is one of the more important and more often ignored principles of a good build/deployment system. Maybe it’s gotten better, but when I first used zc.buildout (very early in its development) the poor handling of errors was by far the biggest problem and it left me with a bad taste in my mouth. easy_install and setuptools in general is also very flawed in this respect.

Log interesting things

I tried to make a compromise between logging very verbosely, and being too quiet. As a user, I want to see everything interesting and leave out everything boring. Determining interesting and boring can be a bit difficult, but really just require some attention and tweaking.

To make it possible to visually parse the output of the tool I found both indentation and color to be very useful. Indentation is used to represent subtasks, and color to make sections and warnings stand out.

The default verbosity setting is not to be completely quiet. Silence is a Unix convention that just doesn’t work for build tools. Silence gets you interactions like this:

$ build-something target-directory/
(much time passes)
Error: cannot write /home/ianb/builds/20080426/target-directory/products/AuxInput/auxinput/config/configuration.xml

Why did it want to write that file? Why can’t it write that file? Is the build buggy? Did I misconfigure it? Does the directory exist?

The typical way of handling this is either to run the build again with logging setup or otherwise make it more verbose, or to get in the habit of always running it verbose.

Mixing code and configuration

BuildIt, which we were using before, had the ability to put variables in settings, and you could read an option from another section with something like ${section/option}. It was limited to simple (but recursive) variable substitution, and had some clever but very confusing rules that created a kind of inheritance.

I liked the ability to do substitution, but wasn’t happy with the compromise BuildIt made. I wasted a lot of time trying to figure out the context of substitutions. So, I saw two directions. One was to remove the cleverness and just do simple substitution. This is the choice zc.buildout made. The other was to go whole-hog. With a bit of trepidation I decided to to go for it, and I made the choice to treat all configuration settings as Tempita templates. All configuration is generally accessed via config.setting_name, and that lazily interpolates the setting (it took me quite a while to figure out how to avoid infinite loops of substitution). Because evaluation is done lazily settings can depend on each other and be overridden and have lots of code in defaults (e.g., a default that is calculated based on the value of another setting), and it works out okay. Most settings just ended up having a smart default, and as a result very little tweaking of the configuration is necessary.

Somewhat ironically the result was a kind of atrophying of the settings, because no one actually set them, instead we just tweaked the defaults to get it right. Now I’m not entirely sure what exactly the "settings" are setting, or who they should really belong to. To the build? To the tasks? While this is conceptually confusing, in practice it isn’t so bad. This mixing of code and configuration has been distinctly useful, and not nearly as problematic to debug as I worried it would be. In some ways it was a way of building lambda into every string, and the lazy evaluation of those strings has been really important. But it’s not clear if they are really settings.

Would normal string interpolation have been enough (e.g., with string.Template)? I’m pretty sure it wouldn’t have been. The ability to do a little math or use functions that read things from the environment has been very important.

Managing Python libraries

fassembler uses virtualenv for building each piece of the stack. Generally it creates several environments and installs things into them — it doesn’t run inside the environments itself. This works fine.

zc.buildout in comparison does some fancy stuff to scripts where specific eggs are enabled when you run a script. Each script has a list of all the eggs to enable. You can’t install things or manage anything manually, even to test — you always have to go through buildout, and it will regenerate the scripts for you. zc.buildout was implemented at the same time as workingenv (the predecessor to virtualenv), and I actually finished virtualenv with fassembler in mind, so I can’t blame zc.buildout for not using virtualenv. That said, I don’t think the zc.buildout system makes any sense. And it’s really complicated and has to access all sorts of not-really-public parts of easy_install to work.

Isolation is only the start. easy_install makes sure each library’s claimed dependencies are satisfied. You might then think easy_install would do all the work to make the stack work. It is nowhere close to making the stack work. setup.py files can/should contain the bare minimum that is known to be necessary to make a package work. But they can’t predict future incompatibilities, and they can’t predict interactions. And you don’t want all your packages changing versions arbitrarily. If you work with a lot of libraries you need those libraries to be pinned, and only update them when you want to update them, not just because an update has been released.

So for each piece of the stack we have a set of "requirements". This is a flat files that indicates all the packages to install. They can have explicit versions, far more restrictive than anything you should put in setup.py. It also can check out from svn, including pinning to revisions. This installation plan can go in svn, you can do diffs on it, you can branch and copy and do whatever. Maybe at some point we could use it to keep cached copies of the libraries. For now it mostly uses easy_install (and python setup.py develop for checkouts).

In parallel we have a command-line program for just installing packages using files like this, called PoachEggs. I want to make this better, and have fassembler use it, but I mostly note it because it implements a feature that can "freeze" all your packages to a requirements file. You take a working build and freeze its requirements, giving explicit (==) versions for packages, and pin all the svn checkouts to a revision, so that the frozen requirements file will install exactly the packages you know work.

An alternative to this is what the Repoze guys are doing, which is to create a custom index that only includes the versions of libraries that you want. You then tell easy_install to use this instead of PyPI. It works with zc.buildout (and anything that uses easy_install), but I can’t get excited about it compared to a simple text file. I also want svn checkouts instead of create tarballs of the checkout — I like an editable environment, because the build is just as much to support developers as to support deployment.

The structure

A big part of the development of fassembler was nailing down the structure of our site, and moving to use tools like supervisor to manage our processes. A lot of these expectations are built into the builds and fassembler itself. This is part of what makes the build Work — the pieces all conform to a common structure with some basic standards. But this isn’t the build tool itself, it’s just a set of conventions.

I don’t know quite what to make of this. Extracting the conventions from the builds leads to a situation where you can more easily misconfigure things, and the installation process ends up being more documentation-based instead of code-based. We do not want to rely on documentation, because documentation is generally because of a flaw in the build process that needs explaining. It’s faster for everyone if the code is just right. Maybe these conventions could be put into code, separate from the build. The abstraction worries me, though — too much to keep track of?

What we don’t get right

The biggest problem is that fassembler is our own system and no one else uses it. If someone wants to use just a piece of our stack they either have to build it manually or they have to use our system which is meant to build all our pieces together with our conventions. There’s some pressure to use zc.buildout to make pieces more accessible to other Zope users. We’ve also found things that build with zc.buildout that we’d like to use (e.g., setups for varnish).

We haven’t figured out how to separate the code for building our stuff from the build software itself. There’s a bootstrapping problem: you need to get the build code to build a project, and so it can’t be part of the project you are building. zc.buildout uses configuration files (that aren’t code, so they lack the bootstrap problem) and it uses recipes (a kind of plugin) and has gone to quite a bit of effort to bootstrap everything. virtualenv also supports a kind of bootstrap which we use to do the initial setup of the environment, but it doesn’t support code organization in the style of zc.buildout.

Builds are also fairly tedious to write. They aren’t horrible, but they feel much longer than they should be. Part of their length, though, is that over time we put in more code to guard against environment differences or build errors, and more code to detect the environment. But compared to zc.buildout’s configuration files, it doesn’t feel quite as nice, and if it’s not as nice sometimes people are lazy and do ad hoc setups.

The future

We haven’t really decided, but as you might have noticed zc.buildout gets a lot of attention here. There’s quite a few things I don’t like about it, but a lot of these have to do with the recipes available. We don’t have to use the standard zc.buildout egg installation recipe. In fact that would be first on the chopping block, replaced with something much simpler that assumes you are running inside a virtualenv environment, and probably something that uses requirement files.

Also, we could extract filemaker into a library and recipes could use that. Possibly logging could be handled the same way (the logging module just isn’t designed for an interactive experience like a build tool). Then if we used other people’s recipes we might feel grumpy, since they’d use neither filemaker or our logging, but it would still work. And our recipes would be full of awesome. The one thing I don’t think we could do is introduce the template-based configuration. Or, if we did, it would be hard.

That said, there is a very different direction we could go, one inspired more by App Engine. In that model we build files under a directory, and that directory is the build. Wherever you build, you get the same files, period. All paths would be relative. All environmental detection would happen in code at runtime. Things that aren’t "files" exactly would simply be standard scripts. E.g., database setup would not be done by the build, but would be a script put in a standard location.

This second file-based model of building is very much different than the principles behind zc.buildout. zc.buildout requires rebuilding when anything changes, and does so without apology. It requires rebuilding to move the directories, or to move to different machines. Using a file-based model requires a lot of push-back into the products themselves. Applications have to be patched to accept smart relative paths. They have to manage themselves a lot more, detect their environment, handle any conflicts or ambiguities, being graceful about stuff like databases, because the files have to be universal. In an extreme case I could imagine going so far as to only keep a template for a configuration file, and write the real configuration file to a temporary location before starting a server (if the server cannot be patched to accept runtime location information).

So this is the choice ahead. I’m not sure when we’ll make this choice (if ever!) — build systems are dull and somewhat annoying, but they are no more dull and annoying than dealing with a poor build system. Actually, they are definitely less dull than working with a build system that isn’t good enough or powerful enough, or one that simply lacks the TLC necessary to keep builds working. So no choice is a choice too, and maybe a bad choice.

Programming
Python
Web

Comments (16)

Permalink

pdb in the browser

People have asked me a few times about evalexception and pdb — they’d like to be able to use something like pdb through the browser, stepping through code.

The technique I used for tracebacks wouldn’t really work for pdb. For a traceback I saved all the information from the frames — mostly just the local variables — and then let the user interact with that through the browser. But with pdb you pause the application part way through waiting for user input, and the routine only completes much later.

While writing WaitForIt I played around with techniques to deal with very slow WSGI applications. Not that hard, really — you launch every request in a new thread, and you manage those requests in an application of its own. So I started thinking about pdb again, and it started seeming feasible. Whenever the app reads from stdin it goes into an interactive mode, showing you what comes out on stdout and letting you add input to stdin. It’s nothing specific to pdb really.

So, with a bit of hacking, I added it into WebError (which is an extraction of the exception handling in Paste). To give the demo a try, do:

hg clone http://knowledgetap.com/hg/weberror/
cd weberror
python setup.py develop
# You need Paste trunk:
easy_install Paste==dev
python weberror/pdbcapture.py

What you’ll see is not polished, it’s just working, but since I mostly did it to see if I could do it, that’s good enough for me.

Python
Web

Comments (1)

Permalink