WebOb decorator

Lately I’ve been writing a few applications (e.g., PickyWiki and a revisiting a request-tracking application VaingloriousEye), and I usually use no framework at all. Pylons would be a natural choice, but given that I am comfortable with all the components, I find myself inclined to assemble the pieces myself.

In the process I keep writing bits of code to make WSGI applications from simple WebOb -based request/response cycles. The simplest form looks like this:

from webob import Request, Response, exc

def wsgiwrap(func):
    def wsgi_app(environ, start_response):
        req = Request(environ)
        try:
            resp = func(req)
        except exc.HTTPException, e:
            resp = e
        return resp(environ, start_response)
    return wsgi_app

@wsgiwrap
def hello_world(req):
    return Response('Hi %s!' % (req.POST.get('name', 'You')))

But each time I’d write it, I change things slightly, implementing more or less features. For instance, handling methods, or coercing other responses, or handling middleware.

Having implemented several of these (and reading other people’s implementations) I decided I wanted WebOb to include a kind of reference implementation. But I don’t like to include anything in WebOb unless I’m sure I can get it right, so I’d really like feedback. (There’s been some less than positive feedback, but I trudge on.)

My implementation is in a WebOb branch, primarily in webob.dec (along with some doctests).

The most prominent way this is different from the example I gave is that it doesn’t change the function signature, instead it adds an attribute .wsgi_app which is WSGI application associated with the function. My goal with this is that the decorator isn’t intrusive. Here’s the case where I’ve been bothered:

class MyClass(object):
    @wsgiwrap
    def form(self, req):
        return Response(form_html...)

    @wsgiwrap
    def form_post(self, req):
        handle submission

OK, that’s fine, then I add validation:

@wsgiwrap
def form_post(self, req):
    if req not valid:
        return self.form
    handle submission

This still works, because the decorator allows you to return any WSGI application, not just a WebOb Response object. But that’s not helpful, because I need errors…

@wsgiwrap
def form_post(self, req):
    if req not valid:
        return self.form(req, errors)
    handle submission

That is, I want to have an option argument to the form method that passes in errors. But I can’t do this with the traditional wsgiwrap decorator, instead I have to refactor the code to have a third method that both form and form_post use. Of course, there’s more than one way to address this issue, but this is the technique I like.

The one other notable feature is that you can also make middleware:

@wsgify.middleware
def cap_middleware(req, app):
    resp = app(req)
    resp.body = resp.body.upper()
    return resp

capped_app = cap_middleware(some_wsgi_app)

Otherwise, for some reason I’ve found myself putting an inordinate amount of time into __repr__. Why I’ve done this I cannot say.

Programming
Python
Web

Comments (11)

Permalink

Treating configuration values as templates

A while back I described fassembler and one of the things I liked in it is how the configuration works. It uses a conventional declarative INI-style but also allows arbitrary code, so that defaults can be based on each other.

Here’s a basic example of a default configuration:

[some_app]
port_offset = 10
port = {{int(section.DEFAULT['base_port'])+int(port_offset)}}

Then if another configuration file defines base_port then this will all resolve. You can do this in Python, but you don’t get sections, and you have to define everything in just the right order. So while base_port will probably be defined in a deployment-specific configuration, it has to be defined before these other derivative settings are defined. On the other hand, you want deployment-specific configuration to take precedence… so there’s really no good ordering.

Anyway, the implementation really isn’t that hard. I use Tempita as the templating language because, well, I wrote it, and because it’s simple and appropriate for small strings. For the configuration parsing, ConfigParser will do.

Here’s what the basic code looks like in ConfigParser:

from ConfigParser import ConfigParser
from tempita import Template

class TempitaConfigParser(ConfigParser):

    def _interpolate(self, section, option, rawval, vars):
        ns = _Namespace(self, section, vars)
        tmpl = Template(rawval, name='%s.%s' % (section, option))
        value = tmpl.substitute(ns)
        return value

Actually instead of using tempita.Template, we could just do eval(rawval, {}, ns), it would just require a lot more quoting (every value would have to be a valid Python expression). Either with that or Tempita the implementation of _Namespace will look the same.

Here’s a simple implementation:

from UserDict import DictMixin

class _Namespace(DictMixin):
    def __init__(self, config, section, vars):
        self.config = config
        self.section = section
        self.vars = vars

    def __getitem__(self, key):
        if key == 'section':
            return _Section(self)
        if self.config.has_option(self.section, key):
            return self.config.get(self.section, key)
        if vars and key in self.vars:
            return self.vars[key]
        raise KeyError(key)

   def __setitem__(self, key, value):
       if self.vars is None:
           self.vars = {key: value}
       else:
           self.vars[key] = value

We’ve introduced a magic variable section, which is used to refer to other sections. It looks like this:

class _Section(object):
    def __init__(self, namespace):
        self._namespace = namespace

    def __getattr__(self, attr):
        if attr.startswith('_'):
            raise AttributeError(attr)
        return _Namespace(self._namespace.config, attr,     self._namespace.vars)

With these I think you get many of the benefits of using Python code as your configuration format, while still having the benefits of a more declarative approach to configuration, one that allows for forward and backward references.

A full implementation has several more things than I show here, but you can see the full example in my recipes. It also has an example of using INITools instead of ConfigParser to give more accurate filenames and line numbers when there is an exception, while otherwise using the same interface.

Programming

Comments (1)

Permalink

Woonerf and Python

At TOPP there’s a lot of traffic discussion, since a substantial portion of the organization is dedicated to Livable Streets initiatives. One of the traffic ideas people have gotten excited about is Woonerf. This is a Dutch traffic planning idea. In areas where there’s the intersection of lots of kinds of traffic (car, pedestrian, bike, destinations and through traffic) you have to deal with the contention for the streets. Traditionally this is approached as a complicated system of rules and right-of-ways. There’s spaces for each mode of transportation, lights to say which is allowed to go when (with lots of red and green arrows), crosswalk islands, concrete barriers, and so on.

A problem with this is that a person can only pay attention to so many things at a time. As the number of traffic controls increases, the controls themselves dominate your attention. It’s based on the ideal that so long as everyone pays attention tothe controls, they don’t have to pay attention to each other. Of course, if there’s a circumstance the controls don’t take into account then people will deviate (for instance, crossing somewhere other than the crosswalk, or getting in the wrong lane for a turn, or the simple existance of a bike is usually unaccounted for). If all attention is on the controls, and everyone trusts that the controls are being obeyed, these deviations can lead to accidents. This can create a negative feedback cycle where the controls become increasingly complex to try to take into account every possibility, with the addition of things like Jersey barriers to exclude deviant traffic. At least in the U.S., and especially in the suburbs or in complex intersections, this feeling of an overcontrolled and restricted traffic plan is common.

Copenhagen retail street

So: Woonerf. This is an extreme reaction to traffic controls. An intersection designed with the principles of Woonerf eschews all controls. This includes even things like curbs and signage. It removes most cues about behavior, and specifically of the concept of "right of way". Every person entering the intersection must view it as a negotiation. The use of eye contact, body language, and hand signals determines who takes the right of way. In this way all kinds of traffic are peers, regardless of destination or mode of transport. Also each person must focus on where they are right now, and not where they will be a minute from now; they must stay engaged.


Code as Jersey Barrier

So, I was reading a critique of Python where someone was saying how they missed public/private/protected distinctions on attributes and methods. And it occurred to me: Python’s object model is like Woonerf.

Python does not enforce rules about what you must and must not do. There are cues, like leading underscores, the __magic_method__ naming pattern, or at the module level there’s __all__. But there are no curbs, you won’t even feel the slightest bump when you access a "private" attribute on an instance.

This can lead to conflicts. For example, during discussions on installation, some people will argue for creating requirements like "SomeLibrary>=1.0,<2.0", with the expectation that while version 2.0 doesn’t exist, so long as you install something in the 1.x line it will maintain compatibility with your application. This is an unrealistic expectation. Do you and the library maintainer have the same idea about what compatibility means? What if you depend on something the maintainer considers a bug?

Practically, you can’t be sure that future versions of a library will work. You also can’t be sure they won’t work; there’s nothing that requires the maintainer of the library to break your application with version 2.0. This is where it becomes a negotiation. If you decide to cross without a crosswalk (use a non-public API) then okay. You just have to keep an eye out. And library authors, whether they like it or not, need to consider the API-as-it-is-used as much as the API-they-have-defined. In open source in particular, there are a lot of ways to achieve this communication. We don’t use some third party (e.g., a QA team or language features) to enforce rules on both sides (there are no traffic controls), instead the communication is more flat, and speaks as much to intentions as mechanisms. When someone asks "how do I do X?" a common response is: "what are you trying to accomplish?" Often an answer to the second question makes the first question irrelevant.

Woonerf is great for small towns, for creating a humane space. Is it right for big cities and streets, for busy people who want to get places fast, for trucking and industry? I’m not sure, but probably not. This is where a multi-paradigm approach is necessary. Over time libraries have to harden, become more static, innovation should happen on top of them and not in the library. Some times we create third party controls through interfaces (of one kind or another). I suppose in this case there is a kind of negotiation about how we negotiate — there’s no one process for how to build negotiation-free foundations in Python. But it’s best not to harden things you aren’t sure are right, and I’m pretty sure there’s no "right" at this very-human level of abstraction.

Programming

Comments (9)

Permalink

Cultural Imperialism, Technology, and OLPC

A couple posts have got me thinking about cultural imperialism lately: a post by Guido van Rossum about "missionaries" and OLPC not about OLPC at all, a post by Chris Hardie and a speech by Wade Davis.

Some of the questions raised: are we destroying cultures? If so, what can we do about it? Must we be hands off? I will add these questions: is it patronizing to make these choices for other people, no matter how enlightened we try to be? How much change is inevitable? Can we help make the change positive instead of resisting change?

More specifically: what is the effect of OLPC on cultures where it is introduced? Especially small cultures, cultures that have been relatively isolated, cultures that are vulnerable. The internet Quechua community is pretty slim, for example. Introducing the internet into a community will lead the children to favor Spanish more strongly, and identify with that more dominant culture over their family and community culture.

Criticisms like Guido’s are common:

I’m not surprised that the pope is pleased by the OLPC program. The mentality from which it springs is the same mentality which in past centuries created the missionary programs. The idea is that we, the west, know what’s good for the rest of the world, and that we therefore must push our ideas onto the "third world" by means of the most advanced technology available. In past centuries, that was arguably the printing press, so we sent missionaries armed with stacks of bibles. These days, we have computers, so we send modern missionaries (of our western lifestyle, including consumerism, global warming, and credit default swaps) armed with computers.

This kind of criticism is easy, because it doesn’t have any counterproposal. It’s not saying much more than "you all suck" to the people involved.

Cultural imperialism is a genuine phenomena. In an attempt to subjugate or assimilate, the dominant culture may explicitly and cynically enforce its cultural norms, through its religion, requiring all schools to operate in the dominant language, even going as far as suggesting how we arrange ourselves during sex.

But it’s not clear to me that what’s happening now is cultural imperialism. It’s more market-oriented homogenization. Food manufacturers don’t use high-fructose corn syrup because they want to make us fat — they just give us what we want, and they are enabling our latent tendency to become obese. Similarly I think the way culture is spread currently encourages homogeneity, without explicit attempting to destroy culture.

This is where I think a protectionist stance — the idea we should just be hands-off — is patronizing. People aren’t abandoning their cultures because they are stupid and they are being manipulated. People make decisions, what they think is the best decision for themself and their families. These decisions lead them to leave rural areas, learn the dominant language, try to conform through education, and even just lead them to enjoy a dominant culture which is often far more entertaining than a smaller and more traditional culture.

The irony is that once they’ve done this they’ve traded their position for a place in the bottom rung of the dominant society. And it’s true that in many cases they’ve made these decisions because they’ve been forced out of their traditional life by political and legal systems they don’t understand. But to blame it all on oppression is to be blind to the many concrete benefits of our modern world. Corrugated metal roofs are simply superior to thatched roofs, and we can get all romantic about traditional building processes and material independence, but we do so from homes with roofs that don’t leak. Leaking roofs are just objectively unpleasant. And frankly people like TV, you don’t have to tell people to like TV, it just happens.

So I believe that assimilation pressure is natural and inevitable in our times.

What then of technology, of the internet and laptops?

I believe OLPC takes an important stance when it selects open source and open licensing for its content. It is valuing freedom, but more importantly encouraging self-determination, trying to build up a user base that can act as peers in this project, not as simply receivers of first-world largess. But it will be culturally disruptive. And I’m okay with that. In a patriarchal culture, giving girls access to this technology will be destructive to that power structure. Yay! I believe in the moral rightness of that one girl making her own choices, finding her own truths, more than I believe in the validity of the culture she was born into. If you believe people should be able to make their own choices (so long as they are aware of the real consequence of their choices), then you must allow for them to choose to abandon their own cultures for something they find more appealing. They might know better than you if that’s a good choice. I think we all hope that instead they transform their own cultures, but that’s not our choice to make.

What I find unpleasant is if they leave a true identity to find themselves in a place of cultural subservience. If they feel they can’t preserve the part of their culture they most value. Perhaps because of discrimination they feel they must hide their past, or they build up a sense of self-loathing. Perhaps they become isolated, unable to find peers that understand where they come from. And perhaps there is no higher culture at all that they can use to exalt their understanding of the world — do they have a literature? Do they have non-traditional music forms of their own? Do they have a forum where people who share their perspective can have serious discussions? Cultures aren’t destroyed so much as they are starved out of existence.


I think assimilation is inevitable, and can be positive. If we were all able to speak to each other, with some shared second or third language, I think the world would be a better place. I’m not a Christian, but I’m not afraid of anyone knowing The Bible. There’s no piece of culture that I would want to deny from anyone. Each new song, each new book, each new idea… I believe they will all make you a better person, if only in a small way.

And on the internet our culture is cumulative. There’s only so many hours of programming on TV or the radio, only so many pages in a newspaper. On the internet the presence of one kind of culture does not exclude any other. There’s room for a Quechua community as much of any other. But the online Quechua community won’t have exclusive rights to its members like a traditional culture claims — children will live between cultures.

Cumulative culture is not a promise that anyone will care. Languages can still die, cultures can still die, identities become forgotten. If these smaller cultures are going to be preserved, they must adapt to the partially-assimilated status of their members. There must be new art and new ideas and new identities. This is why I believe in the laptop project, because it can enable the creation and sharing of these new ideas. I think it will give smaller cultures a chance to survive — there’s no promises, literature doesn’t write itself, but maybe there is at least a chance.

This is also why I am more skeptical of mobile phones, audio devices, and any device that doesn’t actively enable content creation. Mobile phones are not how culture is made. It let’s people chat, consume information, communicate in a 12-key pidgin. But the mobile phone user is not a peer in a world wide web of information. The mobile phone user lives on a proprietary network, with a proprietary device, and while it perhaps it breaks down some hierarchies through disintermediation, it does so in a transient way. The uptake is certainly faster, but the potential seems so much lower.

I don’t know if OLPC will be successful. That’s as unclear now as ever. But it’s trying to do the right thing, and I think it’s a better chance than most for maintaining or improving the richness of the worlds’ culture.

Non-technical
OLPC
Politics

Comments (22)

Permalink

Modern Web Design, I Renounce Thee!

I’m not a designer, but I spend as much time looking at web pages as the next guy. So I took interest when I came upon this post on font size by Wilson Miner, which in turn is inspired by the 100e2r (100% easy to read) standard by Oliver Reichenstein.

The basic idea is simple: we should have fonts at the "default" size, about 16px, no smaller. This is about the size of text in print, read at a reasonable distance (typically closer up than a screen):

http://blog.ianbicking.org/wp-content/uploads/images/typesize_comparison2.jpg

Also it calls out low-contrast color schemes, which I think are mostly passe, and I will not insult you, my reader, by suggesting you don’t entirely agree. Because if you don’t agree, well, I’m afraid I’d have to use some strong words.

I think small fonts, low contrast, huge amounts of whitespace, are a side effect of the audience designers create for.

This makes me think of Modern Architecture:

http://blog.ianbicking.org/wp-content/uploads/images/300px-seagram.jpg

This is a form of architecture popular for skyscapers and other dramatic structures, with their soaring heights and other such dramatic adjectives. These are buildings designed for someone looking at the building from five hundred feet away. They are not designed for occupants. But that’s okay, because the design isn’t sold to occupants, it is sold to people who look at the sketches and want to feel very dramatic.

Similarly, I think the design pattern of small fonts is something meant to appeal to shallow observation. By deemphasizing the text itself, the design is accentuated. Low-contrast text is even more obviously the domination of design over content. And it may very well look more professional and visually pleasing. But web design isn’t for making sites visually pleasing, it is for making the experience of the content more pleasing. Sites exist for their content, not their design.

In 100e2r he also says let your text breathe. You need whitespace. If you view my site directly, you’ll notice I don’t have big white margins around my text. When you come to my site, it’s to see my words, and that’s what I’m going to give you! When I want to let my text breathe with lots of whitespace this is what I do:

http://blog.ianbicking.org/wp-content/uploads/images/500px-my-white-desktop.jpg

Is a huge block of text hard to read? It is. And yeah, I’ve written articles like that. But the solution?

WRITE BETTER

Similarly, it’s hard to read text if you don’t use paragraphs, but the solution isn’t to increase your line height until every line is like a paragraph of its own.

The solution to the drudgery of large swathes of text is:

  1. Make your blocks of text smaller.
  2. Use something other than paragraphs of text.

Throw in a list. Do some indentation. Toss in even a stupid picture. Personally I try to throw in code examples, because that’s how we roll on this blog.

That’s good writing, that’s content that is easy to read. It’s not easy to write, and I’m sure I miss the mark more often than not. But you can’t design your way to good content. If you want to write like this, if you want to let the flow of your text reflect the flow of your ideas, you need room. Huge margins don’t give you room. They are a crutch for poor writing, and not even a good crutch.

So in conclusion: modern design be damned!

HTML
Non-technical
Web

Comments (18)

Permalink

Atompub as an alternative to WebDAV

I’ve been thinking about an import/export API for PickyWiki; I want something that’s sensible, and works well enough that it can be the basic for things like creating restorable snapshots, integration with version control systems, and being good at self-hosting documentation.

So far I’ve made a simple import/export system based on Atom. You can export the entire site as an Atom feed, and you can import Atom feeds. But whole-site import/export isn’t enough for the tools I’d like to write on top of the API.

WebDAV would seem like a logical choice, as it lets you get and put resources. But it’s not a great choice for a few reasons:

  • It’s really hard to implement on the server.
  • Even clients are hard to implement.
  • It uses GET to get resources. This is probably its most fatal flaw. There is no CMS that I know of (except maybe one) where the thing you view the browser is the thing that you’d actually edit. To work around this CMSes use User-Agent sniffing or an alternate URL space.
  • WebDAV is worried about "collections" (i.e., directories). The web basically doesn’t know what "collections" are, it only knows paths, and paths are strings.
  • (In summary) WebDAV uses HTTP, but it is not of the web.

I don’t want to invent something new though. So I started thinking of Atom some more, and Atompub.

The first thought is how to fix the GET problem in WebDAV. A web page isn’t an editable representation, but it’s pretty reasonable to put an editable representation into an Atom entry. Clients won’t necessarily understand extensions and properties you might add to those entries, but I don’t see any way around that. An entry might look like:

<entry>
  <content type="html">QUOTED HTML</content>
  ... other normal metadata (title etc) ...
  <privateprop:myproperty xmlns:privateprop="URL" name="foo" value="bar" />
</entry>

While there is special support for HTML, XHTML, and plain text in Atom, you can put any type of content in <content>, encoded in base64.

To find the editable representation, the browser page can point to it. I imagine something like this:

<link rel="alternate" type="application/atom+xml; type=entry"
 href="this-url?format=atom">

The actual URL (in this example this-url?format=atom) can be pretty much anything. My one worry is that this could be confused with feed detection, which looks like:

<link rel="alternate" type="application/atom+xml"
 href="/atom.xml">

The only difference is "; type=entry", which I’m betting a lot of clients don’t pay attention to.

The Atom entries then can have an element:

<link rel="edit" href="this-url" />

This is a location where you can PUT a new entry to update the resource. You could allow the client to PUT directly over the old page, or use this-url?format=atom or whatever is convenient on the server-side. Additionally, DELETE to the same URL would delete.

This handles updates and deletes, and single-page reads. The next issue is creating pages.

Atompub makes creation fairly simple. First you have to get the Atompub service document. This is a document with the type application/atomsvc+xml and it gives the collection URL. It’s suggested you make this document discoverable like:

<link rel="service" type="application/atomsvc+xml"
 href="/atomsvc.xml">

This document then points to the "collection" URL, which for our purposes is where you create documents. The service document would look like:

<service xmlns="http://www.w3.org/2007/app"
         xmlns:atom="http://www.w3.org/2005/Atom">
  <workspace>
    <atom:title>SITE TITLE</atom:title>
    <collection href="/atomapi">
      <atom:title>SITE TITLE</atom:title>
      <accept>*/*</accept>
      <accept>application/atom+xml;type=entry</accept>
    </collection>
  </workspace>
</service>

Basically this indicates that you can POST any media to /atomapi (both Atom entries, and things like images).

To create a page, a client then does a POST like:

POST /atomapi
Content-Type: application/atom+xml; type=entry
Slug: /page/path

<entry xmlns="...">...</entry>

There’s an awkwardness here, that you can suggest (via the Slug header) what the URL for the new page is. The client can find the actual URL of the new page from the Location header in the response. But the client can’t demand that the slug be respected (getting an error back if it is not), and there’s lots of use cases where the client doesn’t just want to suggest a path (for instance, other documents that are being created might rely on that path for links).

Also, "slug" implies… well, a slug. That is, some path segment probably derived from the title. There’s nothing stopping the client from putting a complete path in there, but it’s very likely to be misinterpreted (e.g. translating /page/path to /2009/01/pagepath).

Bug I digress. Anyway, you can post every resource as an entry, base64-encoding the resource body, but Atompub also allows POSTing media directly. When you do that, the server puts the media somewhere and creates a simple Atom entry for the media. If you wanted to add properties to that entry, you’d edit the entry after creating it.

The last missing piece is how to get a list of all the pages on a site. Atompub does have an answer for this: just GET /atomapi will give you an Atom feed, and for our purposes we can demand that the feed is complete (using paging so that any one page of the feed doesn’t get too big). But this doesn’t seem like a good solution to me. GData specifies a useful set of queries to for feeds, but I’m not sure that this is very useful here; the kind of queries a client needs to do for this use case aren’t things GData was designed for.

The queries that seem most important to me are queries by page path (which allows some sense of "collections" without being formal) and by content type. Also to allow incremental updates on the client side, filtering these queries by last-modified time (i.e., all pages created since I last looked). Reporting queries (date of creation, update, author, last editor, and custom properties) of course could be useful, but don’t seem as directly applicable.

Also, often the client won’t want the complete Atom entry for the pages, but only a list of pages (maybe with minimal metadata). I’m unsure about the validity of abbreviated Atom entries, but it seems like one solution. Any Atom entry can have something like:

<link rel="self" type="application/atom+xml; type=entry"
 href="url?format=atom" />

This indicates where the entry exists, though it doesn’t suggest very forcefully that the actual entry is abbreviated. Anyway, I could then imagine a feed like:

<feed>
  <entry>

    <content type="some/content-type" />
    <link rel="self" href="..." />
    <updated>YYYYMMDDTHH:MM:SSZ</updated>
  <entry>
  ...
</feed>

This isn’t entirely valid, however — you can’t just have an empty <content> tag. You can use a src attribute to use indirection for the content, and then add Yet Another URL for each page that points to its raw content. But that’s just jumping through hoops. This also seems like an opportunity to suggest that the entry is incomplete.

To actually construct these feeds, you need some way of getting the feed. I suggest that another entry be added to the Atompub service document, something like:

<cmsapi:feed href="URI-TEMPLATE" />

That would be a URI Template that accepted several known variables (though frustratingly, URI Templates aren’t properly standardized yet). Things like:

  • content-type: the content type of the resource (allowing wildcards like image/*)
  • container: a path to a container, i.e., /2007 would match all pages in /2007/…
  • path-regex: some regular expression to match the paths
  • last-modified: return all pages modified at the given date or later

All parameters would be ANDed together.

So, open issues:

  • How to strongly suggest a path when creating a resource (better than Slug)
  • How to rename (move) or copy a page (it’s easy enough to punt on copy, but I’d rather move by a little more formal than just recreating a resource in a new location and deleting the original)
  • How to represent abbreviated Atom entries

With these resolved I think it’d be possible to create a much simpler API than WebDAV, and one that can be applied to existing applications much more easily. (If you think there’s more missing, please comment.)

HTML
Programming
Web

Comments (26)

Permalink

Avoiding Silos: “link” as a first-class object

One of the constant annoyances to me in web applications is the self-proclaimed need for those applications to know about everything and do everything, and only spotty ad hoc techniques for including things from other applications.

An example might be blog navigation or search, where you can only include data from the application itself. Or "Recent Posts" which can only show locally-produce posts. What if I post something elsewhere? I have to create some shoddy placeholder post to refer to it. Bah! Underlying this the data is usually structured in a specific way, with the HTML being a sort of artifact of the database, the markup transient and a slave to the database’s structure.

An example of this might be a recent post listing like:

<ul>
  for post in recent_posts:
    <li>
      <a href="/post/{{post.year}}/{{post.month}}/{{post.slug}}">
        {{post.title}}</a>
    </li>
</ul>

There’s clearly no room for exceptions in this code. I am thus proposing that any system like this should have the notion of a "link" as a first-class object. The code should look like this:

<ul>
  for post in recent_posts:
    <li>
      {{post.link()}}
    </li>
</ul>

Just like with changing IDs to links in service documents, the template doesn’t actually look any more complicated than it did before (simpler, even). But now we can use simple object-oriented techniques to create first-class links. The code might look like:

class Post(SomeORM):
    def url(self):
        if self.type == 'link':
            return self.body
        else:
            base = get_request().application_url
            return '%s/%s/%s/%s' % (
                base, self.year, self.month, self.slug)

    def link(self):
        return html('<a href="%s">%s</a>') % (
            self.url(), self.title)

The addition of the .url() method has the obvious effect of making these offsite links work. Using a .link() method has the added advantage of allowing things like HTML snippets to be inserted into the system (even though that is not implemented here). By allowing arbitrary HTML in certain places you make it possible for people to extend the site in little ways — possibly adding markup to a title, or allowing an item in the list that actually contains two URLs (e.g., <a href="url1">Some Item</a> (<a href="url2">via</a>)).

In the context of Python I recommend making these into methods, not properties, because it allows you to later add keyword arguments to specialize the markup (like post.link(abbreviated=True)).

One negative aspect of this is that you cannot affect all the markup through the template alone, you may have to go into the Python code to change things. Anyone have ideas for handling this problem?

HTML
Programming
Python
Web

Comments (13)

Permalink

Javascript Status Message Display

In a little wiki I’ve been playing with I’ve been trying out little ideas that I’ve had but haven’t had a place to actually implement them. One is how notification messages work. I’m sure other people have done the same thing, but I thought I’d describe it anyway.

A common pattern is to accept a POST request and then redirect the user to some page, setting a status message. Typically the status message is either set in a cookie or in the session, then the standard template for the application has some code to check for a message and display it.

The problem with this is that this breaks all caching — at any time any page can have some message injected into it, basically for no reason at all. So I thought: why not do the whole thing in Javascript? The server will set a cookie, but only Javascript will read it.

The code goes like this; on the server (easily translated into any framework):

resp.set_cookie('flash_message', urllib.quote(msg))

I quote the message because it can contain characters unsafe for cookies, and URL quoting is a particularly easy quoting to apply.

Then I have this Javascript (using jQuery):

$(function () {
    // Anything in $(function...) is run on page load
    var flashMsg = readCookie('flash_message');
    if (flashMsg) {
        flashMsg = unescape(flashMsg);
        var el = $('<div id="flash-message">'+
          '<div id="flash-message-close">'+
          '<a title="dismiss this message" '+
          'id="flash-message-button" href="#">X</a></div>'+
          flashMsg + '</div>');
        $('a#flash-message-button', el).bind(
          'click', function () {
            $(this.parentNode.parentNode).remove();
        });
        $('#body').prepend(el);
        eraseCookie('flash_message');
    }
});

Note that I’ve decided to treat the flash message as HTML. I don’t see a strong risk of injection attack in this case, though I must admit I’m a little unclear about what the normal policies are for cross-domain cookie setting.

I use these cookie functions because oddly I can’t find cookie handling functions in jQuery. It’s always weird to me how primitive document.cookie is. Anyway, CSS looks like this:

#flash-message {
  margin: 0.5em;
  border: 2px solid #000;
  background-color: #9f9;
  -moz-border-radius: 4px;
  text-align: center;
}

#flash-message-close {
  float: right;
  font-size: 70%;
  margin: 2px;
}

a#flash-message-button {
  text-decoration: none;
  color: #000;
  border: 1px solid #9f9;
}

a#flash-message-button:hover {
  border: 1px solid #000;
  background-color: #009;
  color: #fff;
}

This doesn’t have non-Javascript fallback, but I think that’s okay. This isn’t something that a spider would ever see (since spiders shouldn’t be submitting forms that result in update messages). Accessible browsers generally implement Javascript so that’s also not particularly a problem, though there may be additional hints I could give in CSS or Javascript to help make this more readable (if there’s a message, it should probably be the first thing read on the page).

Another common component of pages that varies separate from the page itself is logged-in status, but that’s more heavily connected to your application. Get both into Javascript and you might be able to turn caching way up on a lot of your pages.

Javascript
Programming
Web

Comments (13)

Permalink

Using pip Requirements

Following onto a set of recent posts (from James, me, then James again), Martijn Faassen wrote a description of Grok’s version management. Our ideas are pretty close, but he’s using buildout, and I’ll describe out to do the same things with pip.

Here’s a kind of development workflow that I think works well:

  • A framework release is prepared. Ideally there’s a buildbot that has been running (as Pylons has, for example), so the integration has been running for a while.
  • People make sure there are released versions of all the important components. If there are known conflicts between pieces, libraries and the framework update their install_requires in their setup.py files to make sure people don’t use conflicting pieces together.
  • Once everything has been released, there is a known set of packages that work together. Using a buildbot maybe future versions will also work together, but they won’t necessarily work together with applications built on the framework. And breakage can also occur regardless of a buildbot.
  • Also, people may have versions of libraries already installed, but just because they’ve installed something doesn’t mean they really mean to stay with an old version. While known conflicts have been noted, there’s going to be lots of unknown conflicts and future conflicts.
  • When starting development with a framework, the developer would like to start with some known-good set, which is a set that can be developed by the framework developers, or potentially by any person. For instance, if you extend a public framework with an internal framework (or even a public sub-framework like Pinax) then the known-good set will be developed by a different set of people.
  • As an application is developed, the developer will add on other libraries, or use some of their own libraries. Development will probably occur at the trunk/tip of several libraries as they are developed together.
  • A developer might upgrade the entire framework, or just upgrade one piece (for instance, to get a bug fix they are interested in, or follow a branch that has functionality they care about). The developer doesn’t necessarily have the same notion of "stable" and "released" as the core framework developers have.
  • At the time of deployment the developer wants to make sure all the pieces are deployed together as they’ve tested them, and how they know them to work. At any time, another developer may want to clone the same set of libraries.
  • After initial deployment, the developer may want to upgrade a single component, if only to test that an upgrade works, or if it resolves a bug. They may test out combinations only to throw them away, and they don’t want to bump versions of libraries in order to deploy new combinations.

This is the kind of development pattern that requirement files are meant to assist with. They can provide a known-good set of packages. Or they can provide a starting point for an active line of development. Or they can provide a historical record of how something was put together.

The easy way to start a requirement file for pip is just to put the packages you know you want to work with. For instance, we’ll call this project-start.txt:

Pylons
-e svn+http://mycompany/svn/MyApp/trunk#egg=MyApp
-e svn+http://mycompany/svn/MyLibrary/trunk#egg=MyLibrary

You can plug away for a while, and maybe you decide you want to freeze the file. So you do:

$ pip freeze -r project-start.txt project-frozen.txt

By using -r project-start.txt you give pip freeze a template for it to start with. From that, you’ll get project-frozen.txt that will look like:

Pylons==0.9.7
-e svn+http://mycompany/svn/MyApp/trunk@1045#egg=MyApp
-e svn+http://mycompany/svn/MyLibrary/trunk@1058#egg=MyLibrary

## The following requirements were added by pip --freeze:
Beaker==0.2.1
WebHelpers==0.9.1
nose==1.4
# Installing as editable to satisfy requirement INITools==0.2.1dev-r3488:
-e svn+http://svn.colorstudy.com/INITools/trunk@3488#egg=INITools-0.2.1dev_r3488

At that point you might decide that you don’t care about the nose version, or you might have installed something from trunk when you could have used the last release. So you go and adjust some things.

Martijn also asks: how do you have framework developers maintain one file, and then also have developers maintain their own lists for their projects?

You could start with a file like this for the framework itself. Pylons for instance could ship with something like this. To install Pylons you could then do:

$ pip -E MyProject install \\
>    -r http://pylonshq.com/0.9.7-requirements.txt

You can also download that file yourself, add some comments, rename the file and add your project to it, and use that. When you freeze the order of the packages and any comments will be preserved, so you can keep track of what changed. Also it should be ameniable to source control, and diffs would be sensible.

You could also use indirection, creating a file like this for your project:

-r http://pylonshq.com/0.9.7-requirements.txt
-e svn+http://mycompany/svn/MyApp/trunk#egg=MyApp
-e svn+http://mycompany/svn/MyLibrary/trunk#egg=MyLibrary

That is, requirements files can refer to each other. So if you want to maintain your own requirements file alongside the development of an upstream requirements file, you could do that.

Packaging
Python

Comments (2)

Permalink

A Few Corrections To “On Packaging”

James Bennett recently wrote an article on Python packaging and installation, and Setuptools. There’s a lot of issues, and writing up my thoughts could take a long time, but I thought at least I should correct some errors, specifically category errors. Figuring out where all the pieces in Setuptools (and pip and virtualenv) fit is difficult, so I don’t blame James for making some mistakes, but in the interest of clarifying the discussion…

I will start with a kind of glossary:

Distribution:
This is something-with-a-setup.py. A tarball, zip, a checkout, etc. Distributions have names; this is the name in setup(name="…") in the setup.py file. They have some other metadata too (description, version, etc), and Setuptools adds to that metadata some. Distutils doesn’t make it very easy to add to the metadata — it’ll whine a little about things it doesn’t know, but won’t do anything with that extra data. Fixing this problem in Distutils is an important aspect of Setuptools, and part of what Distutils itself unsuitable as a basis for good library management.
package/module:
This is something you import. It is not the same as a distribution, though usually a distribution will have the same name as a package. In my own libraries I try to name the distribution with mixed case (like Paste) and the package with lower case (like paste). Keeping the terminology straight here is very difficult; and usually it doesn’t matter, but sometimes it does.
Setuptools The Distribution:
This is what you install when you install Setuptools. It includes several pieces that Phillip Eby wrote, that work together but are not strictly a single thing.
setuptools The Package:
This is what you get when you do import setuptools. Setuptools largely works by monkeypatching distutils, so simply importing setuptools activates its functionality from then on. This package is entirely focused on installation and package management, it is not something you should use at runtime (unless you are installing packages as your runtime, of course).
pkg_resources The Module:
This is also included in Setuptools The Distribution, and is for use at runtime. This is a single module that provides the ability to query what distributions are installed, metadata about those distributions, information about the location where they are installed. It also allows distributions to be "activated". A distribution can be available but not activated. Activating a distribution means adding its location to sys.path, and probably you’ve noticed how long sys.path is when you use easy_install. Almost everything that allows different libraries to be installed, or allows different versions of libraries, does it through some management of sys.path. pkg_resources also allows for generic access to "resources" (i.e., non-code files), and let’s those resources be in zip files. pkg_resources is safe to use, it doesn’t do any of the funny stuff that people get annoyed with.
easy_install:
This is also in Setuptools The Distribution. The basic functionality it provides is that given a name, it can search for package with that distribution name, and also satisfying a version requirement. It then downloads the package, installs it (using setup.py install, but with the setuptools monkeypatches in place). After that, it checks the newly installed distribution to see if it requires any other libraries that aren’t yet installed, and if so it installs them.
Eggs the Distribution Format:
These are zip files that Setuptools creates when you run python setup.py bdist_egg. Unlike a tarball, these can be binary packages, containing compiled modules, and generally contain .pyc files (which are portable across platforms, but not Python versions). This format only includes files that will actually be installed; as a result it does not include doc files or setup.py itself. All the metadata from setup.py that is needed for installation is put in files in a directory EGG-INFO.
Eggs the Installation Format:
Eggs the Distribution Format are a subset of the Installation Format. That is, if you put an Egg zip file on the path, it is installed, no other process is necessary. But the Installation Format is more general. To have an egg installed, you either need something like DistroName-X.Y.egg/ on the path, and then an EGG-INFO/ directory under that with the metadata, or a path like DistroName.egg-info/ with the metadata directly in that directory. This metadata can exist anywhere, and doesn’t have to be directly alongside the actual Python code. Egg directories are required for pkg_resources to activate and deactivate distributions, but otherwise they aren’t necessary.
pip:
This is an alternative to easy_install. It works somewhat differently than easy_install, but not much. Mostly it is better than easy_install, in that it has some extra features and is easier to use. Unlike easy_install, it downloads all distributions up-front, and generates the metadata to read distribution and version requirements. It uses Setuptools to generate this metadata from a setup.py file, and uses pkg_resources to parse this metadata. It then installs packages with the setuptools monkeypatches applied. It just happens to use an option python setup.py –single-version-externally-managed, which gets Setuptools to install packages in a more flat manner, with Distro.egg-info/ directories alongside the package. Pip installs eggs! I’ve heard the many complaints about easy_install (and I’ve had many myself), but ultimately I think pip does well by just fixing a few small issues. Pip is not a repudiation of Setuptools or the basic mechanisms that easy_install uses.
PoachEggs:
This is a defunct package that had some of the features of pip (particularly requirement files) but used easy_install for installation. Don’t bother with this, it was just a bridge to get to pip.
virtualenv:
This is a little hack that creates isolated Python environments. It’s based on virtual-python.py, which is something I wrote based on some documentation notes PJE wrote for Setuptools. Basically virtualenv just creates a bin/python interpreter that has its own value of sys.prefix, but uses the system Python and standard library. It also installs Setuptools to make it easier to bootstrap the environment (because bootstrapping Setuptools is itself a bit tedious). I’ll add pip to it too sometime. Using virtualenv you don’t have to worry about different library versions, because for any one environment you will probably only need one version of a library. On any one machine you probably need different versions, which is why installing packages system-wide is problematic for most libraries. (I’ve been meaning to write a post on why I think using system packaging for libraries is counter-productive, but that’ll wait for another time.)

So… there’s the pieces involved, at least the ones I can remember now. And I haven’t really discussed .pth files, entry points, sys.path trickery, site.py, distutils.cfg… sadly this is a complex state of affairs, but it was also complex before Setuptools.

There are a few things that I think people really dislike about Setuptools.

First, zip files. Setuptools prefers zip files, for reasons that won’t mean much to you, and maybe are more historical than anything. When a distribution doesn’t indicate if it is zip-safe, Setuptools looks at the code and sees if it uses __file__, an if not it presumes that the code is probably zip-safe. The specific problem James cites is what appears to be a bug in Django, that Django looks for code and can’t traverse into zip files in the same way that Python itself can. Setuptools didn’t itself add anything to Python to make it import zip files, that functionality was added to Python some time before. The zipped eggs that Setuptools installs are using existing (standard!) Python functionality.

That said, I don’t think zipping libraries up is all that useful, and while it should work, it doesn’t always, and it makes code harder to inspect and understand. So since it’s not that useful, I’ve disabled it when pip installs packages. I also have had it disabled on my own system for years now, by creating a distutils.cfg file with [easy_install] zip_ok = False in it. Sadly App Engine is forcing me to use zip files again, because of its absurdly small file limits… but that’s a different topic. (There is an experimental pip zip command mostly intended for App Engine.)

Another pain point is version management with setup.py and Setuptools. Indeed it is easy to get things messed up, and it is easy to piss people off by overspecifying, and sometimes things can get in a weird state for no good reason (often because of easy_install’s rather naive leap-before-you-look installation order). Pip fixes that last point, but it also tries to suggest more constructive and less painful ways to manage other pieces.

Pip requirement files are an assertion of versions that work together. setup.py requirements (the Setuptools requirements) should contain two things: 1: all the libraries used by the distribution (without which there’s no way it’ll work) and 2: exclusions of the versions of those libraries that are known not to work. setup.py requirements should not be viewed as an assertion that by satisfying those requirements everything will work, just that it might work. Only the end developer, testing the system together, can figure out if it really works. Then pip gives you a way to record that working set (using pip freeze), separate from any single distribution or library.

There’s also a lot of conflicts between Setuptools and package maintainers. This is kind of a proxy war between developers and sysadmins, who have very different motivations. It deserves a post of its own, but the conflicts are about more than just how Setuptools is implemented.

I’d love if there was a language-neutral library installation and management tool that really worked. Linux system package managers are absolutely not that tool; frankly it is absurd to even consider them as an alternative. So for now we do our best in our respective language communities. If we’re going to move forward, we’ll have to acknowledge what’s come before, and the reasoning for it.

Packaging
Python

Comments (38)

Permalink