What Does A WebOb App Look Like?

Lately I’ve been writing code using WebOb and just a few other small libraries. It’s not entirely obvious what this looks like, so I thought I’d give a simple example.

I make each application a class. Instances of the class are "configured applications". So it looks a little like this (for an application that takes one configuration parameter, file_path):

class Application(object):
    def __init__(self, file_path):
        self.file_path = file_path

Then the app needs to be a WSGI app, because that’s how I roll. I use webob.dec:

from webob.dec import wsgify
from webob import exc
from webob import Response

class Application(object):
    def __init__(self, file_path):
        self.file_path = file_path
    @wsgify
    def __call__(self, req):
        return Response('Hi!')

Somewhere separate from the application you actually instantiate Application. You can use Paste Deploy for that, configure it yourself, or just do something ad hoc (a lot of mod_wsgi .wsgi files are like this, basically).

I use webob.exc for things like exc.HTTPNotFound(). You can raise that as an exception, but I mostly just return the object (to the same effect).

Now you have Hello World. I then sometimes do something terrible, I start handling URLs like this:

@wsgify
def __call__(self, req):
    if req.path_info == '/':
        return self.index(req)
    elif req.path_info.startswith('/view/'):
        return self.view(req)
    return exc.HTTPNotFound()

This is lazy and a very bad idea. So you want a dispatcher. There are several (e.g., selector). I’ll use Routes here… the latest release makes it a bit easier (though it could still be streamlined a bit). Here’s a pattern I think makes sense:

from routes import Mapper

class Application(object):
    map = Mapper()
    map.connect('index', '/', method='index')
    map.connect('view', '/view/{item}', method='view')

    def __init__(self, file_path):
        self.file_path = file_path

    @wsgify
    def __call__(self, req):
        results = self.map.routematch(environ=req.environ)
        if not results:
            return exc.HTTPNotFound()
        match, route = results
        link = URLGenerator(self.map, req.environ)
        req.urlvars = ((), match)
        kwargs = match.copy()
        method = kwargs.pop('method')
        req.link = link
        return getattr(self, method)(req, **kwargs)

    def index(self, req):
        ...
    def view(self, req, item):
        ...

Another way you might do it is to skip the class, which means skipping a clear place for configuration. I don’t like that, but if you don’t care about that, then it looks like this:

def index(self, req):
    ...
def view(self, req, item):
    ...

map = Mapper()
map.connect('index', '/', view=index)
map.connect('view', '/view/{item}', view=view)

@wsgify
def application(req):
    results = map.routematch(environ=req.environ)
    if not results:
        return exc.HTTPNotFound()
    match, route = results
    link = URLGenerator(map, req.environ)
    req.urlvars = ((), match)
    kwargs = match.copy()
    view = kwargs.pop('view')
    req.link = link
    return view(req, **kwargs)

Then application is pretty much boilerplate. You could put configuration in the request if you wanted, or use some other technique (like Contextual).

I talked some with Ben Bangert about what he’s trying with these patterns, and he’s doing something reminiscent of Pylons controllers (but without the rest of Pylons) and it looks more like this (with my own adaptations):

class BaseController(object):
    special_vars = ['controller', 'action']

    def __init__(self, request, link, **config):
        self.request = request
        self.link = link
        for name, value in config.items():
            setattr(self, name, value)

    def __call__(self):
        action = self.request.urlvars.get('action', 'index')
        if hasattr(self, '__before__'):
            self.__before__()
        kwargs = req.urlsvars.copy()
        for attr in self.special_vars
            if attr in kwargs:
                del kwargs[attr]
        return getattr(self, action)(**kwargs)

class Index(BaseController):
    def index(self):
        ...
    def view(self, item):
        ...

class Application(object):
    map = Mapper()
    map.connect('index', '/', controller=Index)
    map.connect('view', '/view/{item}', controller=Index,     action='view')

    def __init__(self, **config):
        self.config = config

    @wsgify
    def __call__(self, req):
        results = self.map.routematch(environ=req.environ)
        if not results:
            return exc.HTTPNotFound()
        match, route = results
        link = URLGenerator(self.map, req.environ)
        req.urlvars = ((), match)
        controller = match['controller'](req, link, **self.config)
        return controller()

That’s a lot of code blocks, but they all really say the same thing ;) I think writing apps with almost-no-framework like this is pretty doable, so if you have something small you should give it a go. I think it’s especially appropriate for applications that are an API (not a "web site").

Programming
Python
Web

Comments (5)

Permalink

Configuration management: push vs. pull

Since I’ve been thinking about deployment I’ve been thinking a lot more about what "configuration management" means, how it should work, what it should do.

I guess my quick summary of configuration management is that it is setting up a server correctly. "Correct" is an ambiguous term, but given that there are so many to configuration management the solutions are also ambiguous.

Silver Lining includes configuration management of a sort. It is very simple. Right now it is simply a bunch of files to rsync over, and one shell script (you can see the files here and the script here — at least until I move them and those links start 404ing). Also each "service" (e.g., a database) has a simple setup script. I’m sure this system will become more complicated over time, but it’s really simple right now, and I like that.

The other system I’ve been asked about the most about lately is Puppet. Puppet is a real configuration management system. The driving forces are very different: I’m just trying to get a system set up that is in all ways acceptable for web application deployment. I want one system set up for one kind of task; I am completely focused on that end, and I care about means only insofar as I don’t want to get distracted by those means. Puppet is for people who care about the means, not just the ends. People who want things to work in a particular way; I only care that they work.

That’s the big difference between Puppet and Silver Lining. The smaller difference (that I want to talk about) is "push" vs. "pull". Grig wrote up some notes on two approaches. Silver Lining uses a "push" system (though calling it a "system" is kind of overselling what it does) while Puppet is "pull". Basically Silver Lining runs these commands (from your personal computer):

$ rsync -r <silverlining>/server-files/serverroot/ root@server:/
$ ssh root@server "$(cat <silverlining>/server-files/update-server-script.sh)"

This is what happens when you run silver setup-node server: it pushes a bunch of files over to the server, and runs a shell script. If you update either of the files or the shell script you run silver setup-node again to update the server. This is "push" because everything is initiated by the "master" (in this case, the developer’s personal computer).

Puppet uses a pull model. In this model there is a daemon running on every machine, and these machines call in to the master to see if there’s any new instructions for them. If there are, the daemon applies those instructions to the machine it is running on.

Grig identifies two big advantages to this pull model:

  1. When a new server comes up it can get instructions from the master and start doing things. You can’t push instructions to a server that isn’t there, and the server itself is most aware of when it is ready to do stuff.
  2. If a lot of servers come up, they can all do the setup work on their own, they only have to ask the master what to do.

But… I don’t buy either justification.

First: servers don’t just do things when they start up. To get this to work you have to create custom images with Puppet installed, and configured to know where the master is, and either the image or the master needs some indication of what kind of server you intended to create. All this is to avoid polling a server to see when it comes online. Polling a server is lame (and is the best Silver Lining can do right now), but avoiding polling can be done with something a lot simpler than a complete change from push to pull.

Second: there’s nothing unscalable about push. Look at those commands: one rsync and one ssh. The first is pretty darn cheap, and the second is cheap on the master and expensive on the remote machine (since it is doing things like installing stuff). You need to do it on lots of machines? Then fork a bunch of processes to run those two commands. This is not complicated stuff.

It is possible to write a push system that is hard to scale, if the master is doing lots of work. But just don’t do that. Upload your setup code to the remote server/slave and run it there. Problem fixed!

What are the advantages of push?

  1. Easy to bootstrap. A bare server can be setup with push, no customization needed. Any customization is another kind of configuration, and configuration should be automated, and… well, this is why it’s a bootstrap problem.
  2. Errors are synchronous: if your setup code doesn’t work, your push system will get the error back, you don’t need some fancy monitor and you don’t need to check any logs. Weird behavior is also synchronous; can’t tell why servers are doing something? Run the commands and watch the output.
  3. Development is sensible: if you have a change to your setup scripts, you can try it out from your machine. You don’t need to do anything exceptional, your machine doesn’t have to accept connections from the slave, you don’t need special instructions to keep the slave from setting itself up as a production machine, there’s no daemon that might need modifications… none of that. You change the code, you run it, it works.
  4. It’s just so damn simple. If you don’t start thinking about push and pull and other design choices, it simply becomes: do the obvious and easy thing.

In conclusion: push is sensible, pull is needless complexity.

Programming
Silver Lining

Comments (18)

Permalink

Joining Mozilla

As of last week, I am now an employee of Mozilla! Thanks to everyone who helped me out during my job search.

I’ll be working both with the Mozilla Web Development (webdev) team, and Mozilla Labs.

The first thing I’ll be working on is deployment. In part because I’ve been thinking about deployment lately, in part because streamlining deployment is just generally enabling of other work (and a personal itch to be scratched), and because I think there is the possibility to fit this work into Mozilla’s general mission, specifically Empowering people to do new and unanticipated things on the web. I think the way I’m approaching deployment has real potential to combine the discipline and benefits of good development practices with an accessible process that is more democratic and less professionalized. This is some of what PHP has provided over the years (and I think it’s been a genuinely positive influence on the web as a result); I’d like to see the same kind of easy entry using other platforms. I’m hoping Silver Lining will fit both Mozilla’s application deployment needs, as well as serving a general purpose.

Once I finish deployment and can move on (oh fuck what am I getting myself into) I’ll also be working with the web development group who has adopted Python for many of their new projects (e.g., Zamboni, a rewrite of the addons.mozilla.org site), and with Mozilla Labs on Weave or some of their other projects.

In addition my own Python open source work is in line with Mozilla’s mission and I will be able to continue spending time on those projects, as well as entirely new projects.

I’m pretty excited about this — it feels like there’s a really good match with Mozilla and what I’m good at, and what I care about, and how I care about it.

Mozilla
Non-technical
Programming
Python

Comments (30)

Permalink

toppcloud renamed to Silver Lining

After some pondering at PyCon, I decided on a new name for toppcloud: Silver Lining. I’ll credit a mysterious commenter "david" with the name idea. The command line is simply silversilver update has a nice ring to it.

There’s a new site: cloudsilverlining.org; not notably different than the old site, just a new name. The product is self-hosting now, using a simple app that runs after every commit to regenerate the docs, and with a small extension to Silver Lining itself (to make it easier to host static files). Now that it has a real name I also gave it a real mailing list.

Silver Lining also has its first test. Not an impressive test, but a test. I’m hoping with a VM-based libcloud backend that a full integration test can run in a reasonable amount of time. Some unit tests would be possible, but so far most of the bugs have been interaction bugs so I think integration tests will have to pull most of the weight. (A continuous integration rig will be very useful; I am not sure if Silver Lining can self-host that, though it’d be nice/clever if it could.)

Packaging
Programming
Python
Silver Lining
Web

Comments (5)

Permalink

Throw out your frameworks! (forms included)

No, I should say forms particularly.

I have lots of things to blog about, but nothing makes me want to blog like code. Ideas are hard, code is easy. So when I saw Jacob’s writeup about dynamic Django form generation I felt a desire to respond. I didn’t see the form panel at PyCon (I intended to but I hardly saw any talks at PyCon, and yet still didn’t even see a good number of the people I wanted to see), but as the author of an ungenerator and as a general form library skeptic I have a somewhat different perspective on the topic.

The example created for the panel might display that perspective. You should go read Jacob’s description; but basically it’s a simple registration form with a dynamic set of questions to ask.

I have created a complete example, because I wanted to be sure I wasn’t skipping anything, but I’ll present a trimmed-down version.

First, the basic control logic:

from webob.dec import wsgify
from webob import exc
from formencode import htmlfill

@wsgify
def questioner(req):
    questions = get_questions(req) # This is provided as part of the example
    if req.method == 'POST':
        errors = validate(req, questions)
        if not errors:
            ... save response ...
            return exc.HTTPFound(location='/thanks')
    else:
        errors = {}
    ## Here's the "form generation":
    page = page_template.substitute(
        action=req.url,
        questions=questions)
    page = htmlfill.render(
        page,
        defaults=req.POST,
        errors=errors)
    return Response(page)

def validate(req, questions):
    # All manual, but do it however you want:
    errors = {}
    form = req.POST
    if (form.get('password')
        and form['password'] != form.get('password_confirm')):
        errors['password_confirm'] = 'Passwords do not match'
    fields = questions + ['username', 'password']
    for field in fields:
        if not form.get(field):
            errors[field] = 'Please enter a value'
    return errors

I’ve just manually handled validation here. I don’t feel like doing it with FormEncode. Manual validation isn’t that big a deal; FormEncode would just produce the same errors dictionary anyway. In this case (as in many form validation cases) you can’t do better than hand-written validation code: it’s shorter, more self-contained, and easier to tweak.

After validation the template is rendered:

page = page_template.substitute(
    action=req.url,
    questions=questions)

I’m using Tempita, but it really doesn’t matter. The template looks like this:

<form action="{{action}}" method="POST">
New Username: <input type="text" name="username"><br />
Password: <input type="password" name="password"><br />
Repeat Password:
  <input type="password" name="password_confirm"><br />
{{for question in questions}}
  {{question}}: <input type="text" name="{{question}}"><br />
{{endfor}}
<input type="submit">
</form>

Note that the only "logic" here is to render the form to include fields for all the questions. Obviously this produces an ugly form, but it’s very obvious how you make this form pretty, and how to tweak it in any way you might want. Also if you have deeper dynamicism (e.g., get_questions start returning the type of response required, or weird validation, or whatever) it’s very obvious where that change would go: display logic goes in the form, validation logic goes in that validate function.

This just gives you the raw form. You wouldn’t need a template at all if it wasn’t for the dynamicism. Everything else is added when the form is "filled":

page = htmlfill.render(
    page,
    defaults=req.POST,
    errors=errors)

How exactly you want to calculate defaults is up to the application; you might want query string variables to be able to pre-fill the form (use req.params), you might want the form bare to start (like here with req.POST), you can easily implement wizards by stuffing req.POST into the session to repeat a form, you might read the defaults out of a user object to make this an edit form. And errors are just handled automatically, inserted into the HTML with appropriate CSS classes.

A great aspect of this pattern if you use it (I’m not even sure it deserves the moniker library): when HTML 5 Forms finally come around and we can all stop doing this stupid server-side overthought nonsense, you won’t have overthought your forms. Your mind will be free and ready to accept that the world has actually become simpler, not more complicated, and that there is knowledge worth forgetting (forms are so freakin’ stupid!) If at all possible, dodging complexity is far better than cleverly responding to complexity.

HTML
Programming
Python
Web

Comments (25)

Permalink

Why toppcloud (Silver Lining) will not be agnostic

I haven’t received a great deal of specific feedback on toppcloud (update: renamed Silver Lining), only a few people (Ben Bangert, Jorge Vargas) seem to have really dived deeply into it. But — and this is not unexpected — I have already gotten several requests about making it more agnostic with respect to… stuff. Maybe that it not (at least forever) require Ubuntu. Or maybe that it should support different process models (e.g., threaded and multiple processes). Or other versions of Python.

The more I think about it, and the more I work with the tool, the more confident I am that toppcloud should not be agnostic on these issues. This is not so much about an "opinionated" tool; toppcloud is not actually very opinionated. It’s about a well-understood system.

For instance, Ben noticed a problem recently with weird import errors. I don’t know quite why mod_wsgi has this particular problem (when other WSGI servers that I’ve used haven’t), but the fix isn’t that hard. So Ben committed a fix and the problem went away.

Personally I think this is a bug with mod_wsgi. Maybe it’s also a Python bug. But it doesn’t really matter. When a bug exists it "belongs" to everyone who encounters it.

toppcloud is not intended to be a transparent system. When it’s working correctly, you should be able to ignore most of the system and concentrate on the relatively simple abstractions given to your application. So if the configuration reveals this particular bug in Python/mod_wsgi, then the bug is essentially a toppcloud bug, and toppcloud should (and can) fix it.

A more flexible system can ignore such problems as being "somewhere else" in the system. Or, if you don’t define these problems as someone else’s problem, then a more flexible system is essentially always broken somewhere; there is always some untested combination, some new component, or some old component that might get pushed into the mix. Fixes for one person’s problem may introduce a new problem in someone else’s stack. Some fixes aren’t even clear. toppcloud has Varnish in place, so it’s quite clear where a fix related to Django and Varnish configuration goes. If these were each components developed by different people at different times (like with buildout recipes) then fixing something like this could get complicated.

So I feel very resolved: toppcloud will hardcode everything it possibly can. Python 2.6 and only 2.6! (Until 2.7, but then only 2.7!). Only Varnish/Apache/mod_wsgi. I haven’t figured out threads/processes exactly, but once I do, there will be only one way! And if I get it wrong, then everyone (everyone) will have to switch when it is corrected! Because I’d much rather have a system that is inflexible than one that doesn’t work. With a clear and solid design I think it is feasible to get this to work, and that is no small feat.

Relatedly, I think I’m going to change the name of toppcloud, so ideas are welcome!

Programming
Python
Silver Lining
Web

Comments (23)

Permalink

Leaving TOPP

After three and a half years at The Open Planning Project, my time there is done.

For a while TOPP has been trying to find itself, to determine what it is that it can do best, and how to do that. I think TOPP has finally started really figuring that out, focusing on civic participation, revisiting "planning", and though it’s not fully developed there seems to be a strong potential for TOPP to serve as a point of collaboration for other more ad hoc open source government efforts: government workers and volunteers can provide substantial and well-informed development efforts, but the long-term shepherding of a project is difficult, and TOPP has the potential to provide that kind of long-term neutral guidance. At the same time, communities of people have been developing around these issues; people have gotten past simply calling for government to be inclusive or transparent and have started to do the real work of making that happen.

But… unfortunately I won’t be able to figure out their next steps with them. Mark Gorton has been very generous in his support of TOPP, and helped us get started. But while he has been patient, and even at times seemingly immune from the economic trends… well, he’s not immune, and he has had to cut back his support for TOPP before we were able to become self-sufficient. And so there have been layoffs, myself among them.

I suspect what I’ll do next will have a very different focus. This feels a bit weird, a kind of lost identity. Overall I’m feeling pretty optimistic about finding something new and interesting to do, but I’m still a bit melancholy about leaving things behind. That TOPP and the people I’ve worked with are far away in New York makes it feel like more of a loss because I don’t know when I’ll be back next. But onward and upward! Now to find out what is next…

Non-technical

Comments (21)

Permalink

Weave: valuable client-side data

I’ve been looking at Weave some lately. The large-print summary on the page is Synchronize Your Firefox Experience Across Desktop and Mobile. Straight forward enough.

Years and years ago I stopped really using bookmarks. You lose them moving from machine to machine (which Weave could help), but mostly I stopped using them because it was too hard to (a) identify interesting content and place it into a taxonomy and (b) predict what I would later be interested in. If I wanted to refer to something I’d seen before there’s a good chance I wouldn’t have saved it, while my bookmarks would be flooded with things that time would show were of transient interest.

So… synchronizing bookmarks, eh. Saved form data and logins? Sure, that’s handy. It would make browsing on multiple machines nicer. But it feels more like a handy tweak.

All my really useful data is kept on servers, categorized and protected by a user account. Why is that? Well, of course, where else would you keep it? In cookies? Ha!

Why not in cookies? So many reasons… because cookies are opaque and can’t hold much data, can’t be exchanged, and probably worst of all they just disappear randomly.

What if cookies weren’t so impossibly lame for holding important data? Suddenly sync seems much more interesting. Instead of storing documents and data on a website, the website could put all that data right into your browser. And conveniently HTML 5 has an API for that. Everyone thinks about that API as a way of handling off-line caching, because while it handles many problems with cookies it doesn’t handle the problem of data disappearing as you move between computers and browser. That’s where Weave synchronization could change things. I don’t think this technique is something appropriate for every app (maybe not most apps), but it could allow a new class of applications.

Advantages: web development and scaling becomes easy. If you store data in the browser scaling is almost free; serving static pages is a Solved Problem. Development is easier because development and deployment of HTML and Javascript is pretty easy. Forking is easy — just copy all the resources. So long as you don’t hardcode absolute links into your Javascript, you can even just save from the browser and get a working local copy of the application.

Disadvantages: another silo. You might complain about Facebook keeping everyone’s data, but the data in Facebook is still more transparent than data held in files or locally with a browser. Let’s say you create a word processor that uses local storage for all its documents. If you stored that document online sharing and collaboration would be really easy; but with it stored locally the act of sharing is not as automatic, and collaboration is very hard. Sure, the "user" is in "control" of their data, but that would be more true on paper than in practice. Building collaboration on top of local storage is hard, and without that… maybe it’s not that interesting?

Anyway, there is an interesting (but maybe Much Too Hard) problem in there. (DVCS in the browser?)

Update: in this video Aza talks about just what I talk about here. A few months ago. The Weave APIs also allude to things like this, including collaboration. So… they are on it!

HTML
Web

Comments (1)

Permalink

toppcloud (Silver Lining) and Django

I wrote up instructions on using toppcloud (update: renamed Silver Lining) with Django. They are up on the site (where they will be updated in the future), but I’ll drop them here too…

Creating a Layout

First thing you have to do (after installing toppcloud of course) is create an environment for your new application. Do that like:

$ toppcloud init sampleapp

This creates a directory sampleapp/ with a basic layout. The first thing we’ll do is set up version control for our project. For the sake of documentation, imagine you go to bitbucket and create two new repositories, one called sampleapp and another called sampleapp-lib (and for the examples we’ll use the username USER).

We’ll go into our new environment and use these:

$ cd sampleapp
$ hg clone http://bitbucket.org/USER/sampleapp src/sampleapp
$ rm -r lib/python/
$ hg clone http://bitbucket.org/USER/sampleapp-lib lib/python
$ mkdir lib/python/bin/
$ echo "syntax: glob
bin/python*
bin/activate
bin/activate_this.py
bin/pip
bin/easy_install*
" > lib/python/.hgignore
$ mv bin/* lib/python/bin/
$ rmdir bin/
$ ln -s lib/python/bin bin

Now there is a basic layout setup, with all your libraries going into the sampleapp-lib repository, and your main application in the sampleapp repository.

Next we’ll install Django:

$ source bin/activate
$ pip install Django

Then we’ll set up a standard Django site:

$ cd src/sampleapp
$ django-admin.py sampleapp

Also we’d like to be able to import this file. It’d be nice if there was a setup.py file, and we could run pip -e src/sampleapp, but django-admin.py doesn’t create that itself. Instead we’ll get that on the import path more manually with a .pth file:

$ echo "../../src/sampleapp" > lib/python/sampleapp.pth

Also there’s the tricky $DJANGO_SETTINGS_MODULE that you might have had problems with before. We’ll use the file lib/python/toppcustomize.py (which is imported everytime Python is started) to make sure that is always set:

$ echo "import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'sampleapp.settings'
" > lib/python/toppcustomize.py

Also we have a file src/sampleapp/sampleapp/manage.py, and that file doesn’t work quite how we’d like. Instead we’ll put a file into bin/manage.py that does the same thing:

$ rm sampleapp/manage.py
$ cd ../..
$ echo '#!/usr/bin/env python
from django.core.management import execute_manager
from sampleapp import settings
if __name__ == "__main__":
    execute_manager(settings)
' > bin/manage.py
$ chmod +x bin/manage.py

Now, if you were just using plain Django you’d do something like run python manage.py runserver. But we’ll be using toppcloud serve instead, which means we have to set up the two other files toppcloud needs: app.ini and the runner. Here’s a simple app.ini:

$ echo '[production]
app_name = sampleapp
version = 1
runner = src/sampleapp/toppcloud-runner.py
' > src/sampleapp/toppcloud-app.ini
$ rm app.ini
$ ln -s src/sampleapp/toppcloud-app.ini app.ini

The file must be in the "root" of your application, and named app.ini, but it’s good to keep it in version control, so we set it up with a symlink.

It also refers to a "runner", which is the Python file that loads up the WSGI application. This looks about the same for any Django application, and we’ll put it in src/sampleapp/toppcloud-runner.py:

$ echo 'import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
' > src/sampleapp/toppcloud-runner.py

Now if you want to run the application, you can:

$ toppcloud serve .

This will load it up on http://localhost:8080, and serve up a boring page. To do something interesting we’ll want to use a database.

Setting Up A Database

At the moment the only good database to use is PostgreSQL with the PostGIS extensions. Add this line to app.ini:

service.postgis =

This makes the database "available" to the application. For development you still have to set it up yourself. You should create a database sampleapp on your computer.

Next, we’ll need to change settings.py to use the new database configuration. Here’s the lines that you’ll see:

DATABASE_ENGINE = ''           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
DATABASE_NAME = ''             # Or path to database file if using sqlite3.
DATABASE_USER = ''             # Not used with sqlite3.
DATABASE_PASSWORD = ''         # Not used with sqlite3.
DATABASE_HOST = ''             # Set to empty string for localhost. Not used with sqlite3.
DATABASE_PORT = ''             # Set to empty string for default. Not used with sqlite3.

First add this to the top of the file:

import os

Then you’ll change those lines to:

DATABASE_ENGINE = 'postgresql_psycopg2'
DATABASE_NAME = os.environ['CONFIG_PG_DBNAME']
DATABASE_USER = os.environ['CONFIG_PG_USER']
DATABASE_PASSWORD = os.environ['CONFIG_PG_PASSWORD']
DATABASE_HOST = os.environ['CONFIG_PG_HOST']
DATABASE_PORT = ''

Now we can create all the default tables:

$ manage.py syncdb
Creating table auth_permission
Creating table auth_group
Creating table auth_user
Creating table auth_message
Creating table django_content_type
Creating table django_session
Creating table django_site
...

Now we have an empty project that doesn’t do anything. Let’s make it do a little something (this is all really based on the Django tutorial).

$ manage.py startapp polls

Django magically knows to put the code in src/sampleapp/sampleapp/polls/ — we’ll setup the model in src/sampleapp/sampleapp/polls/models.py:

from django.db import models

class Poll(models.Model):
    question = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')

class Choice(models.Model):
    poll = models.ForeignKey(Poll)
    choice = models.CharField(max_length=200)
    votes = models.IntegerField()

And activate the application by adding ’sampleapp.polls’ to INSTALLED_APPS in src/sampleapp/sampleapp/settings.py. Also add ‘django.contrib.admin’ to get the admin app in place. Run manage.py syncdb to get the tables in place.

You can try toppcloud serve . and go to /admin/ to login and see your tables. You might notice all the CSS is broken.

toppcloud serves static files out of the static/ directory. You don’t actually put static in the URLs, these files are available at the top-level (unless you create a static/static/ directory). The best way to put files in there is generally symbolic links.

For Django admin, do this:

$ cd static
$ ln -s ../lib/python/django/contrib/admin/media admin-media

Now edit src/sampleapp/sampleapp/settings.py and change ADMIN_MEDIA_PREFIX to ‘/admin-media’.

(Probably some other links should be added.)

One last little thing you might want to do; replace this line in
settings:

SECRET_KEY = 'ASF#@$@#JFAS#@'

With this:

from tcsupport.secret import get_secret
SECRET_KEY = get_secret()

Then you don’t have to worry about checking a secret into version control.

You still don’t really have an application, but the rest is mere "programming" so have at it!

Programming
Python
Silver Lining
Web

Comments (1)

Permalink

A new way to deploy web applications

Deployment is one of the things I like least about development, and yet without deployment the development doesn’t really matter.

I’ve tried a few things (e.g. fassembler), built a few things (virtualenv, pip), but deployment just sucked less as a result. Then I got excited about App Engine; everyone else was getting excited about "scaling", but really I was excited about an accessible deployment process. When it comes to deployment App Engine is the first thing that has really felt good to me.

But I can’t actually use App Engine. I was able to come to terms with the idea of writing an application to the platform, but there are limits… and with App Engine there were simply too many limits. Geo stuff on App Engine is at best a crippled hack, I miss lxml terribly, I never hated relational databases, and almost nothing large works without some degree of rewriting. Sometimes you can work around it, but you can never be sure you won’t hit some wall later. And frankly working around the platform is tiring and not very rewarding.


So… App Engine seemed neat, but I couldn’t use it, and deployment was still a problem.

What I like about App Engine: an application is just files. There’s no build process, no fancy copying of things in weird locations, nothing like that; you upload files, and uploading files just works. Also, you can check everything into version control. Not just your application code, but every library you use, the exact files that you installed. I really wanted a system like that.

At the same time, I started looking into "the cloud". It took me a while to get a handle on what "cloud computing" really means. What I learned: don’t overthink it. It’s not magic. It’s just virtual private servers that can be requisitioned automatically via an API, and are billed on a short time cycle. You can expand or change the definition a bit, but this definition is the one that matters to me. (I’ve also realized that I cannot get excited about complicated solutions; only once I realized how simple cloud computing is could I really get excited about the idea.)

Given the modest functionality of cloud computing, why does it matter? Because with a cloud computing system you can actually test the full deployment stack. You can create a brand-new server, identical to all servers you will create in the future; you can set this server up; you can deploy to it. You get it wrong, you throw away that virtual server and start over from the beginning, fixing things until you get it right. Billing is important here too; with hourly billing you pay cents for these tests, and you don’t need a pool of ready servers because the cloud service basically manages that pool of ready servers for you.

Without "cloud computing" we each too easily find ourselves in a situation where deployments are ad hoc, server installations develop over time, and servers and applications are inconsistent in their configuration. Cloud computing makes servers disposable, which means we can treat them in consistent ways, testing our work as we go. It makes it easy to treat operations with the same discipline as software.

Given the idea from App Engine, and the easy-to-use infrastructure of a cloud service, I started to script together something to manage the servers and start up applications. I didn’t know what exactly I wanted to do to start, and I’m not completely sure where I’m going with this. But on the whole this feels pretty right. So I present the provisionally-named: toppcloud (Update: this has been renamed Silver Cloud).


How it works: first you have a directory of files that defines your application. This probably includes a checkout of your "application" (let’s say in src/mynewapp/), and I find it also useful to use source control on the libraries (which are put in lib/python/). There’s a file in app.ini that defines some details of the application (very similar to app.yaml).

While app.ini is a (very minimal) description of the application, there is no description of the environment. You do not specify database connection details, for instance. Instead your application requests access to a database service. For instance, one of these services is a PostgreSQL/PostGIS database (which you get if you put service.postgis in your app.ini file). If you ask for that then there will be evironmental variables, CONFIG_PG_DBNAME etc., that will tell your application how to connect to the database. (For local development you can provide your own configuration, based on how you have PostgreSQL or some other service installed.)

The standard setup is also a virtualenv environment. It is setup so every time you start that virtualenv environment you’ll get those configuration-related environmental variables. This means your application configuration is always present, your services always available. It’s available in tests just like it is during a request. Django accomplishes something similar with the (much maligned) $DJANGO_SETTINGS_MODULE but toppcloud builds it into the virtualenv environment instead of the shell environment.

And how is the server setup? Much like with App Engine that is merely an implementation detail. Unlike App Engine that’s an implementation detail you can actually look at and change (by changing toppcloud), but it’s not something you are supposed to concern yourself with during regular application development.

The basic lifecycle using toppcloud looks like:

toppcloud create-node
Create a new virtual server; you can create any kind of supported server, but only Ubuntu Jaunty or Karmic are supported (and Jaunty should probably be dropped). This step is where the "cloud" part actually ends. If you want to install a bare Ubuntu onto an existing physical machine that’s fine too — after toppcloud create-node the "cloud" part of the process is pretty much done. Just don’t go using some old Ubuntu install; this tool is for clean systems that are used only for toppcloud.
toppcloud setup-node
Take that bare Ubuntu server and set it up (or update it) for use with toppcloud. This installs all the basic standard stuff (things like Apache, mod_wsgi, Varnish) and some management script that toppcloud runs. This is written to be safe to run over and over, so upgrading and setting up a machine are the same. It needs to be a bare server, but
toppcloud init path/to/app/
Setup a basic virtualenv environment with some toppcloud customizations.
toppcloud serve path/to/app
Serve up the application locally.
toppcloud update –host=test.example.com path/to/app/
This creates or updates an application at the given host. It edits /etc/hosts so that the domain is locally viewable.
toppcloud run test.example.com script.py
Run a script (from bin/) on a remote server. This allows you to run things like django-admin.py syncdb.

There’s a few other things — stuff to manage the servers and change around hostnames or the active version of applications. It’s growing to fit a variety of workflows, but I don’t think its growth is unbounded.


So… this is what toppcloud. From the outside it doen’t do a lot. From the inside it’s not actually that complicated either. I’ve included a lot of constraints in the tool but I think it offers an excellent balance. The constraints are workable for applications (insignificant for many applications), while still exposing a simple and consistent system that’s easier to reason about than a big-ball-of-server.

Some of the constraints:

  1. Binary packages are supported via Ubuntu packages; you only upload portable files. If you need a library like lxml, you need to request that package (python-lxml) to be installed in your app.ini. If you need a version of a binary library that is not yet packaged, I think creating a new deb is reasonable.
  2. There is no Linux distribution abstraction, but I don’t care.
  3. There is no option for the way your application is run — there’s one way applications are run, because I believe there is a best practice. I might have gotten the best practice wrong, but that should be resolved inside toppcloud, not inside applications. Is Varnish a terrible cache? Probably not, but if it is we should all be able to agree on that and replace it. If there are genuinely different needs then maybe additional application or deployment configuration will be called for — but we shouldn’t add configuration just because someone says there is a better practice (and a better practice that is not universally better); there must be justifications.
  4. By abstracting out services and persistence some additional code is required for each such service, and that code is centralized in toppcloud, but it means we can also start to add consistent tools usable across a wide set of applications and backends.
  5. All file paths have to be relative, because files get moved around. I know of some particularly problematic files (e.g., .pth files), and toppcloud fixes these automatically. Mostly this isn’t so hard to do.

These particular compromises are ones I have not seen in many systems (and I’ve started to look more). App Engine I think goes too far with its constraints. Heroku is close, but closed source.

This is different than a strict everything-must-be-a-package strategy. This deployment system is light and simple and takes into account reasonable web development workflows. The pieces of an application that move around a lot are all well-greased and agile. The parts of an application that are better to Do Right And Then Leave Alone (like Apache configuration) are static.

Unlike generalized systems like buildout this system avoids "building" entirely, making deployment a simpler and lower risk action, leaning on system packages for the things they do best. Other open source tools emphasize a greater degree of flexibility than I think is necessary, allowing people to encode exploratory service integration into what appears to be an encapsulated build (I’m looking at you buildout).

Unlike requirement sets and packaging and versioning libraries, this makes all the Python libraries (typically the most volatile libraries) explicit and controlled, and can better ensure that small updates really are small. It doesn’t invalidate installers and versioning, but it makes that process even more explicit and encourages greater thoughtfulness.

Unlike many production-oriented systems (what I’ve seen in a lot of "cloud" tools) this encorporates both the development environment and production environment; but unlike some developer-oriented systems this does not try to normalize everyone’s environment and instead relies on developers to set up their systems however is appropriate. And unlike platform-neutral systems this can ensure an amount of reliability and predictability through extremely hard requirements (it is deployed on Ubuntu Jaunty/Karmic only).

But it’s not all constraints. Toppcloud is solidly web framework neutral. It’s even slightly language neutral. Though it does require support code for each persistence technique, it is fairly easy to do, and there are no requirements for "scalable systems"; I think unscalable systems are a perfectly reasonable implementation choice for many problems. I believe a more scalable system could be built on this, but as a deployment and development option, not a starting requirement.

So far I’ve done some deployments using toppcloud; not a lot, but some. And I can say that it feels really good; lots of rough edges still, but the core concept feels really right. I’ve made a lot of sideways attacks on deployment, and a few direct attacks… sometimes I write things that I think are useful, and sometimes I write things that I think are right. Toppcloud is at the moment maybe more right than useful. But I genuinely believe this is (in theory) a universally appropriate deployment tool.


Alright, so now you think maybe you should look more at toppcloud…

Well, I can offer you a fair amount of documentation. A lot of that documentation refers to design, and a bit of it to examples. There’s also a couple projects you can look at; they are all small, but :

  • Frank (will be interactivesomerville.org) which is another similar Django/Pinax project (Pinax was a bit tricky). This is probably the largest project. It’s a Django/Pinax volunteer-written application for collecting community feedback the Boston Greenline project, if that sounds interesting to you might want to chip in on the development (if so check out the wiki).
  • Neighborly, with minimal functionality (we need to run more sprints) but an installation story.
  • bbdocs which is a very simple bitbucket document generator, that makes the toppcloud site.
  • geodns which is another simple no-framework PostGIS project.

Now, the letdown. One thing I cannot offer you is support. THERE IS NO SUPPORT. I cannot now, and I might never really be able to support this tool. This tool is appropriate for collaborators, for people who like the idea and are ready to build on it. If it grows well I hope that it can grow a community, I hope people can support each other. I’d like to help that happen. But I can’t do that by bootstrapping it through unending support, because I’m not good at it and I’m not consistent and it’s unrealistic and unsustainable. This is not a open source dead drop. But it’s also not My Future; I’m not going to build a company around it, and I’m not going to use all my free time supporting it. It’s a tool I want to exist. I very much want it to exist. But even very much wanting something is not the same as being an undying champion, and I am not an undying champion. If you want to tell me what my process should be, please do!


If you want to see me get philosophical about packaging and deployment and other stuff like that, see my upcoming talk at PyCon.

Packaging
Programming
Python
Silver Lining
Web

Comments (11)

Permalink