There’s a bunch of techniques for doing deployments of long-running processes (Zope, Python server, Rails, etc). A pretty good technique is to do HTTP proxying. There’s some details and conventions I’d like to see for HTTP, but that’s not my concern here.
HTTP proxying isn’t great for commodity hosting. Mostly you need to set up a new long-running process, and commodity hosts don’t make that easy or reliable. FastCGI offers one solution to that, essentially putting the process management into Apache or whatever web server you are using.
The problem with FastCGI is that it is finicky. There’s lots of configuration parameters, lots of parts don’t work right, and there seems to be a golden path where things actually work but it’s hard to know exactly what that is.
Another technique that has been used in the past instead of FastCGI is a very small CGI script. One example in SCGI is called cgi2scgi. This small script is fast to run (it compiles to 12kb), and all it does is take the CGI request and turn it into a SCGI request to a long-running server.
This is a nice start, and easy to deploy, except it doesn’t handle long-running processes. A great feature to add to something like this would be simple process management. I imagine something where if the socket (named or a port) that the cgi2scgi script connects to isn’t up or working, it runs a script that will start the server. If another request comes in while the server is starting up, it shouldn’t try to start the server twice. If the server is randomly killed (as is common on commodity hosters) then the next request will try to bring the server up.
Unlike FastCGI, this won’t try to handle different process models or anything fancy. It’s up to the startup script to set everything up properly, start multiple worker processes if necessary, etc. There’s probably some tricky details I haven’t thought of, and it’s slightly annoying to write all this in C (but necessary, since it’s part of the CGI script, which must be small). But I think it can be done better than existing in-the-wild FastCGI implementations.
And when we’re done, I think we could have something that would be a really good basis for commodity hosting of a whole bunch of non-PHP frameworks. You can distribute the Linux binaries, as all the Commodity Hosts That Matter can run those (even the BSD ones should be fine). Easy application installation practically falls right out of that.
You really need to have a look at where I am going with mod_wsgi (www.modwsgi.org). Although it has ‘embedded’ mode, which works like mod_python, it also has ‘daemon’ mode which in some ways, but not all, works like FASTCGI/SCGI solutions but which I would suggest is easier to configure and manage especially since you do not need any separate framework like flup and the one script file can work for either ‘embedded’ or ‘daemon’ mode and can generally be plugged in pretty well as is into other WSGI hosting solutions.
At the moment the ‘daemon’ mode of mod_wsgi is biased specifically towards long running processes. This is okay for small to medium commodity web hosting environments where setting up sites is a bit more hands on and they have low site density, but for large scale commodity web hosting a bit more work is still required to probably make mod_wsgi acceptable.
The problem with large scale commodity web hosting sites is that they usually drive the configuration from something like LDAP so as to avoid restarting Apache when adding new sites. Further, they want a homogeneous environment which can support the various web languages such as PHP, Python etc. There setup is generally tailored to PHP, but they don’t want to have to run a separate set of boxes just to meet the needs of Python users. Finally, they more often than not don’t run everything on one machine but load balance a lot of sites across many actual machines.
To them, a long running process is evil, as even if a small percentage of the possibly thousands of sites they host want such a long running process, it means such a process for each user on every machine in their cluster. Since Python applications are generally pretty fat they just see it as wasted memory. End result is that it reduces their site density and thus increases their costs.
For these sites, their ideal scenario is still a system whereby the Python application, although running in a separate process, is shutdown down after a relatively short period of inactivity (measured as requests arriving). That way, since a reasonable percentage of sites wouldn’t see that much traffic, they reclaim the memory thereby avoiding running with high overall memory usage.
Although some FASTCGI/SCGI solutions support transient processes, mod_wsgi doesn’t support it in the first version. The intention though is to add it in a subsequent version. As with what has been done with mod_wsgi so far, the intention is to make configuration easy yet flexible enough to accommodate such large scale commodity web hosting companies. Also, it must all be manageable via Apache and its configuration. They generally don’t want to have some separate supervisor like system for processes which is distinct from Apache as it just means one more thing to configure and manage, increasing the complexity of their systems.
If allowance isn’t made for such automated systems for configuration of large numbers of sites then they will still view it as not adequate. Because there is going to be varied ways that these sorts of companies do their configuration, it may be quite tricky to get it right and be flexible enough to meet all requirements.
Quickly looking at things from the Python users perspective, one of the most important things is going to be providing an easy way for the user to restart just their application processes so that code changes can be picked up. This can’t involve restarting Apache as a whole. FASTCGI/SCGI solutions use methods whereby changing the main script file causes a restart, but based on complaints one sees, this isn’t always reliable for some implementations.
In mod_wsgi it currently primarily relies on signals for a complete application restart when using ‘daemon’ mode. Such a signal could be sent by the application itself from a protected page added to the application itself by the user. Alternatively, it could be sent from a user management web page provided by the web hosting company. Using signals like this is fine if everything is on the one box, but not adequate if a web hosting companies uses a cluster of boxes.
Although basing a restart on changes to the script file may seem better, it also has its own problems because of the multi process nature of Apache and because the initial Apache process receiving the request would generally be running as a different user to the application. This process therefore may not have the privileges necessary to send a signal to a application process to get it to shutdown and restart before a request is sent to it. Thus one has to base detection of changes to the script into the application process itself so it can request its own shutdown and restart. This checking possibly has to somehow run independent of requests arriving and would have to be aware of things such as idle process shutdown requirements imposed by the configuration.
So, various problems still to be solved. Glad though to see you are thinking about it. I was getting a bit tired of everyone simply grumbling about the lack of solutions and not seeing any one else seemingly actively doing anything to improve the situation. :-)
When you post posted this blog item, I was actually half way through writing my own blog item about whether mod_wsgi is suitable yet for large scale commodity web hosting. That is why I was so readily able to rant off about the subject. Anyway, I have now finished this blog item and you can find it here.
You remind me of a product for Perl, called ’speedycgi’. Somewhat of a misnomer, as its utility is broader than just cgi, but it resembles the cgi2scgi exe you describe above, plus management of the backend processes.
You can read more at:
http://daemoninc.com/SpeedyCGI/
No, I am not any part of its development, but I was a happy user for a couple of years. It had the merit that you could enable it or disable it by just changing the scripts #! line from ‘perl …’ to ’speedycgi ..’ and back. It was/is open-source, in C, and the code could be borrowed.
David
BTW: Thank you for all the WSGI work you have done for the Python community.
I don’t get all this FUD about FastCGI. I haven’t tried modfastcgi, but modfcgid works fine and out of the box - simple to configure and pretty fast, too. Even if it didn’t work, it would surely be much better to spend time on improving implementations of a generic protocol such as FastCGI than on creating yet another Python-specific Apache module (modpython, modscgi, modwebkit, modwsgi…) - isn’t unification the holy grail of the church of WSGI? And how come FastCGI is ‘difficult’ yet WSGI, which is just a glorified pipe but apparently needs a bazillion of tutorials, wrappers and facelifts to render it fit for human consumption is ‘easy’?
Michael, unfortunately you are just adding to the FUD that is out there. Specifically, like a lot of people out there you seem not to really understand the difference between FASTCGI and WSGI.
FASTCGI/SCGI/AJP are all wire protocols for interacting between a web server and a separate web server application process. WSGI on the other hand is a Python specific programmatic interface (or API). Thus they are quite different things.
If anything, it is FASTCGI/SCGI/AJP which are glorified pipes. It becomes quite clear that WSGI isn’t when one realises that a WSGI application can be hosted within the same process as the web server, unlike FASTCGI/SCGI/AJP hosted applications which MUST be in a separate process.
Thus, WSGI is a higher level programming API. In fact, a WSGI application can be hosted on top of FASTCGI/SCGI/AJP implementations and this actually reflects the point of WSGI, that it is a common web server interface that can be hosted on top of multiple hosting technologies. It is because it is a higher level API and not just a constrained wire protocol that there is more documentation and tutorials about describing it.
Actually, one of the problems with FASTCGI/SCGI/AJP is the lack of documentation and it is that reason that so many people have trouble with it. Most of the time when you see people have problems, the answer is nearly always ‘here is the configuration I used’, or ’see my blog where I explain how I eventually got it to work’. What you don’t see is people pointing to a central really comprehensive and definitive set of documentation for FASTCGI on a web site as there isn’t really such a thing. What is out there generally amounts to a few pages if that. If you truly want to understand the FASTCGI ‘wire protocol’ properly, as well as reading the minimal documentation that exists, you generally still have to go and work it out from the source code.
Anyway, for WSGI ultimately the hosting mechanism for WSGI doesn’t matter. If you are happy with FASTCGI then use it. Other people will still use other hosting solutions because often each has its own advantages and disadvantages, with some also offering more comprehensive features. The ease of configuration of solutions can also be a point, with some more complicated than others and FASTCGI solutions by no means necessarily always being the easiest to setup and configure, especially when you are subject to how an ISP has set it up to be used.
So, choice of hosting solution is good, and what you need to realise is in saying ‘isn’t unification the holy grail of the church of WSGI’, the answer is ‘yes’, but at a layer higher than the wire protocol level you are looking at.
OK.
I wound up in this place looking for some useful information, which I didn’t find in the verbiage, and therefore relieved my frustration by ranting a little. I certainly didn’t mean to confuse FCGI and WSGI - I’m aware of the differences and took it for granted that the reader would be, too. I realize that WSGI does not connect OS processes - what I was referring to is that it provides a mechanism for ‘piping’ together stuff within a Python app. WSGI certainly is less complicated than FastCGI, but considering the trivial problem it solves it DOES have a very cumbersome interface.