Book Idea: Python Optimization
Every so often it seems like some Python figure thinks about writing a
book, but they aren't sure what to write about. So, for anyone like
that, here's an idea for the taking: Python optimization. It seems
like it has a fairly narrow scope, but I think there's actually a lot
of material there. Here's some things I can think of:
- Profiling, where all optimization should start.
- In-python optimizations:
- Caching (this is where I always start). Also pooling.
- Factoring applications into long-running processes, when
startup times are a problem. Also tips like lazy loading of
code.
- Other tricks to speed startup time. I understand things like
the search path can significantly effect some of this.
- Streaming... I feel like there's a term I am missing here.
Like the difference between SAX XML parsing and DOM, where SAX
encourages the programmer to do all the work possible as the
document is being parsed, while DOM typically does all the parsing
up front. That's one example of a larger concept, and one that
dramatically effects performance.
- Note important modules that may be forgotten. E.g., StringIO
- Lots of little tips, e.g., lst.sort(); lst.reverse() instead of
lst.sort(lambda a, b: cmp(b, a)).
- Numeric and related modules. There's a lot of novel ways of
using these tools outside of their core domain.
- Pyrex.
- Extensions in C, thinking particularly about those inner
loops. I think this subject could remain fairly tight as long as
you are focused on optimization, rather than the entire subject of
programming C Python extensions.
- Psyco. It's kind of magic, so it might be hard to talk about,
but there is some important information about how it works.
- Microthreads. There's a lot of flavors and ideas out there,
some built on generators, some not. Perhaps some talk of Stackless,
or Greenlets (the mysterious new kid on the block).
- Asynchronous programming. This is a big topic, but at least
there should be a discussion of the performance characteristics, and
enough information to get a feel.
- XML parsing. It seems very specific, but there's a bunch of
alternatives, and performance can be a significant issue.
- GUI programming, particularly techniques to make a GUI
responsive. This might be difficult to address without covering GUI
programming as a whole, so maybe this wouldn't work.
I'm sure there's things I'm not thinking of, and there's a lot of
research that would go into it. I think the result could be a really
good book, though, with something for every level. Like a Python
Cookbook, only more specific. There's a fair amount of competition
among generic Python books, so specific topics seem to have more
potential. It could also be quite popular; it's something that would
catch the eye of even non-Python programmers, as there's many people
who want to use Python, but are concerned about the performance. I
think that concern is often misplaced, but it's there nonetheless.
Performance is also exciting to people, in the way that gets people to
buy books.
Anyway, there's my idea, maybe it'll be helpful to someone.
Created 17 Sep '04
Modified 14 Dec '04
Here's another idea: "Design and Interpretation of Runtime Systems". Every computer science department on the planet offers a course on compilers, but how many offer a course on the theory and practice of what you have to do at runtime? Linking and loading, byte-code interpreters, just-in-time compilation---there's a lot there, but no-one has ever written it all down in one place.
I think Python would be an running example for such a book, if one included Psyco, IronPython, etc.
An optimization book would be great. Many people who use Python have rich programming backgrounds and can transfer universal optimization concepts like caching and pooling to Python. However I believe for a significant number of Python users that Python is a first or at most second language. This group doesn't have the benefit of lessons learned after years of programming in C, Java, or Perl to apply to optimizing Python programs.
Rather than a dead trees book, how about a wikipedia-like public repository, with many contributors?