<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: lxml: an underappreciated web scraping library</title>
	<atom:link href="http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/</link>
	<description></description>
	<pubDate>Thu, 18 Mar 2010 01:10:15 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: sikis</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-151030</link>
		<dc:creator>sikis</dc:creator>
		<pubDate>Mon, 08 Feb 2010 09:05:41 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-151030</guid>
		<description>Nice summarization of lxml features, thanks.</description>
		<content:encoded><![CDATA[<p>Nice summarization of lxml features, thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Benjamin Sergeant</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-150037</link>
		<dc:creator>Benjamin Sergeant</dc:creator>
		<pubDate>Mon, 01 Feb 2010 18:17:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-150037</guid>
		<description>I don't know how much code goes in the cpython code but maybe if the division is well defined between C calls to libxml libs one could write some wrappers to simulate libxml calls using ElementTree which is one of python battery (not sure about the xpath support). Not a small project but something that could be usefull for say Google App Engine users or Jython/IronPython/Pypy users.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t know how much code goes in the cpython code but maybe if the division is well defined between C calls to libxml libs one could write some wrappers to simulate libxml calls using ElementTree which is one of python battery (not sure about the xpath support). Not a small project but something that could be usefull for say Google App Engine users or Jython/IronPython/Pypy users.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pornoizle</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-141395</link>
		<dc:creator>pornoizle</dc:creator>
		<pubDate>Fri, 04 Dec 2009 18:26:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-141395</guid>
		<description>i second that installing lxml can be a hassle; i only managed to make it run on windows by manually downloading and installing a binary package. definitely one great advantage of BeautifulSoup: a single fairly short *.py file, and you’re good to go.</description>
		<content:encoded><![CDATA[<p>i second that installing lxml can be a hassle; i only managed to make it run on windows by manually downloading and installing a binary package. definitely one great advantage of BeautifulSoup: a single fairly short *.py file, and you’re good to go.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neutrino</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-133623</link>
		<dc:creator>Neutrino</dc:creator>
		<pubDate>Fri, 16 Oct 2009 07:57:03 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-133623</guid>
		<description>Lxml is one of those Python libraries that should be really high profile given its usefulness. Sadly it is also one of those odd projects peculiar to open source that doesn't have a forum and as a consequence receives little attention from anyone not prepared to maintain a database of mailing lists on every workstation they use for every single piece of software or technology subject they are interested in.</description>
		<content:encoded><![CDATA[<p>Lxml is one of those Python libraries that should be really high profile given its usefulness. Sadly it is also one of those odd projects peculiar to open source that doesn&#8217;t have a forum and as a consequence receives little attention from anyone not prepared to maintain a database of mailing lists on every workstation they use for every single piece of software or technology subject they are interested in.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jesse</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-125779</link>
		<dc:creator>jesse</dc:creator>
		<pubDate>Tue, 25 Aug 2009 20:58:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-125779</guid>
		<description>Has anybody looked at http://scrapy.org/.</description>
		<content:encoded><![CDATA[<p>Has anybody looked at <a href="http://scrapy.org/" rel="nofollow">http://scrapy.org/</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mauro.degiorgi@gmail</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-115072</link>
		<dc:creator>mauro.degiorgi@gmail</dc:creator>
		<pubDate>Fri, 03 Jul 2009 17:47:02 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-115072</guid>
		<description>You must insert the trailing slash:

doc = parse(’http://java.sun.com/’).getroot()

work :) dont ask me why...</description>
		<content:encoded><![CDATA[<p>You must insert the trailing slash:</p>

<p>doc = parse(’http://java.sun.com/’).getroot()</p>

<p>work :) dont ask me why&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Bicking</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-113673</link>
		<dc:creator>Ian Bicking</dc:creator>
		<pubDate>Fri, 26 Jun 2009 00:32:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-113673</guid>
		<description>No, not at all -- I just tried it and it worked fine.</description>
		<content:encoded><![CDATA[<p>No, not at all &#8212; I just tried it and it worked fine.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: JohnMc</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-113672</link>
		<dc:creator>JohnMc</dc:creator>
		<pubDate>Thu, 25 Jun 2009 23:26:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-113672</guid>
		<description>Ian,

first, thanks lxml is a hoot. I'll grant that getting lxml install is a bit of a pain but not that bad. Using the STATIC_DEPS suggestion was a big help as well.

However I have noticed something working through your web scraping examples.

Working through this --

from lxml.html import parse
doc = parse('http://java.sun.com').getroot()
for link in doc.cssselect('div.pad a'):
    print '%s: %s' % (link.text_content(), link.get('href'))

I receive a failure on the parse().getroot() statement. However if I do the following -- 

import urllib
from lxml.html import *
content = urllib.urlopen('http://java.sun.com').read()
doc = fromstring(content)
for link in doc.cssselect('div.pad a'):
    print '%s: %s' % (link.text_content(), link.get('href'))

it works.
Have you seen this behavior before?</description>
		<content:encoded><![CDATA[<p>Ian,</p>

<p>first, thanks lxml is a hoot. I&#8217;ll grant that getting lxml install is a bit of a pain but not that bad. Using the STATIC_DEPS suggestion was a big help as well.</p>

<p>However I have noticed something working through your web scraping examples.</p>

<p>Working through this &#8211;</p>

<p>from lxml.html import parse
doc = parse(&#8217;http://java.sun.com&#8217;).getroot()
for link in doc.cssselect(&#8217;div.pad a&#8217;):
    print &#8216;%s: %s&#8217; % (link.text_content(), link.get(&#8217;href&#8217;))</p>

<p>I receive a failure on the parse().getroot() statement. However if I do the following &#8212; </p>

<p>import urllib
from lxml.html import *
content = urllib.urlopen(&#8217;http://java.sun.com&#8217;).read()
doc = fromstring(content)
for link in doc.cssselect(&#8217;div.pad a&#8217;):
    print &#8216;%s: %s&#8217; % (link.text_content(), link.get(&#8217;href&#8217;))</p>

<p>it works.
Have you seen this behavior before?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pamela Scoot</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-95745</link>
		<dc:creator>Pamela Scoot</dc:creator>
		<pubDate>Thu, 09 Apr 2009 11:52:50 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-95745</guid>
		<description>Thank you for clarifying the reasons to use lxml . Many have doubt about lxml and rumor is going on that lxml is not comfortable to use comparing BeautifulSoup. But your post has given a urge to rethink over the matter.</description>
		<content:encoded><![CDATA[<p>Thank you for clarifying the reasons to use lxml . Many have doubt about lxml and rumor is going on that lxml is not comfortable to use comparing BeautifulSoup. But your post has given a urge to rethink over the matter.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Web Developer - Shom</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-93053</link>
		<dc:creator>Web Developer - Shom</dc:creator>
		<pubDate>Thu, 02 Apr 2009 11:28:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-93053</guid>
		<description>Thanks for this beautiful post. I haven't try lxml till now because I was not sure whether it will be as useful as beautiful soap or not. But after reading this post I changed my mind and I will surely try lxml.</description>
		<content:encoded><![CDATA[<p>Thanks for this beautiful post. I haven&#8217;t try lxml till now because I was not sure whether it will be as useful as beautiful soap or not. But after reading this post I changed my mind and I will surely try lxml.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 3.850 seconds -->
