<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: lxml: an underappreciated web scraping library</title>
	<atom:link href="http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/</link>
	<description></description>
	<lastBuildDate>Fri, 06 May 2011 07:16:39 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: ddavout</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-176786</link>
		<dc:creator>ddavout</dc:creator>
		<pubDate>Sun, 10 Oct 2010 11:49:05 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-176786</guid>
		<description>Thanks a lot to help me to overcome my &quot;fear&#039; in front of lxml. As a beginner I am quite happy to have managed to take all the data I needed that were sparsed on more than 400 web pages.(with the help of xmlstarlet to clean up manually some pages)
No problem whatsoever to install with the debian packages.
Thanks a lot</description>
		<content:encoded><![CDATA[<p>Thanks a lot to help me to overcome my &#8220;fear&#8217; in front of lxml. As a beginner I am quite happy to have managed to take all the data I needed that were sparsed on more than 400 web pages.(with the help of xmlstarlet to clean up manually some pages)
No problem whatsoever to install with the debian packages.
Thanks a lot</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: manju</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-165889</link>
		<dc:creator>manju</dc:creator>
		<pubDate>Fri, 25 Jun 2010 10:31:15 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-165889</guid>
		<description>Hi Ian,
in the first lxml example that you have given,I think instead of
doc = parse(&#039;http://java.sun.com&#039;).getroot()
it should be 

from urllib2 import urlopen
doc=parse(urlopen(&#039;http://java.sun.com&#039;)).getroot()

as parse does not fetch the website.
As i have said, the first one is giving an error but the second one is working fine for me.</description>
		<content:encoded><![CDATA[<p>Hi Ian,
in the first lxml example that you have given,I think instead of
doc = parse(&#8216;<a href="http://java.sun.com" rel="nofollow">http://java.sun.com</a>&#8216;).getroot()
it should be </p>

<p>from urllib2 import urlopen
doc=parse(urlopen(&#8216;<a href="http://java.sun.com" rel="nofollow">http://java.sun.com</a>&#8216;)).getroot()</p>

<p>as parse does not fetch the website.
As i have said, the first one is giving an error but the second one is working fine for me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: manju</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-165829</link>
		<dc:creator>manju</dc:creator>
		<pubDate>Thu, 24 Jun 2010 12:11:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-165829</guid>
		<description>Hi all!
i have installed lxml2.2.2 on windows platform(i m using python version 2.6.5).ive tried the code you have mentioned :
&quot;from lxml.html import parse
p= parse(&#039;http://www.google.com&#039;).getroot()&quot;

but i am getting the following error:

Traceback (most recent call last):
  File &quot;&quot;, line 1, in 
    p=parse(&#039;http://www.google.com&#039;).getroot()
  File &quot;C:\Python26\lib\site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\html\__init__.py&quot;, line 661, in parse
    return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
  File &quot;lxml.etree.pyx&quot;, line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590)
  File &quot;parser.pxi&quot;, line 1491, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71205)
  File &quot;parser.pxi&quot;, line 1520, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:71488)
  File &quot;parser.pxi&quot;, line 1420, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:70583)
  File &quot;parser.pxi&quot;, line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:67736)
  File &quot;parser.pxi&quot;, line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63820)
  File &quot;parser.pxi&quot;, line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741)
  File &quot;parser.pxi&quot;, line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056)
IOError: Error reading file &#039;http://www.google.com&#039;: failed to load external entity &quot;http://www.google.com&quot;

i am clueless as to what to do next as i am a newbie to python.
please guide me to solve this error.
thanks in advance!! :)</description>
		<content:encoded><![CDATA[<p>Hi all!
i have installed lxml2.2.2 on windows platform(i m using python version 2.6.5).ive tried the code you have mentioned :
&#8220;from lxml.html import parse
p= parse(&#8216;<a href="http://www.google.com" rel="nofollow">http://www.google.com</a>&#8216;).getroot()&#8221;</p>

<p>but i am getting the following error:</p>

<p>Traceback (most recent call last):
  File &#8220;&#8221;, line 1, in 
    p=parse(&#8216;<a href="http://www.google.com" rel="nofollow">http://www.google.com</a>&#8216;).getroot()
  File &#8220;C:\Python26\lib\site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\html&#95;<em>init</em>_.py&#8221;, line 661, in parse
    return etree.parse(filename<em>or</em>url, parser, base<em>url=base</em>url, **kw)
  File &#8220;lxml.etree.pyx&#8221;, line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590)
  File &#8220;parser.pxi&#8221;, line 1491, in lxml.etree.<em>parseDocument (src/lxml/lxml.etree.c:71205)
  File &#8220;parser.pxi&#8221;, line 1520, in lxml.etree.</em>parseDocumentFromURL (src/lxml/lxml.etree.c:71488)
  File &#8220;parser.pxi&#8221;, line 1420, in lxml.etree.<em>parseDocFromFile (src/lxml/lxml.etree.c:70583)
  File &#8220;parser.pxi&#8221;, line 975, in lxml.etree.</em>BaseParser.<em>parseDocFromFile (src/lxml/lxml.etree.c:67736)
  File &#8220;parser.pxi&#8221;, line 539, in lxml.etree.</em>ParserContext.<em>handleParseResultDoc (src/lxml/lxml.etree.c:63820)
  File &#8220;parser.pxi&#8221;, line 625, in lxml.etree.</em>handleParseResult (src/lxml/lxml.etree.c:64741)
  File &#8220;parser.pxi&#8221;, line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056)
IOError: Error reading file &#8216;<a href="http://www.google.com&#039;" rel="nofollow">http://www.google.com&#039;</a>: failed to load external entity &#8220;http://www.google.com&#8221;</p>

<p>i am clueless as to what to do next as i am a newbie to python.
please guide me to solve this error.
thanks in advance!! :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Herbert Roitblat</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-163750</link>
		<dc:creator>Herbert Roitblat</dc:creator>
		<pubDate>Sat, 29 May 2010 18:24:11 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-163750</guid>
		<description>I am trying to install lxml in an active virtual environment on Ubuntu 10.04 64-bit on a Dell xps, Python 2.6.5.

I used this command from the top of the virtual environment:

STATIC_DEPS=true bin/easy_install lxml

This is the result I got:

make[1]: Leaving directory `/tmp/easy_install-N4_xs4/lxml-2.2.6/build/tmp/libxslt-1.1.26&#039;
NOTE: Trying to build without Cython, pre-generated &#039;src/lxml/lxml.etree.c&#039; needs to be available.
Using build configuration of libxml2 2.7.7 and libxslt 1.1.26
Building against libxml2/libxslt in the following directory: /tmp/easy_install-N4_xs4/lxml-2.2.6/build/tmp/libxml2/lib
/usr/bin/ld: /tmp/easy_install-N4_xs4/lxml-2.2.6/build/tmp/libxml2/lib/libxslt.a(xslt.o): relocation R_X86_64_32 against `.rodata.str1.8&#039; can not be used when making a shared object; recompile with -fPIC
/tmp/easy_install-N4_xs4/lxml-2.2.6/build/tmp/libxml2/lib/libxslt.a: could not read symbols: Bad value


I guess that something is missing.  I installed Python-dev on my main system, created the virtual environment with --no-site-packages.  If I need to install Python-dev in my virtual environment, I don&#039;t know how to do it (apt-get tries to put it in the main installation, complains that I am not root). If I need something else, I would appreciate instructions on how to put it into my virtualenv.

By the way, what I am looking to do right now is to translate a dictionary object into xml.  I currently translate it into JSON without a hitch, but I need to be able to support also translating it into xml.  The structure consists of a key:value, the value consists of a list of a list of key-value pairs.  Any thoughts on converting a dictionary to xml would also be welcome.  That&#039;s how I came to lxml.

Thanks so much.

Herb</description>
		<content:encoded><![CDATA[<p>I am trying to install lxml in an active virtual environment on Ubuntu 10.04 64-bit on a Dell xps, Python 2.6.5.</p>

<p>I used this command from the top of the virtual environment:</p>

<p>STATIC<em>DEPS=true bin/easy</em>install lxml</p>

<p>This is the result I got:</p>

<p>make[1]: Leaving directory <code>/tmp/easy_install-N4_xs4/lxml-2.2.6/build/tmp/libxslt-1.1.26'
NOTE: Trying to build without Cython, pre-generated 'src/lxml/lxml.etree.c' needs to be available.
Using build configuration of libxml2 2.7.7 and libxslt 1.1.26
Building against libxml2/libxslt in the following directory: /tmp/easy_install-N4_xs4/lxml-2.2.6/build/tmp/libxml2/lib
/usr/bin/ld: /tmp/easy_install-N4_xs4/lxml-2.2.6/build/tmp/libxml2/lib/libxslt.a(xslt.o): relocation R_X86_64_32 against</code>.rodata.str1.8&#8242; can not be used when making a shared object; recompile with -fPIC
/tmp/easy<em>install-N4</em>xs4/lxml-2.2.6/build/tmp/libxml2/lib/libxslt.a: could not read symbols: Bad value</p>

<p>I guess that something is missing.  I installed Python-dev on my main system, created the virtual environment with &#8211;no-site-packages.  If I need to install Python-dev in my virtual environment, I don&#8217;t know how to do it (apt-get tries to put it in the main installation, complains that I am not root). If I need something else, I would appreciate instructions on how to put it into my virtualenv.</p>

<p>By the way, what I am looking to do right now is to translate a dictionary object into xml.  I currently translate it into JSON without a hitch, but I need to be able to support also translating it into xml.  The structure consists of a key:value, the value consists of a list of a list of key-value pairs.  Any thoughts on converting a dictionary to xml would also be welcome.  That&#8217;s how I came to lxml.</p>

<p>Thanks so much.</p>

<p>Herb</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Graeme Pietersz</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-160873</link>
		<dc:creator>Graeme Pietersz</dc:creator>
		<pubDate>Thu, 29 Apr 2010 12:25:20 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-160873</guid>
		<description>Thanks for this: it encouraged me to use lxml. I used lxml to import data from HTML to a database (I should have been generating those pages from a database in the first place, but thats another story).

I used xml.sax for something similar (except it was an XML file that time) a few months ago, and lxml was much easier to work with.

No installation problems on Mandriva Linux -- lxml was in the repos so a tick and a click was all that was needed. As Grease  suggests, if you use an OS with a package manager, its the easiest way to install almost anything.</description>
		<content:encoded><![CDATA[<p>Thanks for this: it encouraged me to use lxml. I used lxml to import data from HTML to a database (I should have been generating those pages from a database in the first place, but thats another story).</p>

<p>I used xml.sax for something similar (except it was an XML file that time) a few months ago, and lxml was much easier to work with.</p>

<p>No installation problems on Mandriva Linux &#8212; lxml was in the repos so a tick and a click was all that was needed. As Grease  suggests, if you use an OS with a package manager, its the easiest way to install almost anything.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sikis</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-151030</link>
		<dc:creator>sikis</dc:creator>
		<pubDate>Mon, 08 Feb 2010 09:05:41 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-151030</guid>
		<description>Nice summarization of lxml features, thanks.</description>
		<content:encoded><![CDATA[<p>Nice summarization of lxml features, thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Benjamin Sergeant</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-150037</link>
		<dc:creator>Benjamin Sergeant</dc:creator>
		<pubDate>Mon, 01 Feb 2010 18:17:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-150037</guid>
		<description>I don&#039;t know how much code goes in the cpython code but maybe if the division is well defined between C calls to libxml libs one could write some wrappers to simulate libxml calls using ElementTree which is one of python battery (not sure about the xpath support). Not a small project but something that could be usefull for say Google App Engine users or Jython/IronPython/Pypy users.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t know how much code goes in the cpython code but maybe if the division is well defined between C calls to libxml libs one could write some wrappers to simulate libxml calls using ElementTree which is one of python battery (not sure about the xpath support). Not a small project but something that could be usefull for say Google App Engine users or Jython/IronPython/Pypy users.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pornoizle</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-141395</link>
		<dc:creator>pornoizle</dc:creator>
		<pubDate>Fri, 04 Dec 2009 18:26:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-141395</guid>
		<description>i second that installing lxml can be a hassle; i only managed to make it run on windows by manually downloading and installing a binary package. definitely one great advantage of BeautifulSoup: a single fairly short *.py file, and you’re good to go.</description>
		<content:encoded><![CDATA[<p>i second that installing lxml can be a hassle; i only managed to make it run on windows by manually downloading and installing a binary package. definitely one great advantage of BeautifulSoup: a single fairly short *.py file, and you’re good to go.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neutrino</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-133623</link>
		<dc:creator>Neutrino</dc:creator>
		<pubDate>Fri, 16 Oct 2009 07:57:03 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-133623</guid>
		<description>Lxml is one of those Python libraries that should be really high profile given its usefulness. Sadly it is also one of those odd projects peculiar to open source that doesn&#039;t have a forum and as a consequence receives little attention from anyone not prepared to maintain a database of mailing lists on every workstation they use for every single piece of software or technology subject they are interested in.</description>
		<content:encoded><![CDATA[<p>Lxml is one of those Python libraries that should be really high profile given its usefulness. Sadly it is also one of those odd projects peculiar to open source that doesn&#8217;t have a forum and as a consequence receives little attention from anyone not prepared to maintain a database of mailing lists on every workstation they use for every single piece of software or technology subject they are interested in.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jesse</title>
		<link>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/comment-page-1/#comment-125779</link>
		<dc:creator>jesse</dc:creator>
		<pubDate>Tue, 25 Aug 2009 20:58:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/#comment-125779</guid>
		<description>Has anybody looked at http://scrapy.org/.</description>
		<content:encoded><![CDATA[<p>Has anybody looked at <a href="http://scrapy.org/" rel="nofollow">http://scrapy.org/</a>.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.477 seconds -->

