<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: lxml.html</title>
	<link>http://blog.ianbicking.org/2007/09/24/lxmlhtml/</link>
	<description></description>
	<pubDate>Mon, 01 Dec 2008 21:34:22 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>

	<item>
		<title>By: Ian Bicking</title>
		<link>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1126</link>
		<dc:creator>Ian Bicking</dc:creator>
		<pubDate>Wed, 26 Sep 2007 00:18:17 +0000</pubDate>
		<guid>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1126</guid>
		<description>Re: testing -- there's a property to access the lxml document in WebTest (an extraction of paste.fixture): http://pythonpaste.org/webtest/#parsing-the-body</description>
		<content:encoded><![CDATA[<p>Re: testing &#8212; there&#8217;s a property to access the lxml document in WebTest (an extraction of paste.fixture): <a href="http://pythonpaste.org/webtest/#parsing-the-body" rel="nofollow">http://pythonpaste.org/webtest/#parsing-the-body</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kumar McMillan</title>
		<link>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1122</link>
		<dc:creator>Kumar McMillan</dc:creator>
		<pubDate>Tue, 25 Sep 2007 18:58:05 +0000</pubDate>
		<guid>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1122</guid>
		<description>oh, cool, I see it's already in lxml's trunk :)</description>
		<content:encoded><![CDATA[<p>oh, cool, I see it&#8217;s already in lxml&#8217;s trunk :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kumar McMillan</title>
		<link>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1121</link>
		<dc:creator>Kumar McMillan</dc:creator>
		<pubDate>Tue, 25 Sep 2007 18:52:47 +0000</pubDate>
		<guid>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1121</guid>
		<description>Hi Ian, I just wanted to say thanks for hacking on this and I look forward to seeing it merged into lxml's trunk.  I've deployed it from the branch and have been using it for a side project quite successfully.  Ironically, it's a screen scraper.  I tried using templatemaker on the scrapee but its template is obfuscated (my theory anyway) so it [breaks templatemaker pretty bad](http://code.google.com/p/templatemaker/issues/detail?id=4).  Anyway, xpath seems cleaner and more maintainable.  Plus, it is very useful to have HTMLElement.text_content() so that the template can be analyzed in a more contextual way.

I've also been starting to work with paste (via pylons) a little more than just experimentation and it might be nice to use xpath for assertions on paste.fixture's response.  However, I haven't needed more than the provided mustcontain() method yet so maybe it would be overkill, just a thought.  

Xpath would certainly be a clean, maintainable way to "validate" a template.  That is, answer : Do all my pages implement the common header/footer/nav-bar layout for the site?  I'm not sure if it is reasonable to go to such lengths in ones tests, this again is just a thought.</description>
		<content:encoded><![CDATA[<p>Hi Ian, I just wanted to say thanks for hacking on this and I look forward to seeing it merged into lxml&#8217;s trunk.  I&#8217;ve deployed it from the branch and have been using it for a side project quite successfully.  Ironically, it&#8217;s a screen scraper.  I tried using templatemaker on the scrapee but its template is obfuscated (my theory anyway) so it <a href="http://code.google.com/p/templatemaker/issues/detail?id=4">breaks templatemaker pretty bad</a>.  Anyway, xpath seems cleaner and more maintainable.  Plus, it is very useful to have HTMLElement.text_content() so that the template can be analyzed in a more contextual way.</p>

<p>I&#8217;ve also been starting to work with paste (via pylons) a little more than just experimentation and it might be nice to use xpath for assertions on paste.fixture&#8217;s response.  However, I haven&#8217;t needed more than the provided mustcontain() method yet so maybe it would be overkill, just a thought.  </p>

<p>Xpath would certainly be a clean, maintainable way to &#8220;validate&#8221; a template.  That is, answer : Do all my pages implement the common header/footer/nav-bar layout for the site?  I&#8217;m not sure if it is reasonable to go to such lengths in ones tests, this again is just a thought.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jgraham</title>
		<link>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1101</link>
		<dc:creator>jgraham</dc:creator>
		<pubDate>Mon, 24 Sep 2007 22:18:32 +0000</pubDate>
		<guid>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1101</guid>
		<description>&#62; It would be nice to have something similar with html5lib.

It appears lxml has grown a HTML 5-incompatible idea desire to enforce "valid" tag names. In particular it seems things like lxml.etree.Element("foo?bar") now throw a value errors even though HTML 5 parsers are expected to be able to create elements called "foo?bar". Not only does this mean that not all HTML 5 trees can be represented in lxml.etree, but it means that our hack of using a HTML 5-illegal element name to represent the notional document root doesn't work any more. If this problem can be overcome, html5lib already supports generating lxml trees, so it would be easy to wrap it in syntax like lxml.html.html5lib.parse().</description>
		<content:encoded><![CDATA[<p>&gt; It would be nice to have something similar with html5lib.</p>

<p>It appears lxml has grown a HTML 5-incompatible idea desire to enforce &#8220;valid&#8221; tag names. In particular it seems things like lxml.etree.Element(&#8221;foo?bar&#8221;) now throw a value errors even though HTML 5 parsers are expected to be able to create elements called &#8220;foo?bar&#8221;. Not only does this mean that not all HTML 5 trees can be represented in lxml.etree, but it means that our hack of using a HTML 5-illegal element name to represent the notional document root doesn&#8217;t work any more. If this problem can be overcome, html5lib already supports generating lxml trees, so it would be easy to wrap it in syntax like lxml.html.html5lib.parse().</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Martijn Faassen</title>
		<link>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1099</link>
		<dc:creator>Martijn Faassen</dc:creator>
		<pubDate>Mon, 24 Sep 2007 19:44:23 +0000</pubDate>
		<guid>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1099</guid>
		<description>I'm happy to see all this great work appear in lxml! lxml is still my baby too and I'm very proud. :) Thanks Ian for these great contributions and thanks Stefan for being such an excellent lead developer for lxml!</description>
		<content:encoded><![CDATA[<p>I&#8217;m happy to see all this great work appear in lxml! lxml is still my baby too and I&#8217;m very proud. :) Thanks Ian for these great contributions and thanks Stefan for being such an excellent lead developer for lxml!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Bicking</title>
		<link>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1094</link>
		<dc:creator>Ian Bicking</dc:creator>
		<pubDate>Mon, 24 Sep 2007 17:26:54 +0000</pubDate>
		<guid>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1094</guid>
		<description>Fredrik: I'm not sure.  Obviously the first part is parsing HTML, but there are BeautifulSoup parsers and native serialization in ET 1.3.  I use `el.getparent()` sometimes, and though not a tremendous amount I've found it's hard to refactor when you don't have that pointer to the parent, as in ElementTree.  Also, many of these use XPath very heavily.  You could probably rewrite several of them to be simpler (ET-compatible) expressions plus a simple list comprehension.

A couple might be more feasible than the others: the differ, maybe formfill, and definitely the doctest comparison stuff would be okay (since it is based on code that was written for ElementTree originally).</description>
		<content:encoded><![CDATA[<p>Fredrik: I&#8217;m not sure.  Obviously the first part is parsing HTML, but there are BeautifulSoup parsers and native serialization in ET 1.3.  I use <code>el.getparent()</code> sometimes, and though not a tremendous amount I&#8217;ve found it&#8217;s hard to refactor when you don&#8217;t have that pointer to the parent, as in ElementTree.  Also, many of these use XPath very heavily.  You could probably rewrite several of them to be simpler (ET-compatible) expressions plus a simple list comprehension.</p>

<p>A couple might be more feasible than the others: the differ, maybe formfill, and definitely the doctest comparison stuff would be okay (since it is based on code that was written for ElementTree originally).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fredrik</title>
		<link>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1093</link>
		<dc:creator>Fredrik</dc:creator>
		<pubDate>Mon, 24 Sep 2007 16:13:16 +0000</pubDate>
		<guid>http://blog.ianbicking.org/2007/09/24/lxmlhtml/#comment-1093</guid>
		<description>So how hard would it be to make the relevant portions of this work also for xml.etree ?</description>
		<content:encoded><![CDATA[<p>So how hard would it be to make the relevant portions of this work also for xml.etree ?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
