Sunday, January 16, 2011

XHTMLr

Project Description
Normalizes HTML into XML that can be parsed and manipulated.

HTML can be really ugly. Even valid HTML can be (and most often is) invalid XML. This small, fast little library is able to parse the HTML tree and create XML that can be read into System.Xml.XmlDocument or System.XML.Linq.XDocument.
Consider the following ugly HTML:
    <p>First paragraph
    <p style=color:red>Second paragraph

Running this command:
XHTML.ToXml(html, XHTML.Options.Default | XHTML.Options.Pretty);
Will produce the following XML:
<html>
 <body>
     <p>First paragraph
                   </p>
     <p style="color:red">Second paragraph</p>
 </body>
</html>

Read more: Codeplex