.NET Information center: AngleSharp

Thursday, June 20, 2013

AngleSharp

Background

The idea for AngleSharp has been born about a year ago (and I will explain in the next paragraphs why AngleSharp goes beyond HtmlAgilityPack or similar solutions). The main reason to use AngleSharp is actually to have access to the DOM as you would have in the browser. The only difference is that in this case you will use C# (or any other .NET language). There is another difference (by design), which changed the names of the properties and methods from camel case to pascal case (i.e. the first letter is capitalized).

Therefore before we go into details of the implementation we need to have a look what's the long-term goal of AngleSharp. Actually there are several goals:

Parsers for HTML, XML, SVG, MathML and CSS
Create CSS stylesheets / style rules
Return the DOM of a document
Run modifications on the DOM
(*) Provide the basis for a possible renderer

The core parser is certainly given by the HTML5 parser. A CSS4 parser is a natural addition, since HTML documents contain stylesheet references and style attributes. Having another XML parser that complies with the current W3C specification is a good addition, since SVG and MathML (which can also occur in HTML documents) can be parsed as XML documents. The only difference lies in the generated document, which has different semantics and a uses a different DOM.

The * point is quite interesting. The basic idea behind this is not that a browser that is written entirely in C# will be build upon AngleSharp (however, that could happen). The motivation here lies in creating a new, cross-platform UI framework, which uses HTML as a description language with CSS for styling. Of course this is a quite ambitious goal, and it will certainly not be solved by this library, however, this library would play an important in the creation of this framework.

In the next couple of sections we will walk through some of the important steps in creating an HTML5 and a CSS4 parser.