Thursday, March 03, 2011

Fantastic tool for HTML parsing

Last  night I just made one of those HTML screenscaper-style apps - my downstairs neighbor is an artist and has this homepage with his stuff on. HTML often being extremely well deformed can be a nasty cat to swing, especially if you're locked inside a Phone 7 room. HTML Agility Pack saved me from the embarrasment of writing the exact and correct regular expression, taking care of with and without quotes, spaces etc - which is enough for me to be a fan. Getting the list of images with titles was something in the area of:

public class GalleryItemViewModel
{
   public string Path { get; set; }
   public string Title { get; set; }
}

HtmlWeb.LoadAsync(baseAddress + "/kilp.html", (s, args) =>
{
   var result = args.Document.DocumentNode.Descendants("img").Select(image =>
     new GalleryItemViewModel
       {
         Path = baseAddress + image.GetAttributeValue("src", string.Empty),
         Title = image.GetAttributeValue("title", string.Empty)
       }).ToList();
   this.Dispatcher.BeginInvoke(() =>
   {
     model.Items.Clear();
     result.ForEach(givm => model.Items.Add(givm));

Read more: Michael W. Olesen's blog
Read more: HTML Agility Pack