Tuesday, July 27, 2010

Make The WebClient Class follow redirects and get Target Url

How to make the .NET WebClient class follow redirects and get the target url

Unlike its brother HttpWebRequest, the WebClient class automatically follows redirects, but if you need to get the "final" url, you'll need to "wrap" your WebClient in a class that derives from System.Net.WebClient. Here's an example:

using System;
using System.Net;

public class MyWebClient : WebClient
{

Uri _responseUri;

public Uri ResponseUri
{

get { return _responseUri; }
}

protected override WebResponse GetWebResponse(WebRequest request)
{

WebResponse response = null;
try
{
response = base.GetWebResponse(request);
_responseUri = response.ResponseUri;
}
catch
{
}

return response;
}
}


By overriding the GetWebResponse method, we can populate a ResponseUri property with the final target of any 302 rediirects. Redirects are very common in all kinds of websites as they allow the owner to count hits, and log information before sending you on your merry way to the target.

Here's some sample code that goes through a whole list of integer "Redirect Ids", assembles the page title and final url, and saves these to a delimited text file that can be read later:

static string urlbase="http://sitewithredirect.com/Redirect.aspx?id=";

static void ProcessUrls()
{

string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";

for (int i =1; i < 4000; i++)
{

string item = i.ToString();
string url = urlbase + item;
string content = null;
string targetUrl = null;
string title = null;

MyWebClient w = new MyWebClient();

try
{
content = w.DownloadString(url);
targetUrl = w.ResponseUri.ToString();
Regex rex = new Regex(regex, RegexOptions.IgnoreCase);
title = rex.Match(content).Value.Trim();
System.Diagnostics.Debug.WriteLine(targetUrl);
}

Read more: eggcafe

Posted via email from .NET Info