Thursday, August 29, 2013

Hadoop for .NET Developers: Programmatically Loading Data to AVS

As mentioned in an earlier post, the WebHDFS client assumes a Hadoop cluster employs HDFS but can be configured to work with a cluster leveraging AVS. If you are working with a persistent HDInsight in Azure cluster (based on AVS), then the WebHDFS client is likely a good option for you to explore.

That said, Azure Blob Storage, the storage service foundation of AVS, provides additional options for working with your data.  In this blog post, I want to show you how to load data directly to Azure Blob Storage so that it may be accessible to an HDInsight cluster that may or may not exist at the time the data is loaded.

In this exercise, we'll load the ufo_awesome.tsv file to the Azure Blob Storage service provisioned with your HDInsight in Azure cluster.  You could run these same steps to a service provisioned through other means and on which you later deploy your HDInsight cluster to achieve the same ends demonstrated here:

1. Navigate to the Azure portal and locate the Storage icon on the left-hand side of the page.

2. Click on the Storage icon to access the storage provisioned for your account.  If you only have an HDInsight cluster associated with your account, you should see just one item in the list to the right of the Storage icon.  If you have more than one HDInsight cluster or other services deployed, you may see other storage items in the list.

Inline image 2


3. Record the name of the storage item associated with your HDInsight cluster.  In the code that follows, this name will serve as the value of the name variable.

4. Click on the name of the storage item associated with your HDInsight cluster.


Read more: Data Otaku
QR: Inline image 1