Assignment 1: Cloud Storage Speed Test

 

Due Date: Monday, April 9

 

Goals:

 

Secondary Goals:

 

Overview:

You will run a series of Python programs to upload a medium-size file to AWS and Azure, recording the performance.  You will create a plot illustrating your results. 

 

What to turn in:

  1. A plot of the times for all experiments you ran.  (Suggestion: a simple bar chart, with one bar for each experiment, will suffice.  Other visualizations are possible.)
  2. The data you collected as a text file with records of the form (cloud_vendor, file_size, number of chunks, time_in seconds), where “cloud_vendor” is one of “AWS” or “Azure”

 

  1. Get Access to Amazon Web Services

 

Using these resources as a guide, sign up for Amazon Web Services and make sure you are familiar with the EC2 and S3 services.  Make sure you know how to use the AWS Console to launch an instance and work with S3 buckets.

 

http://escience.washington.edu/get-help-now/get-started-amazon-web-services

 

  1. Get Access to Windows Azure

 

Sign up for a free trial for Windows Azure.  Make sure you know how to access the Windows Azure console to review storage resources.

 

http://www.windowsazure.com/en-us/pricing/free-trial/

 

Create a storage account

 

http://msdn.microsoft.com/en-us/library/windowsazure/gg433066.aspx

 

record the name; you’ll need it later.

 

  1. Run upload performance experiment

 

  1. Install Python 

Windows: http://www.python.org/getit/windows/

Mac: you have it, but make sure you have 2.7

Linux: you have it, but make sure you have 2.7

 

  1. Install boto: https://github.com/boto/boto

Click the "Zip" button button to download a zip file

 

 

  1. Download the sample code from http://www.cs.washington.edu/homes/billhowe/bigdatacloud/lecture1/code

 

You will get four files:

 

gendata.py

generate data files to use to test upload

loaddata_s3.py

upload a file to S3, optionally in parallel

loaddata_azure.py

upload a file to azure, optionally in parallel

winazurestorage.py

a library for working with windows azure from python (see [1])

azure_unit_test.py

a set of simple test cases for debugging azure access 

 

gendata.py

loaddata_s3.py

loaddata_azure.py

 

  1. Run gendata.py

 

Linux/Mac: Open a terminal, navigate to the directory where these files are located, type

$ python gendata.py

 

This program will create a file “200mb.dat”  to use for testing.

 

  1. Run loaddata_s3.py

 

Linux/Mac: Open a terminal, navigate to the directory where these files are located, type

$ python loaddata_s3.py

 

Answer the questions when you are prompted:

 

Enter AWS access key: <Your AWS access key>

Enter AWS secret key: <Your AWS secret key>

Enter bucket name: escience.washington.edu.cloudcourse.lecture1

Enter file name to upload: 200mb.dat

Enter number of chunks to upload in parallel: 1

 

You can also provide these on the command line in the same order.  Make sure to wrap your secret key in double quotes, as it may contain characters that will be misinterpreted by the shell.

 

  1. Record the time:

 

 

 

  1. Repeat the experiment, but set the number of chunks to 4

 

  1. (Optional) Run additional experiments with different numbers of chunks

 

  1. (Optional) Run additional experiments with different file sizes by modifying gendata.py

 

 

NOTE: The Azure part of the assignment is not working currently. It may or may not be fixed in time to work on it by Monday. If it doesn't work for you, that's ok.

 

  1. Now test the Windows Azure connection and try the Azure experiment

 

$ python azure_unit_test.py

 

Enter your credentials from the right hand panel

 

Enter Azure storage account: <Your Azure storage account name>

Enter Azure Primary (secret) key: <Your Azure primary secret key>

 

Now try the Windows Azure experiment

 

$ python loaddata_azure.py

 

Warning: there may be a problem with this script.  It was generating spurious errors at the time of class.

 

References

[1] http://sriramk.com/blog/2008/11/python-wrapper-for-windows-azure.html