I recently found a very affordable cloud-based storage for my research data - Amazon Glacier. It costs only 1 cent (yes, $0.01) per GB per month. If my data is 2TB in total, that's $20/month. Besides the cost, I don't see other competitors, including RackSpace Cloud File or Google Storage, offering a storage-only service - one exception is DreamHost's DreamObject which is 7 cents (3 or 4 cents when promotion) per GB per month. I really don't wanna use their cloud machine for only storing my files because I will be charged for the CPU time.
Getting started with Amazon Glacier is not quite easy for me, a person who doesn't have much experience with cloud computing. It's the first time that I need to use APIs to write programs to upload/download files. Luckily, after Googling around for a while, I am having happy hours with Boto, the Python API for Amazon cloud services, including Glacier. Boto's documentation for Glacier is at http://docs.pythonboto.org/en/latest/ref/glacier.html
Creating vaults
I didn't create vaults using Boto but web interface at AWS console.Uploading archives
Solution 1: Using APIs
(modified from [1])from boto.glacier.layer1 import Layer1 from boto.glacier.concurrent import ConcurrentUploader import sys import os.path from time import gmtime, strftime access_key_id = "...your_aws_access_key_id..." secret_key = "...your_aws_secret_key..." target_vault_name = "...your_vault_name..." fname = sys.argv[1] # the file to be uploaded into the vault as an archive. fdes = sys.argv[2] # a description you give to the file if not os.path.isfile(fname) : print("Can't find the file to upload!") sys.exit(-1); glacier_layer1 = Layer1(aws_access_key_id=access_key_id, aws_secret_access_key=secret_key) uploader = ConcurrentUploader(glacier_layer1, target_vault_name, part_size=128*1024*1024, num_threads=4) print("Begin at "+strftime("%Y-%m-%d %H:%M:%S", gmtime())) # Time in Greenwich time. print("uploading... "+fname+", "+fdes) archive_id = uploader.upload(fname, fdes) print("Success! archive id: '%s'"%(archive_id)) print("Finish at "+strftime("%Y-%m-%d %H:%M:%S", gmtime())) # Time in Greenwich time.
Please note that
ConcurrentUploader
allows uploading using multiple threads and in multiple parts. Here I need to explain threads and parts. - Threads and parts are independent.
- Threads: How many parallel threads to use to upload files. Default is 10. This only shortens the upload time. But do calculate according to your bandwidth.
- Parts: How many pieces (must be power of 2) should the file be chopped for uploading. Check here for details. A file is chopped into pieces to be sent to Amazon Glacier server individually and is assembled into one after all pieces are received. This won't speed up the uploading. But using smaller pieces can avoid retransmitting a lot if the transmission is interrupted. In other words, you may use smaller pieces if your network is not stable.
- Also note that Amazon charges you by the number of connections you establish with their servers. So you don't wanna chop into too many small pieces (or use too many threads - not sure about this part.)
Solution 2: Using glacier
script
Boto itself comes with a glacier
script under bin
to allow uploading files. The usage is very simple:glacier upload <vault> <files> --access_key <key> --secret_key <key> --region <region>You can set environment variables to avoid entering access_key, secret_key and region every time. For details, please simply run
glacier
without any parameter and the help info will be printed. Deleting an archive
There are two ways to do it. One in layer1 API of Boto and the other in layer2's.Using Layer1:
from boto.glacier.layer1 import Layer1 access_key_id = "...your_aws_access_key_id..." secret_key = "...your_aws_secret_key..." vault_name = "...your_vault_name..." archive_id = "...the_archive_id_you_wrote_down..." glacier_layer1 = Layer1(aws_access_key_id=access_key_id, aws_secret_access_key=secret_key) glacier_layer1.delete_archive(vault_name, archive_id)
Using Layer2 (from [2])
from boto.glacier.layer2 import Layer2 access_key_id = "...your_aws_access_key_id..." secret_key = "...your_aws_secret_key..." vault_name = "...your_vault_name..." archive_id = "...the_archive_id_you_wrote_down..." l = Layer2(aws_access_key_id=access_key, aws_secret_access_key=secret_key) v = l.get_vault(vault_name) v.delete_archive(archive_id)
References:
- [1] For upload archives and create inventory: http://www.withoutthesarcasm.com/using-amazon-glacier-for-personal-backups/
- [2] For deleting archives: https://gist.github.com/srs81/3896192
2 comments:
You forgot the region-parameter to layer1(). It will default to us-east-1.
You are correct. Since I connect to the default data center, I don't set it.
Post a Comment