Windows Azure: Building a Secure Backup System (part 1)

10/22/2010 9:16:17 AM
When using cloud services, you often have to worry about security, and resort to cryptographic techniques such as digital signatures and encryption. There are many reasons for doing this. You may be storing sensitive data (for example, people’s health reports) and you are mandated by law to provide extra protection. You may be protecting sensitive data regarding your business that you don’t want in the wrong hands. Or you could simply be paranoid and not trust Microsoft or any other cloud provider. Whatever your reason, you can take several steps to add further levels of protection to your data.
Note: All of this has very little to do with your trust for Microsoft or any other cloud provider. All cloud providers (and Microsoft is no different) have multiple levels of security and several checks to ensure that unauthorized personnel cannot access customer applications or data. However, you often have little choice of whether you want to trust a cloud provider. You might be mandated to add further security levels by anything from internal IT policy to financial regulations and compliance laws.

This chapter is slightly different from all the others in that a majority of the discussion here is devoted to looking at security and cryptography. Frankly, the code and techniques used in this chapter could just as easily be used for data on a file server as for data in the cloud. Why does it show up in a book on cloud computing, then?

The decision to include an examination of security and cryptography resulted from two key motivations. First, this is useful to a lot of people when they have to build applications with highly sensitive data. More importantly, it is so difficult to get this stuff right that any time invested in examining good security and cryptographic techniques is well worth it.

The danger of having an insecure system is known to everyone.

This chapter shows you how to build a secure backup system for your files. It will cover how to use the right kinds of cryptographic practices and blob storage features to ensure some security properties.

Note: This chapter is not meant as a comprehensive introduction to cryptography. If you’re interested in that, Practical Cryptography by Niels Ferguson and Bruce Schneier (Wiley) is a good place to start. A quick web search brings up a lot of good references as well.

1. Developing a Secure Backup System

In this chapter, you will learn how to build a secure system that should satisfy even the most “paranoid” conspiracy theorist. You will discover how a real-world application (hopefully, a useful one) will use the blob service, as well as the challenges and trade-offs involved. Finally, you will learn how to code to a nontrivial application.

Note: Don’t infer that the word paranoid suggests that these techniques aren’t relevant to normal users. Mentally insert “highly conservative from a security perspective” whenever you see the word paranoid throughout the ensuing discussions.

The application has a highly creative name, Azure Backup (azbackup), and it is quite simple to use. It mimics the tar utility that ships with most modern Unix systems. Instead of compressing and making a single backup out of multiple files and directories to disk, azbackup lets you compress files and make backups to Windows Azure blob storage instead. The tool tars multiple files and directories together into one big file (in exactly the same manner as the Unix tar command). This tar file is then compressed using the popular gzip algorithm.

Why tar and then compress? Why not compress each file individually? By combining multiple files in one large file, you gain higher compression rates. Compression algorithms compress data by finding redundancy. They have a better chance of finding redundancy in one large file, rather than in several small files individually. Having one large file is also easier for you to manage when it comes to moving around, copying, and managing any operation.

The entire code for this sample is available at You’ll be seeing snippets of code as this chapter progresses, but you can always look at the entire source code. It is also very easy to set up and run, and should work on Windows as well as any modern Unix system that has Python support.

Note: The sample takes inspiration from the excellent tarsnap service. If you’re really paranoid and want to delve into a real, production backup service, tarsnap’s design makes for great reading.

2. Understanding Security

A primary challenge developers face is deciding how secure to make an application. When you ask most people how secure they want their data or their application, you’re going to hear superlative terms such as impenetrable, totally secure, and so on. When you hear someone say that, you should run as quickly as you can in the opposite direction.

Unfortunately (or fortunately, for security consultants), there is no completely secure system. In theory, you can imagine an application running on an isolated computer with no network connections in an underground bunker surrounded by thick concrete, protected by a small army. And even that can’t be completely secure.

Security, like other things, is a spectrum in which you get to pick where you want to be. If you’re building a small social bookmarking service, your security needs are different than if you’re working for the government building a system for the National Security Agency (NSA).

For the sample backup application you’ll see in this chapter, you will be as paranoid as possible. The goal is that your data should be secure even if three-letter government agencies wanted to get to it. This is overkill for applications you’ll be building, so you can look at the security techniques used in this chapter and pick which ones you want to keep, and which ones you don’t care about.

Before you figure out how to secure something, you must know what you are securing it from, and what secure even means. For the application in this chapter, let’s define a few security properties that should hold at all times.

The first property is secrecy. The data that you back up using this application should not be in the clear either in motion or at rest. No one should be able to get data in plain form apart from you. Importantly, this data should not be in the clear with your cloud provider of choice here, Microsoft.

The second property is integrity. You must instantly verify whether the data you backed up has been tampered with in any way. As a bonus, it would be nice if this verification were done with something bound to your identity (also known as a digital signature).

The third property is the ability to verify your tools. Essentially, you will be so paranoid that you don’t trust code you can’t see in any layer charged with enforcing the previous two properties. This means you will force yourself to stick to open source tools only. Note that the fact that Microsoft is running some code in the data center is irrelevant here, because the data is protected “outside” the data center, and the blob service is used only as a very efficient byte storage-and-transfer mechanism.

We will discuss the first two properties throughout the course of this chapter. For the third property, you will stick with open source tools that will work just as well on a non-Windows open source platform.

To run the code in this chapter, you need two pieces of software. The first is Python 2.5 or later, which you can download from if you don’t already have it.

Note: Almost all modern *nix operating systems ship with some version of Python. As of this writing, most did not ship with Python 2.5 or later. To check the version of Python you have, run python --version at a command line.

Unfortunately, Python lacks some of the core cryptographic functionality that is used in this chapter, so the second piece of required software is an additional Python package called M2Crypto. You can find prebuilt versions for your operating system of choice at This is a popular Python library maintained by Heikki Toivonen that wraps around the OpenSSL tool set to provide several cryptographic and security features.

Note: M2Crypto doesn’t have the greatest documentation in the world, but since it is a thin wrapper around OpenSSL, you can often look up documentation for the OpenSSL function of the same name. 
Other -----------------
- Understanding Windows Azure Roles
- The Windows Azure Tool Set
- Windows Azure Table Overview (part 2) - Azure Tables Versus Traditional Databases
- Windows Azure Table Overview (part 1) - Core Concepts
- Exploring Group Policy in Windows 7
- Working with Multiple Local Group Policy Objects
- The Windows Azure Sandbox
- Windows Azure : Peeking Under the Hood with a Command Shell (part 2) - Running the Command Proxy
- Windows Azure : Peeking Under the Hood with a Command Shell (part 1) - Building the Command Shell Proxy
- Windows 7 : Using Any Search Engine from the Address Bar
- Windows 7 : Understanding Internet Explorer Advanced Options
Most View
- Performing Administrative Tasks Using Central Administration (part 16) - Farm Management
- Windows 7: Recovering from a Problem
- BizTalk Server 2009 : Using asynchronous services in WCF (part 3) - Building a client-side asynchronous experience
- Working with Search Page Layouts : Adding Navigation to the Search Center (part 1) - Adding Home and Back Buttons to the Search Result Page
- iPhone Programming : Connecting to the Network - Sending Email
- Windows Phone 7 : Working with the Calendar
- Sharepoint 2010 : Optimizing Outside of SQL Server
- Windows 7: Accessing Shared Network Resources
- Windows Server 2008 : Installing the Web Server Role (part 8)
- Optimizing for Vertical Search : Mobile, Video & Multimedia Search
Top 10
- Implementing Edge Services for an Exchange Server 2007 Environment : Utilizing the Basic Sender and Recipient Connection Filters (part 3) - Configuring Recipient Filtering
- Implementing Edge Services for an Exchange Server 2007 Environment : Utilizing the Basic Sender and Recipient Connection Filters (part 2)
- Implementing Edge Services for an Exchange Server 2007 Environment : Utilizing the Basic Sender and Recipient Connection Filters (part 1)
- Implementing Edge Services for an Exchange Server 2007 Environment : Installing and Configuring the Edge Transport Server Components
- What's New in SharePoint 2013 (part 7) - BCS
- What's New in SharePoint 2013 (part 6) - SEARCH
- What's New in SharePoint 2013 (part 6) - WEB CONTENT MANAGEMENT
- What's New in SharePoint 2013 (part 5) - ENTERPRISE CONTENT MANAGEMENT
- What's New in SharePoint 2013 (part 4) - WORKFLOWS
- What's New in SharePoint 2013 (part 3) - REMOTE EVENTS