Sunday, October 20, 2013

First of Five Enterprise Lessons Learned in the AWS Virtual Private Cloud

Public Cloud is cool and it's the only kind of Cloud my company works with. Yet enterprise financial services firms and others require controlled access. And in 2013 AWS not only won the contract to build an actual Private Cloud for the CIA (any further legal maneuvers from IBM notwithstanding), but AWS also seems to have conceded that Virtual Private Cloud is a first class AWS Cloud offering. It's not that VPC was ignored, it's just than in 2011 and most of 2012 new capabilities of the Oracle Relational Database Service (RDS) were available first in EC2, then VPC. The customers we work with require AWS VPC. In this post and four more to follow, I'm taking note of some of the lessons learned working with VPC in the hope that I may entertain if not educate.


Some Opening Remarks on Enterprise and VPC
Networking is hard. And to define a VPC you need to have more than a passing knowledge of networks. I'm not an expert, however I've spent a lot of time reading about and configuring VPCs and I'm still learning. Rather than revisit solid documentation on VPC, I'm going to move quickly so as to preserve the "in the trenches" perspective I think adds value to this subject.

Defining the VPC
When you define a VPC you get one chance to define the CIDR Block, or IP space. They're free, and it's very likely you will provision or at least define many VPCs if you work in the Enterprise Cloud space.  Although the VPC is free, if you need to change it, the only way to do so is to delete it and recreate it, or create a new VPC with the CIDR range you need, and then migrate your instances and certain other stuff to the new VPC. It's a lot of work and if you're working in a corporate environment leery of automation, people will really scream and moan when they find out stuff needs to change. Then they get angry and show up with torches and pitchforks like the angry villagers in a Frankenstein movie.

On the one hand you can say: "Welcome to the Cloud, my friends, wait until we have a real problem before you set this place on fire or poke each other with those sharp sticks," but enterprise journey to Cloud offers many opportunities to tease people, and for now it's easier for you to set aside this sort of early, delectable brush with enterprise IT culture for a later point in time. At this stage you're the one waiting because you have no VPC and you can't build much until you get this one up and running.

First Lesson(s) Learned 
(They were more or less clustered together, but I'll count them as One)
Choose a CIDR Block that is significantly larger than your network team will offer you. "Choose" is a loaded word in the enterprise. Likely somebody else will choose for you, or will see your request and decide there's a better way. Your request for a /21 network will likely be met with disbelief, and in the name of saving string, somebody will quickly alter the request to a mostly useful /24. As you will see once you start working with a subnet calculator, a /24 is small (256 ip addresses).

Lots of People in the Enterprise Are Eager to Selectively Review and Even Alter Your Solution Design, You Need to Communicate not Just What you Need, But Why It's Necessary
At first glance people not familiar with AWS will simply try to determine how many servers a given business segment or application will deploy. Even though the IP addresses are free and there are a lot of them, data center network teams try to avoid issues like address ranges that overlap with those of business partners, and a dark day when perhaps internal IP addresses will be scarce. Explain to them that you need a lot more IP space than at first glance might be understood. Don't get into a network discussion that you will lose, be vague and state that for AWS this is what's required. Ask them to do the math, when they return, tell where they are wrong based on the following.

1. In a VPC you must map a separate subnet to each availability zone where your application will run instances.
2. RDS requires a "subnet group" consisting of at least two subnets, and they should be in different availability zones and must be if you want to run "multizone RDS"
3. New services may be developed by AWS that require subnets. For example, RedShift requires a subnet if you want to run it in a VPC.
4. Your security team may specify that PII data cannot be placed in the same subnet as non-PII data. If that's a requirement and you don't have enough subnets, it's a problem for you.
 5. Internal ELBs (ELB that runs in a private VPC subnet and has no public interface) require ip addresses. I consider it a good practices to place AWS services that consume ip addresses in subnets separate from other services and instances.

ELB in a VPC Requires Public IP Subnets
In 2012 for most of the year the AWS Elastic Load Balancer (ELB) required a minimum sized subnet in each availability zone where it would "run." The reason is that ELB, even though its a service, actually runs instances. Those instances require ip addresses in each public subnet of each availability zone in a VPC. In 2012 the minimum for a public facing ELB was something like 104 ip addresses per public subnet per availability zone. So for me that meant three availability zones at a minimum of 104 ip addresses per subnet. A kind soul who had looked over my solution wisely concluded that based on the number of expected instances, I only needed around 250 ip addresses in the CIDR range. Yet even after defining the public subnets, I still needed subnets for the remainder. In any event, we had to ask for a new VPC and this and that and it was a really big headache. So now you know. Today the public ELB doesn't require as many IP addresses. However keep in mind that the word Elastic, in the acronym ELB implies expanding and contracting. And also, if you're using ELB the only other instance type I would conceivably place in the public subnet is a NAT instance. Everything else can be placed in private subnets. And nothing other than the NAT instances should be competing with the ELB for ip addresses in any of the public subnets.

[Stay tuned for Second Lesson Learned to be published later this week...]