IaC and Security Best Practices for Large Scale Cloud Migrations
In this case study
About
A large multinational organization in the manufacturing sector worked with WWT to design and implement a secure, multicloud network architecture as part of an upcoming large-scale workload migration into multiple AWS environments. Their goal was to migrate thousands of virtualized server workloads running in their on-premises data centers and colocation facilities around the globe to VMware Cloud on AWS (VMC).
Challenge
The customer has a presence in multiple public clouds including Azure, Oracle, AWS and GCP, and was in search of a secure, scalable, resilient multicloud networking architecture that can support the migration of thousands of workloads from their data centers into VMC in AWS. The multicloud attribute was important because they were aiming for an operational model that would enable them to leverage their existing investments in IT Operations (Ops) training and experience with VMware vCenter, Palo Alto Next-Generation Firewalls (NGFWs) and F5 BIG-IP application delivery controllers (ADCs). The architecture was intended to be operated by the same IT Ops teams across AWS and Azure to start with. As such, having a similar solution pattern leveraging repeatable solution components was of great interest to them. Another challenge the customer faced was how to effectively scale the operational components of the architecture and ensure that the pattern is repeatable between environments. They were experimenting with various IaC providers and decided that they would like to explore how to leverage Hashicorp's Terraform as a deployment tool for this new architecture.
Solution overview
The customer learned that WWT's Cloud Architects have helped other customers setup this type of environment and WWT has a long history of capabilities surrounding the ISVs that they were looking to leverage. WWT holds partnerships with AWS, VMWare, Palo Alto and F5, so we offered a great deal of expertise to bring to the table to support their efforts. As part of an initial engagement model, we typically like to host a workshop (or discovery session) that enables our architects to get ramped up to speed as quickly as possible. This allows us to accelerate our time to value by exploring the most critical asks from the customer, the challenge points and frustrations that they are experiencing on a technical level.
For this customer, WWT organized a two-day workshop, where SMEs from both the customer and WWT engaged in technical discovery, investigation and theoretical whiteboard sessions. These sessions involved key members across both teams which involved experts from cloud networking and security. The typical outcome of these initial accelerations results in a high-level design (HLD) intended to meet the customer's business requirements, technology objectives and timelines. WWT, leveraging its partnership with Palo Alto and F5 (and past experiences with deploying a scalable multi-account architecture with Ingress and Egress revolving around these technologies in AWS), provided an enhanced adaptation of Palo Alto's published guide, Securing Applications in AWS Transit Gateway. The overall direction of the design consisted of the following components:
- AWS Transit Gateway (TGW) as the core IPv4 Layer router.
- Security VPCs for Egress traffic, East-West traffic and Ingress traffic.
- Palo Alto NGFWs for traffic inspection and filtering.
- F5 Big-IP appliances to act as a Web Application Firewall layer (WAF).
To expand upon this further, we will begin with the architecture of the constructs connecting to the AWS TGW. AWS TGW allows multiple connections via attachment types, which can be connected to network constructs in AWS such as VPCs, VPNs and Direct Connect Gateways. AWS TGW aims to reduce network complexity within AWS and is highly scalable from an operational and implementation perspective. As part of the implementation, WWT deployed VPCs for Egress, East-West and Ingress traffic that acted as individual Security Domain VPCs, and each of these Security VPCs were attached to the TGW.
The Egress Security VPC functions as the Egress Internet edge security suite responsible for the security inspection and load balancing of all application connections initiated by AWS-hosted clients (VMC and VPCs). Return traffic flows through the same inspection devices in the reverse direction thanks to source NAT (SNAT) policies. They are deployed independently of one another in an Active/Active configuration, share no state and operate in independent life cycles.
The deployed East-West Security VPC functions as the Intranet-only security suite responsible for the security inspection of all traffic between Intranet hosts. Just as the Egress Security VPC allows return traffic from the same inspection device, a similar pattern is found in the East-West VPC. This is because there is only ever one NGFW receiving traffic at any time. In addition, like the East-West VPC, the NGFWs share no state and operate in independent life cycles. Unlike the Egress VPC however, the NGFWs are deployed independently in an Active/Passive configuration. This is achieved by leveraging BGP AS Path Prepending to ensure that only one NGFW is ever selected for traffic forwarded to it by the TGW BGP Best Path Selection algorithm at any given time. Due to this design, no NAT policies are required on the NGFWs.
Traffic destined for the internet will pass through one of the sets of Palo Alto NGFWs in both the Egress and East-West VPCs, and these NGFWs will provide for L3 + L4 stateful packet inspection, deep packet inspection, threat-prevention and anti-malware services. In addition, as mentioned previously, BGP sessions are established from the TGW to the Palo Alto NGFW's to provide for both VPN Equal Cost Multipathing (ECMP) and link resiliency (redundant VPN endpoints). ECMP allows links that have similar costs within the routing protocol to be used to send traffic equally across between the sender and receiver. More specifically, ECMP possesses intrinsic load balancing qualities to ensure that internet-destined traffic arriving at the TGW from an internal source can simply be distributed in a round-robin traffic pattern to the NGFWs based upon a simple hash-based algorithm.
One of the concerns for this customer was security, and as such the Ingress VPC added additional components to mitigate and address these concerns, with the intention and purpose to provide additional visibility and security. Given the nature of the deployment, WWT worked closely with other partner solutions vendors (F5 and VMWare) to complete this design.
The Ingress Security VPC for this customer acted as the ingress Internet edge security suite responsible for the security inspection and load balancing of inbound application connections initiated by internet clients to backend services (VMC and elsewhere). Traffic from the Internet will first pass through an AWS Network Load Balancer (NLB) where it will be L4 load balancing to a pool of F5 BIG-IP appliances which function as Web Application Firewalls (WAFs) and termination point for HTTP and HTTPS traffic. The F5s leverage LTM services to load balance traffic towards the Palo Alto NGFWs in a balanced manner and adjust the load as the NGFWs scale horizontally. An updated reference architecture based around the AWS Gateway Load Balancer (GWLB) was recently released by Palo Alto and will appear in a future article.
As part of this implementation, it was required to connect this new network architecture to a pre-existing VMC on AWS environment. Working with the customer, we established VPNs from the TGW to VMWare edge routers with the TGW from the NSX-T Tier 0 (T0) edge routers and established an AWS Direct Connect (DX) link that terminated at the T0 router to handle VM migration traffic. The DX in the customer's Equinix data centers also connected to a Direct Connect Gateway (DXG) in the nearest AWS region. This in turn connects with the TGW, resulting in flexible routing capabilities from any VPC or VMC instance that is routing from this TGW to an on-prem network.
Conclusion
The outcome of this work resulted in a successful deployment and repeatable architecture pattern, as well as Infrastructure as Code that could be applied across accounts. In addition, it allowed the customer to feel confident in their ability to leverage this architecture for their future migration of VMware workloads efforts. As the customer desired deployments to additional regions, an additional engagement followed, and this architecture was made completely deployable using Terraform. In addition to having this architecture deployed to other regions around the globe, the customer was also able to leverage this architecture for Microsoft Azure along with accompanying Terraform source code, to duplicate the successes of this engagement.
As AWS and our ISV partners continue to release new features and capabilities, WWT will continue to identify opportunities to ensure that each of our customer's digital transformation journeys continues to deliver more value and help our customers to meet and exceed their own organization's transformation goals as well.