Migrating AWS infrastructure from Terraform to AWS CDK

October 3, 2023
Rss Fetcher

Migrating AWS Infrastructure From Terraform to AWS CDK

Sharing my hands-on experience

A data center. Image generated by Midjourney

Introduction

Managing cloud infrastructure has evolved with tools like Terraform and the AWS Cloud Development Kit (AWS CDK) leading the way. Terraform has been a trusted ally for many, helping to set up and manage infrastructure smoothly. But now, AWS CDK is stepping up, promising more features and easier ways to handle AWS services.

If you are a cloud architect or an IT professional familiar with Terraform and curious about what AWS CDK offers, you’re in the right place. In this article, I will share my personal experience based on a real project and walk you through a shift from Terraform to AWS CDK, a move many are considering due to AWS CDK’s flexibility and straightforward approach to managing AWS services.

Although no deep knowledge of Terraform and AWS CDK is required to read this article, at least some fundamental knowledge of the mentioned tools will make understanding much easier.

Despite the fact that this article covers specifically migration from Terraform, same principles apply for migration to CDK from any other IaC tool or manually created resources

You’ll see how you can plan and execute the migration without issues. I’ve designed this article to be practical, with tips from real experiences to help you make informed decisions.

For those of you who want to skip the introduction and migration planning part, feel free to go directly to Import Existing Resources into the CDK section and its TL;DR part.

Background

Terraform and AWS CDK

In the world of cloud computing, Terraform and AWS Cloud Development Kit (AWS CDK) are like powerful engines behind the scenes, both being popular IaC tools allowing you to have repeatable deploys of the infrastructure and automate routine operations.

Terraform has been around for a while. It’s a tool that helps you build and change your infrastructure easily, using a simple configuration language, which used to be called HCL (HashiCorp Configuration Language), but it seems like all mentions of HCL were wiped out from the documentation, and now it is simply Terraform Language.

On the other side, AWS CDK by Amazon takes IaC a bit higher, not only allowing you to work with familiar programming languages like Python, JavaScript, and others but also providing higher-level constructs and pattern implementations. Together, these allow you to focus more on the business logic rather than on the technical details of implementation.

The goal of migrating from Terraform to AWS CDK

When we talk about moving from Terraform to AWS CDK, we aim to step up our game in managing cloud services. It’s all about moving to a more integrated platform with AWS, where you can use programming languages you are already familiar with, which means you can do more with less effort.

Migrating isn’t just about switching tools; it’s about adopting a system that can save time, reduce errors, and allow for a more intuitive cloud infrastructure setup. You retain all the essential features but gain much more in functionality and ease of use.

Levels of abstraction in AWS CDK

Before diving deeper, it is essential to remind you about the levels of constructs in the AWS CDK library.

A remarkable feature of AWS CDK is its library of constructs, essentially breaking down into three levels known as L1, L2, and L3 constructs, each offering a different degree of abstraction and control.

Starting with L1 constructs, these are the basic building blocks representing AWS cloud resources at the fundamental level, offering granular control but requiring more coding effort. L1 constructs maps one-to-one to AWS CloudFormation’s resources, and these are effectively autogenerated CDK wrappers around low-level CloudFormation resources. They are named CfnXyz, where Xyz is the name of the resource.
For example, CfnBucket represents the AWS::S3::Bucket CloudFormation resource. Picture these L1 constructs as the individual bricks in a Lego set, allowing for detailed customization but demanding more time and precision.
Next up, L2 constructs take a step further by bundling L1 constructs into higher-level components with a higher-level, intent-based API, giving you preconfigured defaults, boilerplate, and glue logic. This is like having a Lego kit with sections of the castle pre-assembled, so you have fewer pieces to work with, saving you time while offering a good amount of customization.
Lastly, we have L3 constructs, which are even more abstracted, providing ready-made patterns that encapsulate broader functionalities designed to help you complete common tasks in AWS, often involving multiple kinds of resources. It is like buying a fully assembled Lego castle that you can use straight away, though with limited customizability.

Reusability comes into play significantly here, especially with L2 and L3 constructs, allowing you to reuse patterns and architectures across different environments, ensuring consistency while saving time and effort. This hierarchical setup means you can choose the right level of abstraction for your project, balancing between control and convenience, thereby working more efficiently and with less room for error. It’s all about working smart and leveraging AWS CDK’s flexibility to seamlessly build and scale your cloud infrastructure.

When writing AWS CDK code, you should usually aim for using L2 or L3 constructs unless you have very good reason to use L1 constructs, which might give you a bit more flexibility but usually require much more effort. But as we will see below, when you are migrating existing resources, you have no option but to go with L1 constructs for exactly the reason of flexibility. As L1 constructs are mapped one-to-one to CloudFormation resources, you can describe your already deployed resources only using L1 constructs.

Planning the Migration

Identifying and cataloging current resources managed by Terraform

Before you start thinking about the big move, it’s essential to know exactly what you’ve got in your current setup: you need to list down all the elements currently managed by Terraform. It could be various AWS services, databases or their configurations, networks — everything needs to be on this list.

But don’t just stop at listing them. It’s a great practice to categorize them into different groups, maybe based on their functions or how critical they are to your operations. This way, you have a clear picture of your existing setup and can identify potential challenges in the migration process later.

Also, it is essential to identify all the dependencies between resources — what depends on what, what is crucial to plan a migration sequence, and what should be migrated first and what may be processed later.

Dependencies are usually clearly visible in Terraform files either by referencing one resource from another or by explicitly defining them in the depends_on resource attribute.

It is very important at this stage to not only examine your Terraform files but also to investigate the actual state of the terraform state list command. Despite Terraform files describing low-level resources, something might still be created under the hood, and identifying these resources early will help you avoid issues when writing CDK code.

Also, don’t forget to investigate the AWS Console. Are there any resources that are not known to Terraform?

Define your stacks

If your workload is at least a little complex, then it will make sense to separate resources into different stacks, migrate stack by stack, and propagate stacks over environments instead of one big migration of everything together. For example, you may want to have all your networking resources in one stack, database and storage in another, business logic in the third, and pipeline in the fourth. After you’ve done this, you may choose another principle to define your stacks.

Two Types of Migration

At this point, you should have a complete list of resources for migration, understand their dependencies, and in which stacks they will be grouped. Now it’s time to select a migration type for each of them based on whether you can afford to destroy and recreate this particular resource or not.

Destroy and recreate

All the resources you can afford to be temporarily deleted should be migrated this way. It will decrease the scope of an import migration (we will look at import below). Destroy and recreate migration means you will destroy selected resources with Terraform and recreate them with CDK.

This means that ARNs (AWS IDs) of these resources will change, so all references to these resources must be updated. This type of migration has the advantage of possibly using any level of constructs in the CDK code — L1, L2, or even L3. Also, you may use this opportunity to fix some technical debt, update to newer software versions, etc. However, destroying a resource means you will lose all the data in this resource unless you make a backup. You can restore this backup on a newly created resource later. Going with this type of migration is usually possible for the following resources only:

Resource has no data (i.e., WAF or something), or data can be restored from a backup after resource recreation
No other resources depend on this resource, or they also may be destroyed and recreated
You can tolerate the temporary disappearance of this resource, as destroy and recreation will take some time. How long it will take depends on the resource type and many other factors.

If all these conditions are not met, you have no choice but to import existing resources into the CDK.

Import Existing Resources Into AWS CDK

This is probably the most important part of the article because, let’s be honest — if you could delete all your resources and recreate them with CDK, you probably wouldn’t read this.

The usual way to work with CDK is to write CDK code, which will create your resources from scratch. But for those who already have some resources created, there is a special command cdk import which will import your existing resources into the stack. This is how you can put existing resources under CDK control.

Note: As of writing this, every time I run cdk import, it warns me that import is still an experimental feature. Therefore, it must be used with caution. However, I never experienced any problems using it.

The main limitation of the cdk import command is that it can only import resources into an existing stack.

This means you must start by deploying a new stack, but as CDK refuses to deploy empty stacks, I usually define some dummy resource in it, i.e., DynamoDB table or anything else. After deploying it, I’ll import resources and delete that temporary DynamoDB table.

But if you previously identified resources that do not need to be imported and maybe just destroyed them with Terraform and recreated them with CDK, it could be the right place for them. In that case, you do not need to put a temporary resource and may define your real resources here.

Let’s start by defining a stack with a temporary resource.

Note: I will be using Python here, but as usual with CDK, you are free to choose from .NET, Go, Java, Python or TypeScript.

class NetworkStack(Stack):
    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        *,
        parameters,
        **kwargs,
    ) -> None:
        super().__init__(scope, construct_id, stack_name="NetworkStack", **kwargs)
        dynamodb.Table(
            self,
            "TempResource",
            table_name="temp-resource",
            partition_key=dynamodb.Attribute(
                name="id", type=dynamodb.AttributeType.STRING
            ),
            # Need to specify removal policy, so it will be actually
            # destroyed later
            removal_policy=RemovalPolicy.DESTROY,
        )

I am specifying removal_policy=RemovalPolicy.DESTROY because CDK will keep the DynamoDB table to prevent data loss by default and run cdk destroy later. As this is a temporary resource, we will not have any data, so it may be deleted.

RemovalPolicy is an abstraction in CDK that controls what happens to the resource if it stops being managed by CDK and CloudFormation. Also, it translates it to the CloudFormation DeletionPolicy attribute. This clash in naming convention — deletion policy vs removal policy — sometimes adds a bit of confusion, but basically it is the same thing named differently.

Now run cdk deploy to deploy it.

Congratulations! Now you have a stack with your temporary resource deployed and can fill it with definitions of resources you want to import.

In this step, it is very important to model the state your resources currently have. You must use L1 constructs because only L1 constructs give you the level of detail you need to describe your existing resources.

Remember, L1 constructs are nothing but automatically generated wrappers around CloudFormation resources, so everything you have deployed can be described with L1 constructs.

L2 and L3 constructs are on a higher level. While they’ll save you time when defining your infrastructure from scratch, they won’t help in this situation, as your deployed resources don’t follow their internal logic.

Note: When adding resources for import, do not add any other changes to your stack! Currently CDK cannot process change of existing resources and import new resources at the same time.

For example, let’s say we want to import a VPC. To do that, we have to add the following CfnVPC construct to our code:

class NetworkStack(Stack):
    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        *,
        parameters,
        **kwargs,
    ) -> None:
        super().__init__(scope, construct_id, stack_name="NetworkStack", **kwargs)
        dynamodb.Table(
            self,
            "TempResource",
            table_name="temp-resource",
            partition_key=dynamodb.Attribute(
                name="id", type=dynamodb.AttributeType.STRING
            ),
            # Need to specify removal policy, so it will be actually
            # destroyed later
            removal_policy=RemovalPolicy.DESTROY,
        )

        # importing VPC
        ec2.CfnVPC(
            self, "VPC", cidr_block=parameters["vpc_cidr"]
        )

Now, run cdk import. It will scan your code and AWS environment, understand the DynamoDB table is already in the stack, and import VPC.

At this stage, depending on the resources you are importing and details you have specified in your code, it may ask you for more information, such as IDs, if something is missed and needs clarification.

Congratulations! Now your resources are imported into CDK, and any changes to the code will be reflected on resources in the cloud. Now, it is a good idea to check for a stack drift. Ideally, there should be no drift, but sometimes you’ll find that you forgot to add some meta-information in your stack. This information may be tags or a deletion policy, and it is probably not too late to fix it.

That’s it! The only thing left to do is remove that temporary resource (DynamoDB table) from your stack code. After that, you can deploy the stack again to apply this change and remove this resource from the cloud.

Sounds easy, right? But sometimes, certain resources may not be supported for import due to their nature. For example, route tables and routes must be deployed from scratch. So if you are importing network resources (i.e., subnets, ACLs, VPC FlowLog, etc.) after importing VPC, you’ll have to add route tables and routes to your code and execute one more deploy step to get them into the cloud. This step may be combined with the deletion of the temporary resource.

As you can see, there are a few steps to run cdk deploy and cdk import sequentially. To make the code clear and the execution repeatable, I usually try it all in another environment with temporary resources — just deploy part of the Terraform infrastructure to another account and try my CDK code until I have a nice, repeatable process.

To make the same code run at different stages, I usually introduce an enum variable that enables certain parts of the code. After that, I change the value of the variable in config or environment or CDK context to indicate that the next part of the resources should be enabled for the next stage of import or deployment.

The same principle can be used to create a stack to import resources, but deployment to a new environment is also possible. This can help you create multiple identical environments from resources you already have deployed.

class NetworkImportStages(Enum):
    """Defines stages of import, ignored for deployment from scratch"""

    DEPLOY_TEMP_RESOURCE = 10
    IMPORT_CORE_RESOURCES = 20
    DEPLOY_ROUTES = 30
    NO_IMPORT = 1000


class NetworkStack(Stack):
    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        *,
        parameters,
        **kwargs,
    ) -> None:
        super().__init__(scope, construct_id, stack_name="NetworkStack", **kwargs)

        if parameters["import_stage"].value < NetworkImportStages.DEPLOY_ROUTES.value:
            dynamodb.Table(
                self,
                "TempResource",
                table_name="temp-resource",
                partition_key=dynamodb.Attribute(
                    name="id", type=dynamodb.AttributeType.STRING
                ),
                removal_policy=RemovalPolicy.DESTROY,
            )

        if parameters["import_stage"].value >= NetworkImportStages.IMPORT_CORE_RESOURCES.value:
            vpc = ec2.CfnVPC(
                self, "VPC", cidr_block=parameters["vpc_cidr"]
            )

        if parameters["import_stage"].value >= NetworkImportStages.DEPLOY_ROUTES.value:
            public_route_table = ec2.CfnRouteTable(
                self,
                "PublicRouteTable",
                vpc_id=vpc.ref,
            )
            # other route tables and routes here

TL;DR

You can use the following summary of everything written above as a cheat sheet:

Identify all resources, their dependencies and relations for your migration. Group, prioritize, and plan what you will migrate first and what you’ll do next. When identifying resources, look not only in Terraform files but also don’t forget about the actual state
Select migration type for each resource: decide which resources you may destroy with Terraform and recreate them from scratch from CDK. To do this, you will need to import with cdk import command
When identifying the scope of work for cdk import, it often makes sense to group resources in different stacks rather than have everything defined in one stack.

Then, for each stack:

If there are any resources you identified that can be destroyed with Terraform and recreated with CDK, it is time to destroy them.
Deploy a new stack with resource(s) you destroyed in the previous step. If there were no such resources, define some temporary resource in the stack, i.e., DynamoDB table or anything else. We do not need this resource and will delete it later; it is here only to have something defined in a stack because CDK refuses to deploy empty stacks.
Add L1 constructs of resources you want to import into your stack code. Run cdk import. It may ask you for some additional information, such as IDs, etc.
Check for stack drift (see below)
Fix your stack if there are any drifts. Often, some tags or deletion policies are forgotten
If you have defined a temporary resource in the stack, it is time to remove it from the code and deploy stack changes to remove a resource from the cloud

How to check for a stack drift

1. Run aws cloudformation detect-stack-drift — stack-name <your_stack_name>. It will return stack drift detection ID

2. Run aws cloudformation describe-stack-drift-detection-status — stack-drift-detection-id <id> with the ID from the previous step. When it shows that drift detection is complete, run:

3. aws cloudformation describe-stack-resource-drifts — stack-name <your_stack_name> to see if any drift was found

Migrating AWS infrastructure from Terraform to AWS CDK was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.