In this article, we’ll look at a process you can use to begin containerizing “legacy” software. While no two products will be the same, and the term “legacy” is subjective, we’ll focus on broadly applicable steps for packaging tightly-coupled systems currently tied to individual environments.
1. Identify Candidate Systems
It’s worthwhile first preparing a systems inventory that lets you identify good candidates for containerization. In some cases, you might conclude a particular application simply can’t be containerized. This will usually be when it’s got deeply ingrained hardware requirements or relies on obsolete kernel features and programming languages.
The best candidates are frequently used systems which will immediately benefit from accelerated future development. Look for applications which are already fairly self-contained if you’re completely new to containerization. Selecting a system which is well-used but not mission-critical will give you leeway if things go wrong while allowing you to recognize the benefits of a successful migration.
2. Componentize The System
You could containerize your candidate system by writing a Dockerfile, including all the application’s dependencies, and calling it a day. While this is a valid way to quickly get a system into a container, it shouldn’t be the final goal of your efforts. A monolithic container will result in long builds, huge image sizes, and poor scalability.
Instead you should look for opportunities to split each of your systems into individual components. Those components should end up in their own containers, preventing any single piece from becoming too large. You’ll be able to scale the components individually by creating extra replicas of resource-constrained containers.
This step’s also important in establishing overall modularity and encouraging further container adoption. As you separate out more systems into their components, you’ll begin to find overlaps that let you reuse container images you’ve already created. You’ll notice it becomes gradually easier to continue containerizing.
Deciding where to split up components shouldn’t feel too taxing. Begin by identifying where the system relies on services that are already external to its source code. Database connections, message queues, email servers, proxies, and gateways should all be independent of the component they augment. You’ll separate these into their own containers that sit alongside the instance running your code.
It’s also worthwhile looking for opportunities to refactor what’s left. Does your service have too many responsibilities that could be parceled off as separate functional units? You might have a user profile API that accepts photo uploads; the service which resizes those photos could be a good candidate to run autonomously in its own container.
3. Prepare Your Components
After separating out components, you need to prepare them to operate in a containerized environment. Containers have several key differences compared with traditional VMs. Persistent storage, configuration, and links between components are the three most important to consider upfront.
Persistent Storage
Containers are ephemeral environments. Filesystem modifications are lost when your containers stop. You’re responsible for managing your application’s persistent data using the mechanisms your container runtime provides.
In the case of Docker, volumes are used to persist data outside your container instances. Volumes are mounted to specific paths within containers. To avoid having to mount dozens of volumes, it’s best to concentrate your application’s data within a few top-level directories. Mounting volumes to those locations will guarantee persistence for the files your application stores.
It’s important to audit your application’s filesystem interactions to understand which volumes you need and any problems you’ll encounter. Not paying attention to this step could be costly if data you assume to be persisted is lost each time a container restarts.
Managing Configuration
Many legacy applications are configured using static config files. These might be in a dedicated format, such as XML, JSON, or INI, or coded using the system’s programming language.
Containers are normally configured by external environment variables. Variables are defined when containers are created, using mechanisms such as Docker’s -e flag with docker run. They’re injected into the running container’s environment.
Using this system ensures you can rely on your container toolchain to set and change config parameters. You might have to refactor your application first to support reading settings from environment variables. One common way of easing the transition is to place a small script inside the container’s entrypoint. This can enumerate environment variables upon container creation and write them into a config file for your application.
Links Between Services
Containerization also makes you think about inter-service networking. Services aren’t generally exposed to each other except by explicit configuration. You can set up automatic linking in Docker by joining multiple containers to the same Docker network. This offers a service discovery function that lets containers reach each other by name.
Other containerization technologies use different approaches to networking and service discovery. Having separated out your systems into individual components, you need to tie them back together using the facilities offered by your runtime. The nature of containerized deployments means there’s often more complexity than networking between VMs or physical hosts. Traffic needs to be routed and load balanced between all your container replicas and their dependencies so you should acknowledge these requirements early on.
4. Write Your Dockerfiles
Once you’ve planned out your architecture, you can start the physical work associated with containerization. The first step is to write Dockerfiles for your application’s components. These define the sequence of commands and actions that create a filesystem containing everything the component needs to run.
Dockerfiles start with an appropriate base image referenced by a FROM statement. This is commonly an operating system (ubuntu:20.04, alpine:3) or a pre-built programming language environment (php:8, node:16). You can choose the image that best matches your application’s existing environment. Starting from an empty filesystem is possible but not usually necessary unless you need extremely granular control.
Additional content is layered onto the base image by instructions like COPY and RUN. These let you copy in files from your host and run commands against the build’s temporary filesystem. Once you’ve written your Dockerfile, you can build it with the docker build -t my-image:latest . command.
5. Set Up Orchestration
Assuming you’ve componentized your system, you’ll end up with one container image for each piece. Now you need a way of bringing up all the containers simultaneously so you can conveniently start a functioning application instance.
Larger production installations commonly use Kubernetes for this purpose. It’s a dedicated orchestration system that adds its own higher-level concepts for creating replicated containerized deployments. Smaller systems and development environments are often well-served by Docker Compose, a tool which relies on simpler YAML files to start a “stack” of several containers:
A docker-compose.yml file lets you start all its services using the docker-compose binary:
Setting up some form of orchestration makes your container fleet more manageable and facilitates scaling via replication. Both Kubernetes and Docker Compose are able to start multiple instances of your services, a capability which can’t be achieved with legacy applications formed from tightly coupled components.
6. After The Move: Monitoring and Expanding Your Container Fleet
Containerization doesn’t end with starting an instance of your application. To get the most from the technology, you need to properly monitor your containers to stay informed of errors and resource utilization.
Larger systems are best served by a dedicated observability platform that can aggregate logs and metrics from across your fleet. You might already be using a similar solution with your legacy app deployments but it’s even more important to containers. Good observability will let you trace problems back to the container instance they originated from, surfacing the insights that matter when you’ve got hundreds or thousands of replicas.
To keep expanding your fleet, double down on documentation and standardization. We’ve already seen how splitting systems into components aids future reuse. However this only works effectively if you’ve documented what you’ve got and how each piece fits together. Taking the time to write about your system and the process you’ve been through will streamline future work. It’ll also help new team members understand the decisions you’ve made.
Is It Worth It?
Containerization is worthwhile when you feel a system’s development is being held back by its current processes. Being able to deploy it as a set of containers simplifies the development experience and gives you more versatility in deployment. Now you can launch the service anywhere a container runtime is available, whether it’s one instance on your laptop or 1,000 on a public cloud provider.
Going all-in on containers makes it easier to harness the power of the cloud, consolidate your deployments, and reduce on-premises infrastructure costs. However, these apparent wins can be counterbalanced by the need to retrain engineers, hire new specialized talent, and maintain your containers over time.
The decision to containerize a legacy system needs to consider the value of that system to your business, the current time spent maintaining it, and the likely reduction as a result of using containers. It might be that low-priority services are best left alone if the processes associated with them aren’t causing immediate issues.
It should be acknowledged that not all legacy apps will need or be capable of using every touted benefit of containerization. Adoption is a spectrum, from running the system in a single monolithic container, through to full componentization, orchestration, and integration with observability suites. The latter model is the ideal target for business-critical applications which engineers evolve every day; conversely, the former may be adequate for rarely touched services where the prime hindrance is time spent provisioning new VM-based development environments.
Conclusion
Migrating legacy applications to containerized workflows can seem challenging on the surface. Breaking the process into distinct steps usually helps define where you are and where you want to be. In this article, we’ve looked at six granular stages you can use to approach the containerization of existing systems. We’ve also discussed some of the considerations you need to make when deciding whether to proceed.
From a conceptual standpoint, containerizing a legacy application is little different to working with a new one. You’re applying the same principles of componentization, linked services, and configuration that’s injected from the outside environment. Most systems are relatively straightforward to containerize when viewed from this perspective. Focusing on these aspects will help you decouple your applications, create scalable components, and devise an effective containerization methodology.