ECS on Fargate is the natural step up from AWS Elastic Beanstalk for AI engineering teams that have outgrown Beanstalk's deploy story but do not want to take on a full Kubernetes platform. Containerised LLM proxies, retrieval workers, and agent runtimes need predictable rollouts, real CI/CD, and observability that Beanstalk struggles to deliver. The migration is mechanical once the underlying primitives are in place. This is the recipe we use.
What we are aiming for
- A standard AWS infrastructure footprint: VPC, public subnets, security groups, an Application Load Balancer, and an ECS cluster
- A deploy pipeline that ships from a GitHub push to an ECS service update, without manual steps
- A publicly accessible HTTP endpoint backed by a containerised service, an AI inference worker, retrieval API, or agent runtime, running on Fargate
- Per-service auto-scaling and cost guardrails sized for AI workloads, where token spend and inference latency are first-class operational concerns
What gets created
A single bash script provisions the foundation: the VPC and its subnets, the security groups, the ECR repository for the container image, the CloudWatch log group, the ECS cluster, the task definition, the service, the target group, and the Application Load Balancer with its listener. Treat the script as code, version it, review it, and run it through the same review process as the application itself.
Two things to watch
- The container port the application listens on must match the port the load balancer routes to. Mismatches here are the most common cause of healthy infrastructure with a 502 endpoint
- Every ARN, role, account ID, and environment value must be updated for the target environment. Leaving placeholders in production is how outages happen
Why the task definition is the most important file
The ECS task definition declares the container image, the resource allocation (memory and CPU units), the network mode, the FARGATE launch type, the environment variables, the port mappings, and the CloudWatch log configuration. For AI services, also specify the secrets ARN for API keys, per-task token-budget environment variables, and any sidecar for observability. Version the file alongside the application code so every deploy ships infrastructure intent and application together.
The migration steps, in order
- Add a Dockerfile to the application repository if one does not already exist
- Provision the VPC, subnets, and security groups using the bash script (or the equivalent IaC module)
- Author the task definition file and store it in the application repository
- Run the cluster creation script to wire up the cluster, service, target groups, and load balancer
- Cut traffic over with a DNS swap once the new endpoint is verified, keep Beanstalk online for 48 hours as a rollback safety net
GitHub Actions: the CI/CD pipeline for AI services
A GitHub Actions workflow handles the deploy: build the image, push to ECR, register a new task definition revision, and update the ECS service. Enable the deployment circuit breaker so a failed deploy automatically rolls back to the last healthy task definition. For AI services, add an eval-pass-rate gate before the deploy promotes, a regression on the golden dataset should fail the build the same way a unit test would.
Use the deployment circuit breaker. The two minutes it costs to enable it pay for themselves the first time a bad deploy starts auto-rolling back instead of taking your AI endpoint offline.
When to choose ECS on Fargate
ECS on Fargate is the right migration target when the team has containerised the application, wants real CI/CD without Beanstalk's restrictions, and does not need the operational depth of Kubernetes. For AI engineering teams running LLM proxies, retrieval workers, agent runtimes, or async eval pipelines, it is the cleanest middle ground. The setup takes a day, the deploy pipeline takes another, and the resulting platform scales with the product without the platform team growing alongside it.
