Blog·Engineering
EngineeringOct 25, 2023 · 8 min read

AWS Beanstalk to ECS Fargate: AI-engineering recipe

AI services need a deploy story Beanstalk cannot give them. A practical recipe for moving a containerised AI workload to ECS on Fargate, with VPC, load balancer, and a GitHub Actions deploy pipeline.

Alessandro Merola
Alessandro Merola
CTO
Engineering

ECS on Fargate is the natural step up from AWS Elastic Beanstalk for AI engineering teams that have outgrown Beanstalk's deploy story but do not want to take on a full Kubernetes platform. Containerised LLM proxies, retrieval workers, and agent runtimes need predictable rollouts, real CI/CD, and observability that Beanstalk struggles to deliver. The migration is mechanical once the underlying primitives are in place. This is the recipe we use.

What we are aiming for

  • A standard AWS infrastructure footprint: VPC, public subnets, security groups, an Application Load Balancer, and an ECS cluster
  • A deploy pipeline that ships from a GitHub push to an ECS service update, without manual steps
  • A publicly accessible HTTP endpoint backed by a containerised service, an AI inference worker, retrieval API, or agent runtime, running on Fargate
  • Per-service auto-scaling and cost guardrails sized for AI workloads, where token spend and inference latency are first-class operational concerns

What gets created

A single bash script provisions the foundation: the VPC and its subnets, the security groups, the ECR repository for the container image, the CloudWatch log group, the ECS cluster, the task definition, the service, the target group, and the Application Load Balancer with its listener. Treat the script as code, version it, review it, and run it through the same review process as the application itself.

Two things to watch

  • The container port the application listens on must match the port the load balancer routes to. Mismatches here are the most common cause of healthy infrastructure with a 502 endpoint
  • Every ARN, role, account ID, and environment value must be updated for the target environment. Leaving placeholders in production is how outages happen

Why the task definition is the most important file

The ECS task definition declares the container image, the resource allocation (memory and CPU units), the network mode, the FARGATE launch type, the environment variables, the port mappings, and the CloudWatch log configuration. For AI services, also specify the secrets ARN for API keys, per-task token-budget environment variables, and any sidecar for observability. Version the file alongside the application code so every deploy ships infrastructure intent and application together.

The migration steps, in order

  • Add a Dockerfile to the application repository if one does not already exist
  • Provision the VPC, subnets, and security groups using the bash script (or the equivalent IaC module)
  • Author the task definition file and store it in the application repository
  • Run the cluster creation script to wire up the cluster, service, target groups, and load balancer
  • Cut traffic over with a DNS swap once the new endpoint is verified, keep Beanstalk online for 48 hours as a rollback safety net

GitHub Actions: the CI/CD pipeline for AI services

A GitHub Actions workflow handles the deploy: build the image, push to ECR, register a new task definition revision, and update the ECS service. Enable the deployment circuit breaker so a failed deploy automatically rolls back to the last healthy task definition. For AI services, add an eval-pass-rate gate before the deploy promotes, a regression on the golden dataset should fail the build the same way a unit test would.

Use the deployment circuit breaker. The two minutes it costs to enable it pay for themselves the first time a bad deploy starts auto-rolling back instead of taking your AI endpoint offline.

When to choose ECS on Fargate

ECS on Fargate is the right migration target when the team has containerised the application, wants real CI/CD without Beanstalk's restrictions, and does not need the operational depth of Kubernetes. For AI engineering teams running LLM proxies, retrieval workers, agent runtimes, or async eval pipelines, it is the cleanest middle ground. The setup takes a day, the deploy pipeline takes another, and the resulting platform scales with the product without the platform team growing alongside it.

Next article
Engineering
Why I moved AI workloads from Beanstalk to Fargate
Available for new partnerships

Ready to build your next product?

Tell us about your project. We'll respond within one business day with next steps.

We use cookies

We use essential cookies for the site to work, and analytics cookies (Google Analytics) to understand how you use it. Cookie Policy.