Replacing AWS with Docker Swarm on a VPS

Developers like @levelsio have been running entire businesses on single VPS instances for years. 37signals moved off the cloud and saved $2M annually. The pattern is the same every time - teams realise the AWS bill is too high for what they’re running at modest scale - a few services, a database, a queue. The kind of setup that should cost almost nothing but somehow runs to hundreds a month on AWS.

I wanted to test this properly. Not just move a static site or self-host a Next.js app - I’ve written about both of those - but run a real multi-service stack: API, background workers, database, queue, web frontend, SSL, load balancing. The full stack.

So I built an uptime monitor to test a realistic setup. The code is on GitHub.

The stack

The uptime monitor does what you’d expect. You add URLs to watch, it pings them on a schedule, records the results, and shows a status page. Simple enough to build in a day, complex enough to need every common infrastructure component.

  • API - Hono running on Node
  • Background worker - BullMQ / Redis
  • Database - Postgres
  • Web - Astro SSR status page
  • Proxy/SSL - Traefik with automatic Let’s Encrypt
  • DNS - Cloudflare

The AWS version

There are different ways to architect this on AWS. Two common ones:

Container-based (ECS Fargate):

  • ALB for routing and SSL
  • ECS Fargate for the API and worker
  • NAT Gateway for outbound traffic from private subnets
  • RDS Postgres
  • ElastiCache Redis
  • CloudFront, Route 53, CloudWatch

Serverless (Lambda):

  • API Gateway
  • Lambda for the API and check workers
  • EventBridge for scheduling checks
  • RDS Postgres
  • ElastiCache Redis
  • CloudFront, Route 53, CloudWatch

The serverless route looks cheaper on paper but still needs RDS and ElastiCache running 24/7. Lambda cold starts add latency. And the NAT Gateway problem doesn’t go away - Lambdas in a VPC still need one for outbound access.

Either way, conservative total: $100-150/month - and that’s before any real traffic. The NAT Gateway alone is $32/month so containers in private subnets can make outbound HTTP requests.

The Swarm version

A single Hetzner CX32 (4 vCPU, 8GB RAM): $7/month.

For a production setup with redundancy, three nodes: $21/month.

Everything runs on Docker Swarm. Traefik replaces the ALB. Postgres and Redis run as containers with mounted volumes. The API, worker, and web frontend each get two replicas spread across nodes.

The docker-compose.swarm.yml is the entire infrastructure definition:

services:
  traefik:
    image: traefik:v3.3
    command:
      - --providers.swarm=true
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.le.acme.httpchallenge.entrypoint=web
    ports:
      - "80:80"
      - "443:443"

  api:
    image: ghcr.io/you/swarm-uptime-api:latest
    deploy:
      replicas: 2
      labels:
        - traefik.enable=true
        - traefik.http.routers.api.rule=Host(`api.status.example.com`)
        - traefik.http.routers.api.tls.certresolver=le

  worker:
    deploy:
      replicas: 2

  web:
    deploy:
      replicas: 2
      labels:
        - traefik.enable=true
        - traefik.http.routers.web.rule=Host(`status.example.com`)

  postgres:
    image: postgres:17-alpine
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:8-alpine

The entire infrastructure is defined in a single compose file. Deploy with docker stack deploy -c docker-compose.swarm.yml uptime.

What maps and what doesn’t

AWSSwarm equivalent
ALBTraefik - auto SSL, path/host routing
ECS/FargateSwarm services with replicas
RDSPostgres container with a volume
SQS + LambdaBullMQ + worker container
NAT GatewayNot needed - containers have direct outbound
CloudFrontCloudflare free tier in front
VPCDocker overlay network
CloudWatchDocker logs, optionally Grafana

Most of the mapping is straightforward. The one gap is Lambda. There’s no clean equivalent of “run this code in response to an event, scale to zero when idle.” But for a background worker that processes a queue, a container that stays running and polls is simpler and cheaper. You’re paying flat rate for the VPS anyway.

Where this falls apart

Postgres in a container works fine until the node dies. RDS gives you automated failover and point-in-time recovery without thinking about it. On a VPS you’re managing backups yourself.

If you need multi-region, a VPS in Helsinki isn’t serving users in Tokyo well. AWS regions exist for that.

Lambda costs nothing when idle. If you have sporadic workloads - a function that runs once an hour for 200ms - the per-invocation model wins. A container sitting idle on a VPS is cheap but not free.

DDoS protection needs Cloudflare in front. AWS Shield is built in.

Where AWS falls apart

$21/month for a 3-node Swarm cluster vs $100-150/month on AWS for the same workload. For small teams with predictable traffic, that’s a 5-7x premium without a matching benefit.

A CDK deploy through CloudFormation takes 10-20 minutes. docker stack deploy takes seconds.

Debugging on AWS means CloudWatch log groups, log stream pagination, 15-minute metric delays. On a VPS it’s docker service logs api.

When to use which

For production systems supporting a scaled business, AWS or another cloud provider is safer, more powerful, and more flexible. But most personal projects and MVPs I’ve seen on AWS don’t need it. They need a server, a database, and a queue. The Swarm setup covers that for $21/month.

The full code is at github.com/danieljohnmorris/swarm-uptime.