Orchestration vs choreography: which to choose

Third post in the series. The previous two built the same order saga two ways:

Both live in the orchestration-vs-choreography repo. Both deployed to LocalStack Ultimate and ran the same three input shapes (happy, out-of-stock, card-declined) through to the same three terminal DynamoDB states. What differs is the control plane between the API call and that final write: EventBridge rules in one, a state machine in the other.

What choreography is

A saga implemented through events on a shared bus. Key characteristics:

  • No central controller. Each service knows only which events to listen for and which to emit.
  • Coordination happens through event subscriptions, not direct calls.
  • Compensations are additional handlers subscribed to failure events.
  • The whole flow is implicit. There is no single file that describes it; you read it by following rules and subscriptions.
  • Services can be added, removed, or evolved independently as long as the event contracts hold.
  • Runs on event-routing infrastructure (EventBridge in AWS; Kafka, NATS, or RabbitMQ elsewhere).

What orchestration is

A saga implemented through a central coordinator that invokes each service in sequence. Key characteristics:

  • A single controller (a state machine, a workflow definition) defines the flow.
  • Services are invoked by the controller and return values to it. They do not know about each other.
  • Compensation is declared in the controller, typically via Catch blocks and rollback branches.
  • The whole flow is explicit and lives in one place.
  • The controller knows about every service it invokes; adding a step means editing the controller.
  • Runs on workflow infrastructure (Step Functions in AWS; Temporal, Airflow, Camunda elsewhere).

Code size

From wc -l on the two trees:

SectionChoreographyOrchestration
lib/stack.ts87132
handlers/ (total)191158
Lambdas55
CDK resources1 bus, 4 rules, 5 fns, 1 table1 state machine, 5 fns, 1 table, 1 log group

The totals are similar; what differs is where the code sits. The state machine pulls code into lib/stack.ts; the event publishing puts it into the handlers.

Practical differences

  • Tracing. Step Functions returns an executionArn for the whole run; DescribeExecution gives input, output, and every transition. Choreography needs a correlation ID stitched through every event payload to get the same view.
  • Retrying. Step Functions has redrive on a failed execution. Choreography retries mean re-emitting the original event, which only works if every handler is idempotent.
  • Reading the flow. Orchestration’s flow is one block of CDK: createTask.next(reserveTask).next(chargeTask).next(shipTask) plus two addCatch calls. Choreography’s flow is the rule list, read by following each rule to its handler.
  • Adding a step. A new step is roughly equal effort either way. A new failure mode is addCatch in orchestration; a new rule plus compensating handler in choreography, with existing handlers untouched.
  • Service knowledge. Choreography requires EventBridge rule patterns and reading across CloudWatch log groups. Orchestration requires Step Functions resultPath, Standard vs Express, and state machine IAM.
  • Cost. Step Functions Standard charges per state transition. EventBridge charges per million events published. At the volume of this repo, both are cents. At high volume the maths differs; this repo does not test it.

When I would pick each one

Caveat: I have run this saga twice on LocalStack, not at production scale.

Orchestration when the flow has more than three or four steps, when compensation is non-trivial, when the team relies on execution history for debugging, or when one team owns all the handlers.

Choreography when multiple teams own different parts of the flow, when the events feed other consumers beyond this saga (analytics, fraud detection), or when event volume is high enough that per-transition pricing matters.

For this four-step, single-team, no-other-consumers pipeline, orchestration is the lighter choice. If a second team consumed PaymentCharged for their own pipeline, the calculus flips toward choreography.

What both share

Typed events make either pattern survivable. The zod schemas in both repos chain forward: each step’s output schema extends the previous one. By the time notifyShipping runs, the input carries every field the upstream steps added, and the schema enforces it.