Structuring Terraform for Scale: Monorepo vs. Polyrepo
Key takeaways
- Monorepo centralizes all Terraform in one repository with shared modules, enabling atomic changes and consistent tooling, but requires discipline around blast radius and CI/CD complexity
- Polyrepo splits infrastructure by service or team with isolated state, providing clear ownership and independent deployment velocity, but creates module duplication and version drift
- Team size is the primary decision factor: monorepo works for 2-15 engineers with strong collaboration; polyrepo scales to 50+ engineers with autonomous teams
- Hybrid approaches (shared modules in monorepo, workspaces in polyrepo) can provide the best of both worlds for mid-sized organizations
- Migration between strategies is possible but requires careful planning around state files, CI/CD pipelines, and team coordination
The Terraform Organization Problem
You started with one main.tf file. It worked great. Then you added staging. Then production. Then VPC, RDS, ECS, Lambda, CloudFront... Now you have:
- 47
.tffiles in one directory - 12 developers committing to the same repo
- CI/CD taking 23 minutes to plan all resources
- A production incident caused by a staging change
- No clear ownership of infrastructure components
The question: Should you keep everything in one repository (monorepo) or split into multiple repositories (polyrepo)?
The real answer: It depends on your team size, deployment model, and organizational structure.
Monorepo: Everything in One Place
Definition: All Terraform code lives in a single repository with shared modules and centralized tooling.
Typical Structure
terraform-infrastructure/
├── .github/
│ └── workflows/
│ ├── plan.yml
│ └── apply.yml
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── backend.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ ├── backend.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ └── production/
│ ├── main.tf
│ ├── backend.tf
│ ├── variables.tf
│ └── terraform.tfvars
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── ecs-service/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── README.md
│ ├── rds/
│ └── lambda/
├── scripts/
│ ├── validate.sh
│ ├── plan-all.sh
│ └── apply.sh
└── README.md
Advantages
1. Atomic Changes Across Environments
# modules/ecs-service/main.tf
resource "aws_ecs_task_definition" "app" {
family = var.service_name
container_definitions = jsonencode([{
name = var.service_name
image = var.docker_image
cpu = var.cpu
memory = var.memory
# Bug fix: Add health check (affects all environments simultaneously)
healthCheck = {
command = ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
}])
}When you fix a bug in the module, all environments get the fix in one commit. No need to propagate changes across 5 repositories.
2. Shared Modules with Single Source of Truth
# environments/production/main.tf
module "api_service" {
source = "../../modules/ecs-service"
service_name = "api"
docker_image = "api:v1.2.3"
cpu = 1024
memory = 2048
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
}
module "worker_service" {
source = "../../modules/ecs-service" # Same module, guaranteed consistent
service_name = "worker"
docker_image = "worker:v1.0.5"
cpu = 512
memory = 1024
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
}3. Centralized Tooling and Standards
# .github/workflows/plan.yml
name: Terraform Plan
on:
pull_request:
paths:
- 'environments/**'
- 'modules/**'
jobs:
changed-environments:
runs-on: ubuntu-latest
outputs:
environments: ${{ steps.detect.outputs.environments }}
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Detect changed environments
id: detect
run: |
# Only plan environments that changed
CHANGED=$(git diff --name-only origin/main...HEAD | grep 'environments/' | cut -d'/' -f2 | sort -u | jq -R -s -c 'split("\n")[:-1]')
echo "environments=$CHANGED" >> $GITHUB_OUTPUT
plan:
needs: changed-environments
if: needs.changed-environments.outputs.environments != '[]'
runs-on: ubuntu-latest
strategy:
matrix:
environment: ${{ fromJson(needs.changed-environments.outputs.environments) }}
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.6.0
- name: Terraform Init
working-directory: environments/${{ matrix.environment }}
run: terraform init
- name: Terraform Plan
working-directory: environments/${{ matrix.environment }}
run: |
terraform plan -out=tfplan
terraform show -json tfplan > plan.json
- name: Post plan to PR
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('environments/${{ matrix.environment }}/plan.json', 'utf8');
// Post plan summary to PR comment4. Easy Refactoring
Renaming a module? One commit affects all usages:
git mv modules/ecs-service modules/ecs-fargate-service
find environments -type f -name "*.tf" -exec sed -i 's|modules/ecs-service|modules/ecs-fargate-service|g' {} +
git commit -m "Rename ecs-service module to ecs-fargate-service"Disadvantages
1. Blast Radius Risk
One bad merge to main can affect all environments:
# Someone accidentally commits this to production/main.tf
resource "aws_security_group_rule" "allow_all" {
type = "ingress"
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # Oops, security vulnerability
security_group_id = module.vpc.default_security_group_id
}Mitigation:
# .github/workflows/apply.yml
- name: Require manual approval for production
if: matrix.environment == 'production'
uses: trstringer/manual-approval@v1
with:
approvers: platform-team
minimum-approvals: 22. Long CI/CD Times
As the monorepo grows, planning all environments takes longer:
# Initial: 2 minutes
terraform plan (dev, staging, production)
# 6 months later: 15 minutes
terraform plan (dev, staging, prod, dr, sandbox-1, sandbox-2, ...)
# 12 months later: 45 minutes
terraform plan (10 environments × 20 modules each)Mitigation: Selective planning based on changed files (shown in workflow above)
3. Merge Conflicts
10 engineers changing infrastructure simultaneously:
Auto-merging environments/production/main.tf
CONFLICT (content): Merge conflict in environments/production/main.tf
Automatic merge failed; fix conflicts and then commit the result.
Mitigation: Smaller, focused PRs and feature flags
4. Difficulty Enforcing Team Boundaries
Team A owns VPC, Team B owns ECS. But both can edit each other's code:
modules/
├── vpc/ # Team A
└── ecs/ # Team B (but Team A can still modify this)
Mitigation: CODEOWNERS file + required reviews
# .github/CODEOWNERS
/modules/vpc/** @team-networking
/modules/ecs/** @team-platform
/environments/production/** @team-platform @team-security
Polyrepo: Separated by Ownership
Definition: Infrastructure split across multiple repositories, typically by service, team, or functional area.
Typical Structure
# Repository: terraform-networking
terraform-networking/
├── modules/
│ ├── vpc/
│ └── transit-gateway/
├── dev/
│ ├── main.tf
│ └── backend.tf
├── staging/
└── production/
# Repository: terraform-ecs-api
terraform-ecs-api/
├── modules/
│ └── ecs-service/
├── dev/
├── staging/
└── production/
# Repository: terraform-ecs-worker
terraform-ecs-worker/
├── modules/
│ └── ecs-service/ # Duplicate of terraform-ecs-api module!
├── dev/
├── staging/
└── production/
# Repository: terraform-rds
terraform-rds/
├── modules/
│ └── postgres/
├── dev/
├── staging/
└── production/
Advantages
1. Clear Ownership and Autonomy
Each team owns their repository completely:
# terraform-ecs-api/.github/workflows/deploy.yml
name: Deploy API Service
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Deploy to production
run: |
cd production
terraform init
terraform apply -auto-approve
# No cross-team dependencies, no waiting for other teams' reviews2. Independent Deployment Velocity
Team A can deploy 20 times/day without affecting Team B:
Team A (API): 20 deployments/day
Team B (Worker): 3 deployments/day
Team C (Networking): 1 deployment/week
# No coordination required, no merge conflicts
3. Blast Radius Isolation
Breaking change in one repo doesn't affect others:
# terraform-ecs-worker/production/main.tf
# This mistake only affects worker, not API or RDS
resource "aws_ecs_service" "worker" {
desired_count = 0 # Accidentally scaled to zero
# API service continues running normally
}4. Easier Access Control
GitHub repository permissions per team:
terraform-networking: @team-networking (admin)
terraform-ecs-api: @team-backend (admin)
terraform-ecs-worker: @team-backend (admin)
terraform-rds: @team-database (admin), @team-backend (read)
Disadvantages
1. Module Duplication
Same ECS module copied across 5 repositories:
terraform-ecs-api/modules/ecs-service/ (version 1.2.3)
terraform-ecs-worker/modules/ecs-service/ (version 1.2.3)
terraform-ecs-cron/modules/ecs-service/ (version 1.1.0) # Drift!
terraform-ecs-admin/modules/ecs-service/ (version 1.2.5) # Different version
Solution: Publish modules to private Terraform Registry
# terraform-ecs-api/production/main.tf
module "api_service" {
source = "app.terraform.io/company/ecs-service/aws"
version = "1.2.3" # Centralized versioning
service_name = "api"
docker_image = "api:v1.0.0"
}2. Cross-Repository Dependencies
API service needs VPC ID from networking repo:
# terraform-ecs-api/production/main.tf
# How do we get the VPC ID from terraform-networking?
# Option 1: Hardcode (brittle)
variable "vpc_id" {
default = "vpc-abc123"
}
# Option 2: Data source (better)
data "aws_vpc" "main" {
tags = {
Name = "production-vpc"
}
}
# Option 3: Remote state (best)
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "terraform-state"
key = "networking/production/terraform.tfstate"
region = "us-east-1"
}
}
module "api_service" {
source = "..."
vpc_id = data.terraform_remote_state.networking.outputs.vpc_id
}Problem: Tight coupling between repositories via remote state
3. Inconsistent Tooling
Each repo can drift in standards:
terraform-networking: Terraform 1.6.0, using tflint
terraform-ecs-api: Terraform 1.5.5, using tfsec
terraform-rds: Terraform 1.4.0, no linting
Solution: Shared CI/CD templates or organization-level GitHub Actions
4. Difficult Cross-Cutting Changes
Upgrading AWS provider across 15 repositories:
# Must update in 15 separate PRs
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0" # Want to upgrade to 5.0
}
}
}Decision Framework
Choose Monorepo If:
- Team size: 2-15 engineers
- Strong collaboration culture (daily standups, shared ownership)
- Infrastructure changes affect multiple services simultaneously
- You value consistency over autonomy
- Deployment frequency: 5-20 deploys/week across all services
- Your infrastructure is tightly coupled (shared VPC, shared RDS)
Example: Startup with 5 engineers managing API + worker + database
Choose Polyrepo If:
- Team size: 15+ engineers across multiple teams
- Teams operate autonomously (microservices, separate on-call)
- Each service has independent infrastructure lifecycle
- You value autonomy over consistency
- Deployment frequency: 50+ deploys/day across all teams
- Your infrastructure is loosely coupled (service mesh, separate VPCs)
Example: Scale-up with 40 engineers, 8 teams, 30 microservices
Hybrid Approach: The Middle Ground
Pattern 1: Shared Modules Monorepo + Workspaces Polyrepo
# Repository: terraform-modules (monorepo)
terraform-modules/
├── vpc/
├── ecs-service/
├── rds/
└── lambda/
# Repository: team-backend-infrastructure (polyrepo)
team-backend-infrastructure/
├── workspaces/
│ ├── api/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── production/
│ └── worker/
│ ├── dev/
│ ├── staging/
│ └── production/
└── modules.tf # References terraform-modules repo
# team-backend-infrastructure/modules.tf
module "ecs_service_module" {
source = "git::https://github.com/company/terraform-modules.git//ecs-service?ref=v1.2.3"
}
# workspaces/api/production/main.tf
module "api" {
source = "../../../modules.tf"
service_name = "api"
environment = "production"
}Pattern 2: Monorepo with Workspace Isolation
terraform-infrastructure/
├── shared-modules/
│ ├── vpc/
│ └── ecs/
├── teams/
│ ├── backend/
│ │ ├── api/
│ │ │ ├── dev/
│ │ │ ├── staging/
│ │ │ └── production/
│ │ └── worker/
│ ├── frontend/
│ │ └── cdn/
│ └── data/
│ └── pipeline/
└── .github/
└── workflows/
├── team-backend.yml
├── team-frontend.yml
└── team-data.yml
# .github/workflows/team-backend.yml
name: Backend Team Infrastructure
on:
push:
paths:
- 'teams/backend/**'
- 'shared-modules/**'
pull_request:
paths:
- 'teams/backend/**'
# Backend team's changes only trigger their pipelineReal-World Case Study: Migration from Monorepo to Polyrepo
Company: B2B SaaS, $10M ARR Team growth: 8 → 35 engineers over 18 months Problem: Monorepo CI/CD taking 40 minutes, 10+ merge conflicts/day
Migration Strategy
Phase 1: Identify Service Boundaries (Week 1-2)
# Current monorepo
terraform-infrastructure/
├── networking/ → Extract to terraform-networking
├── ecs-api/ → Extract to team-backend/terraform-api
├── ecs-worker/ → Extract to team-backend/terraform-worker
├── rds/ → Extract to team-database/terraform-rds
└── cloudfront/ → Extract to team-frontend/terraform-cdn
Phase 2: Extract Shared Modules (Week 3-4)
# Create terraform-modules repository
git clone terraform-infrastructure terraform-modules
cd terraform-modules
# Keep only modules/ directory
git filter-branch --subdirectory-filter modules -- --all
# Tag release
git tag v1.0.0
git push origin v1.0.0Phase 3: Create Service Repositories (Week 5-8)
# For each service
git clone terraform-infrastructure terraform-networking
cd terraform-networking
# Keep only networking directory
git filter-branch --subdirectory-filter networking -- --all
# Update module references
find . -type f -name "*.tf" -exec sed -i 's|../../modules/|git::https://github.com/company/terraform-modules.git//|g' {} +
# Configure remote state dependencies
cat > production/networking.tf <<EOF
output "vpc_id" {
value = aws_vpc.main.id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
EOF
git add .
git commit -m "Extract networking to separate repository"
git push origin mainPhase 4: Update Remote State References (Week 9-10)
# terraform-ecs-api/production/main.tf
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "networking/production/terraform.tfstate"
region = "us-east-1"
}
}
module "api_service" {
source = "git::https://github.com/company/terraform-modules.git//ecs-service?ref=v1.0.0"
vpc_id = data.terraform_remote_state.networking.outputs.vpc_id
private_subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
}Phase 5: Update CI/CD (Week 11-12)
# terraform-networking/.github/workflows/deploy.yml
name: Deploy Networking
on:
push:
branches: [main]
paths:
- 'dev/**'
- 'staging/**'
- 'production/**'
jobs:
deploy:
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging, production]
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Apply
working-directory: ${{ matrix.environment }}
run: |
terraform init
terraform apply -auto-approveResults After Migration
Before (Monorepo):
- CI/CD time: 40 minutes
- Merge conflicts: 10+ per day
- Deploy frequency: 15 deploys/week
- Team autonomy: Low (cross-team reviews required)
After (Polyrepo):
- CI/CD time: 8 minutes per service
- Merge conflicts: 1-2 per week
- Deploy frequency: 80 deploys/week
- Team autonomy: High (teams self-service)
Trade-offs:
- Module management: Now requires versioning discipline
- Cross-service changes: Require coordination (but rare)
- Tooling consistency: Requires shared CI/CD templates
Best Practices for Both Approaches
Monorepo Best Practices
1. Use Terragrunt for DRY Configuration
# terragrunt.hcl (root)
remote_state {
backend = "s3"
config = {
bucket = "terraform-state-${get_aws_account_id()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
}
}
# environments/production/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "../../modules//ecs-service"
}
inputs = {
service_name = "api"
environment = "production"
cpu = 1024
memory = 2048
}2. Environment-Specific Workspaces
# Use workspaces for environment separation within monorepo
cd environments/shared
terraform workspace new dev
terraform workspace new staging
terraform workspace new production
terraform workspace select production
terraform apply -var-file="production.tfvars"3. Pre-commit Hooks for Validation
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.83.5
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_tflint
- id: terraform_tfsecPolyrepo Best Practices
1. Private Terraform Registry
# Publish modules to Terraform Cloud/Enterprise
terraform {
cloud {
organization = "company"
workspaces {
name = "terraform-modules"
}
}
}
# Consume in service repos
module "ecs_service" {
source = "app.terraform.io/company/ecs-service/aws"
version = "~> 1.2"
service_name = "api"
}2. Standardized Repository Template
# Create template repository: terraform-service-template
terraform-service-template/
├── .github/
│ └── workflows/
│ ├── plan.yml
│ └── apply.yml
├── modules/
├── dev/
├── staging/
├── production/
├── .pre-commit-config.yaml
├── .tflint.hcl
└── README.md
# Use as template for new services
gh repo create terraform-new-service --template terraform-service-template
3. Automated Dependency Updates
# .github/dependabot.yml
version: 2
updates:
- package-ecosystem: "terraform"
directory: "/"
schedule:
interval: "weekly"
open-pull-requests-limit: 5
reviewers:
- "platform-team"Conclusion: No One-Size-Fits-All
The monorepo vs. polyrepo decision isn't binary—it's a spectrum:
Small team (2-10 engineers): Start with monorepo
- Simple to manage
- Easy atomic changes
- Low coordination overhead
Growing team (10-25 engineers): Consider hybrid
- Shared modules in separate repo
- Services in monorepo with workspaces
- Team-specific CI/CD paths
Large organization (25+ engineers): Move to polyrepo
- Service ownership per team
- Independent deployment velocity
- Scale autonomy
The key: Match your repository structure to your team structure. Conway's Law applies to infrastructure code:
"Organizations design systems that mirror their communication structure"
If your teams are tightly coupled, monorepo works. If your teams are autonomous, polyrepo scales better.
Action Items
- Assess your current pain points: Long CI/CD? Merge conflicts? Unclear ownership?
- Map your team structure: How many teams? How do they collaborate?
- Measure deployment frequency: Deploys per week per team
- Identify service boundaries: Which infrastructure components are independent?
- Start small: Migrate one service to polyrepo (or consolidate two repos into monorepo)
- Iterate based on team feedback: Survey developers on autonomy vs. consistency
If you need help designing a Terraform repository structure for your organization, schedule a consultation. We'll analyze your team structure, deployment patterns, and provide a migration roadmap with Terraform code and CI/CD examples.