As your SaaS grows, your database schema must evolve. Adding columns, renaming fields, and dropping old indexes are common tasks.
However, doing this in production is risky. A poorly planned migration can lock tables, causing database timeouts that take down your entire application.
Here are the best practices for executing database migrations with zero downtime.
1. Avoid Destructive Operations in One Go
Never rename or delete a column directly in a single deployment. If your app is running on multiple servers, the old code will expect the old column and crash immediately during the rollout.
Instead, use the Expand and Contract Pattern:
- Expand (Deploy Phase 1): Add the new column. Update the code to read from the old column, but write updates to both the old and new columns.
- Backfill (Phase 2): Run a background script to copy existing data from the old column to the new column.
- Transition (Deploy Phase 3): Update the code to read only from the new column.
- Contract (Deploy Phase 4): Drop the old column from the database.
2. Always Set Timeout Limits
By default, database systems like PostgreSQL or MySQL will wait indefinitely to acquire a lock for structural changes (like ALTER TABLE). If a long-running analytical query holds a lock on that table, the migration will block, and all subsequent web requests will pile up, leading to a system-wide outage.
Always set a statement timeout before running migrations:
SET statement_timeout = 3000; -- Timeout after 3 seconds
ALTER TABLE users ADD COLUMN phone_number VARCHAR(20);
3. Avoid Locks on Add Index
Adding an index on a large table can block writes for minutes. Always use the concurrent index feature supported by your database:
- In PostgreSQL:
CREATE INDEX CONCURRENTLY - In MySQL:
ALGORITHM=INPLACE, LOCK=NONE
By building indexes in the background, you keep your write traffic flowing smoothly.
