Q20 of 26 · SQL
What strategies do you use for seeding test data with SQL, and how do you avoid test pollution?
SQLSeniorsqltest-dataseedingtest-isolationfixtures
Short answer
Short answer: Use deterministic IDs or known-prefix naming, run seeds inside a transaction you can roll back, and scope test data to each test with a unique identifier so parallel runs don't collide.
Detail
Core strategies:
1. Unique-prefix IDs to isolate test data:
INSERT INTO users (email, name) VALUES
('test-run-' || :run_id || '@example.com', 'Test User');
Each run gets a unique email that can be cleaned up with a single DELETE WHERE email LIKE 'test-run-%'.
2. Transaction-scoped seeds (fastest teardown):
BEGIN;
INSERT INTO customers VALUES (99999, 'Test Customer');
INSERT INTO orders VALUES (88888, 99999, 'pending', 99.99);
-- ...test runs...
ROLLBACK; -- both rows vanish instantly
3. Fixture files + idempotent upserts:
INSERT INTO products (id, name, price)
VALUES (1001, 'Test Widget', 19.99)
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, price = EXCLUDED.price;
Idempotent seeds can run before every test suite without failing on re-run.
4. Schema isolation — separate test schema or DB: Each CI run gets a fresh database spun from a Docker container. The whole DB is the "transaction" — throw it away after the job.
Test pollution sources to avoid:
- Shared static IDs that tests mutate in different ways
- Not cleaning up data created by failed tests
- Seeds that assume a specific row count in a table
// WHAT INTERVIEWERS LOOK FOR
At least two strategies with trade-offs. Transaction rollback for speed. Unique-prefix or separate-DB for isolation. Idempotent seeds for stability.