What strategies do you use for seeding test data with SQL, and how do you avoid test pollution?

Question

Accepted Answer

Use deterministic IDs or known-prefix naming, run seeds inside a transaction you can roll back, and scope test data to each test with a unique identifier so parallel runs don't collide. Core strategies: Unique-prefix IDs to isolate test data: Each run gets a unique email that can be cleaned up with a single DELETE WHERE email LIKE 'test-run-%'. Transaction-scoped seeds (fastest teardown): Fixture files + idempotent upserts: Idempotent seeds can run before every test suite without failing on re-run. Schema isolation — separate test schema or DB: Each CI run gets a fresh database spun from a Docker container. The whole DB is the "transaction" — throw it away after the job. Test pollution sources to avoid: Shared static IDs that tests mutate in different ways Not cleaning up data created by failed tests Seeds that assume a specific row count in a table

What strategies do you use for seeding test data with SQL, and how do you avoid test pollution?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR