Seedkit: realistic Postgres data without copying production

Why I built seedkit — a CLI that reads your Postgres schema fresh each run, generates realistic FK-correct data, and hands you a connection string. Without leaking prod, without breaking on schema drift.

April 28, 20263 min readProduct updates

I'm a tech lead. Most of my job is keeping the developers around me unblocked, and you'd be surprised how often the thing blocking them is test data.

Someone joins the team. Their local DB is empty. They lose half a day either cobbling together fixtures or asking around in Slack for a "good" pg_dump. Someone on the team sends them a different one. Now there are two slightly different shapes of reality being debugged in parallel.

The fixes are usually one of two things. Both bad.

You can maintain a seed.sql (or a forest of factory functions in your test framework). It works, until somebody adds a column on a Friday afternoon, and then on Monday the whole team rebases and the seed script breaks for everyone. The dev who broke it spends an hour of their morning on it. The one who pulled main an hour later spends another hour. Multiply by every migration that lands.

Or — and this is the worse one — you sanitize a chunk of prod and pass it around. It feels smart. Real data! But you're one missed PII column away from a GDPR fine, the snapshot rots within a sprint anyway because the schema keeps moving, and now the most sensitive thing in your company is sitting on three different laptops.

What I actually wanted was a tool that read the current schema (whatever it looks like today, post-Friday-migration) and produced realistic, foreign-key-correct data anyone on the team could pull down with one command. Same data, same shape, no pg_dump, no YAML, no Friday-night seed-script bisect.

That's seedkit.

What it looks like

$ npx seedkit-cli new --prompt "saas crm"
✓ 8 tables, 14 relations · 14,820 rows · ready
postgres://user:****@ep-cool-leaf-7281.eu-central-1.host.tld/seedkit_a3fd

That command drafts a CRM schema with an LLM, fills it with internally-consistent rows (emails match names, FKs resolve, enum values are valid), and gives you back a real ephemeral Postgres on Neon. You can psql straight in.

If you already have a schema — and you probably do — point at your DB:

seedkit seed --url $DATABASE_URL --schema schema.sql

The orchestrator introspects, hands the DDL to the LLM, applies the resulting INSERTs transactionally with SET CONSTRAINTS ALL DEFERRED so circular FKs don't fight you. No migration, nothing copied, nothing committed.

Lock the data with --seed my-fixture and everyone on your team — laptops, CI, preview branches — gets byte-identical rows when they pass --from-cache. That's the part I actually wanted: same data, everywhere, cheap to reproduce.

What else is in the box

The ephemeral Postgres is on Neon. EU-hosted, scales to zero between queries, revivable from cache after the TTL expires. More →

If you don't have a schema yet — frontend prototype, design mock-up, "I just need realistic JSON of users" — there's also a Synthetic Data API (Pro+) that samples directly from a curated pool of about 540 identities, 580 companies, plus posts, products, and reviews, all cross-referenced. Different shape from the schema-driven path. Same determinism.

And the Avatar APIapp.seedkit.dev/api/avatars/<seed> — is just there. No auth, no quota, deterministic SVG. Use it for your own users if you want; the seedkit-generated identities use it under the hood.

What it doesn't do yet

It's Postgres only. MySQL is on the list, not in the box, same with Mongo. The LLM-orchestrated path needs a free account because somebody has to pay the AI bill and that someone is me; the local-only seed --url path against a schema you supply works without one. And it's brand new this week — there will be quirks.

Try it

npx seedkit-cli new --prompt "describe what you're building"

The CLI is open source (MIT) on GitHub. Quickstart for the proper walkthrough. Bug reports, ideas, war stories: ben@seedkit.dev.

Keep reading