Agents need real databases

Coding agents iterate against the code they write — but the moment that code touches a database, their verification loop quietly falls apart. Here's the database-shaped hole in the agent loop, why we built seedkit to fill it on purpose, and what changes when agents have real-shaped data instead of mocks they wrote about themselves.

May 11, 20266 min readSynthetic data

Coding agents write more code than they read. They write it in loops — propose, run, observe, fix — and the quality of every loop hinges on whether the "run" step actually exercises the code or just nods politely at it.

For a lot of code that's fine. Pure functions, type-checked refactors, frontend rendering: the agent runs the thing and watches what happens. The signal is real.

The database layer is where the loop quietly breaks. Most repos don't have a database an agent can fill with reasonable data on demand. So when an agent writes a query, or an API route, or a migration, the "run" step it gets is one of: a unit test against a mock the agent itself wrote, an integration test against a fixture someone three people ago hand-rolled and nobody fully trusts, or nothing — the agent reads the code back to itself, decides it looks right, and moves on.

None of those are the same as running the code against data with the shape of real production. This is the database-shaped hole in the agent loop. We built seedkit to fill it on purpose.

What agents lose to today

If you've watched a coding agent try to set up its own test data, you've watched it lose to one of these in sequence.

Foreign keys. The agent generates an INSERT into orders before any users exist. Postgres rejects it. The agent sees the error, regenerates the seed in a different order, hits a different FK violation, regenerates again. Three or four iterations later it has either disabled the FK constraints, wrapped everything in SET session_replication_role = replica, or quietly switched to mocks. None of those are answers — they're admissions.

Unique constraints. Faker emits two of the same email. The agent regenerates. Faker emits two of the same email again. The agent adds a Date.now() suffix to every email address in the seed. Now nothing's realistic, but the inserts pass. We've optimized for the wrong thing.

Referential realism. The agent gets the FKs right but every order in the database is owned by users[0].id, because that's the first thing it grabbed and it never thought about the distribution. The query under test happens to return early for that same user. The bug ships.

Schema drift. The agent adds a column. The seed file still inserts the old shape. The agent now has to remember to update both. It doesn't. The fixture rots within hours and the next agent run is testing against the wrong-shaped database.

A human developer hitting these makes a face and writes a generator. An agent hitting these in a loop has a bad time — every retry costs tokens, every failed iteration looks like progress, and the final "tests pass" verdict is increasingly disconnected from whether the code is actually right.

What an agent needs from a seeder

When we stepped back and listed what an agent actually needs to fill the database-shaped hole, the list came out short and specific.

  1. One-command invocation. Agents are CLI-native. Anything that needs a UI, an auth flow, or a configuration ritual is unreachable from a tool call.
  2. Schema awareness, zero config. The agent shouldn't have to teach the tool about the schema. Point at a connection string, read the schema, go.
  3. Referential correctness by construction. FKs, unique constraints, NOT NULLs, CHECK expressions — all respected on the first try. No retries, no replica-mode tricks.
  4. Determinism. Same seed name produces the same bytes. So the agent can re-run a failing scenario and get the same data, and the human reviewing the agent's PR can reproduce locally.
  5. A preview mode. Agents need cheap-to-inspect feedback. Five sample rows of what's about to land beats 1,287 rows inserted, trust me.
  6. A predictable cost profile. Agents iterate a lot. A tool that calls an LLM per cell becomes unaffordable the moment you put it inside an agent's outer loop.

That list is most of seedkit's roadmap, by accident. We weren't building for agents originally — we were building for indie devs setting up a dev DB on Sunday night. It turns out the two audiences want roughly the same thing, because both are optimizing for "no friction between deciding to seed and having data."

How seedkit maps onto the list

  • One command. npx @seedkit-dev/cli new from inside the repo. An agent can put that exact string in a Bash tool call. No login required for the open-source path.
  • Schema-aware. We introspect Postgres directly: types, constraints, FKs, deferrable constraints, composite keys. The agent points at a URL and we figure the rest out.
  • Referentially correct. Topological sort with cycle handling, and a FK picker that doesn't pile every child onto the same parent. The first run inserts cleanly; there is no second run for retries.
  • Deterministic. --seed my-fixture pins the output. Re-runs are byte-identical. We wrote about why determinism is the boring half — short version: a single PRNG seeded from the seed name, plus a cache for the LLM-touched cells. Agents and CI runners get the same fixture without paying the LLM bill twice.
  • Preview before insert. --preview shows five sample rows per table before anything hits the database. The agent — or the human reviewing the agent's work — can read it and decide.
  • Predictable cost. Most cells never see an LLM in the first place; the strategy stack handles ~95% of columns deterministically and the rest get cached. An agent running seedkit in a loop isn't running a per-row inference call.

Why this matters

The reason this matters more in 2026 than it did in 2024 isn't that the agents got smarter. It's that the agent loop became a serious developer surface. People ship features written, tested, and reviewed primarily by an agent, with a human in the final approval seat. The cheapest place to catch a bug in that workflow is the runtime layer — the place where the code actually does its thing against actual data.

If the runtime layer is full of mocks the agent wrote about itself, the human reviewer becomes the only real check, and the speedup an agent was supposed to deliver evaporates into careful code review.

Realistic test data is graduating from a nice-to-have for human devs into primary scaffolding for agentic dev. The same way you wouldn't run a CI suite without a database, in a year you won't run an agent loop without seedable data either. The tools that fit the agent loop are the ones that are callable in one command, deterministic, and cheap enough to put inside a retry budget. Most of the existing data-generation tools were built for none of that.

What this doesn't solve

A few honest limits, since we owe them.

We don't make your agent better at SQL. If the query is wrong, real-shaped data won't necessarily expose it — sometimes the data has to be adversarial in a specific way. We're working on a constraint-based generation mode for exactly that ("give me a fixture where some user has more than 1,000 orders") but it isn't shipped.

We don't replace the human reviewer on the PR. An agent that ran the right query against the right-shaped data can still write code that does the wrong thing for reasons unrelated to data. Real data closes one specific gap; it doesn't replace human judgment about whether the code should exist at all.

And we're Postgres-only right now. If your agent is working in a MySQL or Mongo repo, the same loop has the same hole, and we don't have a stopgap yet. On the roadmap, not in your hands today.

Try it

npx @seedkit-dev/cli new --prompt "describe what you're building"

Drop that in your agent's first-run hook, in a make seed target, in the README — wherever the agent is going to look when it needs a database with stuff in it. If you've watched your agent fight with seed data and want to compare notes: ben@seedkit.dev.

Keep reading