Maxa
Knowledge Base

A data pipeline that argues for its own correctness

João Vitor de Camargo
a data pipeline that argues for its own correctness

There is a quiet mismatch in how the data world is adopting AI. Most of the agentic tooling pointed at data engineers is built to write code faster: describe a model, get SQL back in seconds.

While speed is readily measurable, it’s not enough: speed without trust doesn’t ship.The most interesting use of agentic AI in data engineering is self-verification: a data pipeline can test its own logic as it is built, and those same tests follow it into production, so the logic survives contact with real data.

The industry measured the wrong win

The broader software industry is already living through the consequences of optimizing for speed of code generation alone. It’s apparent from a few studies that AI coding tools exacerbated this preference:

  • In Sonar's 2025 State of Code survey of more than 1,100 developers, AI tools were writing a large share of committed code. However, 96% of the surveyed developers reported distrusting the output because it tends to look correct while hiding subtle errors. The time saved at the keyboard reappeared downstream as review and validation.
  • A 2026 enterprise report found that 43% of AI-generated code changes still needed manual debugging in production after passing staging. Faster authoring created a verification tax, paid later.

Data engineering inherits a harder version of this problem. When application code is wrong, it usually fails loudly and at the point of the mistake. Whereas a wrong number in a data pipeline fails silently. For example, revenue overstated by a duplicated join renders as a clean figure on a dashboard, indistinguishable from a correct one, until someone downstream acts on it.

The cost of AI-generating transformation code faster, with no matching gain in how that code earns trust, stays hidden until a business stakeholder makes a decision on a number that was wrong the whole time.

What is a self-verifying data pipeline?

In a traditional data pipeline, an engineer writes the transformations and then writes tests against them as a separate step. A self-verifying pipeline is generated with its own checks built in, and those checks keep running on production data after deployment.

Verification has always been part of data engineering: profiling columns, testing keys with dbt, checking how tables relate. With a self-verifying system, what’s new is where the checks come from: derived directly from the business logic behind a transformation and generated alongside the pipeline, instead of added by a human afterward.

An agentic system can split the verification work across multiple agents so no AI agent grades its own work. One agent generates a transformation or a mapping, a separate verifier agent reviews it against business requirements and can approve or send it back, and at the end independent agents confirm the model is able to answer the business questions it's meant to answer.

In its ideal state, verification closes the loop with the engineer's own copilot: a run on full production data carries findings back to the agentic system, which generates the right adjustments.

Self-verification shows up in three specific areas:

  1. When the system proposes a mapping, it reads both the column name and the values inside it, and flags any mismatch when the two disagree. A customer_name field full of random identifiers gets caught instead of trusted.
  2. Each generated transformation ships with mechanical checks that prove it held: rows accounted for, keys unique, categories covered. Those checks live in the dbt project, so they travel with the code and re-run on production data long after generation is done.
  3. When the system hits a business requirements gap that data alone can't settle, it avoids hallucinating an answer. It hands the data engineer the exact question to take back to the business stakeholder, so an assumption never hardens into an incorrect number.

Example: margin analysis across two systems

Take an industrial MRO distributor. They have tens of thousands of SKUs, fasteners and fittings moving through regional warehouses to contractors.

Revenue lives in the ERP, while the cost to pick, store, and ship each line lives in the WMS. The CFO wants margin by product line, and the request lands on a data team.

The obvious join takes minutes to write: match each order's revenue from the ERP to its fulfillment cost from the WMS, line the products up on SKU, group by product line. The number comes out clean and hits a dashboard. It is also wrong, and the dashboard looks identical either way:

  • The ERP and WMS use different identifiers for the same product. The ERP tracks a manufacturer part number while the WMS tracks an internal bin code. To join the two, someone builds a mapping table that says "part number X = bin code Y”. This mapping is maintained separately from both systems, drifting every time the warehouse moves a product to a new location, purchasing adds a new part number, or either side retires an old product.
  • When a mapping entry is missing, a join between the ERP and WMS drops the row. When a stale entry leaves two bin codes pointing to the same part number, the join duplicates the row. Neither shows up as an error, and the output looks like a clean table with a slightly wrong number.
  • A returned order breaks the revenue-to-cost match.The ERP removes the revenue on a returned order, but the fulfillment cost stays in the WMS because the picking and shipping already happened.
  • When the join runs, that orphaned cost row finds no matching revenue and drops out. The return quietly erases both the revenue and its cost, so the margin reads higher than it should, and the product lines with the most returns show the widest gap between reported and real margin.

This complexity is just an average Monday for a data engineer working with ERP data.

Pointing AI at this data produces an erroneous margin dashboard that reads as confident and precise. A coding agent can attempt the harmonization, guess that the part number maps to the bin code, pick a way to handle returns, and write clean SQL on top of those guesses. What it can't do is tell you whether the guesses were right.

That's the difference a self-verifying build makes. Agents in a self-verifying build can harmonize the two systems, yes, but they also check their own harmonization and show their work:

  • Agents test the mapping between the two identifier systems against the actual data, catching entries that are missing, stale, or pointing one part number to multiple bin codes.
  • They generate tests that fail loudly when rows drop out of a join or a key isn't unique, so a silent miscount can't reach the dashboard.
  • Where the sources can't settle how a return should flow, they raise the gap as a specific question for the engineer to resolve.

A self-verifying data pipeline that flags exactly where it's unsure, instead of papering over it, hands the engineer an advantage glossed over by a coding agent: a precise list of gaps still needing human attention.

The output the data engineer reviews is readable dbt code they own, with gaps called out, rather than a seemingly confident number they have to reverse-engineer their trust in.

The high bar for agentic data engineering

So far, AI coding assistants and copilots have led to the ability to write SQL faster.

Every risk that already makes data engineering slow compounds when a tool generates code with no read on whether the output is trustworthy. The wrong number just arrives faster, disguised as a verified fact.

Self-verification is the correction. When a data pipeline ships with its own verification, mappings tested against real data, mechanical checks that travel into production, business gaps surfaced as questions instead of guesses, trust stops depending on a human reviewing every line. This is what makes speed worth the investment for a data team evaluating agentic tools.

Anyone can generate a data pipeline now. The bar is whether the system can prove its output.