All field reports
Field report

Your AI Isn't Failing. Your Data Is.

AI looks like the problem. It isn’t.

When an AI build fails inside a 5-50 person business, the operator almost always reaches the same conclusion: AI doesn’t work for businesses our size.

That’s the wrong diagnosis. The AI was probably fine. The data feeding it almost certainly wasn’t.

This is the most common failure pattern I see across SMB AI projects right now. The build looks like it’s going to work. The demos are convincing. The system goes live. Then the output starts disagreeing with reality, and a few months later the operator pulls the plug and concludes the tool was oversold.

Most of the time, the tool was fine. The data feeding it was the problem nobody scoped, because nobody asked.

The reason this pattern is so persistent: AI on bad data fails in a way that delays the diagnosis. With traditional reporting, bad data produces output that obviously breaks. The sum doesn’t add up. A cell shows #N/A. The operator notices and goes hunting for the source.

With AI, bad data produces output that looks correct and is wrong. Quotes that are formatted professionally and miss the actual pricing. Chatbot answers that sound confident and cite a policy that doesn’t exist. Lead scores that rank junk traffic as high intent.

By the time the operator notices, weeks or months have passed and the team has been making downstream decisions based on confidently wrong output.

The output looks correct. That’s the trap.

The traditional failure mode for a bad report is loud. Numbers don’t reconcile. A chart shows nothing. A query errors out. The reader notices in seconds, opens the data, finds the corrupted source, and either fixes it or stops trusting the report.

AI doesn’t fail that way. AI on bad data is a confident, fluent, well-formatted producer of wrong output. A lead-scoring model trained on incomplete source data ranks junk leads as A-tier with full conviction, and the sales team chases them. A dashboard built on a CRM with three different conventions for “follow-up status” shows a green metric that should be red, and leadership makes the wrong call from data that looked clean.

The chatbot, the quoting agent, the inventory predictor, the AI report generator: all of them produce output that looks formatted, professional, plausible. And it’s wrong in ways that take weeks to surface.

That delay is what makes the AI-on-bad-data failure so expensive. The cost isn’t the AI build itself. The cost is the months of operational decisions made on confident wrong output before anyone catches it. Wrong quotes accepted by customers who then disputed invoices. Wrong leads chased while real ones aged out. Wrong inventory ordered against fictional demand patterns.

The build that looks like a winning automation is producing the most expensive class of error there is: errors that look like correct answers.

Three places I watch for this on every assessment

The six-pillar diagnostic I run on every new engagement has a data-discipline pillar specifically because of this pattern. Three things I look for first:

Intake that doesn’t capture source.

Pull up the last 50 leads. Where did each one come from? If the business can’t answer that cleanly, every marketing report it produces afterward is fiction. No channel can be evaluated. ROI calculations are guesswork. Any follow-up automation runs blind, because the system has no concept of which leads warrant which kind of follow-up.

This is the single most common gap I see in 5-50 person businesses. It looks like a marketing problem. It’s actually a data-capture problem. Until source lands in a structured field at the moment the lead arrives, nothing built on top of that data is trustworthy.

Customer notes living in five people’s heads.

The most experienced person on the team knows the customer history. The new hire knows nothing. The AI chatbot knows nothing. Tribal knowledge doesn’t scale, and the AI doesn’t get smarter than the documentation that feeds it.

If knowledge transfer happens at lunch instead of in a system, the chatbot you’re about to build will be polite, confident, and uninformed. It will tell customers the wrong thing about their own history because it has no access to the actual history.

A CRM with three conventions for “follow-up status.”

One salesperson writes “F/U Tuesday.” Another writes “to call back.” A third leaves the field blank and tracks it in their head. Dashboards become useless because they roll up incompatible vocabularies. Alerts fire randomly. The “automate the follow-ups” pitch lands the operator $14k poorer with a system that has nothing trustworthy to fire on.

The fix in all three cases is the same: a captured field, structured consistently, populated reliably. Boring infrastructure work. Not AI.

What deliberate data looks like, and why perfect data is the wrong goal

The mistake on the other side of this trap is just as expensive: deciding that nothing can be automated until the data is perfect. That’s the path I watch consulting firms walk operators into. Eighteen months of “data warehouse readiness” work later, the operator has spent six figures and produced nothing that returns ROI.

The right bar isn’t perfect data. The right bar is deliberate data.

Three operations, in order:

  1. Capture clean data at every operational seam. Intake, handoff, completion. Wherever a process hands off to another process, that seam either gets captured properly or it leaks. Identify the seams. Make sure structured fields exist. Make sure they get populated.

  2. Structure it so people and tools can find what they need without re-asking. Consistent naming. One source of truth per piece of information. No “we track this in the spreadsheet and also in the CRM and also in Slack.” Pick one and treat the rest as derivative.

  3. Then automate. Once steps 1 and 2 are in place, automation has something to stand on. The chatbot has accurate history to draw from. The dashboards reflect real state. The quoting agent quotes from real pricing instead of guessing.

Skip steps 1 and 2 and step 3 produces output that looks correct and is wrong. Skip them at scale and that output compounds week over week.

The bar isn’t perfect. The bar is consistent. The data has to be deliberate enough that automation built on top of it tells the truth.

The 14-person shop that almost bought the wrong build

Last month I scoped a 14-person manufacturing shop. The owner had been pitched an AI quoting agent by another consultant. He liked the demo and wanted my read before he signed.

Three days into the assessment, here’s what we found about the quoting data the AI would actually be reading:

  • One pricing spreadsheet maintained by the office manager. Updated quarterly. Out of date on several common parts by the time we looked.
  • A second pricing spreadsheet maintained by the shop manager. Updated more often than the first. Disagreed with it on roughly a dozen items.
  • An email thread between the sales lead and the owner where unusual quotes got hashed out informally. Not searchable. Not folded back into either spreadsheet.
  • The shop manager’s notebook on the floor where he wrote down what he actually quoted on the phone. Never reconciled with anything else.

The AI quoting agent would have been built on top of all four sources, agreeing with none of them consistently. It would have produced quotes that looked correct, were professionally formatted, and were wrong often enough that customers would have stopped accepting them. The owner would have concluded AI didn’t work for his shop and pulled the plug. The pattern from the top of this post would have repeated, on his dime.

We pulled the quoting agent off the schedule. Instead we built a single source of truth for pricing first: the office manager and shop manager reconciling weekly into one structured system that both of them update. Boring work. A couple of weeks of disciplined cleanup.

The quoting agent ships next month. It works because there’s something underneath it to stand on.

Fix the data. Then automate. In that order.