
Most leaders who greenlight an AI initiative assume the risk lives in the model. They worry about whether the technology is good enough, whether it will make things up, and whether a better version ships next quarter. The most credible research on enterprise AI says they are worried about the wrong thing.
The projects that fail almost never fail because the model was too weak. They fail because the AI never learned how the organization actually works. This piece is about that gap: why it is the real predictor of success or failure, and the unglamorous work required to close it.
The number that should worry every leader
Start with where things actually stand in 2026, because the headline has flipped. Adoption is no longer the question. In McKinsey's most recent global survey on AI, 88 percent of organizations reported using AI in at least one business function, up from 78 percent a year earlier. Nearly every organization now uses this technology in some form, which means "we use AI" is no longer a differentiator.
Value is a different story. In that same McKinsey research, only about 6 percent of organizations qualified as high performers, meaning they reported significant value and could attribute at least 5 percent of their profit to AI. Only 39 percent could point to any profit impact at all, and most of those put it below 5 percent. Deloitte's 2026 State of AI in the Enterprise report, based on responses from more than 3,000 senior leaders, found something even more pointed: organizations rated themselves less prepared in governance, data, and talent than the year before. The tools raced ahead. Organizational readiness did not.
This is not a new problem, wearing new clothes, and the rigorous look at why came earlier. A 2024 RAND Corporation study, built from structured interviews with 65 experienced data scientists and engineers, found that more than 80 percent of AI projects fail to reach meaningful production, roughly twice the failure rate of technology projects that do not involve AI. Through 2025, that failure mostly looked like pilots that never shipped. S&P Global found 42 percent of companies had abandoned most of their AI initiatives that year, up from 17 percent the year before. MIT's Project NANDA estimated that around 95 percent of enterprise generative AI pilots produced no measurable financial return, a figure that has been fairly criticized on methodology and is best read as directional rather than precise.
Put the timeline together, and the lesson is hard to miss. Over two years, the models improved dramatically, and adoption went from common to nearly universal. The value gap did not close. It widened. If a better model were the answer, the last two years would have delivered it. Something else is the binding constraint.
The real reason: your AI does not know how you work
When RAND's researchers traced these failures to their root causes, the number-one cause was not technical. It was a misunderstanding or miscommunication about the actual problem the AI was supposed to solve. One leader in the study assumed the organization had good data because it produced weekly reports, without realizing that the data was never built for the new purpose it was being asked to serve. The technology worked. The understanding of the organization did not.
Ethan Mollick, the Wharton professor who studies AI adoption and wrote the book Co-Intelligence, has landed in the same place from a different angle. His argument, as of late 2025, is that the factor separating organizations that get value from AI from those that do not is no longer individual talent or even the model's capability. It is the organizational structure and the way leaders choose to deploy the tool. He puts it bluntly: meaningful AI use cannot just be staff asking a chatbot to summarize a document. It has to be a deeper form of work, which means the organization has to rethink its own processes first.
The most recent data says the same thing in the bluntest terms yet. When McKinsey examined what actually separates the high performers from everyone else, the single strongest correlate of bottom-line impact was not the model or the budget. It was a fundamental redesign of how the work gets done. High performers were also three times more likely to have senior leaders who genuinely owned the effort rather than handing it to a technical team. McKinsey's own summary of the gap is about as direct as a consultancy gets: it is a leadership and operating-model problem, not a technology problem. Deloitte's 2026 findings point the same way. The organizations that captured real value were those where senior leadership actively shaped how AI was governed, and the most common barrier to integrating AI into workflows was not the technology. It was that the people and the organization were not ready for it.
This is the quiet truth underneath the failure statistics. A general-purpose model is brilliant in the abstract and ignorant about you specifically. It has read most of the public internet and knows nothing about your roles, your handoffs, your constraints, or the reasons you do certain things in the particular way you do them. Bolt that model onto an organization that it does not understand, and you get exactly what most pilots produce: fluent, confident, generic output that misses the real job.
Context is the layer everyone skips
I think of organizational context as a missing layer. Not a data dump, and not a folder of files, but the actual texture of how work moves through your organization: who owns what, where the handoffs are, which rules are written down and which ones live only in people's heads, and the reasons behind the process that no one has ever bothered to document because everyone already knows them.
A chatbot starts every conversation from zero. That is its design. It has the entire public record and none of your operation, so it cannot tell the difference between the way your organization does something and the way it is done on average across a million companies it has read about. Much of what makes an organization actually function is tacit, carried by experienced people, and never captured in any system.
So the gap between a demo that impresses a room and a deployment that survives contact with real work is almost always this layer. The demo runs on a generic task. The deployment runs on your task, and your task is full of context the model was never given. Skip the layer and you are not deploying intelligence into your organization. You are deploying a confident stranger.
What capturing context actually looks like
The fix is the unglamorous work the research keeps pointing back to. Before you automate anything, you map the work. Done well, it follows a clear sequence. This is the approach we use with organizations, and we call it Groundwork.
It starts with a structured intake conversation about how the team actually operates, recorded so that nothing said in the room is lost. Then it takes in the documents that define the work: the standard procedures, the role descriptions, the process notes. That context gets stored and kept, so the system references it on every task instead of starting cold every time, and it keeps growing with each new meeting and each new document rather than freezing on day one.
Then comes the part most automation pitches skip. The work gets sorted, honestly, into three buckets. Some tasks can be fully automated. Some should be semi-automated, with a person reviewing the output before it goes anywhere. And some should stay fully human, because judgment, relationships, or stakes make automation the wrong call. The output of that sorting is a plan we call an Automation Blueprint: what to automate, what to leave alone, in what order, and the impact you should expect.
A human reviews and approves every recommendation before anything ships. The agent proposes. A person decides. The judgment about what not to automate matters as much as the judgment about what to automate, and that judgment stays with people who are accountable for the outcome.
The honest limits
I want to be direct about what this is and is not, because overpromising is how trust gets lost with the exact audience that matters here.
This is adjacent to something organizations have always needed: a good scoping process. A sharp consultant has been mapping work and recommending changes for as long as consulting has existed. The honest difference is depth, persistence, and pace. A consultant does this once and leaves with the knowledge in their head. A context layer does it continuously, holds the knowledge, and references it on every task going forward. That is a real difference, but it is a difference of degree, not a different universe, and anyone who tells you otherwise is selling.
AI also has failure modes that capturing context does not erase. Models still make things up. They drift over time. A recommendation can simply be wrong. That is the entire reason the human approval gate is not optional, and why anyone promising fully autonomous automation of consequential work is skipping the most important safeguard. RAND named other root causes too, including poor data quality and the temptation to point AI at problems that are genuinely too hard for it today. Capturing context helps with the first and helps you avoid the second, because an honest map tells you which work belongs in the human-only bucket.
This matters even more as the work shifts toward agents that take actions on their own, which is the direction Groundwork and most serious tools are heading. McKinsey's 2026 research on AI trust found that the biggest barrier to scaling agentic AI was not regulation or raw capability. Nearly two-thirds of leaders named security and risk, and about three-quarters flagged inaccuracy as a concern. What is holding back autonomous AI is confidence that it can be deployed safely, which is exactly why a human approving every consequential recommendation is not a limitation to apologize for. It is the design.
Over-automation is its own risk. The goal is never the maximum number of automated tasks. It is the right ones. Anywhere students, privacy, or high-stakes decisions are involved, the bar for keeping a human accountable goes up, not down.
And context capture is work. The most common way to get it wrong is to try to map the entire organization at once. You do not. You start with one workflow.
What this means for your organization
If you take one thing from the data, make it a question you ask before you buy your next AI tool: can this learn how we actually work, or does it start every task from zero? If the answer is the second, you are paying for a confident stranger, and the research suggests you will probably land outside the small group of organizations getting real value.
A few concrete moves you can make this week. Pick a single painful, repeatable workflow rather than the whole operation. Map it into the three buckets, and be honest about what should stay human. Put one person's name, in writing, against every output you decide to automate, so accountability never gets lost in the tooling.
This is the work we do with organizations at 24/7 Teach. Over the past decade we have supported more than 50 organizations with Customized Training and AI integration, and in one recent engagement a team met 96 percent of its goals and saved roughly 28,000 dollars a year by being deliberate about which work to automate and which to keep human. If you want an honest read on where your own organization stands, our partnership conversations start with exactly that: a 30-minute working call where you bring the problem you are trying to solve and leave with a candid assessment of fit. [link: Scope a partnership -> /organizations]
The technology is going to keep getting better. That was never the variable that decided whether it worked for you.
About the Author:
Justice Jones is the co-founder and chief strategy officer of 24/7 Teach and Naomi-AI. He has worked as a K-12 principal, lead instructional designer, and AI strategist. 24/7 Teach was founded to close the gap between what schools teach and what learners actually need to succeed in college and in their careers. 24/7 Teach has since grown into Customized Training for teens, adults, and organizations, placing more than 600 adults into new jobs and careers at companies including Apple, Google, and JPMorgan Chase, and supporting more than 50 organizations with training and AI integration.