The three roles that decide whether AI works in your organization

In 2025, MIT's Project NANDA studied 300 enterprise AI deployments and reported a number that should stop every leader before they sign another AI contract. Ninety-five percent of corporate generative AI pilots delivered no measurable impact on profit and loss. Only five percent created real value. The lead author, Aditya Challapally, was direct about the cause. The failures were not about the quality of the models. They were about how organizations tried to use them.

That distinction is the whole game. Most organizations treat AI adoption as a buying decision, which tool to use, followed by a training decision, teaching the staff to use it. Both decisions age out fast. The organizations that capture durable value do something different. They build three internal capabilities, exploring, evaluating, and integrating, so the organization keeps absorbing new tools as fast as the old ones go obsolete. This piece lays out those three roles, what each does, and how we run them within 24/7 Teach.

The tool is the smallest part of the problem

Boston Consulting Group has a useful rule of thumb from its work across enterprise AI programs. They call it 10-20-70. Roughly 10 percent of AI value comes from the algorithms and models, 20 percent from technology and data infrastructure, and 70 percent from people and process. Most organizations invert it. They spend their attention on the 10 percent, the model, because the model is the part every vendor pitch leads with and the part that produces a clean number on a slide.

The MIT findings point in the same direction from a different angle. Pilots stalled not because the models were weak, but because of brittle workflows, a lack of contextual learning, and misalignment with how work actually happens day-to-day. MIT also found that buying from specialized vendors and building partnerships succeeded roughly two-thirds of the time, while internal build-it-yourself efforts succeeded about a third as often. The lesson is not "never build." The lesson is that integration, not the model, is the constraint.

There is a deeper reason the model is the smallest part. Frontier models are converging. When your competitor can call the same model through the same interface you can, the model stops being a source of advantage. Whatever edge exists moves to the 70 percent: the workflows you redesign, the judgment your people apply, and the speed at which your organization can put a new capability to work.

Agentic AI makes this sharper, not softer. In June 2025, Gartner predicted that more than 40 percent of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. Gartner analyst Anushree Verma described much of the current market as agent washing, older products rebranded as agents without the underlying capability. A November 2025 study from MIT Sloan Management Review and BCG found agentic AI had reached 35 percent adoption in two years, faster than any prior AI wave, with most deployments still stuck in pilot. Adoption is easy. Value is not.

The half-life of an AI tool is shorter than your training cycle

Here is the trap that catches well-run organizations. You select a tool. You build a training program around it. By the time the training is built, reviewed, scheduled, and delivered, the tool has shipped three feature updates, a competitor has leapfrogged it, or a model release has made half of what you taught obsolete.

If your team's competence is anchored to a specific tool, that competence decays at the speed of the tool. And right now that speed is brutal. The skill that does not decay is the ability to learn a new tool quickly, evaluate whether it is worth adopting, and fold it into real work. That is a different thing to train for, and it is the thing that lasts.

This reframes what AI training should even mean. The training your team needs is not a tour of one product's menus. It is the habit of evaluating and integrating whatever shows up next. The vendor will sell you the first kind. The second kind you have to build.

The three roles that create durable value

When we map what actually separates an organization that compounds value from AI from one that buys tools and stalls, it comes down to three functions. Call them roles, not necessarily job titles. In a large organization they may be three teams. In a small one, they may be three hats on the same person. What matters is that all three are owned and none is skipped.

The Explorer. Someone whose standing job is to scan the landscape for new tools and capabilities and bring the promising ones to the table. Not to adopt them. To surface them. Without an Explorer, an organization only learns about a new capability when a vendor cold-calls or a competitor's win shows up in the press, which is to say, too late.

The Evaluator. Someone who takes what the Explorer surfaces and runs it through a real test before anything scales. A useful evaluation answers five questions.

What does it cost per usable result, not per call? A pricier tool that gets the job right on the first pass can be cheaper than a cheap one you re-run three times.
How is it different from the tool we already use, if at all?
Does it make our work faster or higher quality at the point of production, measured against a real baseline rather than a gut feeling?
Does it give our people time back to do the work only people can do?
What is our exposure if the tool changes its pricing or disappears, since the market churns fast enough that durability is now part of the cost? A tool that passes the demo but fails on cost per result, or quietly duplicates something you already own, gets stopped here. This is also where the unglamorous governance work lives: privacy, security review, and for schools, FERPA and student data.

The Integrator. This is the role most organizations miss entirely, and it is where the MIT finding about workflow misalignment gets solved. A tool that the Evaluator approved is not finished. It has to be adapted to the specific way each team works, because a capability that is a strength for one team can be a liability for another. The Integrator's job is to find where a tool genuinely fits, adapt it for that context, and hand it off so the team can run it without the Integrator standing over them.

The handoff matters. The goal is not permanent dependency on whoever introduced the tool. The goal is shared capability. When the Integrator does the job well, the team owns the workflow afterward, and the organization's capacity has actually grown rather than been rented.

Note: AI assisted with structuring our research. The experiments, observations, and analysis are drawn from our own organizational practices.

What this looks like when you actually run it

We did not arrive at these three roles from a whiteboard. We have been building adaptive, data-driven systems into our operations since 2017, starting with adaptive scoring and feedback in our LMS. We moved early into generative AI as well, using a copywriting tool called Jasper before most people had heard of ChatGPT. Living through more than one wave taught us these roles, because they let us as an organization absorb the next wave rather than start over.

The work now spans admissions, content production, and quality review, under one principle: human-led, AI-facilitated. The machine does the work that scales. People keep the judgment, the relationships, and the final call. We built the three roles because we had to, and the lessons below came from making the mistakes first, so our partners do not have to repeat them.

A concrete example. We built an internal voice agent that supports our admissions team. When a prospective family interacts with it, the agent captures a transcript and sends a structured admissions plan to the right person in Slack. That did not come from buying an off-the-shelf admissions product. It came from the Explorer surfacing the capability, the Evaluator confirming the cost made sense, and that it gave our admissions people their time back for the conversations that genuinely need a human, and the Integrator wiring it into the workflow our team already uses, rather than asking them to log into one more dashboard.

The Evaluator discipline is most visible in how we choose which model runs a given task. When we needed to generate practice math questions at volume, we did not pick a model by reputation. We ran the same set of 69 questions across four of the leading models available at the time, scored each against 18 state standards, and compared pass rates. One model cleared 62 of 69. Another cleared 48. The gap was not subtle, and it would have been invisible if we had trusted the marketing.

Quality was only half the decision. The model that scored highest in generation also cost more per call, so we matched it to the job. Heavy generation work, where accuracy compounds, goes to the stronger model. Routine evaluation work, where a lighter model is plenty, does not. Choosing the right model for each task, rather than the most powerful model for every task, is what keeps usage costs sane once you are running thousands of calls a week. For disclosure, we build with Anthropic's Claude across much of our stack, including this site, so we evaluate its costs and limits the same way we ask our partners to evaluate theirs.

Then there is the texture you only learn by running it. Two of our pipelines used the same underlying model, and one returned cleanly formatted question images while the other returned them cropped. Same model, different result, because the integration around it differed. No vendor slide warns you about that, and no amount of reading replaces it. It is also why we keep the Evaluator and Integrator roles live rather than one-time. New models ship constantly, pricing shifts, and a choice that was right last quarter is worth re-checking this one.

The Integrator role earned its place through a mistake we learned from. One of our team members built an internal process for debugging and quality review that worked beautifully for her own workflow. When we tried to hand it to the rest of the team as written, it did not fit how they worked, and it would have slowed them down rather than speeding them up. The fix was not to abandon it. It was to adapt it to each person's actual workflow before rolling it out. That is the Integrator's job in one sentence. A tool is only integrated when it fits the work, not when it merely functions.

This is also the model we use with the organizations we partner with. We do not hand a district or a company an off-the-shelf training and walk away. We co-design the work with your team and hand it off so you run it yourselves. Across our organization's engagements, the pattern holds. In one partnership, a small charter school network met 96 percent of its stated goals and saved roughly $28,000 a year, not because we handed them a model, but because we built the capability to use it into their team. We have supported more than 50 organizations this way. 24/7 Teach also operates Naomi-AI, an AI platform used in K-8 classrooms, and the operating model described here is the same one we run there.

Isn't this just change management?

Partly, yes, and that is the point. The single most common reason AI initiatives fail is that organizations treat change management as a separate problem to handle after the tool is installed, rather than as the core of the work. BCG's research found that peer learning is the primary way people actually pick up AI skills, with most respondents naming colleagues, not formal training, as their main channel. The three roles are change management, given a shape that specifically matches the speed of AI. The Explorer keeps you from falling behind. The Evaluator keeps you from spending money on hype. The Integrator is the peer-learning engine that turns an approved tool into a team capability.

The fair objection is the small-organization version. We cannot staff three roles. You do not need to. These are functions, not headcount. A twelve-person nonprofit can have one operations lead who wears all three hats on a rotating basis, as long as the organization is honest about which hat it is wearing and does not skip the Evaluator step because the demo looked good. The failure mode is not having too few people. It is having no one who owns evaluation and integration, so every tool decision defaults to whoever is loudest about the newest tool.

What to do before you buy another AI tool

If you take one thing from the MIT number, let it be this. The 95 percent did not fail because they picked the wrong model. They failed because no one owned the integration work. Here is where to start this quarter.

Name the three roles, even informally. Decide who your Explorer, Evaluator, and Integrator are. At a small organization, write down which hat gets worn when. The act of naming them surfaces the gap immediately, because most organizations discover they have an Explorer, since everyone forwards AI articles, and no Evaluator or Integrator.
Write your evaluation rubric before your next vendor call. Five questions: cost per usable result, difference from what you already have, production speed or quality gain against a baseline, time given back to people, and your exposure if the tool changes or disappears. A tool that cannot answer all five does not advance.
Pick one workflow, not one tool. Choose a single, real, recurring task where a person is doing something a tool could assist with, and integrate there first. MIT found the highest returns in unglamorous back-office work, not the sales and marketing pilots that attract most of the budget. Start where the work is, not where the demo is shiniest.
Plan the handoff from day one. Decide in advance who owns the workflow after integration, so the capability resides with the team rather than the person who introduced it.

The organizations that win the next few years will not be the ones with the best tools. Everyone will have access to the same tools. They will be the ones who built the capability to keep absorbing new ones. That capability is buildable, and it is the work we do with our organization partners.

If you are scoping out how AI should actually run within your school network or your company, we will tell you on a call whether we are a fit; if we are not, we will point you somewhere better. Scope a partnership with 24/7 Teach →

About the author

Justice Jones is an instructional designer, AI strategist, and former K-12 principal. He co-founded 24/7 Teach to close the gap between what schools teach and what teens and professionals actually need to succeed, and he serves as CSO of Naomi-AI. 24/7 Teach has supported more than 50 organizations and placed 600 or more adults in new careers, and its graduates have collectively earned more than $5.7 million in scholarships. Full bio →