The Control Gap: Enterprise AI organizations have an ownership problem, not a technology problem — and most are governing it by hand

AI portfolios are expanding far faster than the ability to govern them across enterprises. Most organizations run a contested field of platforms, each claiming to be the “primary” AI layer; few could confidently detect a model drifting or failing in production; and the single most-cited barrier to control is the absence of any one owner accountable for AI across the stack. The result is a widening control gap — ambition and spend racing ahead of visibility, ownership, and cost control — with autonomous agents already producing real financial and operational failures.

This wave of VentureBeat Pulse Research examines the enterprise AI control gap: how many platforms claim to be the primary AI layer, who actually governs AI behavior across them, whether organizations could detect a model failing in production, what most blocks cross-platform governance, and how the financial and operational control failures of autonomous agents are already surfacing.

The central finding is a control gap — the distance between how aggressively enterprises are expanding AI and how little of it they can see, own, or govern. Just under three-fifths (58%) are net-adding AI initiatives, with “expanding significantly” the largest single posture.

Yet 85% run two or more platforms each claiming to be the “primary” AI layer and only 8% have consolidated to one. Against that contested surface, 40% say they are very confident they would detect a model drifting, behaving unsafely, or failing in production — but only 10% back that confidence with active monitoring and alerting, the rest leaning on manual human review. The machinery to expand AI is running well ahead of the machinery to control it.

The gap is, above all, a question of ownership. Only a third (38%) say a central team governs AI today, and a fifth (20%) say each platform team governs its own independently; the single most-cited barrier to cross-platform governance is the absence of a single accountable owner (32%), and roughly one in six (17%) say no role holds formal accountability at all. The same vacuum shows up in spend: just under half (49%) name shadow AI — unauthorized agentic pipelines run on corporate cards outside central oversight — as their most severe control failure, and another 25% have been hit by a runaway “infinite loop” agent bill. Enterprises have standardized the ambition well before they have standardized the control.

Methodology

VentureBeat fielded this survey as part of its ongoing Pulse Research series, this instrument focused on the enterprise AI control gap — governance, observability, and cost control across multiple AI platforms. Responses are filtered to organizations with 100 or more employees and, for this cut, exclude the respondents who selected “Other” as their job function, leaving a base of identifiable roles (n=145); all are drawn from a single Q2 2026 (June) wave. 

By organization size the sample tilts toward the mid-market and lower-large bands: 100–499 and 500–2,499 employees (23% each) lead, with 10,000–49,999 (22%) and 2,500–9,999 (20%) close behind and 50,000+ at 11%. By role it is senior and technical: consultants and advisors (20%), CIO/CTO/CISO (18%), directors of engineering/IT (14%), product and program managers (13%), and enterprise architects (12%) make up the core. Technology/Software is the largest industry at 41%, followed by Financial Services and Professional Services (12% each) and Healthcare/Life Sciences and Manufacturing/Industrial (10% each).

The findings should be read as a directional signal rather than a precise measurement; it is self-selected and is not a probability sample. Where a single share would be fragile on its own, the report leans on the direction and grouping of responses rather than the exact percentage point.

Finding 1: Expansion is outrunning control

AI portfolios are growing faster than the means to govern them

We asked enterprises to describe how their AI portfolio has changed over the past 12 months. Growth leads — with a meaningful minority deliberately pulling back.

Expansion leads. Combining “expanding significantly” (33%) and “net positive growth” (25%), just under three-fifths of enterprises (58%) are net-adding AI initiatives. Yet a substantial share is easing off deliberately: roughly a quarter (23%) are actively rationalizing — scaling what works and cutting the rest — and another 12% hold their portfolios flat. Only a handful (3%) have paused to get governance in order first.

This is the engine behind every gap that follows: enterprises are accelerating into a landscape they have not yet learned to see or own, and a notable 4% cannot even describe their own portfolio. The ambition documented here is exactly what makes the visibility and ownership shortfalls in Findings 3 and 4 consequential rather than academic.

Finding 2: No single “primary” AI layer — the surface is contested

More than four in five run multiple platforms each claiming primacy

We asked how many enterprise platforms currently claim to be the organization’s “primary” AI layer — the ERP, EHR, ITSM, productivity suite, or data platform each positioning itself as the center of gravity. Almost no one has a single answer.

The defining condition is contested primacy. Adding the two multi-platform bands, 85% of enterprises have at least two platforms each asserting itself as the primary AI layer, and more than a third (36%) describe an open four-way-or-more contest. Only 8% have consolidated to a single layer, and another 6% have not even mapped the question. This is the structural reason governance is hard: there is no agreed center of gravity to govern from. Each platform brings its own AI, its own controls, and its own assumptions — and, as Finding 3 shows, the question of who governs across them increasingly has no settled answer.

Finding 3: Governance is claimed at the center but contested in practice

A central team owns it on paper; in practice, it’s fragmenting

We asked who is actually responsible for governing AI behavior across all of those platforms today, and which function holds primary accountability. The headline answer is reassuring; the detail is not.

On the surface, a central governance function is the leading answer — but only a third (38%) claim one, well short of a majority. The rest of the distribution undercuts it further: a fifth (21%) say ownership is unclear or contested between teams, a fifth (20%) say each platform team simply governs its own AI independently, and 19% say no one has addressed it at all.

Accountability fragments further when we asked which role actually holds it — CIO/CTO/CISO leads at 27%, a Chief AI Officer or equivalent at 22%, and a striking 17% say no one holds formal accountability yet. Even where a central team is claimed, the named owner is most often the general technology executive rather than a dedicated AI authority. The governance function exists more often as an org-chart aspiration than an operating reality — the precondition for the detection gap in Finding 4.

Finding 4: The detection gap — confidence is real but largely manual

Only one in 10 have active monitoring and alerting

We asked how confident enterprises are that they would detect an AI model in production that was drifting, behaving unsafely, or failing to complete tasks correctly. This is the heart of the control gap.

This is the report’s central number. While 40% say they are very confident they would detect a failing model, the overwhelming majority of that confidence rests on manual human review (30%) rather than automation — just 10% have active monitoring and alerting actually in place.

At the other end, more than a quarter combine the two reactive answers — no systematic visibility (8%) and would hear it from end users first (19%) — meaning they would learn of a production failure after the fact, from the people it affected. The plurality (32%) sit in a hopeful middle, expecting to “catch most issues eventually.” Set against the aggressive expansion of Finding 1, this is the crux of the control gap — enterprises are scaling AI into production faster than they are building automated means to know when it breaks. Confidence is real, but it is largely manual, and automated detection remains the exception.

Finding 5: The missing owner is the biggest barrier

Governance stalls on accountability first, visibility second

We asked enterprises to name their single biggest barrier to governing AI across multiple platforms. The org chart tops the list.

The single missing owner leads at 32%, the most-cited barrier. Vendor opacity (25%) and the lack of tooling or infrastructure to observe across platforms (16%) sit behind, and together these two technical-visibility barriers (41%) outweigh the ownership gap. Leadership deprioritization accounts for another 17%, while a clear lack of talent is rare (5%). Rounding out the picture, another 5% say it isn’t a barrier for them at all — they’ve already solved it.

Read together, the picture is more contested than the headline suggests: enterprises still most often name a missing owner, but a good share locate the obstacle in vendor black boxes and the absence of cross-platform observability.

Asked in a free-text question what one thing they would fix, respondents converged from different directions on the same answer — a single accountable owner, and a control plane that abstracts cost, drift, and model choice away from the end user.

Finding 6: The fine-tuning ROI reckoning

Roughly seven in 10 have little to show for custom model investment

We asked what share of the proprietary foundation models enterprises have invested in fine-tuning over the past 18 months have delivered clear, measurable positive ROI in production today. Most describe a sandbox graveyard — or a deliberate decision to avoid one.

Custom fine-tuning has, for most, not paid off. Combining the three disappointing outcomes — sandbox graveyard, strategic avoidance, and total write-off — roughly seven in ten (73%) either failed to get custom models into productive use or deliberately declined to try, against 27% for whom fine-tuned models are a reliable advantage. The largest single group (45%) remains the graveyard: projects too expensive or complex to maintain, stranded in development. Another quarter (24%) never started — they priced in the downstream maintenance burden and avoided it.

The signal is that many enterprises still treat bespoke model training as a cost trap, which helps explain the pragmatic, buy-and-blend vendor posture in Finding 7.

Finding 7: Vendor posture — hybrid by default, with defection rising

Enterprises blend open and closed models; more are now trimming a vendor

We asked two related questions: whether enterprises are shifting workloads toward open-weight models to escape API costs and lock-in, and which proprietary vendor, if any, they are most likely to phase out over the next year. The answers describe hedging — and a rising willingness to cut.

On open weights, a clear majority (51%) strike a hybrid balance, with a deliberate closed commitment second at 32% and a hard pivot to self-hosted open models at 16%. The hybrid plurality is the same instinct visible throughout this survey — keep optionality, avoid being trapped — while the closed group remains candid that the operational overhead of self-hosting still outweighs the savings for them.

On vendor defection, loyalty by inertia no longer leads: Microsoft is now the single most-named target (29%, often citing Copilot/Azure cutbacks in favor of direct model access), narrowly ahead of the 27% who are downsizing no one at all. OpenAI follows at 21% (citing pricing volatility), with Anthropic at 15% and Google at 6%. No single vendor faces a wholesale exodus, but among identifiable roles the balance has tipped from “expanding across all” toward actively trimming at least one provider.

Finding 8: The agentic spending crisis — shadow AI leads the failures

Unauthorized pipelines, not runaway loops, are the top control failure

Finally, we asked what the most severe financial or operational control failure enterprises have experienced as autonomous agents run over longer execution windows. Shadow AI tops the list — and very few have escaped a scare.

The control gap has a price, and it is being paid. Just under half of enterprises (49%) cite shadow AI — unauthorized agentic pipelines spun up on corporate cards outside any central oversight — as their most severe failure, the operational twin of the “no single owner” barrier in Finding 5. Another 25% have been burned by a runaway infinite-loop agent bill, and 6% by an agent that degraded production databases. Only 21% report guarded stability — the minority that has imposed hard token throttling and budget caps at the infrastructure layer and avoided surprises.

Put differently, roughly four in five of these enterprises (79%) have already experienced a real financial or operational control failure from autonomous AI, not merely worried about one. As with detection in Finding 4, the deterministic controls that would prevent these failures exist at only a fraction of organizations.

The bottom line: A control gap that spending cannot close on its own

Organizations with 100 or more employees describe AI programs that are expanding fast and governing slowly. Just under three-fifths are net-adding to their portfolios; more than four in five run a contested field of platforms with no agreed primary layer; and the thing they most often name as their chief obstacle is a single accountable owner. The visibility to match the ambition is largely manual — only 10% have active monitoring and alerting, and confidence in detecting a failing model rests mostly on human review rather than automation.

The consequences are already concrete rather than hypothetical. Custom fine-tuning has disappointed more often than not, pushing enterprises toward a hedged, hybrid, buy-and-blend model posture; and the autonomous agents now reaching production have produced real control failures for roughly four in five respondents, led by shadow AI running outside any central oversight. This reads as a directional signal rather than a precise measurement — but the direction is consistent across every question: ambition, spend, and deployment are racing ahead of ownership, observability, and cost control. The control gap is not a tooling problem that more spending will close on its own; it is, first, a question of who owns the answer. 


Based on survey responses from 145 qualified enterprise respondents (100+ employees). Sample size is small; data should be treated as directional. Respondents include Directors, VPs, CIOs, CTOs, and Enterprise Architects across Technology, Financial Services, Retail, Healthcare, and other sectors.

The Control Gap: Enterprise AI organizations have an ownership problem, not a technology problem — and most are governing it by hand

AI portfolios are expanding far faster than the ability to govern them across enterprises. Most organizations run a contested field of platforms, each claiming to be the “primary” AI layer; few could confidently detect a model drifting or failing in production; and the single most-cited barrier to control is the absence of any one owner accountable for AI across the stack. The result is a widening control gap — ambition and spend racing ahead of visibility, ownership, and cost control — with autonomous agents already producing real financial and operational failures.

This wave of VentureBeat Pulse Research examines the enterprise AI control gap: how many platforms claim to be the primary AI layer, who actually governs AI behavior across them, whether organizations could detect a model failing in production, what most blocks cross-platform governance, and how the financial and operational control failures of autonomous agents are already surfacing.

The central finding is a control gap — the distance between how aggressively enterprises are expanding AI and how little of it they can see, own, or govern. Just under three-fifths (58%) are net-adding AI initiatives, with “expanding significantly” the largest single posture.

Yet 85% run two or more platforms each claiming to be the “primary” AI layer and only 8% have consolidated to one. Against that contested surface, 40% say they are very confident they would detect a model drifting, behaving unsafely, or failing in production — but only 10% back that confidence with active monitoring and alerting, the rest leaning on manual human review. The machinery to expand AI is running well ahead of the machinery to control it.

The gap is, above all, a question of ownership. Only a third (38%) say a central team governs AI today, and a fifth (20%) say each platform team governs its own independently; the single most-cited barrier to cross-platform governance is the absence of a single accountable owner (32%), and roughly one in six (17%) say no role holds formal accountability at all. The same vacuum shows up in spend: just under half (49%) name shadow AI — unauthorized agentic pipelines run on corporate cards outside central oversight — as their most severe control failure, and another 25% have been hit by a runaway “infinite loop” agent bill. Enterprises have standardized the ambition well before they have standardized the control.

Methodology

VentureBeat fielded this survey as part of its ongoing Pulse Research series, this instrument focused on the enterprise AI control gap — governance, observability, and cost control across multiple AI platforms. Responses are filtered to organizations with 100 or more employees and, for this cut, exclude the respondents who selected “Other” as their job function, leaving a base of identifiable roles (n=145); all are drawn from a single Q2 2026 (June) wave. 

By organization size the sample tilts toward the mid-market and lower-large bands: 100–499 and 500–2,499 employees (23% each) lead, with 10,000–49,999 (22%) and 2,500–9,999 (20%) close behind and 50,000+ at 11%. By role it is senior and technical: consultants and advisors (20%), CIO/CTO/CISO (18%), directors of engineering/IT (14%), product and program managers (13%), and enterprise architects (12%) make up the core. Technology/Software is the largest industry at 41%, followed by Financial Services and Professional Services (12% each) and Healthcare/Life Sciences and Manufacturing/Industrial (10% each).

The findings should be read as a directional signal rather than a precise measurement; it is self-selected and is not a probability sample. Where a single share would be fragile on its own, the report leans on the direction and grouping of responses rather than the exact percentage point.

Finding 1: Expansion is outrunning control

AI portfolios are growing faster than the means to govern them

We asked enterprises to describe how their AI portfolio has changed over the past 12 months. Growth leads — with a meaningful minority deliberately pulling back.

Expansion leads. Combining “expanding significantly” (33%) and “net positive growth” (25%), just under three-fifths of enterprises (58%) are net-adding AI initiatives. Yet a substantial share is easing off deliberately: roughly a quarter (23%) are actively rationalizing — scaling what works and cutting the rest — and another 12% hold their portfolios flat. Only a handful (3%) have paused to get governance in order first.

This is the engine behind every gap that follows: enterprises are accelerating into a landscape they have not yet learned to see or own, and a notable 4% cannot even describe their own portfolio. The ambition documented here is exactly what makes the visibility and ownership shortfalls in Findings 3 and 4 consequential rather than academic.

Finding 2: No single “primary” AI layer — the surface is contested

More than four in five run multiple platforms each claiming primacy

We asked how many enterprise platforms currently claim to be the organization’s “primary” AI layer — the ERP, EHR, ITSM, productivity suite, or data platform each positioning itself as the center of gravity. Almost no one has a single answer.

The defining condition is contested primacy. Adding the two multi-platform bands, 85% of enterprises have at least two platforms each asserting itself as the primary AI layer, and more than a third (36%) describe an open four-way-or-more contest. Only 8% have consolidated to a single layer, and another 6% have not even mapped the question. This is the structural reason governance is hard: there is no agreed center of gravity to govern from. Each platform brings its own AI, its own controls, and its own assumptions — and, as Finding 3 shows, the question of who governs across them increasingly has no settled answer.

Finding 3: Governance is claimed at the center but contested in practice

A central team owns it on paper; in practice, it’s fragmenting

We asked who is actually responsible for governing AI behavior across all of those platforms today, and which function holds primary accountability. The headline answer is reassuring; the detail is not.

On the surface, a central governance function is the leading answer — but only a third (38%) claim one, well short of a majority. The rest of the distribution undercuts it further: a fifth (21%) say ownership is unclear or contested between teams, a fifth (20%) say each platform team simply governs its own AI independently, and 19% say no one has addressed it at all.

Accountability fragments further when we asked which role actually holds it — CIO/CTO/CISO leads at 27%, a Chief AI Officer or equivalent at 22%, and a striking 17% say no one holds formal accountability yet. Even where a central team is claimed, the named owner is most often the general technology executive rather than a dedicated AI authority. The governance function exists more often as an org-chart aspiration than an operating reality — the precondition for the detection gap in Finding 4.

Finding 4: The detection gap — confidence is real but largely manual

Only one in 10 have active monitoring and alerting

We asked how confident enterprises are that they would detect an AI model in production that was drifting, behaving unsafely, or failing to complete tasks correctly. This is the heart of the control gap.

This is the report’s central number. While 40% say they are very confident they would detect a failing model, the overwhelming majority of that confidence rests on manual human review (30%) rather than automation — just 10% have active monitoring and alerting actually in place.

At the other end, more than a quarter combine the two reactive answers — no systematic visibility (8%) and would hear it from end users first (19%) — meaning they would learn of a production failure after the fact, from the people it affected. The plurality (32%) sit in a hopeful middle, expecting to “catch most issues eventually.” Set against the aggressive expansion of Finding 1, this is the crux of the control gap — enterprises are scaling AI into production faster than they are building automated means to know when it breaks. Confidence is real, but it is largely manual, and automated detection remains the exception.

Finding 5: The missing owner is the biggest barrier

Governance stalls on accountability first, visibility second

We asked enterprises to name their single biggest barrier to governing AI across multiple platforms. The org chart tops the list.

The single missing owner leads at 32%, the most-cited barrier. Vendor opacity (25%) and the lack of tooling or infrastructure to observe across platforms (16%) sit behind, and together these two technical-visibility barriers (41%) outweigh the ownership gap. Leadership deprioritization accounts for another 17%, while a clear lack of talent is rare (5%). Rounding out the picture, another 5% say it isn’t a barrier for them at all — they’ve already solved it.

Read together, the picture is more contested than the headline suggests: enterprises still most often name a missing owner, but a good share locate the obstacle in vendor black boxes and the absence of cross-platform observability.

Asked in a free-text question what one thing they would fix, respondents converged from different directions on the same answer — a single accountable owner, and a control plane that abstracts cost, drift, and model choice away from the end user.

Finding 6: The fine-tuning ROI reckoning

Roughly seven in 10 have little to show for custom model investment

We asked what share of the proprietary foundation models enterprises have invested in fine-tuning over the past 18 months have delivered clear, measurable positive ROI in production today. Most describe a sandbox graveyard — or a deliberate decision to avoid one.

Custom fine-tuning has, for most, not paid off. Combining the three disappointing outcomes — sandbox graveyard, strategic avoidance, and total write-off — roughly seven in ten (73%) either failed to get custom models into productive use or deliberately declined to try, against 27% for whom fine-tuned models are a reliable advantage. The largest single group (45%) remains the graveyard: projects too expensive or complex to maintain, stranded in development. Another quarter (24%) never started — they priced in the downstream maintenance burden and avoided it.

The signal is that many enterprises still treat bespoke model training as a cost trap, which helps explain the pragmatic, buy-and-blend vendor posture in Finding 7.

Finding 7: Vendor posture — hybrid by default, with defection rising

Enterprises blend open and closed models; more are now trimming a vendor

We asked two related questions: whether enterprises are shifting workloads toward open-weight models to escape API costs and lock-in, and which proprietary vendor, if any, they are most likely to phase out over the next year. The answers describe hedging — and a rising willingness to cut.

On open weights, a clear majority (51%) strike a hybrid balance, with a deliberate closed commitment second at 32% and a hard pivot to self-hosted open models at 16%. The hybrid plurality is the same instinct visible throughout this survey — keep optionality, avoid being trapped — while the closed group remains candid that the operational overhead of self-hosting still outweighs the savings for them.

On vendor defection, loyalty by inertia no longer leads: Microsoft is now the single most-named target (29%, often citing Copilot/Azure cutbacks in favor of direct model access), narrowly ahead of the 27% who are downsizing no one at all. OpenAI follows at 21% (citing pricing volatility), with Anthropic at 15% and Google at 6%. No single vendor faces a wholesale exodus, but among identifiable roles the balance has tipped from “expanding across all” toward actively trimming at least one provider.

Finding 8: The agentic spending crisis — shadow AI leads the failures

Unauthorized pipelines, not runaway loops, are the top control failure

Finally, we asked what the most severe financial or operational control failure enterprises have experienced as autonomous agents run over longer execution windows. Shadow AI tops the list — and very few have escaped a scare.

The control gap has a price, and it is being paid. Just under half of enterprises (49%) cite shadow AI — unauthorized agentic pipelines spun up on corporate cards outside any central oversight — as their most severe failure, the operational twin of the “no single owner” barrier in Finding 5. Another 25% have been burned by a runaway infinite-loop agent bill, and 6% by an agent that degraded production databases. Only 21% report guarded stability — the minority that has imposed hard token throttling and budget caps at the infrastructure layer and avoided surprises.

Put differently, roughly four in five of these enterprises (79%) have already experienced a real financial or operational control failure from autonomous AI, not merely worried about one. As with detection in Finding 4, the deterministic controls that would prevent these failures exist at only a fraction of organizations.

The bottom line: A control gap that spending cannot close on its own

Organizations with 100 or more employees describe AI programs that are expanding fast and governing slowly. Just under three-fifths are net-adding to their portfolios; more than four in five run a contested field of platforms with no agreed primary layer; and the thing they most often name as their chief obstacle is a single accountable owner. The visibility to match the ambition is largely manual — only 10% have active monitoring and alerting, and confidence in detecting a failing model rests mostly on human review rather than automation.

The consequences are already concrete rather than hypothetical. Custom fine-tuning has disappointed more often than not, pushing enterprises toward a hedged, hybrid, buy-and-blend model posture; and the autonomous agents now reaching production have produced real control failures for roughly four in five respondents, led by shadow AI running outside any central oversight. This reads as a directional signal rather than a precise measurement — but the direction is consistent across every question: ambition, spend, and deployment are racing ahead of ownership, observability, and cost control. The control gap is not a tooling problem that more spending will close on its own; it is, first, a question of who owns the answer. 


Based on survey responses from 145 qualified enterprise respondents (100+ employees). Sample size is small; data should be treated as directional. Respondents include Directors, VPs, CIOs, CTOs, and Enterprise Architects across Technology, Financial Services, Retail, Healthcare, and other sectors.

The Agentic Reckoning: Enterprise AI organizations have a runtime problem, not a model problem — and most are building the wrong solution

In Q1 2026, VentureBeat’s Pulse Research surfaced the “Governance Mirage”: the gap between the governance org charts enterprises had drawn and the control layers they had actually built. Forty-three percent said a central team owned AI governance; 23% couldn’t agree on who owned it at all; and 31% named vendor opacity as the single biggest obstacle.

This new wave of research asks the next question: Once you’ve admitted the governance problem, what breaks first when you try to fix it? The answer from our respondents is unambiguous. The failure point is not the model. It’s the runtime.

Enterprises are discovering that AI agents built on stateless infrastructure — Python scripts, LangChain chains, ad hoc orchestration — cannot survive the operational realities of production. Container restarts erase context. Token costs breach business cases. Hallucinations in Step 3 compound into catastrophic failures by Step 12. And the majority of engineering teams are spending more time managing this “plumbing” than building the intelligence that was supposed to justify the investment.

What emerges from this survey is a picture of an industry at a critical fork. The organizations that survive the Agentic Reckoning will be those that treat runtime durability as a first-class engineering concern — not an afterthought to be patched with retries and prompting. The ones that don’t will find themselves back where RPA left enterprises a decade ago: a graveyard of clever pilots that couldn’t survive Day Two.

Methodology

VentureBeat conducted this survey in May 2026 as part of its ongoing Pulse Research series on agentic AI adoption in the enterprise. Respondents were filtered to organizations with 100 or more employees. The final qualified sample consists of 132 verified, highly qualified technology leaders at the forefront of enterprise AI agent deployment. 

They span:

Directors of AI/Analytics (8%)

Directors of Engineering/IT (16%)

VP of Data/AI/Analytics (5%)

VP of Engineering/IT (5%)

CIOs/CTOs/CISOs (15%) 

Product and Program Managers (13%) 

Consultants (9%) 

Software and ML Engineers (9%) 

Enterprise Architects (8%) 

Other (12%)

Industries represented include Technology/Software (42%), Financial Services (20%), Professional Services (8%), Healthcare/Life Sciences (7%), Retail/Consumer (6%), Education (4%), and others.

Given our strict filtering criteria, this cohort provides a robust and authoritative look at emerging agentic infrastructure trends.

Respondent demographics by company size:

  • Large enterprise (10,000+ employees): 35% of the sample

  • Mid-to-large enterprise (500–9,999 employees): 48% of the sample

  • Growth enterprise (100–499 employees): 17% of the sample

These quantitative findings capture a critical moment in infrastructure evolution and are best synthesized alongside VentureBeat’s Q1 2026 governance reports and our deep-dive practitioner conversations conducted throughout the quarter.

Finding 1: The runtime is the problem

The “spine vs. brain” debate is over

The foundational question of enterprise AI in 2026 is whether agent failures trace back to the model’s reasoning capability — the Brain — or to the runtime infrastructure’s inability to manage state, survive failures, and coordinate execution — the Spine. We asked our respondents directly.

Integration/governance challenges were the biggest problem. But Spine issues were close behind.

However, 17% still say the Brain is the primary failure mode. That’s not a rounding error — it’s a signal. The organizations in this cohort are not disputing the infrastructure problem; they are telling us that the models themselves are not yet reliable enough for the edge cases their workflows are generating. The model-versus-runtime debate is genuinely three-sided. Read together, these three answers are not fully in conflict. The Spine and Gap camps are struggling with infrastructure and governance respectively. The Brain cohort is struggling with something upstream: reasoning reliability at scale. 

This is a significant finding. The frontier model wars — GPT-5 vs. Claude 4.7 vs. Grok — are consuming enormous mindshare in the enterprise technology press. Our respondents are telling us that war is, for now, beside the point. The models are smart enough, but the infrastructure around them is not.

“The models are smart enough, but our stateless infrastructure is too fragile to manage long-running, multi-step agentic processes.”

— Director of Engineering / IT, Financial Services, 10,000–49,999 employees

Finding 2: The DIY tax is eating teams alive

Engineering capacity is being consumed by plumbing, not intelligence

If the Spine is a primary failure mode, what does that cost in practice? We asked respondents what percentage of their team’s weekly engineering capacity is consumed by building and maintaining custom “plumbing” — manual retries, state-persistence, checkpointing — rather than actual agentic logic.

The results reveal a market in two distinct camps, with a dangerous middle.

The arithmetic is stark. Seventy-seven percent of respondents are spending meaningful engineering time on infrastructure overhead. Just 23% — those whose frameworks are handling reliability — have escaped the tax. The distribution is notably flat: the Crisis and Efficiency poles are the same sizes as the middle categories (Trap and Maintenance Tax). This is the signature of a market that has partially addressed the worst failures but has not yet escaped the structural overhead.

The Efficiency Zone respondents are not necessarily in a more sophisticated position. In many cases, they may be on managed platforms that abstract away the durability problem — or they may simply not yet have hit the scale at which stateless architectures begin to fail. The Complexity Trap is often where the Efficiency Zone ends.

There’s a direct business consequence for organizations in the Crisis zone. Every engineering hour spent writing retry logic or debugging a “ghost failure” — a silent API timeout that leaves an agent hanging without a traceback — is an hour not spent on the differentiated logic that was supposed to justify the AI investment in the first place.

Finding 3: State amnesia is the production killer

The No. 1 technical obstacle has shifted: Cost and hallucination now lead state failures

When AI agents fail to reach production or scale, what is the primary technical obstacle? We named five candidates, ranging from model hallucination to cost overruns to latency failures.

Hallucination Propagation at 24% compounds silently — reasoning errors in early steps become catastrophic by Step 10. Ghost Failures at 20% are invisible by definition, which means their real prevalence is likely higher than this number suggests.

Finding 4: The observability tax falls heaviest on Microsoft

Platform visibility costs are not equally distributed

Our Q1 2026 research identified vendor opacity as the single biggest obstacle to AI governance — ahead of talent gaps, tooling, and budget. That finding pointed to this question: Which vendor ecosystem, in practice, imposes the highest cost to achieve basic production visibility?

We asked respondents which platform requires the most custom telemetry, manual instrumentation, and “logging glue” to achieve visibility into agentic failures.

Microsoft’s position at the top of this ranking is not noise. It is a structural characteristic of the Microsoft agentic ecosystem — the same Azure/Copilot stack that dominates enterprise AI adoption requires the most instrumentation overhead to see inside.

It also reinforces the warning that Brian Gracely, Senior Director at Red Hat, made at VentureBeat’s Boston event in March: that building your control system entirely inside one cloud provider’s toolset means “renting a cage.” The organizations paying the highest observability tax are precisely those most locked into provider-native tooling.

The implication for teams currently evaluating orchestration architecture is direct: observability cost is a real budget item that should appear in any build-vs-buy analysis. A platform that appears cheaper at the API layer may impose substantially higher engineering costs at the telemetry layer.

Finding 5: The hype-reality gap belongs to OpenAI and Microsoft

Agentic coding marketing is significantly ahead of production reliability. 

We asked respondents a pointed question: Which major platform’s Agentic Coding marketing is the most disconnected from the actual technical reliability and fault-tolerance of their product? Thirty-two percent said they didn’t know — a figure that has held roughly constant across all three waves, suggesting persistent uncertainty is structural, not a sample artifact. Cursor also registered 6% in this wave. Among those with enough production experience to have a view.

Microsoft leads at 45%; OpenAI is second at 22%. The gap is too large to attribute solely to deployment footprint. It suggests that GitHub Copilot Workspaces and AutoGen are generating a specific category of disappointment — probably around the reliability of multi-agent orchestration in production — that accumulates with use. A platform that fewer enterprises are running in production will accumulate fewer credible disappointed practitioners.

The more significant observation is what this gap means for decision-makers evaluating new agentic tooling. The marketing around all major platforms describes agentic autonomy and reliability at a level that production deployments are not yet delivering. The organizations in our survey who have moved beyond pilots are encountering the difference firsthand.

Finding 6: The security mesh is being built from first principles

Enterprises are not waiting for vendors to solve agent security

How are enterprises protecting proprietary research data from AI leakage and prompt-driven exfiltration? The security architecture question is one of the most consequential in agentic AI, because agents — unlike static models — can actively call APIs, traverse file systems, and execute code. The blast radius of a security failure is qualitatively different.

Policy-as-Code is a leading security mechanism, but not by much. 

The NHI and Policy-as-Code approaches are meaningfully different in their security philosophy. NHI is identity-centric: The question it answers is “who is this agent and what is it allowed to touch?” Policy-as-Code is rule-centric: The question it answers is “regardless of what the model decides to do, what hard stops exist at the infrastructure level?”

Rough parity across all four mechanisms is the headline finding. This is what market convergence looks like in early motion: No dominant pattern has emerged. Notably, though, Egress-Locked Sandboxing is a relatively new trend in agentic AI deployments, yet it’s already at 22%. As more agents gain terminal-level access to enterprise systems, the cost-benefit of sandboxing is improving. This is notable given the maturity of the identity management and policy-as-code disciplines in traditional IT security. The AI security layer is, for now, being built largely from scratch.

The Egress-Locked Sandboxing number deserves attention despite its smaller share. Sandboxing untrusted code execution is the most technically intensive of the four approaches, but it is also the most direct defense against prompt injection attacks that try to execute malicious code through agent tooling. As agentic systems gain more terminal-level access — a trend our survey confirms is accelerating — this approach may prove more important than its current adoption rate suggests.

“How do we audit agentic tools that have terminal-level access to our proprietary repos?”

— Composite concern expressed by multiple respondents

Finding 7: The complexity cliff is real, and most are climbing it

The migration away from stateless architectures is underway — but fragmented

The central thesis of the Agentic Reckoning is that stateless Python/LangChain architectures cannot survive the complexity cliff — the point at which multi-step, long-running agent workflows begin failing at rates that make production deployment untenable. We asked respondents directly: are you migrating toward durable execution frameworks to solve for state loss?

The answers reveal a market in transition, with meaningful disagreement about the right destination.

The 20% committed to stateless architectures — attempting to solve a structural durability problem through better prompting — are the cohort most likely to encounter State Amnesia and Ghost Failures as their workloads scale. It’s essentially the same trap that RPA teams fell into a decade ago, when brittle process automations were patched with increasingly elaborate rule sets rather than re-architected on more resilient foundations.

The Stateless Commitment cohort deserves a reinterpretation. These teams are not all naive: some are building on managed platforms that genuinely abstract state management. But a portion is patching structural fragility with prompting improvements, and the Ghost Failures data in Finding 3 suggests this approach may be encountering its ceiling.

The combined 59% who are either in Active Migration or in Governance-First Evaluation represent the market’s leading edge — organizations that have recognized the architectural problem and are investing to solve it structurally.

Finding 8: The “polyglot orchestration” lead is narrow — the field is fragmented

Architectural conviction is spread across multiple bets

What is the longterm architectural philosophy winning enterprises’ strategic investment? We offered four options representing the major bets available in the current market.

The Polyglot Bet’s lead suggests that enterprises are seeing advantages of using a flexible approach: Using model-driven architectures where non-deterministic reasoning works well, but using deterministic structures and pipelines where accuracy and mission-critical execution is at stake.

This has direct competitive implications for the frontier labs and cloud providers. The cohort saying the use a Cloud-Native Managed Stack is significant. This likely reflects the enterprise reality that Azure OpenAI Service and AWS Bedrock deployments come with built-in organizational gravity — procurement relationships, security approvals, and existing data pipelines. The Independent Durable Runtime bet at 16% signals that a cohort of teams have rejected both cloud lock-in and frontier lab dependency in favor of full architectural sovereignty.

The Polyglot result also helps explain why the observability and governance problems described in this survey are so persistent. When your architecture deliberately spans multiple orchestration layers and multiple providers, no single vendor’s telemetry gives you the full picture. The “Dynatrace for AI” — the unified observability platform called for by Mass General Brigham’s CTO Nallan Sriraman at the VentureBeat Boston event — becomes not just desirable but structurally necessary.

“Enterprises trust no single provider enough to give them full control, yet they lack the engineering capacity to build entirely from scratch.” 

— Survey respondent

Finding 9: User acceptance rate is the emerging production standard

The market is settling on a human-trust metric as its primary A-SLA

What metrics are enterprises actually using to determine whether an AI agent is ready for production? We asked respondents to identify their primary Agentic SLA (A-SLA) indicator — the number that, above all others, tells them whether an agent can ship.

User Acceptance Rate as the dominant production metric is significant because it is a human-trust measure, not a technical performance measure. It does not ask whether the agent ran fast or maintained state. It asks whether a human who reviewed its output chose to accept it. This is, in effect, a field-level Turing test applied at the action level. 

The persistence of UAR as the leading metric reflects the reality of where most enterprise agentic deployments still sit: in a human-in-the-loop posture, where agent actions require human review before execution. That is a rational response to the Hallucination Propagation and Ghost Failures described earlier in this survey. Organizations that have not yet solved runtime durability are, sensibly, keeping humans in the loop — and at 132 respondents, there is no evidence this is changing.

Context Fidelity’s position at 30% is the most significant finding. It tracks directly with the Active Migration data in Finding 7: As more teams move into durable execution frameworks, the 48-hour+ memory problem becomes their primary production concern. Teams that have solved State Amnesia are now focused on whether their agent can remember what it was doing yesterday. Latency Jitter’s collapse from 25% to 11% tells the complementary story: raw speed is no longer the primary anxiety. Correctness and durability have taken its place.

The bottom line: The reckoning is runtime, not reasoning

The data tells a consistent story: There’s a runtime deficit for agents. Enterprises are spending more time on infrastructure plumbing than on agent intelligence, and State Amnesia is still claiming production deployments. But fault lines are visible. The ROI Ceiling has overtaken State Amnesia as the leading production killer — which means the infrastructure problem is no longer purely a technical one. Token economics and orchestration overhead are now consuming enough business value that project sponsors are making the kill decision before engineering teams can solve the durability problem. Hallucination Propagation remains a big problem. The Brain vote in Finding 1 remains significant. And the Polyglot lead is fragile, with varied architectures well represented.

The models are, by most respondents’ own assessment, smart enough — but 17% disagree. What is not yet smart enough is the infrastructure surrounding them: the state management, the fault-tolerance, the observability, the identity governance, and the deterministic execution layer that turns a model’s judgment into something an enterprise can stake its operations on.

The 39% making the Polyglot Bet represent the current leading edge of enterprise architectural thinking. They are building systems where the model’s intelligence is preserved and leveraged, but where the execution layer — the Spine — is deterministic, auditable, and durable by design. They are not waiting for a frontier lab to solve this for them. They are not betting that better prompting will patch infrastructure fragility. They are building the control plane.

The organizations still committed to stateless architectures — still trusting that manual retries and clever prompting can substitute for durable execution — are the ones most likely to contribute to the next wave of this data. Ghost Failures are a primary obstacle. The pattern is familiar: Early adopters diagnose the problem architecturally, migrate to durable runtimes, and escape the failure mode. Late movers inherit it. The Complexity Cliff is not theoretical. It is the wall that most current agentic architectures are already climbing toward.

The reckoning is runtime and economics, not reasoning.


Based on survey responses from 132 qualified enterprise respondents (100+ employees). Sample size is small; data should be treated as directional. Respondents include Directors, VPs, CIOs, CTOs, and Enterprise Architects across Technology, Financial Services, Retail, Healthcare, and other sectors.