Anthropic finally beat OpenAI in business AI adoption — but 3 big threats could erase its lead

For the first time since the AI race began, more American businesses are paying for Anthropic’s Claude than for OpenAI’s ChatGPT.

Adoption of Anthropic rose 3.8% in April to 34.4% of businesses, according to the May 2026 release of the Ramp AI Index. OpenAI’s adoption fell 2.9% to 32.3%. Overall AI adoption among businesses rose 0.2 percentage points to 50.6%.

The crossover — published Tuesday by Ramp, the corporate card and finance automation platform that tracks spending patterns across more than 50,000 U.S. businesses — marks the culmination of a yearlong surge by Anthropic that few in the industry predicted. Anthropic has quadrupled its business adoption over the past year, while OpenAI grew its business adoption by only 0.3%.

But the same report that crowns a new market leader also warns that Anthropic’s position may be more fragile than it appears — threatened by escalating costs, compute constraints, and the very token-based pricing model that has fueled the company’s extraordinary revenue growth.

How Anthropic went from a niche player to the most popular AI model in corporate America

To appreciate the scale of the shift, consider where the two companies stood a year ago. In April 2025, OpenAI commanded roughly 32% of business AI adoption according to Ramp’s underlying data, while Anthropic stood at under 8%. OpenAI had built an early, commanding lead as the consumer default — ChatGPT was where most people first encountered AI, and that momentum carried into corporate purchasing decisions.

Anthropic’s path was different. The company was popular early on with the earliest adopters — engineers, AI evangelists, the technical vanguard inside organizations. As Ramp lead economist Ara Kharazian noted in the March 2026 edition of the index, Anthropic leveraged that early-adopter base to go mainstream. By February, Anthropic was winning about 70% of head-to-head matchups against OpenAI among businesses purchasing AI services for the first time — a complete reversal of the trends observed in 2025.

The trajectory is visible in Ramp’s underlying data. The company’s adoption figures show Anthropic climbing from 0.03% of businesses in June 2023 to 7.94% by April 2025, then rocketing to 34.44% by April 2026.

OpenAI, meanwhile, peaked near 36.5% in mid-2025 and has been slowly declining since. The engine behind much of this growth is a single product: Claude Code, the company’s agentic AI coding tool, which has become the fastest-growing product in Anthropic’s history. A recent analysis estimated that 4% of all GitHub public commits worldwide were being authored by Claude Code — double the percentage from just one month prior.

Business Insider reported in April that the crossover was imminent. A Ramp spokesperson told the outlet that “at the current pace, Anthropic is on track to surpass OpenAI within the next two months,” noting that it already led “among early adopters, including VC-backed companies, and in key sectors like software, finance, and professional services.” That prediction proved accurate almost to the day.

AI adoption reaches a workplace tipping point, but the productivity revolution hasn’t arrived yet

The Ramp data on business spending finds its complement in a separate workforce survey that underscores just how deeply AI has embedded itself into American economic life. For the first time in Gallup’s measurement, half of employed American adults say they use AI in their role at least a few times a year, up from 46% the previous quarter. Frequent use is also increasing, with 13% of employees now saying they use AI daily and 28% reporting they use it a few times a week or more.

But the Gallup data, based on a February 2026 survey of 23,717 U.S. employees, also suggests that the benefits of AI remain concentrated at the level of individual tasks rather than organizational transformation. Only about one in 10 employees in AI-adopting organizations strongly agree that artificial intelligence has transformed how work gets done. That finding is consistent with firm-level studies across the U.S., U.K., Germany, and Australia showing chief executives reporting minimal broad productivity effects from AI over the past three years — a notable gap between the hype cycle and operational reality.

The Ramp methodology captures a different but complementary signal. Where Gallup asks employees whether they use AI, Ramp measures whether their employer is writing checks for it. The index counts corporate card and invoice-based payments, identifying firms as AI adopters if they have a positive transaction amount for an AI product or service in a given month. As Ramp’s methodology page notes, its results likely underestimate actual adoption because many employees use free AI tools or personal accounts for work tasks. Taken together, the two datasets paint a picture of AI that is ubiquitous in the American workplace but has not yet delivered on its promise to fundamentally transform how organizations operate.

Why Anthropic’s biggest threat might be the success of its own best-selling product

Perhaps the most striking aspect of Ramp’s analysis is its refusal to declare a lasting winner. Kharazian identified three specific risks facing Anthropic even as the company takes the lead — and the most serious one stems from a structural tension baked into the company’s business model.

Anthropic makes more money when businesses purchase more tokens, meaning the company is incentivized to drive users toward more expensive models even when cheaper ones are sufficient. This dynamic is already creating budget crises at major enterprises. Uber’s CTO revealed that the company spent its entire 2026 AI budget in just four months, largely on Claude Code and Cursor, with engineers reporting monthly API costs between $500 and $2,000 per person. Adoption jumped from 32% to 84% of Uber engineers in a matter of months, and about 70% of committed code at Uber now comes from AI. The Uber case is a microcosm of a broader tension: Claude Code works — perhaps too well. When a productivity tool becomes so valuable that an organization’s $3.4 billion R&D operation can’t afford to keep the lights on, the resulting cost scrutiny could push enterprises toward cheaper alternatives.

At the same time, quality and reliability have suffered under the weight of demand. In recent weeks, users have experienced frequent outages, rate limits, and increasing dissatisfaction with Claude’s results. Anthropic has responded by resetting usage limits and by striking a compute deal with SpaceX to access more than 300 megawatts of new capacity at the Colossus 1 data center in Memphis. CEO Dario Amodei said the company saw “80x growth per year in revenue and usage” for Q1 2026, when it had only planned for 10x. And Ramp economist Rafael Hajjar found that Anthropic’s latest model update would triple token costs for any prompt that includes an image — a change that seems at odds with the company’s already-acute cost and compute problems.

Open-source models and OpenAI’s Codex could quickly erode Anthropic’s narrow lead

The Ramp report points to competitive dynamics that could reshape the market within months. Some of the fastest-growing vendors on Ramp’s platform in April were AI inference platforms that give companies access to cheap, open-source models — offering enterprises a way to get “good enough” AI at a fraction of the cost, particularly for routine tasks that don’t require frontier model capabilities.

OpenAI’s Codex presents an even more direct threat. By most measures, it is a strong product that does many of the same tasks as Claude Code at a lower price point — and the switching cost between models is minimal. Uber itself is already testing Codex as a hedge, a move that could preview a broader pattern across enterprise tech. OpenAI also retains enormous structural advantages. ChatGPT reached 900 million weekly active users by March 2026, dwarfing Claude’s consumer footprint. Enterprise revenue now makes up more than 40% of OpenAI’s total and is on track to reach parity with consumer revenue by the end of 2026. And OpenAI’s $122 billion funding round, closed in March at an $852 billion valuation, gives it vast resources to compete on pricing, capacity, and product development.

Anthropic is not standing still on distribution. AWS recently launched Claude Platform on AWS, giving enterprises direct access to Anthropic’s native platform through existing AWS credentials, billing, and access controls — a move that lowers procurement friction considerably. Anthropic has also announced compute agreements totaling billions of dollars with Amazon, Google, Microsoft, Nvidia, and others, though much of that capacity won’t come online until late 2026 or 2027. Anthropic is reportedly in talks to raise another $50 billion at a valuation approaching $900 billion.

The unlikely reason businesses are choosing Claude over cheaper alternatives

Beneath the spending data and market share charts lies a more intriguing question: Why are businesses choosing Anthropic over a cheaper, comparably performing alternative?

Kharazian explored this in his March analysis. Claude Code and OpenAI’s Codex are roughly comparable products — on certain benchmarks, Codex is arguably better, and it’s also cheaper. Yet Anthropic can’t meet its own demand. Every plan still has usage limits and rate caps. The company is actively turning away revenue because it doesn’t have the compute to serve it. Despite charging more for roughly equivalent performance, Anthropic’s demand is growing.

Kharazian suggested the answer might be cultural. Earlier this year, Anthropic refused to agree to the Pentagon’s terms of use for Claude, resulting in a blacklisting by the Department of Defense. OpenAI stepped in to offer its services in Anthropic’s place. In the wake of that episode, users rallied around Anthropic, and Claude temporarily surpassed ChatGPT on the App Store. The question, Kharazian wrote, is whether choosing an AI model is becoming less like an enterprise procurement decision and “more like the green bubble/blue bubble distinction in iMessage: a signal of identity as much as a choice of technology.”

That observation may sound absurd for an enterprise software category. But Ramp’s data tells a story that pure economics cannot fully explain. In a market where the products perform similarly, where the cheaper option is arguably better on benchmarks, and where switching costs are negligible, something other than spreadsheet logic is driving the biggest shift in AI market share since the industry began. As Kharazian noted in his report: “We have never seen a software industry as dynamic, where newcomers can disrupt market leaders in a matter of months, and where the pace of development overrides the typical forces of vendor stickiness.”

That dynamism cuts both ways. The same forces that propelled a company from 8% to 34% market share in twelve months could just as easily work in reverse. Anthropic’s two-point lead was earned in the most volatile software market in modern history — and in this market, the distance between the throne and the floor has never been shorter.

Market research is too slow for the AI era, so Brox built 60,000 identical ‘digital twins’ of real people you can survey instantly, repeatedly

In a world where a viral TikTok video can cause a brand to trend globally in mere hours, the traditional market research cycle — often spanning 12 weeks — is becoming a liability.

The lag between a survey question and the answers from a wide (or targeted) pool of respondents has become a primary bottleneck for Fortune 500 decision-makers who are forced to navigate volatile geopolitical and economic shifts with data that is frequently outdated by the time it reaches a slide deck, as industry experts have observed.

Brox, a predictive human intelligence startup, recently announced a strategic funding round following a year where they reported 10X revenue growth. Their proposition is as ambitious as it is technical: the creation of a “parallel universe” populated by 60,000 digital twins of real, living human beings and their entire demographic profiles and consumer preferences, allowing enterprises to run unlimited experiments in hours rather than months.

“These digital twins are one-to-one replicas of actual, real individuals,” said Brox CEO Hamish Brocklebank in a recent video call interview with VentureBeat. “We recruit real people like a normal panel company does, pay them to interview them, and capture all the data around them — fully consent-driven.”

The company, currently a lean 14-person operation, is positioning itself as the antithesis of the “insane” research industry. By replacing statistical models with behavioral replicas, Brox aims to transform how the world’s largest banks and pharmaceutical giants anticipate human reactions to high-stakes global and market-shifting events, or narrow, targeted product releases and personnel news, and everything in between.

The kinds of surveys and specific questions that Brox asks its digital twins are completely open-ended and can be customized to fit any conceivable business customer’s use cases and goals.

According to Brocklebank, examples of survey questions include: “What happens if America invades Iran or Greenland? Will depositors at Bank of America put more money into their account or take more money out? Or, in pharmaceuticals, if RFK Jr. says something next week, will that make people more likely to take vaccines or less likely?”

Not synthetic people — AI copies of real ones

The core differentiator of Brox’s technology lies in the fidelity of its input data.

While many competitors in the “digital audience” space rely on purely synthetic identities — generic personas generated by Large Language Models (LLMs ) — Brocklebank argues that these methods inevitably produce “AI slop”.

Purely synthetic audiences often cluster around a tight distribution of answers, over-indexing for “correct” or “healthy” behaviors (such as eating broccoli) because of inherent biases in the underlying models.

Brox’s “Digital Twins” are instead one-to-one behavioral replicas of real individuals who have been recruited and interviewed with exhaustive depth. The process is intensive:

  • Deep Interviews: The company conducts hours of real and AI-driven interviews with each participant.

  • Psychological Depth: The data collection seeks to understand fundamental “decision drivers,” including upbringing, relationships, and even marital stability.

  • Data Density: For some twins, Brox maintains up to 300 pages of text data, representing what Brocklebank calls “the deepest per person data set that exists”.

To solve the “black box” problem common in AI, Brox utilizes a “reasoning chain” for its predictive outputs. When a digital twin predicts a reaction — such as how a $2 billion net-worth individual might respond to a specific interest rate hike — the model introspects and provides a step-by-step explanation for that decision.

This allows clients to understand not just what will happen, but the underlying psychology of why it is happening.

Scaling the “unscalable” interview

The product offering is currently live in the US, UK, Japan, and Turkey. Brox has successfully digitized specific, high-value cohorts that are traditionally difficult for researchers to access.

This includes a panel of “high-net-worth” individuals (those worth over $5 million) and specialized medical professionals like dermatologists — including a multibillionaire.

However, the largest value for customers is likely in the aggregate mass of all individuals that can be polled en masse and/or segmented across demographics, especially those of medium and lower income levels, whose purchasing power and decision-making is more constrained and whose market-

One of the more unique aspects of the Brox platform is its incentive structure. To ensure twins remain up-to-date, real-world counterparts are re-contacted frequently.

For high-value individuals who are not motivated by small cash payments, Brox has issued Stock Appreciation Rights (SARs), essentially making these participants “investors” in the company’s success to ensure they continue to provide high-fidelity personal updates. The platform’s use cases currently focus on two primary sectors:

  1. Pharmaceuticals: Predicting vaccine hesitancy or how physicians might react to new biologics based on shifting political climates.

  2. Finance: Simulating how depositors at major banks might move funds in response to geopolitical events, such as conflicts in the Middle East.

As for why go to the trouble of interviewing and digitally cloning real people instead of just creating wholly fictitious, synthetic audience characters and personas using LLMs and other AI models, Brocklebank offered his perspective.

“You can create 10,000 truly synthetic digital twins, but the answers will still normalize into a very tight distribution, which is not realistic when you’re actually asking real people,” Brocklebank said.

By maintaining a pre-built audience of 60,000 twins, the company enables clients to bypass the recruitment phase of research. A large US bank or a global pharma giant can now “query” the digital population and receive a validated analysis in a matter of hours.

Pricing and accessibility

Unlike traditional research firms that charge on a per-project or per-respondent basis, Brox operates as a high-end Software-as-a-Service (SaaS) platform with enterprise-level commercial licensing. The company avoids the “seat” or “usage” limits that often hinder rapid experimentation within large organizations.

  • Pricing Tiers: Subscriptions are sold as blanket flat fees, starting at a minimum of $100,000 per year.

  • Top-Tier Contracts: For larger deployments involving multiple teams and global data access, contracts scale up to $1.5 million per year.

  • Usage Rights: Clients are granted unlimited usage during the contract period. This allows them to run thousands of simulations without worrying about incremental costs, encouraging a culture of “testing everything” before deployment.

From a legal and privacy standpoint, the digital twins are built on a “fully consent-driven” framework. While the twins can be traced back to real human data for internal validation, the platform is designed to provide aggregated behavioral insights that protect the anonymity of the participants while maintaining the predictive power of their digital replicas.

Rejecting the rise of Kalshi, Polymarket and ‘prediction markets’

The tech industry has recently seen a surge in valuations and interest in “prediction markets” like PolyMarket and Kalshi, which allow users to bet on the outcomes of various global events.

However, the leadership at Brox maintains a distinct distance from these platforms, citing a “personal disdain” for betting markets from both a moral and intellectual perspective.

Brocklebank argues that while betting markets can predict outcomes (e.g., who wins an election), they offer zero utility for business decision-makers because they fail to provide the “why”.

Knowing there is a 60% chance of a certain candidate winning does not help a company adjust its consumer strategy; knowing why a specific cohort of depositors is feeling anxious does.

Investors including Scribble Ventures, Wonder Ventures, and Vela Partners have backed this “human-first” approach to AI, betting that the moat created by deep human data will prove more resilient than the commoditized models of synthetic data providers.

As Brox prepares for launches in the Middle East and APAC, the company is moving toward its ultimate goal: simulating the entire world as a “parallel universe” for risk-free decision-making.

OpenAI turns its sold-out GPT-5.5 party into a monthlong Codex giveaway for 8,000 developers

OpenAI on Monday began emailing more than 8,000 developers who applied for its invite-only GPT-5.5 party with a surprise consolation prize: a tenfold increase in Codex rate limits on their personal ChatGPT accounts, effective immediately and lasting through June 5.

“We had over 8,000 people express interest in just 24 hours, and while we wish our office was big enough to welcome everyone, we weren’t able to make space for every person who applied,” the company wrote in the email, which VentureBeat obtained. “As a small token of appreciation, we’ve 10x’ed your Codex rate limits until June 5th on your personal ChatGPT account.”

The gift is not limited to the lucky few who scored invitations to the party itself. Everyone who raised their hand — whether they were accepted, waitlisted, or turned away — received the rate limit boost, according to the email and confirmed by multiple recipients on social media.

CEO Sam Altman telegraphed the move on X shortly before inboxes started lighting up. “We are gonna do something nice for everyone who applied for the GPT-5.5 party and that we didn’t have space for,” he wrote. “Hope you enjoy!” The post amassed more than 521,000 views within hours.

What a month of supercharged Codex access actually means for developers

The practical implications are huge. Codex, OpenAI’s AI-powered coding agent, operates under daily usage caps that vary by subscription tier. A tenfold increase to those caps gives developers dramatically more room to prototype, debug, and ship code using GPT-5.5 — which OpenAI says matches GPT-5.4’s per-token latency while performing at a higher level of intelligence and using significantly fewer tokens to complete tasks.

The 31-day window is generous enough to reshape habits. By flooding thousands of developers with expanded access during a critical adoption period, OpenAI is effectively subsidizing the kind of deep, sustained usage that turns a curious trial into a daily dependency. It is a bet that once developers experience Codex at full throttle, they won’t want to go back — and that when the limits reset on June 5, a meaningful number will upgrade their subscriptions to preserve the workflow they’ve built.

The developer community responded with a mix of glee and regret. “I’m literally not taking my Codex hat off for the month,” one developer declared on X. Others kicked themselves for not signing up. “That’s the last time I don’t sign up just because I’m not in SF,” one wrote.

Several users raised a question OpenAI has yet to answer publicly: does the boost stack with the existing Pro $200 tier’s 20x multiplier? One user reported that OpenAI support said no — users get whichever limit is higher, not a combined total. “The key question isn’t whether the 10x boost is only for party applicants,” they wrote. “It’s whether it stacks with Pro.”

OpenAI did not immediately respond to a request for comment on whether the boost stacks with Pro-tier limits.

Inside the low-key meetup that an AI planned for itself

The rate limit gift is a sidecar to the main event: “GPT-5.5 on 5/5,” an invite-only gathering running tonight from 5:55 p.m. to 8:55 p.m. PDT at an undisclosed San Francisco venue. OpenAI billed the evening as “a low-key meetup with Sam and the team behind GPT-5.5,” promising food, drinks, community, giveaways, and swag — not a product announcement. Even the address remained secret until invitations were confirmed — a touch of exclusivity that generated its own buzz.

In a detail that doubles as a product demo, Altman revealed that GPT-5.5 itself planned the party. The model proposed the May 5 date, suggested that human developers give the toasts rather than the AI, and recommended setting up a suggestion box for the next-generation model. Altman described this as “weird emergent behavior.” Registrations closed shortly after opening due to overwhelming demand, with Codex handling the selection process.

Altman also extended an unlikely invitation. He publicly asked Elon Musk to attend, saying, “He can come if he wants… the world needs more love.” The gesture arrives amid Musk’s ongoing lawsuit against OpenAI seeking up to $150 billion in damages — a fact that makes the invitation read less like diplomacy and more like performance art.

Anthropic’s competing reception turns a scheduling overlap into a Silicon Valley spectacle

Here is where the story gets interesting. VentureBeat has confirmed that Anthropic is hosting its very own invite-only event in San Francisco on Tuesday evening — a “Media VIP Welcome Reception” at nearly identical times to OpenAI’s party. The reception serves as a warm-up for Anthropic’s Code with Claude developer conference, the company’s second annual gathering focused on its API, CLI tools, and Model Context Protocol (MCP). The conference proper takes place tomorrow.

The scheduling overlap is difficult to dismiss as coincidence. Both companies are hosting developer-focused events on the same evening, in the same city, targeting many of the same people. Whether this was deliberate counter-programming or genuine coincidence, the optics neatly capture where things stand in the industry’s most consequential rivalry.

Anthropic’s conference will feature its executive and product teams discussing Claude Code, agent implementation strategies, and the product roadmap — all squarely aimed at the same developer audience that just received a month of free Codex upgrades from OpenAI.

How Anthropic overtook OpenAI in revenue — and what it means for the coding wars

The dueling cocktail hours are a social manifestation of a far more consequential battle playing out in revenue, developer adoption, and investor confidence — one that has tilted sharply in Anthropic’s favor.

According to Counterpoint Research data, Anthropic surpassed OpenAI for the first time in global LLM revenue market share in Q1 2026, capturing 31.4% compared to OpenAI’s 29%. But the headline near-tie obscures a dramatic structural divergence. Counterpoint estimates Anthropic achieved that share with roughly 134 million monthly active users, compared to approximately 900 million for OpenAI — yielding average monthly revenue per active user of $16.20 for Anthropic versus $2.20 for OpenAI. OpenAI commands massive scale; Anthropic extracts roughly seven times more revenue per user. That gap is the central tension in this rivalry.

The enterprise shift has been building for over a year. Menlo Ventures — whose portfolio includes Anthropic — estimates the company now captures 40% of enterprise LLM spend, up from 24% the prior year and 12% in 2023, while OpenAI’s share fell to 27% from 50% over the same period. Anthropic has maintained an almost unparalleled 18 months atop the LLM leaderboards for coding, starting with Claude Sonnet 3.5 in June 2024. That dominance in code — AI’s first true killer app — has become the on-ramp to broader enterprise adoption and the engine behind Anthropic’s revenue acceleration.

The top-line numbers tell the rest of the story. Anthropic said earlier this month that its annualized revenue has topped $30 billion, up from $9 billion at the end of 2025, with more than 1,000 business customers now spending over $1 million annually — a figure the company says has more than doubled since February.

Sources familiar with Anthropic’s financials told TechCrunch the run rate is currently closer to $40 billion, driven largely by demand for Claude Code and Cowork. OpenAI, meanwhile, topped $25 billion in annualized revenue as of February, according to Reuters — but the Wall Street Journal reported that the company has recently missed its own projections for user growth and revenue, with CFO Sarah Friar warning colleagues that if growth doesn’t accelerate, the company could face difficulty funding future compute agreements.

The momentum has carried into fundraising at a pace that could redraw the industry’s power map. Anthropic raised $30 billion at a valuation of $380 billion in February. Bloomberg reported last week that the company has begun weighing a fresh funding round that would value it at more than $900 billion, potentially leapfrogging OpenAI as the world’s most valuable AI startup. OpenAI was valued at $852 billion in late March after closing a record-breaking $122 billion funding round. If Anthropic proceeds at the terms described, the company would not only more than double its valuation but would also surpass OpenAI — a reversal that seemed unthinkable six months ago.

Two parties, two visions, and one city at the center of the AI industry’s defining rivalry

For the 8,000-plus developers who applied for the GPT-5.5 party, the immediate value is straightforward: a full month of dramatically expanded Codex usage, free of charge, during a period when both companies are shipping at a breakneck pace. For the industry, the signal is harder to miss. The two most valuable private companies in the world are competing for developer loyalty with a combination of free perks, invite-only parties, celebrity CEO engagement, and multi-billion-dollar enterprise ventures — all within the same 24-hour window, in the same seven-square-mile city.

The broader stakes extend well beyond cocktail napkins and rate limits. Both companies are barreling toward potential IPOs. Both are courting the same Wall Street backers for enterprise joint ventures. Both are racing to define how the next generation of software gets built — and by whom. The developers caught between them are, for the moment, the beneficiaries of a spending war that shows no sign of cooling.

Tonight in San Francisco, the Anthropic reception starts at 5pm. The OpenAI party starts at 5:55pm. VentureBeat will be at both. And somewhere between the two venues, 8,000 developers who couldn’t get into either room will be burning through their new rate limits — building the future with whichever model they opened first.


Michael Nunez is an editor at VentureBeat covering artificial intelligence. He is attending both the Anthropic Code with Claude Media VIP Welcome Reception and the OpenAI GPT-5.5 launch party tonight in San Francisco.

This story is developing and will be updated.

The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next

The vector database category is undergoing a shift in response to the needs of agentic AI. 

The retrieval-augmented generation (RAG)-to-vector database pipeline doesn’t cut it anymore; agentic AI requires a different approach that incorporates context. VentureBeat’s Q1 2026 Pulse survey underscores this trend: Every standalone vector database is losing adoption share, while hybrid retrieval intent has tripled to 33.3%, the fastest-growing strategic position in the dataset.

Vector database pioneer Pinecone recognizes this and is pivoting to meet the specific needs of agentic AI.

The company today announced Nexus, which it positions as a knowledge engine rather than an improvement on retrieval. Nexus introduces a context compiler that converts raw enterprise data into persistent, task-specific knowledge artifacts before agents query them, and a composable retriever that serves those artifacts with field-level citations and deterministic conflict resolution.

Alongside Nexus, Pinecone is releasing KnowQL, a declarative query language that gives agents a vocabulary to specify output shape, confidence requirements, and latency budgets. In Pinecone’s own internal benchmark, one financial analysis task that previously consumed 2.8 million tokens was completed by Nexus with just 4,000. This represents a 98% reduction, although the company has not yet validated it in customer production deployments. Nexus is in early access starting today.

“RAG was built for human users,” Pinecone CEO Ash Ashutosh told VentureBeat. “Nexus was built for agentic users, because their language is very different. The responses they expect are very different. The task that an agent is assigned to do is very different from what a chatbot is supposed to do.”

Why RAG was never built for what agents actually do

RAG encompasses one query, one response, and a person in the loop to interpret the result. But agents work differently. They are assigned tasks, not questions — and completing these requires assembling context from multiple sources, resolving conflicts, tracking what has already been retrieved, and deciding what to query next.

The distinction matters. A RAG pipeline retrieves documents and hands them to a model at inference time. Each agent session starts cold, with no compiled understanding of the enterprise data estate — which tables relate to which, which sources are authoritative for which questions, and which formats an agent downstream will actually be able to consume. Every session re-discovers that from scratch.

“At the heart of all this stuff was a very simple problem,” Ashutosh said. “You’re asking agents — machines — to work on systems and data that was designed for humans.”

Pinecone estimates that 85% of agent compute effort goes to the re-discovery cycle rather than task completion. The downstream effects compound: unpredictable latency, runaway token costs, and non-deterministic results. Run the same task twice against the same data, and an agent may return different answers with no record of which sources drove either result. For enterprises where auditability is a compliance requirement, that is a structural disqualifier, not a tuning problem.

What Nexus is and how it works

Nexus moves reasoning work from inference time to compilation time. In a conventional RAG pipeline, the reasoning required to interpret, contextualize, and structure knowledge happens at the moment an agent queries — every session, every time, burning tokens on work that could have been done in advance. But Nexus reasons just once during a compilation stage that runs before any agent query, then stores the result as a reusable knowledge artifact. The agent receives structured, task-ready context rather than raw documents to interpret on the fly.

The architecture Pinecone is shipping has three distinct components, each addressing a different layer of the agent retrieval problem.

  1. Context compiler. Nexus takes raw source data and a task specification and builds specialized knowledge artifacts — structured, task-optimized representations that agents consume directly without interpretation overhead. The same underlying data estate produces different artifacts for different agents: a sales agent gets deal context synthesized from CRM and call records, a finance agent gets revenue context linking contracts to billing schedules. Artifacts are persistent and reused across agent sessions, not regenerated at inference time.

  2. Composable retriever. Compiled artifacts are served at query time with typed fields, per-field citations with confidence levels, and deterministic conflict resolution. Output is shaped to match the agent’s specified format rather than returned as raw text for the agent to re-parse.

  3. KnowQL. Pinecone describes this as the first declarative query language designed for agents rather than humans. Six primitives — intent, filter, provenance, output shape, confidence, and budget — allow agents to specify structured responses and source grounding and latency envelopes in a single interface. Ashutosh compared the structural gap that KnowQL fills to what SQL did for relational databases: Before a standard interface existed, every application built its own data access layer from scratch.

The relationship between Nexus and Pinecone’s underlying vector database is additive. The context compiler produces knowledge artifacts that are indexed and stored in the vector database; the compilation layer shapes and serves knowledge; the vector layer handles storage, retrieval speed, and scale.

 “The vectors are still stored and managed by the Pinecone vector database,” Ashutosh said.

What analysts make of the architectural claim

Moving reasoning upstream from inference to a compilation stage is not a novel concept — ontologies, data catalogs, and semantic layers have pursued versions of it for years. What has changed is the ability to do this at scale without dedicated engineering teams for every domain. That is the specific argument Nexus is making, and it is where analysts see the genuine advance.

Stephanie Walter, practice leader for AI stack at HyperFRAME Research, told VentureBeat that Nexus is directionally important because it shifts knowledge work from runtime chaos to pre-compiled structure. She stressed, however, that it is an evolution of RAG architecture, not a complete reinvention. 

“The real innovation isn’t the idea itself, but the productization of knowledge compilation as a first-class infrastructure layer,” Walter said. “If Pinecone can operationalize that reliably, it becomes meaningful infrastructure, not just another RAG tuning trick.”

The technical mechanism behind that claim is what Gartner distinguished VP analyst Arun Chandrasekaran called the meaningful architectural distinction.

“Unlike traditional RAG, which relies on pure semantic search at runtime, architectural compilation embeds structural logic into the metadata layer, which can boost time to response and provide better reasoning,” Chandrasekaran told VentureBeat. “This is an important leap from simple retrieval to enhanced reasoning, allowing agents to navigate enterprise schemas and acquire better memory for contextualization.”

The competitive landscape

Multiple vendors acknowledge that a vector database and traditional RAG are not enough for agentic AI.

Microsoft has extended its FabricIQ technology to provide semantic context for agentic AI. Google recently announced its Agentic Data Cloud as an approach to help solve the same issues. There are also standalone contextual memory technologies, like hindsight, that provide yet another option for users.

But analysts are less focused on the feature comparison than on what buyers should actually be evaluating.

“The agentic AI stack is fragmenting into dozens of features, but enterprise buyers shouldn’t chase features,” Walter said. “They should chase control: cost control, governance control, and security control.”

Most enterprise failures in agentic AI, she argued, will not be technical. They will be operational — tied to cost overruns, governance gaps, and security discipline.

The capability bar goes beyond retrieval speed.

“The true differentiator is deterministic grounding,” Chandrasekaran said, pointing to techniques like knowledge graphs that ensure agents understand structural relationships within enterprise data rather than returning surface-level matches. Interoperability is a related consideration: Standards like model context protocol (MCP) matter for connecting agents to legacy data sources without creating new dependencies.

What this means for enterprises

RAG and standalone vector databases were built for a different era. Agentic workloads are exposing the limits of both.

The retrieval cost problem is architectural

Teams running complex agentic workloads on conventional RAG pipelines are burning tokens at inference time on work that could be done in advance — interpreting, contextualizing, and structuring knowledge, every session, from scratch. That is a design problem. Tuning the retrieval layer will not fix it. The question for data engineering teams is whether their current stack is structurally capable of pre-compiling knowledge for specific agent tasks, or whether it was built for a human user who never needed that capability.

Governance is what separates a pilot from a production deployment

The capabilities that determine whether agentic AI gets approved for enterprise use are not performance metrics.

“The real enterprise value proposition isn’t just faster retrieval, but governed knowledge pipelines,” Walter said. “Those are the capabilities that turn agentic AI from an experiment into something finance and risk teams will actually approve.” 

The budget has shifted

VentureBeat’s Q1 Pulse data shows that retrieval optimization investment rose to 28.9% in March, overtaking evaluation spending for the first time in the quarter. Enterprises have finished measuring their retrieval problems. They are now spending to fix them. 

“The future of agentic AI won’t be decided by who has the longest context window,” Walter said. “It will be decided by who can operationalize trusted knowledge at scale without blowing up cost or governance.”

The retrieval rebuild: Why hybrid retrieval intent tripled as enterprise RAG programs hit the scale wall

Something shifted in enterprise RAG in Q1 2026. VB Pulse data spanning January through March tells a consistent story: the market stopped adding retrieval layers and started fixing the ones it already has. Call it the retrieval rebuild.

The survey covered three consecutive monthly waves from organizations with 100 or more employees, with between 45 and 58 qualified respondents per month across platform adoption, buyer intent, architecture outlook and evaluation criteria. The data should be treated as directional.

Enterprise intent to adopt hybrid retrieval tripled from 10.3% to 33.3% in a single quarter — even as 22% of qualified enterprise respondents reported having no production RAG systems at all. For data engineers and enterprise architects building agentic AI infrastructure, the data reveals a market in active transition: the RAG architecture most enterprises built to scale is not the one they expect to run by year-end. 

Hybrid retrieval has become the consensus enterprise strategy. Unlike single-method RAG pipelines that rely on vector similarity alone, hybrid retrieval combines dense embeddings with sparse keyword search and reranking layers, trading simplicity for the retrieval accuracy and access control that production agentic workloads require.

The standalone vector database category is under pressure. Weaviate, Milvus, Pinecone and Qdrant each lost adoption share across the quarter in the VB Pulse data. Custom stacks and provider-native retrieval are absorbing their displaced share.

A growing minority of enterprises are stepping back from RAG altogether — a signal that the market’s maturity narrative has meaningful exceptions.

Organizations that went wide on RAG in 2025 are hitting the same failure point: the architecture built for document retrieval does not hold at agentic scale.

Enterprises that scaled RAG fast are now paying to rebuild it

The two largest intent movements in Q1 are directly connected — enterprises confronting retrieval quality problems at scale, and hybrid retrieval emerging as the consensus answer.

Investment priorities shifted in parallel. Evaluation and relevance testing led budget intent in January at 32.8% and fell to 15.6% by March. Retrieval optimization moved in the opposite direction, from 19.0% to 28.9% — overtaking evaluation as the top growth investment area for the first time. 

Steven Dickens, vice president and practice lead at HyperFRAME Research, described the operational burden enterprise data teams are facing in a VentureBeat interview in March on Oracle’s agentic AI data stack. “Data teams are exhausted by fragmentation fatigue,” Dickens said. “Managing a separate vector store, graph database and relational system just to power one agent is a DevOps nightmare.”

That fatigue shows directly in the platform data. The custom stack rise to 35.6% is not a rejection of managed retrieval — many organizations run both. It is a consolidation response from engineering teams that have hit the limits of assembling too many components.

Not every enterprise has made it that far. The VB Pulse data includes a signal that complicates the market’s overall growth narrative: 22.2% of qualified respondents reported no production RAG by March, up from 8.6% in January.  The report attributes this cohort to organizations that have “not yet committed to any retrieval infrastructure, or have paused programs” — concentrated in Healthcare, Education and Government, the same sectors showing the highest rates of flat budgets.

Standalone vector databases are losing the adoption argument but winning the reliability one

Recent reporting by VentureBeat illustrates why the dedicated retrieval layer still matters in production. 

Two enterprises building on Qdrant show why purpose-built vector infrastructure still wins in production.

 &AI builds patent litigation infrastructure and runs semantic search across hundreds of millions of documents. Grounding every result in a real source document is not optional — patent attorneys will not act on AI-generated text. That requirement makes the architectural choice clear.

“The agent is the interface,” Herbie Turner, &AI’s founder and CTO, told VentureBeat in March. “The vector database is the ground truth.”

GlassDollar, a startup that helps Siemens and Mahle evaluate startups, runs an agentic retrieval pattern across a corpus approaching 10 million indexed documents. A single user prompt fans out into multiple parallel queries, each retrieving candidates from a different angle before results are combined and re-ranked. That query volume and precision requirement is what drove the choice of purpose-built vector infrastructure.

“We measure success by recall,” Kamen Kanev, GlassDollar’s head of product, told VentureBeat in March. “If the best companies aren’t in the results, nothing else matters. The user loses trust.”

The VB Pulse data shows that framing — retrieval as ground truth rather than feature — is gaining traction across the broader enterprise market, even as standalone vector database adoption declines. 

Why enterprises say they need a dedicated vector layer shifted significantly across Q1. In January the top reasons were access control complexity (20.7%) and retrieval precision (19.0%). By March, operational reliability at scale had surged to 31.1% — more than doubling and overtaking everything else. Enterprises are no longer keeping vector infrastructure primarily for precision. They are keeping it because it is the part of the stack they can rely on when query volumes scale.

How enterprises are redefining what good retrieval means

How enterprises judge their retrieval systems shifted notably across Q1 — and the direction of that shift points to a market getting more sophisticated about what good retrieval actually means.

In January, response correctness dominated evaluation criteria at 67.2% — far above anything else. By March, response correctness (53.3%), retrieval accuracy (53.3%) and answer relevance (53.3%) had converged exactly. Getting the right answer is no longer enough if it came from the wrong document or missed the context of the question.

Answer relevance was the only criterion that rose across the quarter, gaining five percentage points. It is also the hardest to measure — whether the retrieved context is actually the right context for that specific question requires purpose-built evaluation infrastructure, not just pass-or-fail correctness checks. Its rise signals that a meaningful share of enterprise buyers have moved past basic RAG testing entirely. 

The market’s verdict: RAG isn’t dead. The original architecture is

The “RAG is dead” narrative had real momentum heading into 2026. It rested on two claims. The first: that long-context windows — models capable of processing hundreds of thousands of tokens in a single prompt — would make dedicated retrieval unnecessary. The second: that agentic memory systems, which store what an agent learns across sessions rather than retrieving it fresh each time, would absorb the knowledge access problem entirely.

The VB Pulse data is the enterprise market’s answer to the first claim. The long-context-as-dominant-architecture position collapsed from 15.5% in January to 3.5% in February before partially recovering to 6.7% in March. January’s sample was heavily weighted toward Technology and Software respondents — the segment most exposed to long-context model announcements in late 2025. As the sample diversified, the position evaporated.

On the memory question, Jonathan Frankle, chief AI scientist at Databricks, framed the architecture clearly in a March interview with VentureBeat: a vector database with millions of entries sits at the base of the agentic memory stack, too large to fit in context. The LLM context window sits at the top. Between them, new caching and compression layers are emerging — but none of them replace the retrieval layer at the base. New agentic memory systems like Hindsight, developed by Vectorize, and observational memory approaches like those in the Mastra framework address session continuity and agent context over time — a different problem than high-recall search across millions of changing enterprise documents.

The most consequential signal: the share of respondents not expecting large-scale RAG deployments by year-end grew from 3.4% to 15.6% — nearly 5x. That is not a verdict against retrieval. It is a verdict against the retrieval architecture most enterprises built first.

The retrieval rebuild is not optional

The retrieval rebuild is the cost of scaling RAG without first deciding what architecture could actually support it.

If your organization is among the 43.1% that entered Q1 planning to expand RAG into more workflows, the VB Pulse data suggests that plan has already changed for many of your peers — and may need to change for you. Hybrid retrieval is the consensus destination. Custom stack growth to 35.6% reflects teams building retrieval infrastructure around requirements that off-the-shelf products do not fully address.

RAG is not dead. The architecture most enterprises used to implement it is. The data suggests the rebuild is not a future decision. For 33% of enterprises, the rebuild is already the stated priority.

Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems

For most data engineering teams, managing pipeline reliability often means waiting for an alert, manually tracing failures across distributed jobs and clusters, and fixing problems after they’ve already hit the business. Agentic AI needs the data to be there, clean and on time. A pipeline that fails silently or delivers stale data doesn’t just break a dashboard — it breaks the AI system depending on it.

That gap is what Definity, a Chicago-based data pipeline operations startup, is building into: embedding agents directly inside the Spark or DBT driver to act during a pipeline run, not after it. One enterprise customer identified 33% of its optimization opportunities in the first week of deployment and cut troubleshooting and optimization effort by 70%, according to Definity. The company also claims customers are resolving complex Spark issues up to 10x faster.

“You need three big things for agentic data operations: full stack context that is real time and production aware. Control of the pipeline. And the ability to validate in a feedback loop. Without that, you can be outside looking in and read only,” Roy Daniel, CEO and co-founder of Definity told VentureBeat in an exclusive interview.

The company on Wednesday announced that it has raised $12 million in Series A financing led by GreatPoint Ventures, with participation from Dynatrace and existing investors StageOne Ventures and Hyde Park Venture Partners.

Why existing pipeline monitoring breaks down at scale

Existing tools approach the problem from outside the execution layer — Datadog, which acquired data quality monitor Metaplane last year, Databricks system tables, and platforms like Unravel Data and Acceldata all read metrics after a job completes. Dynatrace has monitoring capabilities; it also participated in Definity’s Series A.

The Definity approach is differentiated from other options in the way the solution is architected. According to Daniel, that means by the time a platform monitoring tool surfaces a problem, the pipeline has already run — and the failure, the wasted compute or the bad data is already downstream.

“It’s always after the fact,” Daniel said. “By the time you know something happened, it already happened.”

How Definity’s in-execution agents work

The core architectural difference is where the agent sits — inside the pipeline rather than watching from outside it.

Inline instrumentation. The Definity system installs a JVM agent directly inside the pipeline execution layer via a single line of code, running below the platform layer and pulling execution data directly from Spark.

Execution context during the run. The agent captures query execution behavior, memory pressure, data skew, shuffle patterns and infrastructure utilization as the pipeline runs. It also infers lineage between pipelines and tables dynamically — no predefined data catalog is required.

Intervention, not just observation. The agent can modify resource allocation mid-run, stop a job before bad data propagates or preempt a pipeline based on upstream data conditions. Daniel described one production deployment where the agent detected that an upstream job had been preempted and the input table it was supposed to write was stale — and stopped the downstream pipeline before it started, before bad data reached any dependent system.

What is and isn’t real time. Detection and prevention are real time. Root cause analysis and optimization recommendations run on demand when an engineer queries the assistant, with full execution context already assembled.

Overhead and data residency. The agent adds approximately one second of compute on an hour-long run. Only metadata transmits externally; full on-premises deployment is available for environments where no metadata can leave the perimeter.

What in-execution intelligence looks like in a production environment

One early user of the Definity platform is Nexxen, an ad tech platform running large-scale Spark pipelines  for mission-critical advertising workloads, running on-premises.

Dennis Meyer, Director of Data Engineering at Nexxen, told VentureBeat that the core problem he was facing was not pipeline failures but the accumulating cost of inefficiency in an environment with no elastic cloud capacity to absorb waste.

“The main challenge wasn’t about pipelines breaking, but about managing an increasingly complex and large-scale environment,” Meyer said. “Because we operate on-prem, we don’t have the flexibility of instant elasticity, so inefficiencies have a direct cost impact.”

Existing monitoring tools gave Nexxen partial visibility but not enough to act on systematically. “We had existing monitoring tools in place, but needed full-stack visibility to understand workload behavior holistically and to systematically prioritize optimizations,” Meyer said.

Nexxen deployed Definity with no pipeline code changes. According to Meyer, the team identified 33% of its optimization opportunities within the first week, and engineering effort on troubleshooting and optimization dropped by 70%. The platform freed infrastructure capacity, allowing the team to support workload growth without additional hardware investment.

“The key shift was moving from reactive troubleshooting to proactive, continuous optimization,” Meyer said. “At scale, the biggest gap often isn’t tooling — it’s actionable visibility.”

What this means for enterprise data teams

For data engineering teams running production Spark environments, the shift from reactive monitoring to in-execution intelligence has architectural and organizational implications worth thinking through.

Pipeline ops is becoming an AI infrastructure problem. Data pipelines that previously supported analytics now carry AI workloads with direct business dependencies. Failures that were once an inconvenience are now blocking production AI delivery.

Troubleshooting time is a recoverable cost. According to Meyer, Nexxen cut engineering effort on troubleshooting and optimization by 70% after deploying Definity. For teams running lean, that time going back to the roadmap is the most direct near-term case for evaluating this category.

RAG precision tuning can quietly cut retrieval accuracy by 40%, putting agentic pipelines at risk

Enterprise teams that fine-tune their RAG embedding models for better precision may be unintentionally degrading the retrieval quality those pipelines depend on, according to new research from Redis.

The paper, “Training for Compositional Sensitivity Reduces Dense Retrieval Generalization,” tested what happens when teams train embedding models for compositional sensitivity. That is the ability to catch sentences that look nearly identical but mean something different — “the dog bit the man” versus “the man bit the dog,” or a negation flip that reverses a statement’s meaning entirely. That training consistently broke dense retrieval generalization, how well a model retrieves correctly across broad topics and domains it wasn’t specifically trained on. Performance dropped by 8 to 9 percent on smaller models and by 40 percent on a current mid-size embedding model teams are actively using in production.

The findings have direct implications for enterprise teams building agentic AI pipelines, where retrieval quality determines what context flows into an agent’s reasoning chain. A retrieval error in a single-stage pipeline returns a wrong answer. The same error in an agentic pipeline can trigger a cascade of wrong actions downstream.

Srijith Rajamohan, AI Research Leader at Redis and one of the paper’s authors, said the finding challenges a widespread assumption about how embedding-based retrieval actually works. 

“There’s this general notion that when you use semantic search or similar semantic similarity, we get correct intent. That’s not necessarily true,” Rajamohan told VentureBeat. “A close or high semantic similarity does not actually mean an exact intent.”

The geometry behind the retrieval tradeoff

Embedding models work by compressing an entire sentence into a single point in a high-dimensional space, then finding the closest points to a query at retrieval time. That works well for broad topical matching — documents about similar subjects end up near each other. The problem is that two sentences with nearly identical words but opposite meanings also end up near each other, because the model is working from word content rather than structure.

That is what the research quantified. When teams fine-tune an embedding model to push structurally different sentences apart — teaching it that a negation flip which reverses a statement’s meaning is not the same as the original — the model uses representational space it was previously using for broad topical recall. The two objectives compete for the same vector.

The research also found the regression is not uniform across failure types. Negation and spatial flip errors improved measurably with structured training. Binding errors — where a model confuses which modifier applies to which word, such as which party a contract obligation falls on — barely moved. For enterprise teams, that means the precision problem is harder to fix in exactly the cases where getting it wrong has the most consequences.

The reason most teams don’t catch it is that fine-tuning metrics measure the task being trained for, not what happens to general retrieval across unrelated topics. A model can show strong improvement on near-miss rejection during training while quietly regressing on the broader retrieval job it was hired to do. The regression only surfaces in production.

Rajamohan said the instinct most teams reach for — moving to a larger embedding model — does not address the underlying architecture.

“You can’t scale your way out of this,” he said. “It’s not a problem you can solve with more dimensions and more parameters.”

Why the standard alternatives all fall short

The natural instinct when retrieval precision fails is to layer on additional approaches. The research tested several of them and found each fails in a different way.

Hybrid search. Combining embedding-based retrieval with keyword search is already standard practice for closing precision gaps. But Rajamohan said keyword search cannot catch the failure mode this research identifies, because the problem is not missing words — it is misread structure.

“If you have a sentence like ‘Rome is closer than Paris’ and another that says ‘Paris is closer than Rome,’ and you do an embedding retrieval followed by a text search, you’re not going to be able to tell the difference,” he said. “The same words exist in both sentences.”

MaxSim reranking. Some teams add a second scoring layer that compares individual query words against individual document words rather than relying on the single compressed vector. This approach, known as MaxSim or late interaction and used in systems like ColBERT, did improve relevance benchmark scores in the research. But it completely failed to reject structural near-misses, assigning them near-identity similarity scores. 

The problem is that relevance and identity are different objectives. MaxSim is optimized for the former and blind to the latter. A team that adds MaxSim and sees benchmark improvement may be solving a different problem than the one they have.

Cross-encoders. These work by feeding the query and candidate document into the model simultaneously, letting it compare every word against every word before making a decision. That full comparison is what makes them accurate — and what makes them too expensive to run at production scale. Rajamohan said his team investigated them. They work in the lab and break under real query volumes.

Contextual memory. Also sometimes referred to as agentic memory, these systems are increasingly cited as the path beyond RAG, but Rajamohan said moving to that type of  architecture does not eliminate the structural retrieval problem. Those systems still depend on retrieval at query time, which means the same failure modes apply. The main difference is looser latency requirements, not a precision fix.

The two-stage fix the research validated

The common thread across every failed approach is the same: a single scoring mechanism trying to handle both recall and precision at once. The research validated a different architecture: stop trying to do both jobs with one vector, and assign each job to a dedicated stage.

Stage one: recall. The first stage works exactly as standard dense retrieval does today — the embedding model compresses documents into vectors and retrieves the closest matches to a query. Nothing changes here. The goal is to cast a wide net and bring back a set of strong candidates quickly. Speed and breadth are what matter at this stage, not perfect precision.

Stage two: precision. The second stage is where the fix lives. Rather than scoring candidates with a single similarity number, a small learned Transformer model examines the query and each candidate at the token level — comparing individual words against individual words to detect structural mismatches like negation flips or role reversals. This is the verification step the single-vector approach cannot perform.

The results. Under end-to-end training, the Transformer verifier outperformed every other approach the research tested on structural near-miss rejection. It was the only approach that reliably caught the failure modes the single-vector system missed.

The tradeoff. Adding a verification stage costs latency. The latency cost depends on how much verification a team runs. For precision-sensitive workloads like legal or accounting applications, full verification at every query is warranted. For general-purpose search, lighter verification may be sufficient. 

The research grew out of a real production problem. Enterprise customers running semantic caching systems were getting fast but semantically incorrect responses back — the retrieval system was treating similar-sounding queries as identical even when their meaning differed. The two-stage architecture is Redis’s proposed fix, with incorporation into its LangCache product on the roadmap but not yet available to customers.

What this means for enterprise teams

The research does not require enterprise teams to rebuild their retrieval pipelines from scratch. But it does ask them to pressure-test assumptions most teams have never examined — about what their embedding models are actually doing, which metrics are worth trusting and where the real precision gaps live in production.

Recognize the tradeoff before tuning around it. Rajamohan said the first practical step is understanding the regression exists. He evaluates any LLM-based retrieval system on three criteria: correctness, completeness and usefulness. Correctness failures cascade directly into the other two, which means a retrieval system that scores well on relevance benchmarks but fails on structural near-misses is producing a false sense of production readiness.

RAG is not obsolete — but know what it can’t do. Rajamohan pushed back firmly on claims that RAG has been superseded. “That’s a massive oversimplification,” he said. “RAG is a very simple pipeline that can be productionized by almost anyone with very little lift.” The research does not argue against RAG as an architecture. It argues against assuming a single-stage RAG pipeline with a fine-tuned embedding model is production-ready for precision-sensitive workloads.

The fix is real but not free. For teams that do need higher precision, Rajamohan said the two-stage architecture is not a prohibitive implementation lift, but adding a verification stage costs latency. “It’s a mitigation problem,” he said. “Not something we can actually solve.”

OpenAI launches Privacy Filter, an open source, on-device data sanitization model that removes personal information from enterprise datasets

In a significant shift toward local-first privacy infrastructure, OpenAI has released Privacy Filter, a specialized open-source model designed to detect and redact personally identifiable information (PII) before it ever reaches a cloud-based server.

Launched today on AI code sharing community Hugging Face under a permissive Apache 2.0 license, the tool addresses a growing industry bottleneck: the risk of sensitive data “leaking” into training sets or being exposed during high-throughput inference.

By providing a 1.5-billion-parameter model that can run on a standard laptop or directly in a web browser, the company is effectively handing developers a “privacy-by-design” toolkit that functions as a sophisticated, context-aware digital shredder.

Though OpenAI was founded with a focus on open source models such as this, the company shifted during the ChatGPT era to providing more proprietary (“closed source”) models available only through its website, apps, and API — only to return to open source in a big way last year with the launch of the gpt-oss family of language models.

In that light, and combined with OpenAI’s recent open sourcing of agentic orchestration tools and frameworks, it’s safe to say that the generative AI giant is clearly still heavily invested in fostering this less immediately lucrative part of the AI ecosystem.

Technology: a gpt-oss variant with bidirectional token classifier that reads from both directions

Architecturally, Privacy Filter is a derivative of OpenAI’s gpt-oss family, a series of open-weight reasoning models released earlier this year.

However, while standard large language models (LLMs) are typically autoregressive—predicting the next token in a sequence—Privacy Filter is a bidirectional token classifier.

This distinction is critical for accuracy. By looking at a sentence from both directions simultaneously, the model gains a deeper understanding of context that a forward-only model might miss.

For instance, it can better distinguish whether “Alice” refers to a private individual or a public literary character based on the words that follow the name, not just those that precede it.

The model utilizes a Sparse Mixture-of-Experts (MoE) framework. Although it contains 1.5 billion total parameters, only 50 million parameters are active during any single forward pass.

This sparse activation allows for high throughput without the massive computational overhead typically associated with LLMs. Furthermore, it features a massive 128,000-token context window, enabling it to process entire legal documents or long email threads in a single pass without the need for fragmenting text—a process that often causes traditional PII filters to lose track of entities across page breaks.

To ensure the redacted output remains coherent, OpenAI implemented a constrained Viterbi decoder. Rather than making an independent decision for every single word, the decoder evaluates the entire sequence to enforce logical transitions.

It uses a “BIOES” (Begin, Inside, Outside, End, Single) labeling scheme, which ensures that if the model identifies “John” as the start of a name, it is statistically inclined to label “Smith” as the continuation or end of that same name, rather than a separate entity.

On-device data sanitization

Privacy Filter is designed for high-throughput workflows where data residency is a non-negotiable requirement. It currently supports the detection of eight primary PII categories:

  • Private Names: Individual persons.

  • Contact Info: Physical addresses, email addresses, and phone numbers.

  • Digital Identifiers: URLs, account numbers, and dates.

  • Secrets: A specialized category for credentials, API keys, and passwords.

In practice, this allows enterprises to deploy the model on-premises or within their own private clouds. By masking data locally before sending it to a more powerful reasoning model (like GPT-5 or gpt-oss-120b), companies can maintain compliance with strict GDPR or HIPAA standards while still leveraging the latest AI capabilities.

Initial benchmarks are promising: the model reportedly hits a 96% F1 score on the PII-Masking-300k benchmark out of the box.

For developers, the model is available via Hugging Face, with native support for transformers.js, allowing it to run entirely within a user’s browser using WebGPU.

Fully open source, commercially viable Apache 2.0 license

Perhaps the most significant aspect of the announcement for the developer community is the Apache 2.0 license. Unlike “available-weight” licenses that often restrict commercial use or require “copyleft” sharing of derivative works, Apache 2.0 is one of the most permissive licenses in the software world.For startups and dev-tool makers, this means:

  1. Commercial Freedom: Companies can integrate Privacy Filter into their proprietary products and sell them without paying royalties to OpenAI.

  2. Customization: Teams can fine-tune the model on their specific datasets (such as medical jargon or proprietary log formats) to improve accuracy for niche industries.

  3. No Viral Obligations: Unlike the GPL license, builders do not have to open-source their entire codebase if they use Privacy Filter as a component.

By choosing this licensing path, OpenAI is positioning Privacy Filter as a standard utility for the AI era—essentially the “SSL for text”.

Community reactions

The tech community reacted quickly to the release, with many noting the impressive technical constraints OpenAI managed to hit.

Elie Bakouch (@eliebakouch), a research engineer at agentic model training platform startup Prime Intellect, praised the efficiency of Privacy Filter’s architecture on X:

“Very nice release by @OpenAI! A 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion scale data cheaply. keeping 128k context with such a small model is quite impressive too”.

The sentiment reflects a broader industry trend toward “small but mighty” models. While the world has focused on massive, 100-trillion parameter giants, the practical reality of enterprise AI often requires small, fast models that can perform one task—like privacy filtering—exceptionally well and at a low cost.

However, OpenAI included a “High-Risk Deployment Caution” in its documentation. The company warned that the tool should be viewed as a “redaction aid” rather than a “safety guarantee,” noting that over-reliance on a single model could lead to “missed spans” in highly sensitive medical or legal workflows.

OpenAI’s Privacy Filter is clearly an effort by the company to make the AI pipeline fundamentally safer.

By combining the efficiency of a Mixture-of-Experts architecture with the openness of an Apache 2.0 license, OpenAI is providing a way for many enterprises to more easily, cheaply and safely redact PII data.

The modern data stack was built for humans asking questions. Google just rebuilt its for agents taking action.

Enterprise data stacks were built for humans running scheduled queries. As AI agents increasingly act autonomously on behalf of businesses around the clock, that architecture is breaking down — and vendors are racing to rebuild it. Google’s answer, announced at Cloud Next on Wednesday, is the Agentic Data Cloud.

The architecture has three pillars:

  • Knowledge Catalog. Automates semantic metadata curation, inferring business logic from query logs without manual data steward intervention

  • Cross-cloud lakehouse. Lets BigQuery query Iceberg tables on AWS S3 via private network with no egress fees

  • Data Agent Kit. Drops MCP tools into VS Code, Claude Code and Gemini CLI so data engineers describe outcomes rather than write pipelines

“The data architecture has to change now,” Andi Gutmans, VP and GM of Data Cloud at Google Cloud, told VentureBeat. “We’re moving from human scale to agent scale.”

From system of intelligence to system of action

The core premise behind Agentic Data Cloud is that enterprises are moving from human‑scale to agent‑scale operations.

Historically, data platforms have been optimized for reporting, dashboarding, and some forecasting — what Google characterizes as “reactive intelligence.” In that model, humans interpret data and decide what to do.

Now, with AI agents increasingly expected to take actions directly on behalf of the business, Gutmans argued that data platforms must evolve into systems of action.

“We need to make sure that all of enterprise data can be activated with AI, that includes both structured and unstructured data,” Gutmans said. “We need to make sure that there’s the right level of trust, which also means it’s not just about getting access to the data, but really understanding the data.”

The Knowledge Catalog is Google’s answer to that problem. It is an evolution of Dataplex, Google’s existing data governance product, with a materially different architecture underneath. Where traditional data catalogs required data stewards to manually label tables, define business terms and build glossaries, the Knowledge Catalog automates that process using agents.

The practical implication for data engineering teams is that the Knowledge Catalog scales to the full data estate, not just the curated subset that a small team of data stewards can maintain by hand. The catalog covers BigQuery, Spanner, AlloyDB and Cloud SQL natively, and federates with third-party catalogs including Collibra, Atlan and Datahub. Zero-copy federation extends semantic context from SaaS applications including SAP, Salesforce Data360, ServiceNow and Workday without requiring data movement.

Google’s lakehouse goes cross cloud

Google has had a data lakehouse called BigLake since 2022. Initially it was limited to just Google data, but in recent years has had some limited federation capabilities enabling enterprises to query data found in other locations.

Gutmans explained that the previous federation worked through query APIs, which limited the features and optimizations BigQuery could bring to bear on external data. The new approach is storage-based sharing via the open Apache Iceberg format. That means whether the data is in Amazon S3 or in Google Cloud , he argued it doesn’t make a difference.

“This truly means we can bring all the goodness and all the AI capabilities to those third-party data sets,” he said.

The practical result is that BigQuery can query Iceberg tables sitting on Amazon S3 via Google’s Cross-Cloud Interconnect, a dedicated private networking layer, with no egress fees and price-performance Google says is comparable to native AWS warehouses. All BigQuery AI functions run against that cross-cloud data without modification. Bidirectional federation in preview extends to Databricks Unity Catalog on S3, Snowflake Polaris and the AWS Glue Data Catalog using the open Iceberg REST Catalog standard.

From writing pipelines to describing outcomes

The Knowledge Catalog and cross-cloud lakehouse solve the data access and context problems. The third pillar addresses what happens when a data engineer actually sits down to build something with all of it.

The Data Agent Kit ships as a portable set of skills, MCP tools and IDE extensions that drop into VS Code, Claude Code, Gemini CLI and Codex. It does not introduce a new interface.

The architectural shift it enables is a move from what Gutmans called a “prescriptive copilot experience” to intent-driven engineering. Rather than writing a Spark pipeline to move data from source A to destination B, a data engineer describes the outcome — a cleaned dataset ready for model training, a transformation that enforces a governance rule — and the agent selects whether to use BigQuery, the Lightning Engine for Apache Spark or Spanner to execute it, then generates production-ready code.

“Customers are kind of sick of building their own pipelines,” Gutmans said. “They’re truly more in the review kind of mode, than they are in the writing the code mode.”

Where Google and its rivals diverge

The premise that agents require semantic context, not just data access, is shared across the market. 

Databricks has Unity Catalog, which provides governance and a semantic layer across its lakehouse. Snowflake has Cortex, its AI and semantic layer offering. Microsoft Fabric includes a semantic model layer built for business intelligence and, increasingly, agent grounding.

The dispute is not over whether semantics matter — everyone agrees they do. The dispute is over who builds and maintains them.

“Our goal is just to get all the semantics you can get,” he explained, noting that Google will federate with third-party semantic models rather than require customers to start over.

Google is also positioning openness as a differentiator, with bidirectional federation into Databricks Unity Catalog and Snowflake Polaris via the open Iceberg REST Catalog standard.

What this means for enterprises

Google’s argument — and one echoed across the data infrastructure market — is that enterprises are behind on three fronts:

Semantic context is becoming infrastructure. If your data catalog is still manually curated, it will not scale to agent workloads — and Gutmans argues that gap will only widen as agent query volumes increase.

Cross-cloud egress costs are a hidden tax on agentic AI. Storage-based federation via open Iceberg standards is emerging as the architectural answer across Google, Databricks and Snowflake. Enterprises locked into proprietary federation approaches should be stress-testing those costs at agent-scale query volumes.

Gutmans argues the pipeline-writing era is ending. Data engineers who move toward outcome-based orchestration now will have a significant head start.

Adobe’s new Firefly AI Assistant wants to run Photoshop, Premiere, Illustrator and more from one prompt

Adobe today launched its most ambitious AI offensive to date, unveiling the Firefly AI Assistant — a new agentic creative tool that can orchestrate complex, multi-step workflows across the company’s entire Creative Cloud suite from a single conversational interface — alongside a raft of new video, image, and collaboration features designed to position the company at the center of the rapidly evolving AI-powered content creation landscape.

The announcements, which also include a new Color Mode for Premiere Pro, the addition of Kling 3.0 video models to Firefly’s growing roster of third-party AI engines, and Frame.io Drive — a virtual filesystem that lets distributed teams work with cloud-stored media as though it lived on their local machines — represent Adobe’s clearest signal yet that it views agentic AI not as a feature upgrade but as a fundamental reshaping of how creative work gets done.

“We want creators to tell us the destination and let the Firefly assistant — with its deep understanding of all the Adobe professional tools and generative tools — bring the tools to you right in the conversation,” Alexandru Costin, Vice President of AI & Innovation at Adobe, told VentureBeat in an exclusive interview ahead of the launch.

The stakes could hardly be higher. Adobe is fighting to convince Wall Street, creative professionals, and a wave of well-funded AI-native competitors that its decades-old software empire can not only survive the generative AI revolution but lead it.

How Adobe turned a research prototype into a 100-tool creative agent

The centerpiece of today’s announcement is the Firefly AI Assistant, which Adobe describes as a fundamentally new way to interact with its creative tools. Rather than requiring users to manually navigate between Photoshop, Premiere, Illustrator, Lightroom, Express, and other apps — selecting the right tool for each step of a complex project — the assistant lets creators describe an outcome in natural language. The agent then figures out which tools to invoke, in what order, and executes the workflow.

The assistant is the productized version of Project Moonlight, a research prototype Adobe first previewed at its annual MAX conference in the fall of 2025 and subsequently refined through a private beta. “This is basically [Project] Moonlight,” Costin confirmed to VentureBeat. “We started with all the learnings from Moonlight, and we engaged with customers. We looked internally. We evolved that architecture to make it more ambitious.”

Under the hood, Adobe says it has assembled roughly 100 tools and skills that the assistant can call upon, spanning generative image and video creation, precision photo editing, layout adaptation, and even stakeholder review through Frame.io. The system is built around a single conversational interface inside the Firefly web app where users describe what they want and the assistant maintains context across sessions. Pre-built Creative Skills — purpose-built, multi-step workflow templates such as portrait retouching or social media asset generation — can be run from a single prompt and customized to match a creator’s own style. The assistant also learns a creator’s preferred tools, workflows, and aesthetic choices over time, and understands the content type being worked on — image, video, vector, brand assets — to make context-aware decisions.

Crucially, outputs use native Adobe file formats — PSD, AI, PRPROJ — meaning users can take any result into the corresponding flagship app for manual, pixel-level refinement at any point. “We always imagine this continuum where you can have complete conversational edits and pixel-perfect edits, and you can decide, as a creative, where you want to land,” Costin said. The Firefly AI Assistant will enter public beta in the coming weeks, though Adobe did not specify an exact date.

Why Wall Street is watching Adobe’s AI pricing model so closely

For a company whose AI monetization story has faced persistent skepticism from investors, the pricing structure of the Firefly AI Assistant will be closely watched. Costin told VentureBeat that, at launch, using the assistant will require an active Adobe subscription that includes the relevant apps — meaning users who want the agent to invoke Photoshop cloud capabilities, for instance, will need an entitlement that includes the Photoshop SKU. Generative actions will consume the user’s existing pool of generative credits, consistent with how Firefly credits work across the rest of Adobe’s platform.

“To use some of these cloud capabilities from Photoshop and other apps, you need to have a subscription that includes access to the Photoshop SKU,” Costin explained. “You’ll be consuming your credits when you use generative features.” He acknowledged, however, that the model could evolve: “As we better understand the value of this — and the costs of operating the brain, the conversation engine — things might change.”

The question of whether Adobe can convert AI enthusiasm into meaningful revenue growth is anything but theoretical. When Adobe reported its most recent quarterly results in March, it touted 10% year-over-year revenue growth to $6.4 billion and disclosed that annual recurring revenue from AI standalone and add-on products had reached $125 million — a figure CEO Shantanu Narayen projected would double within nine months.

Adobe adds Chinese AI video models to Firefly, raising commercial safety questions

Alongside the assistant, Adobe is expanding Firefly’s roster of third-party AI models to include Kling 3.0 and Kling 3.0 Omni, two video generation models developed by Kuaishou, the Chinese technology company. Kling 3.0 focuses on fast, high-quality production with smart storyboarding and audio-visual sync, while the Omni variant adds professional controls for shot duration, camera angle, and character movement across multi-shot sequences. The additions bring Firefly’s model count to more than 30, joining Google’s Nano Banana 2 and Veo 3.1, Runway’s Gen-4.5, Luma AI’s Ray3.14, Black Forest Labs’ FLUX.2[pro], ElevenLabs’ Multilingual v2, and others.

When asked whether Adobe had concerns about integrating a model from a Chinese tech company given the current geopolitical climate, Costin was direct: “We think choice is what we want to offer our customers.” He explained that Adobe’s strategy distinguishes between its own commercially safe, first-party Firefly models — trained on licensed Adobe Stock imagery and public domain content — and third-party partner models, which carry different commercial safety profiles. “For some use cases, like ideation, non-production use cases, we got requests from customers to support some external models,” Costin said. “If I’m in ideation, I might be more flexible with commercial safety. When I go into production, I’d want to have a model that gives you more confidence.”

This raises an important nuance for the agentic era. When the Firefly AI Assistant autonomously selects which model to use for a given task, the commercial safety guarantees may vary depending on which engine it invokes. Costin pointed to Adobe’s Content Credentials system — the metadata-and-fingerprinting framework developed through the Content Authenticity Initiative — as the mechanism for maintaining transparency. “The agentic power — and the fact that the assistant has access to all of those models — means it could decide to use a model that carries different content credentials,” he acknowledged. “But with the transparency of content credentials, the user will know how a particular piece of content was created and can decide whether that’s commercially safe or not.” Adobe offers commercial indemnity for its first-party Firefly models but applies different indemnity levels for third-party models — a distinction that enterprise buyers, in particular, will need to carefully evaluate.

Inside Adobe’s active collaboration with Nvidia on long-running AI agent infrastructure

Adobe’s agentic ambitions also intersect with its strategic partnership with Nvidia, announced earlier this year at Nvidia’s GTC conference. When asked whether the Firefly AI Assistant’s agentic capabilities are built on NVIDIA’s agent toolkit and NeMo infrastructure, Costin revealed that the collaboration is active but has not yet made it into a shipping product.

“We’re in active discussions — investigating not only Nemotron,” Costin said. “They have this technology called Open Shell and Nemo Claw, which give us the ability to efficiently run long-running agentic workflows in a sandboxed environment.” He said the technology would become increasingly important as Adobe pushes the assistant to handle longer, more autonomous creative tasks — but cautioned that “it’s not shipping yet. It’s being actively explored.”

For Nvidia, which is building an ecosystem of enterprise AI agent platforms with partners like Adobe, Salesforce, and SAP, the partnership could eventually serve as a high-profile proof point for its agent infrastructure stack in the creative vertical. For Adobe, the ability to run complex, long-duration agentic workflows efficiently and securely in sandboxed environments could be the technical foundation that separates the Firefly AI Assistant from lighter-weight chatbot integrations offered by competitors. The partnership also signals Adobe’s recognition that the computational demands of agentic AI — where a single user request may trigger dozens of model calls and tool invocations — require infrastructure partnerships that go well beyond what a software company can build alone.

Premiere Pro’s new color grading mode and the tools Adobe is shipping today

Beyond the headline AI assistant announcement, Adobe’s broader set of updates reflects a company trying to strengthen its position across every phase of the content creation pipeline. Color Mode in Premiere Pro may be the most significant near-term upgrade for working editors. Entering public beta today, Color Mode is described as a first-of-its-kind color grading experience built specifically for the way editors — rather than dedicated colorists — think and work. Adobe notes that it was developed through an extensive private beta with hundreds of working editors, and that participants reported they “actually enjoy color grading” — a sentiment suggesting Adobe may have found a way to democratize one of post-production’s most intimidating disciplines. General availability is expected later in 2026.

The Firefly Video Editor gains audio upgrades including the Enhance Speech feature migrated from Premiere and Adobe Podcast, direct Adobe Stock integration with access to more than 800 million licensed assets, and simple color adjustment controls with intuitive sliders and one-click looks. On the image editing front, Adobe introduced Precision Flow, which generates a range of semantic variations from a single prompt and lets users browse them via an interactive slider — a novel approach that Costin described as “the best slider-based control mixed with the best semantic understanding of not only the existing scene, but what the scene could be.” AI Markup complements this by letting users draw directly on images to specify where and how edits should be applied. After Effects 26.2 adds an AI-powered Object Matte tool that dramatically accelerates rotoscoping and masking — create accurate mattes of moving subjects with a hover and click, refine with a Quick Selection brush, and perfect edges with a Refine Edge tool.

Frame.io Drive wants to kill the shipped hard drive and make cloud media feel local

Rounding out the announcements, Frame.io Drive addresses one of the most persistent pain points in distributed video production: getting media from point A to point B without losing hours — or days — to downloads, syncing, and shipped hard drives. Frame.io Drive is a desktop application that mounts Frame.io projects to a user’s computer so media appears in Finder or Explorer and behaves like local files. The underlying technology, called Frame.io Mounted Storage, streams media on demand as applications request it, while local caching ensures smooth playback. The product builds on streaming technology provided by Suite Studios, and the real-time file access capability is included with every Frame.io account. Adobe emphasized that all content lives solely within Frame.io and is never shared with third parties.

The move positions Frame.io not just as a review-and-approval tool at the end of the production pipeline but as the central media layer from the very beginning of a project — from first capture through final delivery. If successful, the strategy could significantly deepen Adobe’s lock-in with professional video teams by making Frame.io the single source of truth for distributed productions. Frame.io Drive and Mounted Storage will roll out in phases, with Enterprise customers gaining access starting today and accounts on other plans following shortly. Others can join a waitlist.

Adobe’s biggest challenge isn’t building the AI — it’s convincing creators to trust it

Taken together, today’s announcements paint a picture of a company executing aggressively across multiple fronts — but also one that is navigating a complex moment. Adobe first introduced Firefly in March 2023 as a family of generative AI models focused on image and text effects, with a strong emphasis on commercial safety through training on licensed Adobe Stock content. In the two years since, the company has rapidly expanded into video generation, multi-model access, and now agentic workflows — a trajectory that mirrors the broader industry’s shift from standalone AI features to AI-native systems.

But the competitive field has grown dramatically. Runway, Pika, and a host of AI-native video generation startups have captured mindshare among creators. Canva has aggressively integrated AI into its design platform. And the emergence of powerful foundation models from OpenAI, Google, and Anthropic — the latter of which Adobe says it will integrate with Firefly AI Assistant capabilities — means the barrier to building creative AI tools has never been lower. Adobe is also navigating these product ambitions against a complex corporate backdrop: the impending departure of CEO Shantanu Narayen, an actively exploited zero-day vulnerability in Acrobat Reader (CVE-2026-34621) that had been used by hackers for months before being patched this week, a U.K. antitrust investigation over cancellation fees, and a recent $75 million lawsuit settlement.

Adobe’s response, articulated clearly through today’s launches, is to lean into what it believes is its deepest moat: the integration of AI into a set of professional-grade, category-leading applications that no startup can replicate overnight. Costin framed the agentic transition as empowering rather than threatening to creative professionals, comparing Creative Skills to a next-generation version of Photoshop Actions — the macro-recording feature that has long allowed power users to automate repetitive tasks. “We want to help our customers become — from the ones doing all the work — to be creative directors, doing some of the work, but most importantly, guiding the assistant in executing some of those creative visions,” he said.

It is a compelling pitch — and, in its own way, a revealing one. For three decades, Adobe made its fortune by selling the tools that turned creative vision into finished pixels. Now it is asking its customers to let an AI agent handle more of that translation, trusting that the human role will shift from operating the tools to directing the outcome. Whether creators embrace that bargain — and whether Wall Street rewards it — will determine not just Adobe’s trajectory but the shape of an entire industry learning to create alongside machines.