Netomi, the San Francisco-based startup building AI systems for enterprise customer service, said Thursday that it has raised $110 million in new funding in a round led by Accenture Ventures, with participation from Adobe Ventures, WndrCo, Silver Lake Waterman, NAVER Ventures, Metis Strategy and Fin Capital. Jeffrey Katzenberg, managing partner of WndrCo and co-founder of DreamWorks, has joined the company’s board. The round builds on early backing from a roster of AI luminaries that includes OpenAI co-founder Greg Brockman, Google DeepMind co-founder Demis Hassabis and Microsoft AI CEO Mustafa Suleyman.
On its face, the financing is another large AI round in a market still awash in capital. But the deal is more revealing than that. It suggests that a new line is being drawn inside enterprise AI — not between companies that have a chatbot and companies that do not, but between companies that can show AI works in the messy, brittle, heavily governed environments where large businesses actually operate, and those that still mostly shine in demos.
The market around Netomi makes the stakes clear. Sierra, the AI agent startup led by former Salesforce co-CEO Bret Taylor, raised $350 million at a $10 billion valuation in September 2025 and has since made three acquisitions in 2026 alone. Decagon tripled its valuation to $4.5 billion in January 2026 with a $250 million Series D. Salesforce, ServiceNow and Intercom are all racing to embed AI agents into their existing platforms; Intercom’s Fin AI agent reportedly crossed $100 million in annual recurring revenue at $0.99 per resolution. Gartner predicts that 40 percent of enterprise applications will include task-specific AI agents by the end of 2026, up from less than 5 percent in 2025.
Against that backdrop, Netomi’s $110 million round is not the largest in the category, but it may be the most strategically constructed. The combination of Accenture’s enterprise consulting network, Adobe’s dominance in digital experience management and Netomi’s track record in production deployments represents a coordinated play to embed AI not as a chatbot layer on top of websites, but as the fundamental intelligence governing how entire digital experiences behave.
The company did not disclose its valuation, and in an interview tied to the announcement, Netomi executives declined to provide revenue or profitability figures. Instead, Chief Executive Puneet Mehta pointed to customer economics, saying a typical large deployment can generate at least tens of millions of dollars in impact, with some customers on a path to hundreds of millions.
For technical decision-makers, though, the more important part of Thursday’s news may be the partnerships attached to the money.
The structure of the deal reads like a map of how enterprise AI gets bought in 2026.
Alongside the investment, Accenture has entered a global alliance with Netomi to bring the platform to its Fortune 100 client base worldwide. The alliance will involve hundreds of Accenture team members receiving training on Netomi’s platform — a meaningful commitment from the world’s largest consulting firm and a distribution channel that few AI startups can match. Adobe Ventures’ participation comes with plans to integrate Netomi into Adobe’s Brand Concierge agentic ecosystem, giving Netomi a path into the software layer many large brands already use to manage websites, content and digital journeys. Metis Strategy brings access to CIO advisory channels. Ndidi Oteh, CEO of Accenture Song, said in the press release that the partnership is designed to help clients “reinvent how they serve their customers — seamlessly, responsibly and at scale.”
The result is not just more cash. It is a distribution network wrapped around a thesis.
Justin Wexler, a partner at WndrCo who led the firm’s Series B investment in Netomi in 2021, said most companies in the customer experience space are simply swapping a human for an AI. “That’s the extent of what they’re building,” Wexler said. “What we’re doing at Netomi, particularly with the Adobe partnership, is leapfrogging that altogether — merging the two layers. You don’t have a ‘How can I help you?’ chatbot. This is anticipating the issue and eliminating the ticket altogether.”
The distinction matters because it describes a fundamentally different kind of product. Most customer service AI still sits downstream. A customer encounters a problem, opens a chat window, explains the issue and waits for a response. Even when AI speeds up that exchange, the friction has already happened. Netomi wants to move upstream, into the experience before the ticket exists.
Mehta described the idea in blunt economic terms. “Why are there so many customer service tickets? Why is $500 billion spent on human labor answering customer service phone calls, emails and chats?” he asked. “What we realized is that the world’s largest companies wait for a problem to happen and then jump on it to solve it — but by that time, they’ve already created a lot of frustration, and it’s very expensive to do that.”
The answer, in Mehta’s view, is not to make downstream customer service faster with AI. It is to prevent the service ticket from being created in the first place. That logic sits behind almost every strategic decision the company has made — including the Adobe partnership.
“Most important websites run on Adobe Experience Manager,” Mehta said. “So we’re saying, what if we bring that kind of context and awareness upstream — capturing that a customer might be affected before it even turns into a customer service ticket.”
To understand what Netomi is building, you have to understand where its founder came from.
Mehta, who spent his early career constructing automated trading engines on Wall Street, told VentureBeat that the founding thesis was deceptively simple. “When we started Netomi, the core thesis was that AI is going to become the new customer interface,” he said. “The Transformers [paper] did not exist, so we had literally stitched together a set of different models to create the same end result.”
That background in low-latency finance is not incidental. It is the intellectual architecture that undergirds everything Netomi builds. When asked what connects trading systems to customer experience platforms, Mehta drew a direct line.
“If you think about the low-latency trading world, that was the first technology application to use situational awareness and a variety of different signals at scale,” he said. “There was not one signal that it was making decisions on. You needed market data feeds. You needed situational awareness. You needed news. You needed awareness of your own book of business. You needed your own risk assessment.”
That multi-signal architecture, Mehta argued, translates directly to what enterprise customer experience demands. Rather than waiting passively for a customer to describe a problem — the way traditional chatbots and even most current AI agents operate — Netomi’s system attempts to reconstruct the full situation before it acts. The request itself is only part of the story.
“What the customer tells you is very important, but the situation the customer is in is sometimes even more important,” Mehta said. “What if we borrowed that design pattern we built for low-latency trading? Because we can probably know why the customer is calling us. And if we can know that, we could maybe even reach out to them before they reach out to us and solve the problem.”
He summarized the philosophical distinction this way: “What large language models by themselves did was they essentially democratized just raw intelligence. We are democratizing context, and that changes everything.”
That is a sharp line, and also a revealing one. Netomi is effectively betting that the defensible layer in enterprise AI will not be the foundation model alone. It will be the orchestration layer that turns general model capability into governed, auditable, domain-specific action.
That governed approach extends to how the platform handles risk. Netomi uses what it calls an AI authority matrix — a real-time system that defines what the AI can do autonomously and when it must escalate to a human. “It’s a little bit like autonomous driving,” Mehta said. The AI knows when it’s approaching a boundary and pulls a human in. For regulated industries, specific endpoints can be locked to deterministic, rules-based flows while the agentic layer handles broader orchestration — and all of it is version-controlled and traceable, with metadata saved for seven years.
The most technically ambitious element of Netomi’s vision — and the one that most sharply distinguishes it from competitors — is what the company calls AI-embedded customer experience orchestration. Rather than placing a chatbot in the corner of a website, Netomi’s system can rearrange the website itself based on what the AI infers about each individual customer’s situation.
Wexler demonstrated a live example during the interview. “As we see most deployments, companies that want to deploy AI on their websites, they throw a chatbot on the corner,” he said. “If you embed agentic capabilities into the digital layer itself — and again, Adobe Experience Manager is the leading digital layer of enterprise — then you could do really unique things.”
Wexler described what this looks like in practice. In a typical deployment, he said, the AI doesn’t just answer questions — it reshapes the page. Based on a customer’s browsing behavior, purchase history and inferred intent, the system can reorganize a product page in real time: surfacing warnings one customer needs but another doesn’t, prompting a sample order at the moment of hesitation, or flagging a compatibility issue before checkout. Two customers looking at the same product might see fundamentally different pages — not because a marketing team built two versions, but because the AI is composing the experience on the fly.
“The AI is playing the role of arranging the elements of the website to cater to me and my needs,” Wexler said. “It’s anticipating my needs.”
The implication is a shift from static web pages to something closer to generative websites — pages that reconstruct themselves around each visitor the way a good salesperson adjusts a pitch mid-conversation. It is a fundamentally different model from bolting a chat widget onto a page that otherwise looks the same for everyone.
“The AI is playing the role of arranging the elements of the website to cater to me and my needs,” Wexler said. “It’s anticipating my needs.”
That vision already extends beyond screens. Mehta revealed that Coach, the handbag company owned by Tapestry, deployed Netomi’s platform in a physical flagship store during the holiday season to help customers navigate the retail space and is now rolling it out chainwide.
The numbers Netomi is putting behind its production claims are equally ambitious. At DraftKings, the company said its platform can handle traffic surging to more than 40,000 concurrent customer requests per second during major sporting events, while delivering sub-three-second response times and 98 percent intent classification accuracy. At Paramount, the company said it deployed across chat and voice in two weeks and then scaled through a weekend that included a major UFC event and the AFC Championship.
Those are company-reported numbers, and they are hard to benchmark against competitors because the industry lacks standard public reporting. But they illustrate the kind of problem Netomi wants buyers to think about. At that scale, an AI support product stops looking like a smarter FAQ bot and starts looking like a distributed systems challenge. You are not just asking whether a model can answer a question. You are asking whether an entire system can make decisions quickly, safely and consistently while traffic spikes and business rules collide.
Whether Netomi can deliver on the full scope of its ambition — transforming from an AI customer service platform into an ambient intelligence layer that reshapes digital and physical experiences in real time — remains an open question. The company faces competitors with far larger war chests, deeper platform footprints and, in Sierra’s case, a founder-level relationship with OpenAI.
But Netomi’s bet is fundamentally different from what much of the field is building. While Sierra and Decagon race to replace human agents with AI concierges, measuring success in conversations handled, Netomi is wagering that the highest form of customer service is the interaction that never needs to happen at all.
“There are new startups trying to convince enterprises that if every customer gets a ‘concierge,’ if there’s ‘an agent for every moment,’ then loyalty follows,” Mehta said. “But most relationships with brands are functional. Customers don’t want a conversational relationship with their airline or their bank. They want things to work — seamlessly, invisibly, without friction.”
In his closing comments during the interview, Mehta warned that many companies still underestimate the operational risk of deploying immature AI into sensitive customer environments. “What large companies adopting AI don’t fully realize yet is what kind of risk are they taking by adopting those platforms that are not really field tested for this kind of scale and situations,” he said.
That may be the most important line in the whole announcement. Because beneath the funding round, beneath the partner logos and beneath the talk of agents and orchestration, the real question in enterprise AI remains old-fashioned: which systems can be trusted when the environment gets ugly?
“We have built this technology more like how automated trading got built, or how autonomous driving got built, compared to coming at this from just a customer service lens,” Mehta said.
It is a fitting frame for a company whose founder left Wall Street to fix customer service. On the trading floor, the best systems were never the ones that made the most trades. They were the ones that knew, with precision, when not to act — and the ones nobody noticed until something went wrong and they held. Netomi’s new investors are betting $110 million that the same principle applies when the person on the other end of the system is not a trader, but a customer who just wants their floor not to leak.
Presented by Nutanix
As enterprises move from AI experimentation into production deployment, the primary cost driver has shifted away from foundation model training and toward the infrastructure required to run thousands of concurrent inference workloads at scale, with agentic AI as the accelerant.
Where early enterprise AI projects involved a handful of large, scheduled training jobs, production agentic environments require continuous support for short-lived, unpredictable requests that consume GPU, networking, and storage resources in ways traditional infrastructure was never designed to handle. For enterprise technology leaders, that shift is turning infrastructure efficiency into a make-or-break factor in AI economics.
“Every employee with an AI assistant, every automated workflow, every agent pipeline needs models for inferencing and generates a lot of tokens,” says Anindo Sengupta, VP of products at Nutanix. “Those inferencing requests land on a GPU infrastructure, traverse specialized networks, and pull data from storage systems purpose built to support these AI workloads.”
Inference costs per token have dropped by roughly an order of magnitude over the past two years, driven by model efficiency improvements and competitive pressure among cloud providers. The expectation would be that enterprise AI is getting cheaper. Instead, total costs are rising, Sengupta says, pointing to what economists call the Jevons paradox: when a resource becomes cheaper to use, consumption tends to increase faster than the price drops.
So while the cost per token is going down by almost an order of 10 in the last couple of years, consumption has risen more than 100X. The result is that cost per token and GPU utilization are becoming primary operational metrics for enterprise IT, sitting alongside traditional measures like uptime and throughput.
“Cost per token is really about the total cost of ownership for serving inference models,” Sengupta says. “Utilization is about making sure that once you have GPU assets, you’re getting maximum return from them. These metrics will be critical for enterprise IT leaders.”
What makes this difficult is the number of variables involved. Token costs shift depending on which models an organization runs, where workloads execute, and how prompts are structured.
“There are too many variables in cost to manage intuitively,” Sengupta adds. “Optimizing it is an engineering problem, and one that requires continuous tuning.”
Production agentic AI introduces a workload profile that traditional enterprise infrastructure was not designed to handle. Classic data center deployments are built around predictable loads and long planning cycles. Agentic environments produce unpredictable, high-frequency bursts of short inference requests, place new demands on networking and storage, and change faster than most procurement cycles allow.
The infrastructure supporting agentic AI is also structurally different from CPU-based computing. GPU topology, high-speed interconnects, parallel storage systems for agent memory and KV cache, and networking architectures capable of handling DPU offloading all represent new capabilities that require new operational skills.
Siloed infrastructure compounds these challenges. When GPU resources, networking, and data access are managed independently, scheduling inefficiencies accumulate, utilization drops, and costs climb. Organizations running fragmented stacks tend to underutilize expensive GPU assets while simultaneously bottlenecking on storage and network throughput.
The response emerging among infrastructure vendors is a move toward tightly integrated, validated full-stack platforms designed specifically for production AI workloads. The premise is that end-to-end optimization across compute, networking, storage, and software layers produces better utilization and lower per-token costs than assembling best-of-breed components from separate vendors.
Nutanix’s Agentic AI solutionrepresents one approach to this problem. Built on the Nutanix AHV hypervisor, Nutanix Enterprise AI and Nutanix Kubernetes Platform, the solution is designed to manage both the traditional compute layer where agent orchestration runs and the accelerated compute layer where inference executes. The company has introduced NVIDIA topology-aware enhancements to AHV that automatically optimize how GPUs, CPUs, memory, and DPUs are allocated to virtual machines, and has offloaded the Nutanix Flow Virtual Networking to BlueField DPUs, to free GPU cycles and sustain throughput without compromising security.
The solution supports instant deployment of NVIDIA NIM microservices and open-source models including Nemotron, and integrates an AI gateway that governs access to frontier cloud LLMs from Anthropic, Google, OpenAI, and others. The gateway also implements model context protocol (MCP) to allow agents to connect to enterprise data with granular access controls. The solution runs on Cisco infrastructure, allowing organizations to deploy on infrastructure they already operate.
“By integrating everything from the AHV hypervisor and Flow Virtual Networking up to the Kubernetes platform, you remove the silos that slow down AI projects,” Sengupta explains.
One organizational tension that scales with agentic AI adoption is the relationship between platform teams managing shared infrastructure and the developers building and running agent applications on top of it. These groups have historically operated with different tooling, different priorities, and different time horizons, but Sengupta argues that the core dynamic hasn’t changed even as the technology has.
“Platform teams will continue to deliver a catalog of self-service AI capabilities that are also compliant to business needs, that they can serve to agentic AI builders,” Sengupta says. “Mature AI teams will do a great job not just in GPU utilization, but in creating an operating model that enables fast AI infrastructure delivery to meet the pace of innovation that developers want. That’s what is very critical to success.”
The organizations that are managing GPU utilization most effectively tend to be further along in their AI adoption journey, with more established operating models and clearer cost accountability. For organizations earlier in that journey, the infrastructure design and operating model decisions being made now will determine whether AI projects can move from pilot to production without cost or complexity becoming the limiting factor.
The emerging framework for enterprise AI infrastructure is the AI factory, a purpose-built environment for producing and running AI workloads at scale. The challenge is that most organizations will need to operate both traditional compute and accelerated compute simultaneously for years, requiring a common operating model that spans both technology paradigms without sacrificing agility.
With Nutanix, running on Cisco as part of the Cisco AI Pods, powered by Intel and optimized for the NVIDIA reference architecture, organizations get a production-ready, full-stack foundation by enabling AI factories to be securely and efficiently shared by thousands of agents, to achieve the lowest costs per token. The solution bridges the gap between the infrastructure and platform engineering teams who manage the hardware and the AI engineering and agentic AI developer teams who build and run agentic AI applications, making it truly affordable to run AI at a massive scale.
“The metrics that will determine whether an organization can sustain and scale its AI investment — cost per token, GPU utilization, scheduling efficiency — are infrastructure metrics,” Sengupta says. “Managing them well is increasingly a precondition for making AI viable, not just functional.”
Secure and scale your AI factory — explore the full-stack approach here.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
Amazon Web Services on Tuesday launched one of the most consequential enterprise AI plays in the company’s 20-year history, simultaneously bringing OpenAI’s most powerful models to its Bedrock platform, unveiling a new agentic developer framework, releasing a desktop AI productivity tool called Amazon Quick, and expanding its Amazon Connect service from a single contact-center product into a family of four agentic AI solutions targeting supply chains, hiring, healthcare, and customer experience.
The announcements, made at a live event in San Francisco titled “What’s Next with AWS,” landed just 24 hours after OpenAI and Microsoft publicly restructured their exclusive cloud partnership — a move that, for the first time, freed OpenAI to distribute all of its products across rival cloud providers. AWS CEO Matt Garman called it “a huge partnership” and said customers have been asking for OpenAI models inside AWS “from the very early days.”
The timing was no accident. Amazon CEO Andy Jassy had flagged the Microsoft-OpenAI restructuring as “very interesting” in a post on X the day prior, promising more details on Tuesday. What followed was a sweeping set of launches that together represent AWS’s bid to become the definitive infrastructure layer for the agentic AI era — one where intelligent software agents don’t just answer questions but take autonomous action inside enterprise workflows.
The centerpiece announcement: OpenAI’s latest models are now available through Amazon Bedrock in limited preview, with general availability expected within weeks. AWS confirmed that GPT-5.4 is available immediately in limited preview, with GPT-5.5 arriving shortly thereafter.
In an exclusive interview with VentureBeat at the event, Anthony Liguori, Vice President and Distinguished Engineer at AWS, described the significance of the moment. “We announced a partnership about eight weeks ago centered around this idea of the stateful runtime environment, the SRE APIs,” Liguori said. “However, today we announced the availability of all of OpenAI’s frontier models in Amazon Bedrock available via both the stateless APIs — these are the APIs that are commonly used, like chat completions and responses.”
Liguori characterized the stateless API availability as particularly critical because it removes migration friction. “Customers can take their existing workloads today and just start using AWS right off the bat,” he said. “They don’t have to write any new software, develop any new things. I think that’s one of the most exciting announcements that came out today.”
The integration means AWS customers can now evaluate and deploy OpenAI models alongside offerings from Anthropic, Meta, Mistral, Cohere, and Amazon’s own models — all through Bedrock’s unified security, governance, and cost controls. For enterprise procurement teams, this collapses what had been a fragmented multi-vendor landscape into a single pane of glass.
The path to Tuesday’s announcement was anything but smooth. As TechCrunch reported, OpenAI’s earlier $50 billion deal with Amazon, announced in February, had created a legal tangle with Microsoft. Under the original Microsoft-OpenAI agreement, Microsoft retained exclusive rights to OpenAI products accessed through APIs, which appeared to conflict directly with OpenAI’s promise to give AWS exclusive hosting rights for its new Frontier agent-building tool.
Microsoft had publicly pushed back at the time, stating that “Azure remains the exclusive cloud provider of stateless OpenAI APIs.” The Financial Times reported that Microsoft even contemplated legal action. Monday’s restructured deal — which replaced Microsoft’s open-ended exclusivity with a nonexclusive license running through 2032 — swept those legal obstacles aside.
For AWS, the resolution means its multi-billion-dollar investment in OpenAI can now fully bear fruit. As CNBC reported, OpenAI’s revenue chief Denise Dresser had told employees in a memo that the Microsoft relationship “has also limited our ability to meet enterprises where they are — for many that’s Bedrock.” At the San Francisco event, Dresser framed the moment as a turning point. “They’re no longer in the mindset of experimentation and pilots,” she said of enterprise customers. “They really want to go full enterprise wide, and they understand that to do that, they need to have powerful models. But even more importantly, they want those models in a trusted environment.”
OpenAI CEO Sam Altman, who was unable to attend in person due to his ongoing court case against Elon Musk across the Bay Bridge in Oakland, sent a recorded video message. “We are co-developing an agent platform from the ground up, deeply integrated with AWS services and powered by OpenAI’s most advanced models and tools,” Altman said, “so that customers can build and run powerful agents in their own environment without worrying about the underlying plumbing.”
Beyond raw model access, AWS launched Amazon Bedrock Managed Agents powered by OpenAI — a system that combines OpenAI’s frontier models with its proprietary “harness,” the agentic execution framework that powers products like Codex. This is where Liguori’s technical analysis was most revealing.
He explained that the harness concept represents a shift in how models are trained and deployed for agentic work. “When you think about an agentic platform, there’s really two components,” Liguori told VentureBeat. “One is the harness — the actual logic that will execute tool calls for the model, determine when to compact the context, all of those sorts of things — and then the model itself.”
Critically, Liguori argued, the best agentic performance comes when models are trained specifically against their harness through reinforcement learning — not merely prompted to use tools at inference time. “You can give a model a whole lot of instructions and a set of tools, and it will be able to use it most of the time,” he said. “But when you really train the model on a specific set of tools, a specific style of operations, it’s just like drilling plays over and over again — the model builds muscle memory for using that harness.”
The football analogy is instructive. Where general-purpose models are like versatile athletes who can adapt to any playbook, harness-trained models are like championship teams that have run the same formations thousands of times until execution becomes instinctive. For enterprises deploying agents in high-stakes production environments — managing financial transactions, orchestrating supply chains, or processing sensitive healthcare data — that reliability gap matters enormously.
Bedrock Managed Agents consists of three components: a runtime layer for configuring skills, memory policies, and tool access; an environment layer where the agent lives (deployable on Fargate or other AWS compute); and an inference API for interacting with the agent. The system integrates deeply with AWS’s identity and access management, VPC networking, and CloudTrail auditing — meaning every action an agent takes is logged and governed by existing enterprise security policies.
Liguori made what may be his most striking claim when discussing why enterprises should trust AWS over on-premises alternatives or smaller cloud providers. “With Bedrock, the system that we’re using to host the GPT-5.4 models, that whole environment is zero operator access,” he told VentureBeat. “There’s no human that could ever log into one of those machines, so your inference data is never able to be accessed by a human.”
He pointed to AWS’s custom silicon — Graviton processors and Nitro security chips — as the foundation for this claim. “When you look at one of our servers, either compute servers or the servers we’re using for Gen AI, the only thing that you can buy off the shelf is the memory modules. Everything else is either custom boards or even custom silicon.”
This argument is designed to counter a growing narrative from what the industry calls “neo-clouds” — smaller providers that offer on-premises model hosting with tighter physical security controls. Liguori flipped that argument on its head: “You’re actually way more secure in the cloud because we have built a platform with such strong physical securities… If you were to try to stand up your own inference system today, you’d probably be running open source software on just Linux.”
It’s a bold claim, and one that enterprise CISOs will undoubtedly scrutinize. But it underscores AWS’s conviction that the agentic era — where AI agents access source code, PII data, and critical business systems — demands infrastructure security guarantees that go far beyond what most organizations can build independently.
OpenAI’s Codex coding agent also arrived on Bedrock in limited preview. Dresser shared that Codex has been growing at a blistering pace, expanding “from 3 million weekly active users to 4 million in two weeks.” The tool has evolved beyond simple code generation into a full agentic software development lifecycle platform.
For Liguori, who described himself as “10 to 20 times more productive” as an engineer thanks to tools like Codex, bringing this capability into AWS represents the bridge between individual developer productivity and enterprise-scale deployment. “Most developers today are using these OpenAI models on their laptops,” he said. “We haven’t seen that happen yet in the rest of the industry, and with Bedrock Managed Agents, we think we have a way for enterprises to deploy agents in a means that meets their compliance requirements.”
The gap Liguori is describing — between the solo developer experience and enterprise-wide adoption — is arguably the central challenge of the current AI moment. Individual engineers can achieve extraordinary productivity gains with agentic coding tools. But scaling that to thousands of developers across a Fortune 500 company, with proper governance, security, and auditability, requires platform-level infrastructure. That’s the market AWS is targeting.
Liguori saw the near-term potential in even more immediate terms. He described leading a team of about 20 engineers who share a common codebase of skills and MCP tools. “That has been an amazingly powerful thing, because we’re all able to build on top of each other as we learn how to use these models,” he said. “Where I’ve run into a hurdle is there’s a lot of stuff I’d like to share with our finance team… and I can’t really ask them to clone a Git repo and build it from a Git repo.” Bedrock Managed Agents, he argued, will let teams create hosted agents that non-technical colleagues can access — taking agentic development from a developer-only practice to an enterprise-wide capability within the next six months.
While the OpenAI partnership dominated headlines, AWS also launched Amazon Quick Desktop — a new desktop application designed to bring agentic AI to knowledge workers who aren’t developers. Liguori framed the product as addressing a critical gap. “A lot of these agentic tools have primarily targeted developers,” he said. “Quick Desktop is a really great tool if you are a knowledge worker that is not a developer… I think it’s been underserved for the non-developer knowledge workers.”
Quick Desktop integrates with a user’s local files, calendar, email, Slack, and enterprise applications — building what AWS calls a “Knowledge Graph” that maps relationships between people, projects, decisions, and actions. The system connects natively with Google Workspace, Microsoft 365, Zoom, and Salesforce. Unlike other AI productivity tools, Quick doesn’t wait for prompts. It proactively surfaces what matters — unanswered emails, deals needing updates, documents awaiting review — and can take action like scheduling meetings, drafting emails, or updating Jira tickets.
Garman, who said he had been using the desktop app for several weeks, called it “by far the most effective tool” among AI productivity products he has tested. “If you think about what we’ve done with Quick — combine all of your sources of data inside of the enterprise — but then we also saw the power of having access to a local desktop and being able to operate with your local files and your local email and your local Slack… but people were worried about security, appropriately so,” Garman said. “What we’re doing here is combining a bunch of those things together with QUIC to give you the best of all of those worlds.”
The product is available in preview today, with no AWS account required — users can sign up with just an email address. Customers including BMW, 3M, Mondelēz, Southwest Airlines, and the NFL are already using it, with some reporting production time reductions of nearly 80% and customer issue processing cut by more than 50%.
Perhaps the most ambitious long-term bet announced Tuesday was the expansion of Amazon Connect from a single contact-center product — one that reached over $1 billion in revenue last year and processes 20 million interactions daily — into a family of four agentic AI solutions.
The new lineup includes Amazon Connect Decisions, an agentic supply chain planning tool built on more than 25 specialized supply chain tools and 30 years of Amazon operational science, including one of Amazon’s SCOT (Supply Chain Optimization Technologies) foundation models. Amazon Connect Talent is a high-volume hiring platform inspired by Amazon’s experience hiring 250,000 seasonal employees during peak periods, using AI agents to conduct voice interviews around the clock and present recruiters with anonymized, skills-based scoring. Amazon Connect Customer AI is the renamed and enhanced version of the original contact-center service. And Amazon Connect Health covers the patient journey from appointment scheduling through clinical encounters, including ambient documentation, billing code suggestions, and post-visit summaries drawn from Amazon’s experience with One Medical and Amazon Pharmacy.
Colleen Aubrey, who leads applied AI solutions at AWS and previously co-founded Amazon’s advertising business, introduced a new design philosophy underlying all four products: “humorphism.” Where skeuomorphism translated physical objects into digital metaphors — desks to desktops, files to folders — humorphism translates human interaction dynamics into AI agent behavior. “If we’re building products that at the heart of which is an agentic teammate, then how should those teammates interact with you?” Aubrey asked. The philosophy manifests in specific design choices: Connect Decisions agents ask planners why they made manual adjustments and apply those insights across similar products. Connect Talent agents adapt follow-up questions based on candidate responses. Connect Health agents trace every clinical insight back to source data so physicians can verify AI-generated documentation.
Taken together, Tuesday’s announcements reveal a coherent strategy operating across four distinct layers: custom infrastructure (Graviton, Trainium, zero-operator-access security), model access (Bedrock as a model marketplace with unified APIs), an agentic platform (Bedrock Managed Agents and AgentCore for building and governing agents), and purpose-built applications (Quick for individual productivity, Connect for vertical business operations).
This layered approach addresses a fundamental tension in the enterprise AI market. Companies want choice at the model layer but integration at the platform layer and specificity at the application layer. By offering all three through a single security and governance framework, AWS is betting it can capture value across the entire stack — a strategy that reshapes competitive dynamics for Microsoft, Google Cloud, and the growing constellation of smaller AI infrastructure providers.
Garman pushed back on the “SaaSpocalypse” narrative that agentic AI will destroy incumbent enterprise software companies. “The incumbent providers today have such a huge advantage,” he said. “They have deep domain expertise… a large customer set with all of their data.” He pointed to Salesforce’s recent headless API offering as an example of incumbents adapting smartly. But he also drew an explicit parallel to the early days of cloud computing, when customers would simply replicate their on-premises data centers in the cloud rather than reimagine what was possible. “You see that today with how people are thinking about AI and agents,” Garman said. “They’re like, ‘I have this business process, I’m gonna have agents do the exact same thing that humans do.’ It kind of works… but it doesn’t give you that transformational change.”
He pointed to Amazon’s own Prime Video team as proof of what that change looks like in practice. The team used agentic tools to rebuild a partner payment system that was projected to take two years — completing it in roughly two quarters with a handful of people, while simultaneously improving the system for customers, for Amazon, and for the partners who get paid through it.
For enterprises evaluating their AI strategies, Tuesday’s announcements simplify one decision — OpenAI models are now available where most of them already run production workloads — while complicating another. With model access increasingly commoditized across cloud providers, the real differentiator becomes the platform layer: where agents are built, governed, deployed, and trusted to take consequential actions. That’s the battleground AWS is staking out, and it’s the same ground Microsoft, Google, Salesforce, and a growing number of startups intend to contest.
Liguori sees the transformation accelerating fast. “I think what we’re going to see in the next six months is a lot of this agentic stuff going from developer only to being able to be consumed by a larger number of folks within an enterprise,” he told VentureBeat. Anthony Liguori, the AWS distinguished engineer who led the technical work over eight sleepless weeks to bring OpenAI’s models to Bedrock, said his own productivity as a software engineer has increased 10 to 20 times over the past year. When asked what excites him most about what comes next, he didn’t talk about models or infrastructure. He talked about what happens when that same multiplier reaches the finance team, the product managers, the supply chain planners — the millions of knowledge workers who have been watching the agentic revolution from the sidelines.
“We had nothing eight weeks ago,” he said, “and now we’re here.” If the next eight weeks move as fast, the sidelines may not exist for much longer.
Bringing AI agents into the enterprise software development lifecycle is fast becoming the norm. As developers experiment with new platforms, organizations are exposed to potential security and orchestration failures. Systems that work in pilots may fail once the agents start working with real-time data.
Legacy tech giant IBM is one of several companies trying to address that gap by introducing more structure into how these workflows run. Yesterday, it announced the global launch of its AI-powered software development platform Bob, designed to write and test code across the development cycle, already in use by more than 80,000 of its employees after starting with just 100 internal users in summer 2025.
Bob introduces a structured layer that constantly pauses for human-led checkpoints, yet by harnessing AI models to perform agentic tasks, IBM says it has saved some teams up to 70% of time “on selected tasks…equaling an average time savings of 10 hours per week.”
Specific models supported include IBM’s own Granite series, Anthropic’s Claude, some from French AI firm Mistral and other smaller distilled models — no Alibaba Qwen or other fully open source ones.
This approach reflects a shift in how enterprises want to approach AI-led development: to build systems that not only build applications but also execute complex, multi-step workflows that do not rely on a single model or a single orchestration framework. It provides a structured, guarded approach to automation that seeks to center humans more in the process and fill audit gaps.
Neal Sundaresan, general manager, Automation and AI at IBM, told VentureBeat in an exclusive interview that a large part of using AI for software development is being systematic.
“Model capability alone isn’t enough,” Sundaresan said. “How you deploy it, how you structure context, and how you keep humans in the loop is what determines whether AI actually delivers.”
That divide is shaping how enterprises choose AI tools, whether they prioritize flexibility and experimentation or reliability and auditability.
A growing class of open or autonomous agent systems has pushed the boundaries of what developers can do. They can now run extended or stateful workflows without much human intervention.
The rise of OpenClaw showed enterprises how far experimentation can go, especially when trained on local data and run in sandboxes. But it also meant that the choice between easier agent and workflow creation and security.
Some companies have embraced this spirit of experimentation.
Enterprise providers like Nvidia chose to embrace OpenClaw-like systems by adding a fence around the sandbox environment that runs autonomous agents, using NemoClaw. Kilo launched Kilo Claw, aimed at providing security for autonomous agents. OpenAI, in its updated Agents SDK, added support for sandbox agent implementations that mirror a lot of the usage patterns of systems like OpenClaw.
Sundaresan said enterprises continue to experiment with how they want to approach coding and agent building. He doesn’t want to close the door on fully autonomous agents proactively completing tasks, but he believes enterprises will want to exercise more caution as well.
“If you tell me that the final answer will be OpenClaw, then we will get there,” he said. “But it’s better to open the gate slowly than say, ‘oops, how do I close it now?’”
Bob reflects that thought process, highlighting the increasing shift for enterprises.
Bob acts as a coding platform, but unlike similar products, it aims to standardize and govern the agent workflows created on it.
Tools like Cursor and Claude Code position the user at the beginning of the task. They are writing the prompts, chaining steps and debugging. LangGraph does similarly while also allowing teams to define agent flows.
The difference is not about capabilities but about control, and whether the system enterprises use explores potential solutions or delivers predictable execution.
In this case, the human employee starts and ends the process. If the agent is unable to complete its task or makes a mistake, this is handled after the fact.
Bob, on the other hand, essentially pre-structures the development lifecycle into role-based stages. The agents will often check-in with the user for approval as a natural workflow checkpoint. Sundaresan said the idea is to combine the human and automated workflows.
What is becoming clear is that the next phase of enterprise AI no longer relies on model power, but rather on how well tools are designed to balance autonomy and control.
As mentioned previously, Bob is now available for all regions where IBM does business. IBM’s pricing structure for Bob consists of four primary subscription tiers for each user/seat and is built around its own internal credits system called “Bobcoins,” which serves as the primary metric for transparency and predictability.
These are set at a fixed valuation of 1 Bobcoin per $0.50 USD. Users consume these coins by performing specific actions, such as generating code, running commands, or performing file operations. If a user exhausts their balance, they must upgrade their plan to continue using the service.
Here are the plans currently offered and how many Bobcoins the user obtains by subscribing to each tier.
30-day Free Trial providing 40 Bobcoins
Pro plan at $20 per month with 40 Bobcoins
Pro+ plan at $60 per month with 160 Bobcoins
Ultra tier priced at $200 per month for 500 Bobcoins.
All standard plans provide access to core features including specialized agentic modes, literate coding, the Bob Shell for intelligent CLI workflows, and Model Context Protocol (MCP) integration.
While all individual plans are restricted to a single user, an Enterprise plan is available through sales contact, offering centralized team management, flexible role assignments, and the ability to distribute Bobcoins across an organization.
Enterprise subscribers receive additional benefits such as priority support and a dashboard to track entitlements and usage awareness.
Enterprise AI teams running centralized orchestration stacks now have a new variable to account for: AWS Quick, which expanded this week to a desktop-native agent that builds a persistent personal knowledge graph and executes actions across local files and SaaS tools — outside the visibility of most control planes.
Unlike chat-based copilots that reset with each session, Quick now maintains a continuously updated knowledge graph built from the user’s local files, calendar, email and connected SaaS apps. It uses it to proactively trigger actions without waiting to be asked.
AWS launched Quick in October last year as an alternative to AI workflow and productivity platforms coming from Google, OpenAI and Anthropic. It was a way for enterprise employees to access insights from connected applications, an agent builder, deep research, and workflow automation. Now, it’s grown beyond a simple AI assistant and acts more as a proactive workflow agent with a stateful, real-time knowledge graph of the user. It integrates with third-party apps like Google Workspace, Microsoft 365, Zoom, Salesforce and Slack — and now local files — so the agent can gather context and take actions.
“What we’ve been hearing is that many enterprises have not been happy with how difficult it is to get context from their legacy tools,” Jigar Thakkar, vice president of Quick Suite at AWS, told VentureBeat in an interview. “Our vision is that Quick is a desktop experience that is the one place where people can go to get all their information and tasks.”
Enterprises often put orchestration layers at the center to help guide and manage agents. Context is pulled in, decisions are made, and then actions are executed within defined system boundaries.
Recent releases like Anthropic’s Claude Managed Agents or updates to OpenAI’s Agent SDK also push for more stateless, autonomous agents within enterprise workflows, but still operate within defined orchestration boundaries.
Quick still operates under enterprise controls, something that AWS has always underscored with its AI products, so actions taken on Quick remain bound by permissions, identity and security. Integrations remain managed by either an API or an MCP connection.
However, this evolution of Quick introduces a more subtle shift in the decision layer. AWS updated Quick to build a personal knowledge graph that learns more about the user the more they interact with the platform. It builds a profile based on how they use local files, calendar, email or third-party app integrations to proactively suggest actions such as reminding a team leader to set up check-ins.
Enterprises should be wary that a kind of shadow orchestration could arise in a system like this. The personalized context means the decision layer focuses on implicit triggers rather than set workflows, user-specific interpretations, and different action timings. Practitioners are rightfully wary of this much autonomy, understanding that shadow orchestration may not be something completely under their control.
Upal Saha, co-founder and CTO of Bem, told VentureBeat in an email that platforms like AWS Bedrock AgentCore, its managed agent runtime, and similar ones from Salesforce “maximize autonomy rather than accountability” so enterprises are not losing agent visibility by accident.
“When you deploy an agent that reasons its way to a decision across multiple steps, you have already accepted that you will not be able to fully explain what happened after the fact,” Saha said. “That is fine for a demo. It is not fine for a claims processing pipeline or a financial workflow where a regulator can ask you to produce a complete audit trail for every automated decision made in the last three years.”
AWS said the platform’s governance model is designed to address these concerns. “Users can set up different agents and automated workflows tailored to their role — things like monitoring tickets, pulling data from connected systems, or drafting docs — all managed within a governed environment where IT retains control over what’s connected and what data flows where. It’s designed to give individual users flexibility while keeping enterprise-level oversight in place,” an AWS spokesperson said.
Quick’s evolution from an AI assistant to something more proactive represents a possible approach some enterprise software providers will take to deep AI agent integration into workflows. While what AWS wants to accomplish with Quick—better context from apps and local files and a strong understanding of what its users actually want to do—is not unique, it isn’t focusing on traditional orchestration. Instead, it’s relying on context-driven agent management.
This market tension is growing, as evidenced by the release of similar platforms. Mistral, for example, announced Workflows the same day as the updates to Quick. That platform uses a more traditional orchestration framework.
Stateful and personalized agents continue to evolve, and so do the questions around how enterprises govern them.
Training AI reasoning models demands resources that most enterprise teams do not have. Engineering teams are often forced to choose between distilling knowledge from large, expensive models or relying on reinforcement learning techniques that provide sparse feedback.
Researchers at JD.com and several academic institutions recently introduced a new training paradigm that sidesteps this dilemma. The technique, called Reinforcement Learning with Verifiable Rewards with Self-Distillation (RLSD), combines the reliable performance tracking of reinforcement learning with the granular feedback of self-distillation.
Experiments indicate that models trained with RLSD outperform those built on classic distillation and reinforcement learning algorithms. For enterprise teams, this approach lowers the technical and financial barriers to building custom reasoning models tailored to specific business logic.
The standard method for training reasoning models is Reinforcement Learning with Verifiable Rewards (RLVR). In this paradigm, the model learns through trial and error, guided by a final outcome from its environment. An automated verifier checks if the model’s answer is right or wrong, providing a binary reward, such as a 0 or 1.
RLVR suffers from sparse and uniform feedback. “Standard GRPO has a signal density problem,” Chenxu Yang, co-author of the paper, told VentureBeat. “A multi-thousand-token reasoning trace gets a single binary reward, and every token inside that trace receives identical credit, whether it’s a pivotal logical step or a throwaway phrase.” Consequently, the model never learns which intermediate steps led to its success or failure.
On-Policy Distillation (OPD) takes a different approach. Instead of waiting for a final outcome, developers pair a smaller student model with a larger, more capable teacher model. For each training example, the student compares its response to that of the teacher token by token. This provides the student with granular feedback on the entire reasoning chain and response-generation process.
Deploying and running a separate, massive teacher model alongside the student throughout the entire training process incurs massive computational overhead. “You have to keep a larger teacher model resident throughout training, which roughly doubles your GPU footprint,” Yang said. Furthermore, the teacher and student models must share the exact same vocabulary structure, which according to Yang, “quietly rules out most cross-architecture, cross-modality, or multilingual setups that enterprises actually run.”
On-Policy Self-Distillation (OPSD) emerged as a solution designed to overcome the shortcomings of the other two approaches. In OPSD, the same model plays the role of both the student and the teacher.
During training, the student receives a standard prompt while the teacher receives privileged information, such as a verified, step-by-step answer key. This well-informed teacher version of the model then evaluates the student version, providing token-by-token feedback as the student tries to solve the problem using only the standard prompt.
OPSD appears to be the perfect compromise for an enterprise budget. It delivers the granular, step-by-step guidance of OPD. Because it eliminates the need for an external teacher model, it operates with the high computational efficiency and low cost of RLVR, only requiring an extra forward pass for the teacher.
However, the researchers found that OPSD suffers from a phenomenon called “privileged information leakage.”
“The objective is structurally ill-posed,” Yang said. “There’s an irreducible mutual-information gap that the student can never close… When self-distillation is set up as distribution matching, the student is asked to imitate the teacher’s full output distribution under privileged context.”
Because the teacher evaluates the student based on a hidden answer key, the training objective forces the student model to learn the teacher’s exact phrasing or steps instead of the underlying reasoning logic. As a result, the student model starts hallucinating references to an invisible solution that it will not have access to in a real-world deployment.
In practice, OPSD models show a rapid spike in performance early in training, but their reasoning capabilities soon plateau and progressively degrade over time.
The researchers behind RLSD realized that the signals governing how a model updates its parameters have fundamentally asymmetric requirements. They identified that the signal dictating the direction of the update (i.e., whether to reinforce or penalize a behavior) can be sparse, but must be perfectly reliable, because pointing the model in the wrong direction damages its reasoning policy.
On the other hand, the signal dictating the magnitude of the update (i.e., how much relative credit or blame a specific step deserves) benefits from being extremely dense to enable fine-grained, step-by-step corrections.
RLSD builds on this principle by decoupling the update direction from the update magnitude. The framework lets the verifiable environmental feedback from the RLVR signal strictly determine the direction of learning. The model only receives overall reinforcement if the final answer is objectively correct.
The self-teacher is stripped of its power to dictate what the model should generate. Instead, the teacher’s token-by-token assessment is repurposed to determine the magnitude of the update. It simply distributes the total credit or blame across the individual steps of the model’s reasoning path.
This alters how the model learns compared to the classic OPSD paradigm. In standard OPSD, the training objective acts like behavioral cloning, where the model is forced to directly copy the exact wording and phrasing of the teacher. This causes the student to hallucinate and leak references to data it does not have.
Instead of forcing the model to copy a hidden solution, RLSD provides a natural and virtually cost-free source of per-token credit information.
“The intuition: we’re not teaching the model to reason like the teacher,” Yang said. “We’re telling the model, on the path it chose, which of its own tokens were actually doing the work. The model’s exploration distribution stays its own. Only the credit allocation gets sharpened.”
If a specific deduction strongly supports the correct outcome, it receives a higher score. If it is just a useless filler word, it receives a baseline score. RLSD eliminates the need to train complex auxiliary reward networks, manually annotate step-by-step data, or maintain massive external teacher models.
To test RLSD, the researchers trained the open-weight Qwen3-VL-8B vision-language model and evaluated it on several visual reasoning benchmarks. These included MMMU for college-level multi-discipline questions, MathVista, MathVision, WeMath, and ZeroBench, a stress-test benchmark explicitly designed to be nearly impossible for current frontier models.
They compared the RLSD model against the base model with no post-training, standard RLVR via the GRPO algorithm, standard OPSD, and a hybrid combination of the two.
RLSD significantly outperformed every other method, achieving the highest average accuracy of 56.18% across all five benchmarks. It beat the base model by 4.69% and outperformed standard RLVR by 2.32%. The gains were most pronounced in complex mathematical reasoning tasks, where RLSD outperformed standard RLVR by 3.91% on the MathVision benchmark.
Beyond accuracy, the framework offers massive efficiency gains. “Concretely, RLSD at 200 training steps already beats GRPO trained for 400 steps, so roughly 2x convergence speedup,” Yang said. “Cost-wise, the only overhead beyond a normal GRPO pipeline is one extra forward pass per response to grab teacher logits. Compared to rollout generation… that’s basically free.”
Unlike OPSD, which saw performance spike and then completely collapse due to information leakage, RLSD maintained long-term training stability and converged on a higher performance ceiling than standard methods.
The qualitative findings highlight how the model alters its learning behavior. For example, in a complex visual counting task, standard RLVR looks at the final correct answer and gives the entire paragraph of reasoning tokens the same reward. RLSD surgically applied rewards to the specific mathematical subtraction steps that solved the problem, while actively down-weighting generic filler text like “Looking at the image, I see…”.
In another example, the model performed an incorrect math derivation based on a bar chart. Instead of labeling the whole response as a failure, RLSD concentrated the heaviest penalty on the exact point where the model misread a relationship from the chart. It remained neutral on the rest of the logical setup, recognizing that the initial framework was valid.
This is particularly important for messy, real-world enterprise use cases. If a model makes a mistake analyzing a 50-page quarterly earnings report, developers do not want it to unlearn its entire analytical framework. They just want it to fix the specific assumption it got wrong. RLSD allows the model to learn exactly which logical leaps are valuable and which are flawed, token by token. Because RLSD does this by repurposing the model itself, it provides models with granular reasoning capabilities while keeping the costs of training reasonable.
For data engineers and AI orchestration teams, integrating RLSD is straightforward, but it requires the right setup. The most critical requirement is a verifiable reward signal, such as code compilers, math checkers, SQL execution, or schema validators. “Tasks without verifiable reward (open-ended dialogue, brand-voice writing) belong in preference-based pipelines,” Yang said.
However, RLSD is highly flexible regarding the privileged information it requires. While OPSD structurally requires full intermediate reasoning traces, forcing enterprises to either pay annotators or distill from a frontier model, RLSD does not.
“If you have full verified reasoning traces, great, RLSD will use them,” Yang said. “If all you have is the ground-truth final answer, that also works… OPSD doesn’t have this flexibility.”
Integrating the technique into existing open-source multi-modality RL frameworks like veRL or EasyR1 is incredibly lightweight. According to Yang, it requires no framework rewrite and slots right into the standard stack. The code swap involves simply changing tens of lines to adjust the GRPO objective and sync the teacher with the student.
Looking ahead, RLSD offers a powerful way for enterprises to maximize their existing internal assets.
“The proprietary data enterprises hold inside their perimeter (compliance manuals, internal documentation, historical tickets, verified code snippets) is essentially free privileged information,” Yang concluded. “RLSD lets enterprises feed this kind of data straight in as privileged context, which sharpens the learning signal on smaller models without needing an external teacher and without sending anything outside the network.”
Mistral AI, the Paris-based artificial intelligence company valued at €11.7 billion ($13.8 billion), today released Workflows in public preview — a production-grade orchestration layer designed to move enterprise AI systems out of proofs of concept and into the business processes that generate revenue.
The product, which launches as part of Mistral’s Studio platform, is the company’s clearest articulation yet of a thesis that is quietly reshaping the enterprise AI market: that the bottleneck for organizations adopting AI is no longer the model itself, but the infrastructure required to run it reliably at scale.
“What we’re seeing today is that organizations are struggling to go beyond isolated proofs of concept,” Elisa Salamanca, who leads go-to-market for Mistral’s enterprise products, told VentureBeat in an exclusive interview ahead of the launch. “The gap is operational. Workflows is the infrastructure to run AI systems reliably across business-critical processes.”
The release arrives at a pivotal moment for both Mistral and the broader AI industry. The dedicated agentic AI market has been valued at approximately $10.9 billion in 2026 and is projected to reach $199 billion by 2034. Yet despite that staggering growth trajectory, industry research points to a stark reality: over 40% of agentic AI projects will be aborted by 2027 due to high costs, unclear value, and complexity. Mistral is betting that Workflows can help its enterprise customers avoid becoming one of those statistics.
At its core, Workflows provides a structured system for defining, executing, and monitoring multi-step AI processes — from simple sequential tasks to complex, stateful operations that blend deterministic business rules with the probabilistic outputs of large language models.
Salamanca described Workflows as containing several key components. The first is a development kit that allows engineers to build orchestration logic in just a few lines of Python code. “We have also been able to expose MCP servers,” she explained, referring to the Model Context Protocol standard for connecting AI systems to external tools, “so that they can actually do this with agent authoring.”
The second — and arguably more technically significant — component is an architecture that separates orchestration from execution. “We’re decorrelating the orchestration from the execution,” Salamanca said. “Execution can happen close to the customer’s data — their critical systems — and orchestration can happen on the cloud or wherever they want to run it.” This means the data never has to leave the customer’s perimeter, a design decision with enormous implications for regulated industries where data sovereignty is non-negotiable. “Enterprises do not have to worry about us having access to the data,” she added.
The third pillar is observability. According to Mistral’s blog post announcing the release, every branch, retry, and state change within a workflow is recorded in Studio with native support for OpenTelemetry. Salamanca noted that this is not an afterthought: “You can easily see what decisions have been taken by the workflow, by the agent, and you can deep dive into where problems are happening.”
Workflows is fully customizable across models — engineers can select which model handles which step and can inject arbitrary code, allowing them to blend deterministic pipelines with agentic sections. The system also supports connectors that integrate directly with CRMs, ticketing systems, support platforms, and other enterprise tools, with built-in authentication and secrets management.
Unlike some competitors offering drag-and-drop workflow builders, Mistral has deliberately targeted developers and engineers rather than business users. “There are a couple of solutions out there that have click-and-drag, drag-and-drop solutions for workflows,” Salamanca acknowledged. “This is not the approach that we’ve been taking. We’ve been really focused towards developers and critical systems that will not scale if you’re doing these drag-and-drop workflows.”
The decision is part of a broader philosophy at Mistral: that enterprise AI systems handling mission-critical operations — cargo releases, compliance reviews, financial transactions — require the precision and version control that only code can provide. Business users are not excluded from the picture, but their role is downstream. Once engineers write a workflow in Python, it can be published to Le Chat, Mistral’s chatbot platform, so anyone in the organization can trigger it. Every step remains tracked and auditable in Studio.
Under the hood, Workflows runs on Temporal’s durable execution engine — a platform whose $5 billion valuation reflects how its durable execution capabilities, originally built for cloud workflow orchestration, have become essential infrastructure for AI agents requiring reliable, long-running, stateful processes. Temporal’s customers include OpenAI, Snap, Netflix, and JPMorgan Chase, and its technology powers orchestration at companies like Stripe and Salesforce.
Mistral extended Temporal’s core engine for AI-specific workloads by adding streaming, payload handling, multi-tenancy, and observability that the base engine does not provide out of the box. “Workflows is built on top of Temporal,” Salamanca confirmed. “We added all the AI requirements to make these AI workflows reliable. It provides out of the box durability, retries, state management. Whenever there’s a failure, it starts again wherever it stopped.” Originally spun out of Uber’s Cadence project, Temporal transparently handles retries, state persistence, and timeouts, providing durable execution across failures. In late 2025, Temporal joined the newly formed Agentic AI Foundation as a Gold Member and announced an official OpenAI Agents SDK integration. By building on this infrastructure rather than creating a proprietary alternative, Mistral inherits battle-tested reliability while focusing its own engineering efforts on the AI-specific layer that sits above it.
Mistral is not launching Workflows as a concept — the company says customers are already running the product in production, processing millions of executions daily across three primary use cases.
The first is cargo release automation in the logistics sector. Global shipping still runs on paperwork, and a single cargo release can involve customs declarations, dangerous goods classifications, safety inspections, and regulatory checks spanning multiple jurisdictions. Salamanca described the scope of the problem: “Their global shipping today runs on paperwork. They have to involve customs declaration, Dangerous Goods classification, safety inspections, regulatory checks, and Workflows is now powering that with our models and business rules inside.”
Critically, the system keeps humans in the loop at the right moments. According to Mistral’s blog, the human approval step in a workflow is a single line of code — wait_for_input() — that pauses the workflow indefinitely with no compute consumption, notifies the reviewer, and resumes exactly where it left off once approval is given. “Humans are still in the loop, but they’re in the loop at the right time,” Salamanca said. “They just get the validation — I don’t have to go into multiple tools — and the shipment gets released.”
The second production use case is document compliance checking for financial institutions, specifically Know Your Customer reviews. These reviews are manual, repetitive, and traditionally require hours of analyst time per case. Salamanca said Workflows now processes these reviews in minutes and provides outputs in an auditable manner — a requirement for meeting regulatory obligations.
The third example involves customer support in the banking sector. “You’d have millions of users actually asking to have credit cards blocked, or feedbacks on their account situation, on their credit feedbacks,” Salamanca said. With Workflows, incoming support tickets are analyzed, categorized by intent and urgency, and routed automatically. Each routing decision is visible and traceable in Studio, and when the system gets a categorization wrong, the team can correct it at the workflow level without retraining the model.
Workflows does not exist in isolation. It is the middle layer of a three-part enterprise platform that Mistral has been assembling at a rapid clip throughout 2026.
At the bottom sits Forge, the custom model training platform Mistral launched in March at Nvidia’s GTC conference. Forge allows organizations to build, customize, and continuously improve AI models using their own proprietary data. At the top sits Vibe, Mistral’s coding agent platform that provides the user-facing interaction layer — available on web, mobile, or desktop.
Salamanca connected the three explicitly: “We just released Forge. It enables you to create your own models. But the question is, how do you put these models to do valuable work for your enterprise? That’s where Workflows comes in, because this is the orchestration piece — how you blend in deterministic rules and agentic capabilities. And then if you really want to have your end users interact with these AI patterns, it’s where Vibe comes into play.”
Forge is already seeing strong traction, Salamanca said, across two distinct patterns of enterprise demand. “First, they wanted to really build completely dedicated models to solve unique problems — transformers-based architecture for time series in the financial sector, adding new types of modalities to the LLMs,” she explained. “And the second motion was about customers with really specific tasks they want to solve. Reinforcement learning really caught their attention as to how they can use Forge and Forge RL to actually have models do these tasks very well.”
This layered architecture — model customization, workflow orchestration, and end-user interfaces — positions Mistral as something more ambitious than a model provider. It is building a full-stack enterprise AI platform, a strategy that pits it directly against not just other AI labs like OpenAI and Anthropic, but also against the hyperscale cloud providers. The company’s product portfolio now ranges, as Salamanca put it, “from compute to end-user interfaces,” including data centers in Europe, document processing with its OCR model, and audio capabilities through its Voxtral models.
The Workflows launch comes as Mistral executes one of the most aggressive scaling campaigns in the history of the European technology industry. The French AI startup has increased its revenue twentyfold within a year, with co-founder and CEO Arthur Mensch putting the company’s annualized revenue run rate at over $400 million, compared to just $20 million the previous year. The Paris-based company aims to achieve recurring annual revenue of more than $1 billion by year-end.
The company’s fundraising trajectory has been equally dramatic. Mistral announced a €1.7 billion ($1.9 billion) Series C round at a €11.7 billion ($12.8 billion) valuation in September 2025. Bloomberg reported in September 2025 that the company was finalizing a €2 billion investment valuing it at €12 billion ($14 billion). ASML led the round and contributed €1.3 billion, a landmark investment that aligned chip manufacturing expertise with frontier AI development and underscored European industrial capital’s commitment to building a sovereign AI ecosystem. Mistral then secured $830 million in debt in March 2026 to buy 13,800 Nvidia chips for a new data center near Paris.
The financial picture illustrates why Workflows matters strategically. Mistral’s revenue growth is being driven primarily by enterprise adoption, with approximately 60% of revenue coming from Europe, according to CEO Mensch’s public statements. Those enterprise customers are not buying Mistral’s models for casual chatbot applications — they are deploying them in regulated, mission-critical environments where reliability and data sovereignty are table stakes. Workflows gives those customers the production infrastructure they need to actually deploy AI systems that matter.
In May 2025, Mistral released Mistral Medium 3, which was priced at $0.40 per million input tokens and $2 per million output tokens. The company said clients in financial services, energy, and healthcare had been beta testing it for customer service, workflow automation, and analyzing complex datasets. That model now becomes one of many that can be plugged into Workflows, creating a flywheel where better models drive more workflow adoption, which in turn drives more inference revenue.
Mistral’s entry into workflow orchestration arrives in an increasingly crowded field. AI orchestration platforms are quickly becoming the backbone of enterprise AI systems in 2026, and as businesses deploy multiple AI agents, tools, and LLMs, the need for unified control, oversight, and efficiency has never been greater.
Major cloud providers — Amazon with Bedrock AgentCore, Microsoft with Copilot Studio, Google with Vertex AI’s agent tools, and IBM with WatsonX — all offer some form of workflow or agent orchestration. Open-source frameworks like LangChain, LlamaIndex, and Microsoft AutoGen provide developer-level building blocks. And dedicated orchestration startups are proliferating.
Mistral’s differentiation rests on three pillars. First, vertical integration: because Workflows is native to Studio, the orchestration layer and the components it orchestrates — models, agents, connectors, observability — are built to work together, eliminating the integration tax that enterprises pay when stitching together disparate tools. Second, deployment flexibility: the split control-plane/data-plane architecture means customers in regulated industries can run execution workers in their own environments while still benefiting from managed orchestration. Third, data sovereignty: Mistral’s European roots and infrastructure investments give it a natural advantage with organizations wary of routing sensitive data through U.S.-headquartered cloud providers — a concern that has intensified amid ongoing geopolitical tensions and growing European anxiety about relying on foreign providers for over 80% of digital services and infrastructure.
Still, the challenges are real. OpenAI and Anthropic both have significantly larger model ecosystems and developer communities. The hyperscalers control the cloud infrastructure where most enterprise workloads actually run. And the enterprise sales cycles for production-grade AI deployments remain long and complex, requiring deep technical integration work that even well-funded startups can struggle to staff.
Salamanca outlined three areas of near-term development. First, Mistral plans to release a more managed version of Workflows that abstracts deployment logic for developers who don’t need granular control over worker placement. “Whenever you want to have this flexibility, you can, but if you want to be able to have this on a managed infrastructure, even if it’s running in your own VPC, this is something that we’re adding,” she said.
Second, the company intends to make Workflows accessible to business users, not just engineers. “With Vibe code, you can actually author a workflow. This can be executed at scale, and any end user, in the end, can actually do that with Workflows,” Salamanca explained. The third area is enterprise guardrails and safety controls for agentic applications — ensuring agents use the correct tools, run with appropriate permissions, and that administrators can enforce policies at scale. “Making sure that we have all these enterprise controls to be able to scale the authoring and the building of these workflows is something we’re actively working on,” she said.
The Python SDK for Workflows (v3.0) is now publicly available. Developers can try the product in Studio and access documentation and demo templates immediately. Mistral will be hosting its inaugural AI Now Summit in Paris on May 27–28, where the company is expected to provide additional details on its platform roadmap.
For three years, the AI industry has been captivated by a single question: who can build the most powerful model? Mistral’s Workflows launch suggests the company has moved on to a different question entirely — one that may prove far more consequential for the enterprises writing the checks. It’s not about which model is smartest. It’s about which one can actually show up for work.
AI R&D runs on a cycle of hypothesis, experiment, and analysis — each step demanding substantial manual engineering effort. A new framework from researchers at SII-GAIR aims to close that bottleneck by automating the full optimization loop for training data, model architectures, and learning algorithms.
A new framework called ASI-EVOLVE, developed by researchers at the Generative Artificial Intelligence Research Lab (SII-GAIR), aims to solve this bottleneck. Designed as an agentic system for AI-for-AI research, it uses a continuous “learn-design-experiment-analyze” cycle to automate the optimization of the foundational AI stack.
In experiments, this self-improvement loop autonomously discovered novel designs that significantly outperformed state-of-the-art human baselines. The system generated novel language model architectures, improved pretraining data pipelines to boost benchmark scores by over 18 points, and designed highly efficient reinforcement learning algorithms.
For enterprise teams running repeated optimization cycles on their AI systems, the framework offers a path to reducing manual engineering overhead while matching or exceeding the performance of human-designed baselines.
Engineering teams can only explore a tiny fraction of the vast possible design space for AI models at any given time. Executing experimental workflows requires costly manual effort and frequent human intervention. And the insights gained from these expensive cycles are often siloed as individual intuition or experience, making it difficult to systematically preserve and transfer that knowledge to future projects or across different teams. These constraints fundamentally limit the pace and scale of AI innovation.
AI has made incredible strides in scientific discovery, ranging from specialized tools like AlphaFold solving discrete biological problems to agentic systems answering basic scientific questions. However, current frameworks still struggle with open-ended AI innovation and are mostly limited to narrow optimization within very specific constraints.
Advancing core AI capabilities is far more complex. It requires modifying large interdependent codebases, running compute-heavy experiments that consume tens to hundreds of GPU hours, and analyzing multi-dimensional feedback from training dynamics.
“Existing frameworks have not yet demonstrated that AI can operate effectively in this regime in a unified way, nor that it can generate meaningful advances across the three foundational pillars of AI development rather than within a single narrowly scoped setting,” the researchers write.
To overcome the limitations of manual R&D, ASI-EVOLVE operates on a continuous loop between prior knowledge, hypothesis generation, experimentation, and refinement. The system learns relevant knowledge and historical experience from existing databases, designs a candidate program representing its next hypothesis, runs experiments to obtain evaluation signals, and analyzes outcomes into reusable, human-readable lessons that it feeds back into its knowledge base.
There are two key components that drive ASI-EVOLVE. The “Cognition Base” acts as the system’s foundational domain expertise. To speed up the search process, the system is pre-loaded with human knowledge, task-relevant heuristics, and known pitfalls extracted from existing literature. This steers the exploration toward promising directions right from the first iteration.
The second component is the “Analyzer,” which tackles the complex, multi-dimensional feedback from the experiments. It processes raw training logs, benchmark results, and efficiency traces, distilling them into compact, actionable insights and causal analyses.
Several other complementary modules bring the framework together. A “Researcher” agent reviews prior knowledge from the cognition base and past experimental results to generate new hypotheses, either proposing localized code modifications or writing new programs.
The “Engineer” component runs the actual experiments. Because AI training trials are incredibly costly, the Engineer is equipped with efficiency measures like wall-clock limits and early rejection quick tests to filter out flawed candidate programs before they consume excessive GPU hours.
Finally, the “Database” serves as the system’s persistent memory, storing the code, research motivations, raw results, and the Analyzer’s final reports for every iteration, ensuring that insights compound systematically over time.
By unifying these components, ASI-EVOLVE ensures that an AI agent systematically learns from complex, real-world experimental feedback without requiring constant human intervention.
While previous frameworks are designed to evolve candidate solutions, “ASI-EVOLVE evolves cognition itself,” the researchers write. “Accumulated experience and distilled insights are continuously stored and retrieved to inform future exploration, ensuring that the system grows not only in the quality of its solutions but in its capacity to reason about where to search next.”
In their experiments, the researchers showed that ASI-EVOLVE can successfully improve data curation, model architectures, and learning algorithms to create better AI systems.
For real-world enterprise applications, high-quality data is a persistent bottleneck. When tasked with designing category-specific cleaning strategies for massive pretraining corpora, ASI-EVOLVE inspected data samples and diagnosed quality issues like HTML artifacts and formatting inconsistencies. The system autonomously formulated custom curation rules, discovering that systematic cleaning combined with domain-aware preservation rules is far more effective than aggressive filtering.
In benchmark tests, 3B-parameter models trained on the AI-curated data saw an average score boost of nearly 4 points over models trained on raw data. The gains were highest in knowledge-intensive tasks, with performance increasing by over 18 points on Massive Multitask Language Understanding (MMLU), an LLM benchmark that covers tasks across STEM, humanities, and social sciences.
Beyond data, the system proved highly capable at neural architecture design. Across 1,773 autonomous exploration rounds, it generated 105 novel linear attention architectures that surpassed DeltaNet, a highly efficient human-designed baseline. To achieve these results, ASI-EVOLVE developed multi-scale routing mechanisms that dynamically adjust the model’s computational budget based on the specific content of the input.
Finally, in reinforcement learning algorithm design, ASI-EVOLVE discovered novel optimization mechanisms. It designed algorithms that outperformed the competitive GRPO baseline on complex mathematical reasoning benchmarks such as AMC32 and AIME24. One successful variant invented a “Budget-Constrained Dynamic Radius” that keeps model updates within a defined budget, effectively stabilizing training on noisy data.
Enterprise AI workflows constantly require optimizations to existing systems, from fine-tuning open-source models on proprietary data to making small changes to architectures and algorithms. Usually, the computational resources and engineering hours required to carry out such efforts are immense and beyond the capabilities of most organizations. As a result, many are left to run unoptimized versions of standard AI models.
The research team says the framework is designed so enterprises can integrate proprietary domain knowledge into the cognition repository and allow the autonomous loop to iterate on internal AI systems.
The research team has open-sourced the ASI-EVOLVE code, making the foundational framework available for developers and product builders.
Presented by EdgeverveSupply chains are where legacy integration models reach their limits. As partner networks expand and operational volatility increases, traditional middleware is buckling under costs and complexity. That’s why supply chain has emer…
For the past eighteen months, the corporate world has been obsessed with the “builder” phase of the generative AI revolution. Enterprises have raced to deploy autonomous agents to handle everything from customer support to complex codebase refactoring.
However, as these digital workers proliferate, a new, more structural problem has emerged: fragmentation. Agents built on LangChain cannot easily hand off tasks to those built on CrewAI; a Salesforce-embedded agent has no native way to coordinate with a custom-built Python script running on a private cloud.
Today, a new startup, BAND (also known as Thenvoi AI Ltd.) exited stealth with $17 million in Seed funding to provide the “interaction infrastructure” necessary to turn these isolated tools into a unified, collaborative workforce.
“In order for agents to become real players in the global economy, they need ways to communicate, just like humans do,” said co-founder and CEO Arick Goomanovsky in an interview with VentureBeat, continuing, “the communication solutions we have today for systems don’t work for agents, because agents are non-deterministic creatures. It’s not just about API integrations.”
By introducing a deterministic communication layer that functions as a “Slack for agents,” BAND aims to move the industry from a collection of fragile experiments to a scalable, “agentic economy”.
At the core of BAND’s thesis is that simply creating and plugging AI agents into human communication tools like Slack causes them to lose context or require constant “rehydration” if they fail and re-enter a conversation.
“You can’t take a bunch of agents and put them into Slack and expect it to miraculously work,” Goomanovsky said.
BAND solves this through a two-layer architecture designed to handle the unique telemetry of AI-to-AI interaction, a so called “agentic mesh.”
This is the “interaction layer” where agent discovery and structured delegation occur. It allows agents to find one another across different clouds and frameworks without requiring developers to write brittle “glue code” for every new connection.
Multi-Peer Collaboration: Unlike existing protocols that are primarily peer-to-peer or client-server, BAND supports full-duplex, multi-peer communication. This allows a group of agents—for example, a planning agent, a coding agent, and a QA agent—to work together in a shared “room” with synchronized context.
Deterministic Routing: Notably, BAND does not use Large Language Models (LLMs) to route messages. Using an LLM for routing would introduce the same non-deterministic errors the platform seeks to solve. Instead, the platform uses a patent-pending multi-layer architecture to ensure messages reach their destination reliably.
The WhatsApp Comparison: To handle the anticipated volume of agentic traffic, BAND’s infrastructure is built on the same technical stack utilized by global messaging giants like WhatsApp and Discord. This ensures the platform can scale to billions of messages as digital identities begin to outnumber human ones.
If the nesh is the “pipes,” the Control Plane is the “valve”. This layer provides the runtime governance that enterprises require before they can safely scale autonomous systems.
Authority Boundaries: The platform allows organizations to enforce strict rules on which agents can talk to each other and what topics they can discuss.
Credential Traversal: One of the most significant hurdles in multi-agent systems is identity. BAND manages how human permissions and security tokens traverse from agent to agent. For instance, if a human asks Agent A for information, and Agent A delegates that task to Agent B, BAND ensures Agent B only accesses data the original human is permitted to see.
BAND’s product suite is designed to be “framework-agnostic” and “cloud-agnostic,” positioning itself as an independent middleware that prevents vendor lock-in. In a market where hyperscalers like OpenAI or Anthropic want enterprises to stay within their specific ecosystems, BAND offers the flexibility to use the best model across multiple options for the job, including open source and fine-tuned, custom enterprise options.
“No matter where the agents run or how they were built, we can band them together, allow them to discover each other, delegate tasks, and have full-duplex, bidirectional communication,” Goomanovsky said, noting that despite competing first-part options from model providers like OpenAI’s workspace agents (announced yesterday) and Anthropic’s Claude Managed Agents (announced earlier this month), BAND “play[s] the role of the independent platform that allows an enterprise to avoid vendor lock-in.”
The company is currently seeing the most traction in “tech-forward” sectors, including telecommunications, financial services, and cybersecurity.
Coding Agents: This is currently the most popular use case. Developers often find that Claude is superior at planning, while Codex is better at reviewing code. BAND allows these agents to work simultaneously, delegating tasks to one another in real-time.
Customer Support and Operations: Beyond code, BAND enables “cross-boundary” automation. For example, a new employee could be onboarded by a Workday agent, which then communicates with a ServiceNow agent to open a ticket for equipment, which finally talks to a purchasing agent to finalize the order.
Understanding the sensitivity of enterprise data, BAND offers three primary ways to consume the platform:
SaaS: A straightforward cloud-based platform where agents connect via API.
Private Cloud/On-Premise: The entire platform can be deployed within a customer’s VPC or on-premise environment to ensure data never leaves their control.
The Edge: The infrastructure is lightweight enough to be deployed on “flying objects” like drones (UAVs) or even satellites, facilitating communication between agents in physically isolated environments.
Already, BAND’s early users — and enterprises more broadly — are mixing and matching AI agents powered by models from various providers, so the time to provide an overarching solution seems ripe.
As Goomanovsky put it: “Advanced developers are not using a single coding agent. They realize Claude is very good at planning, Codex is much better at reviewing, and today there is no way to create that bidirectional interaction between coding, review, and planning agents. We enable that.”
BAND operates as a commercial entity, focusing on providing “enterprise-grade” stability and security. While the platform integrates with open-source frameworks like LangChain and CrewAI, its own core routing and control technology is proprietary and patent-pending.
For enterprise IT leaders, the “Control Plane” is less about communication and more about auditability. BAND provides full observability into every agent interaction, creating a transcript and a “paper trail” for autonomous actions.
This is a “complementary” solution to existing guardrail products; while a guardrail might protect a single agent from a prompt injection, BAND protects the entire system from cascading failures caused by one agent misinforming another.
The company has launched with a tiered pricing model designed to capture everyone from individual “agent enthusiasts” to global corporations:
Free ($0/mo): Designed for individuals. It allows for up to 10 remote agents and 50 active chat rooms, though it only retains data for 24 hours.
Pro ($17.99/mo): Aimed at startups and growing R&D teams. This tier increases limits to 40 agents and 250 active chat rooms with email support.
Enterprise (Custom): Offers unlimited agents, custom data retention policies to meet compliance requirements, and full API access to BAND’s “Memory APIs”.
The emergence of BAND coincides with a shift in how analysts view the AI market. Gartner has predicted that by 2029, 90% of enterprises deploying multiple agents will require what they call a “Universal Orchestrator”. Similarly, Forrester has recognized the “Agent Control Plane” as a distinct and emerging market category.
The company was founded by Goomanovsky and Vlad Luzin, who combined their backgrounds in Israeli intelligence, cybersecurity, and multi-agent systems to build BAND.
Goomanovsky views the platform not just as a tool, but as a foundational layer for the next era of the internet.
“Communication is the most fundamental problem in computing,” Goomanovsky noted. “When new beings emerge, the first thing they need is a way to talk to each other… We are the agent internet”.
The $17 million Seed round was led by Sierra Ventures, Hetz Ventures, and Team8. Tim Guleri of Sierra Ventures emphasized that BAND is building the “missing layer” that makes large-scale collaboration practical.
This capital will be used to expand the engineering team and accelerate the development of the “design partner” ecosystem, which already includes leading North American telcos and European digital payment companies.
As agents transition from being digital novelties to becoming the primary drivers of enterprise workflows, the “glue code” that holds them together will become the most critical piece of the stack. BAND’s launch marks the first serious attempt to standardize that glue, turning a chaotic “band” of agents into a synchronized, governed symphony.