2026 Predictions
As 2025 comes to a close, I’ve been reflecting on a paradox: AI has never been more powerful, yet most enterprises still struggle to deploy it in ways that create real value.
This past year revealed the hidden complexities of putting AI to work. Companies discovered that accuracy without auditability is worthless in regulated industries. That human-in-the-loop deployments can destroy margins at scale. That agents can reason brilliantly but can’t execute transactions. That generic benchmarks miss the edge cases that define entire verticals.
These are structural gaps in how AI systems interface with the real world; and they’re also gaps that represent massive opportunities for the companies that fill them. We’re moving from a world where raw model capability was the bottleneck to one where orchestration, trust, and domain expertise determine who wins.
Below are 10 predictions for how this will unfold in 2026:
Vertical-specific benchmarks will become the core trust mechanism for AI in accuracy-critical industries.
Unlike generic evals, vertical-specific benchmarks encode institutional knowledge, edge cases, and industry-specific nuances that outsiders consistently miss. Over time, these vertical-specific benchmarks compound: every failure, exception, and corner case gets codified into the benchmark itself. As they mature, they become the internal bar teams will build against and the external artifact customers will come to rely on. The companies that own the benchmark will increasingly define what “good” looks like in their vertical.
Building the Right Benchmarks for Vertical AIWorld models and generative simulation will unlock the next major wave of physical-world AI.
Much of today’s AI progress has stalled at the boundary of the physical world, where continuous, high-dimensional data dominates. This is where approaches like Yann LeCun’s JEPA models matter, shifting learning away from token prediction toward abstract world modeling. The next leap is generative simulation, where models don’t just replay reality but generate plausible variations of it. Rather than merely augmenting datasets, these simulations act as data multipliers by expanding the space of actions, environments, and outcomes a system can reason over. As models learn from both real-world feedback and simulated variation, they form compounding learning loops that improve robustness, planning, and generalization – broadening what AI systems can safely do in the physical world.
Designing the Right Synthetic EnginesThe FDE-heavy deployment model will face a margin reckoning, and spawn new tooling.
Forward-deployed engineers have been critical for early enterprise AI adoption, but they are expensive and hard to scale. As budgets tighten, CFOs and CTOs will scrutinize the margins these models consume. This pressure will catalyze a new class of software built specifically for deployed engineering organizations. These tools will capture product signal, automate customer feedback loops, and systematize what today lives in Slack threads and tribal knowledge. In effect, these tools will become an operational layer for human-in-the-loop deployments. The goal is leverage: enabling one FDE to oversee ten or twenty accounts instead of one or two. Over time, this infrastructure will determine which AI companies can scale beyond bespoke implementations. Deployment efficiency will become a competitive moat.
Maybe It Shouldn’t be an FDEPricing AI based on human labor replacement will dramatically shift toward pricing AI outcomes and upside.
Up until now, AI pricing has mostly leaned on labor arbitrage as a convenient entry point for buyers. But as agents proliferate, value will no longer be benchmarked against human headcount. Instead, buyers will compare one agent against another, or against the opportunity cost of not automating at all. Labor-based pricing is fragile because it caps upside and commoditizes differentiation. AI effort doesn’t map cleanly to hours worked; ten minutes of deep planning and tool use is not equivalent to ten minutes of simple classification. Costs are lumpy, nonlinear, and driven by inference depth, retrieval, and orchestration. As a result, efficiency narratives will plateau. The finish line is moving from cost reduction to business expansion. The enduring pricing models will anchor on outcomes, risk, and revenue ceilings that can keep rising.
Measure the World Like Your BuyerWe will see the rise of batch AI systems across enterprise settings.
While real-time generative AI gets the spotlight today, many of the highest-value enterprise use cases are fundamentally offline. In batch systems, latency is traded for depth, context, and accuracy. A job may take hours, but it can pull in massive datasets, enrich metadata, filter noise, and run multiple models in sequence. Cheap models handle extraction and structuring; more powerful ones synthesize final outputs. What emerges is executive-quality analysis rather than chat-grade answers. These systems are especially powerful for reporting, investigations, audits, and strategic planning. As enterprises realize this, batch workflows will proliferate across critical functions.
The Retrieval Backbone of Modern AIOrchestration will matter more than intelligence.
Competitive advantage is shifting from model size to system design. Outcomes will be determined by how effectively models are wired into tools, context, and feedback loops rather than how large or expensive they are. Synthetic data, verifiable rewards, and multi-turn objectives are redefining how systems learn from their own outputs. This dramatically lowers the cost of personalization and allows systems to align directly with real-world performance metrics. Taken together, these shifts will continue to push the frontier toward smaller, cheaper, specialized models that cooperate through orchestration layers.
From Models to SystemsThe most thoughtfully designed AI products will be almost invisible to their users.
Winning AI products won’t feel like “AI products” at all. The best designs will disappear into existing workflows, making work feel faster, cleaner, and more reliable without demanding behavioral change. Trust is built through subtle reassurance mechanisms: confidence indicators, comparisons to the old way, or simple green-yellow-red signals. These cues emerge from sitting next to users and watching how work actually happens. Orchestration layers will matter more than standalone interfaces. Only after trust and dependence are established can products pull users into deeper platforms.
Invisible by DesignAgent-native transaction infrastructure will become core enterprise plumbing.
Today, most agents can reason, recommend, and trigger workflows, but still rely on humans to execute economic actions. In the background, however, enterprises are already granting non-human actors scoped permissions, automated purchasing authority, and auditable execution inside finance, infrastructure, and procurement systems. In 2026, these fragmented capabilities will consolidate into purpose-built layers for agent identity, authentication, permissions, and payments. Agents will operate with dedicated wallets, programmable spend limits, policy-driven approvals, and immutable audit trails. Agents will be able to buy, sell, reserve, cancel, or negotiate only inside predefined constraints tied to business context and compliance requirements. Much of this infrastructure will live deep inside enterprise workflows, tightly coupled to approvals, compliance, and business context. This will unlock a new class of automation.Litigation will become the dominant pressure point in insurance, and will thus lead to a rise of new companies focused on AI-native insurance claims defense stacks.
Modern claims are increasingly data-dense, spanning telematics, sensors, medical records, vendors, and regulatory disclosures. At the same time, decision timelines are compressing. Plaintiff firms are using AI to surface inconsistencies, reconstruct timelines, and identify comparable case law earlier than carriers can respond. Many insurers still rely on manual workflows designed for human pacing. In 2026, we will see the rise of AI-native insurance claims defense stacks that assemble evidence, generate narratives, and model exposure continuously. In many cases, this happens before litigation is formally filed. Outcomes will be decided weeks earlier (or more) than they are today.A new wave of AI native compliance will emerge.
A new wave of compliance companies will emerge to address a fundamentally new class of risk created by AI systems operating in the real world. Industries like robotics, construction, and insurance are especially ripe for this shift. Traditional compliance frameworks were built for static software and human decision-makers; they break down when autonomous or semi-autonomous systems act continuously, adapt over time, and interact directly with physical environments or financial outcomes. As a result, compliance will move from checklist-based controls to continuous, system-level assurance to monitor how AI behaves in production rather than what it was designed to do on paper. These companies will provide always-on audit trails, policy enforcement, and real-time guardrails, effectively translating regulations into executable constraints inside AI systems.
AI for the Physical World: Modeling, Measuring, and Governing What We Make
Compliance Is Construction’s Next Data Problem
The $400B Testing, Inspection, and Certification Market
I keep coming back to a line from Andrej Karpathy’s 2025 LLM Year in Review:
Will the LLM labs capture all applications or are there green pastures for LLM apps? Personally I suspect that LLM labs will trend to graduate the generally capable college student, but LLM apps will organize, finetune and actually animate teams of them into deployed professionals in specific verticals by supplying private data, sensors and actuators and feedback loops.
That framing resonates deeply with how I’m thinking about 2026. The opportunity isn’t just in better models, but in the vertical-specific systems that organize, constrain, and operationalize them. The next generation of enterprise AI will be built by companies that own the orchestration layers around models as well as the infrastructure that allows those systems to be trusted to act. Accuracy, auditability, economics, and governance are key unlocks. And I’m increasingly optimistic that many of these pieces will come together over the next 12 months.


