AI’s Next Phase: From Research to Real-World Value

 

If you want to understand just how quickly AI is moving from the lab to the real world, look no further than the past 12 months. The pace is staggering: models that once took months to trickle into developer hands are now available on open platforms the day before their official release. Research papers are dissected, replicated, and implemented by global teams within hours, not quarters.

I had the chance to explore this phenomenon during a live discussion with AI2 CEO Ali Farhadi, Luke Zettlemoyer of Meta and UW, and Brian Raymond, Founder & CEO of Unstructured, during our Annual Meeting in March. Their insights crystallized something I’ve been thinking about for months: this isn’t just a story about speed — it’s about the convergence of research, open source, and developer energy, and how that’s changing the way we build, deploy, and extract value from AI.

So why does this matter for anyone building, investing, or leading in this space? Because the next phase of AI isn’t about incremental benchmarks or clever demos. It’s about how the fusion of research breakthroughs and developer ingenuity is creating entirely new capabilities — multimodal agents, domain-specialized models, and infrastructure that turns “vaporware” into production value. If you’re building, investing, or leading in this space, here’s what you need to know about where we are, what’s working, and what’s coming next.

Research Velocity Is Now Measured in Hours

Luke Zettlemoyer of Meta and UW - on DeepSeek, MCP, Reasoning, multimodal agents, domain-specialized models
Ali Farhadi opened the conversation with a data point that stopped me: new models are now served on production platforms the day before they’re released. And with 6,000+ AI papers dropping on arXiv daily, even staying aware of what’s out there has become a full-time job. Luke Zettlemoyer described how Meta’s teams had to scramble the moment DeepSeek was released — open-weight, more efficient, and already in developers’ hands before the marketing cycle even started. That speed isn’t just a novelty — it’s a competitive forcing function. If your teams aren’t paying attention in real time, you’re already behind.

Takeaway for founders: Build muscle for rapid technical absorption. If you’re not reading and replicating key papers within days, your competitors are. Invest in dev tools and culture that make this possible.

From Vaporware to Value: Multi-Agent Systems Are Real

The last year has seen a decisive shift from text-only chatbots to agents that are “hooking up to everything, interfacing with everything, all multimodal, much more integrated-not so isolated only in text interactions,” Luke explained. The rise of multi-agent systems, once considered vaporware, is now practical enough to build real companies around. The Model Context Protocol (MCP) is just one example of how developer activity is exploding around new standards for connecting models to tools and data.

Brian Raymond made a compelling point: 18 months ago, multi-agent systems were barely more than slideware. Today, companies like his are deploying real-world systems powered by coordinated agents, thanks to improved tooling and open-source contributions. At Unstructured, for example, the ability to build pipelines that transform unstructured enterprise data into structured, model-ready inputs is unlocking a new generation of use cases.

Brian Raymond of Unstructured.io - on DeepSeek, MCP, Reasoning, multimodal agents, domain-specialized models, multi-agent systems

Takeaway for founders: Multi-agent systems are no longer speculative — they’re actionable. If you’re building in AI, look beyond single-model interactions. Embrace emerging standards like the Model Context Protocol to connect models with tools, data, and other agents — and turn orchestration into advantage.

The State of Reasoning: Progress and Limits

So, what can today’s models really do? The panel was clear: we’ve made real progress, especially in narrow domains like math or law, where focused training on deep pools of domain data can “absolutely dazzle your users,” as Brian put it. Startups like Harvey AI — built on tight, vertical datasets — are demonstrating just how much value can be unlocked when the scope is constrained.

Ali framed it well: “When you narrow down the scope, you can do well. But is it generic enough to deploy at scale? I am not convinced. In other words, the more we zoom in, the more capable these systems appear. But zooming out — deploying in messy, cross-functional enterprise environments — is still a dogfight.

Luke Zettlemoyer offered a helpful frame: today’s models are essentially compressing the internet. “They’ve seen so much data, they can recreate study guides, solve law school problems,” he said. “But we’re still far from real novel reasoning or inventing new things that open up whole new fields.”

Ali Farhadi of AI2 - on DeepSeek, MCP, Reasoning, multimodal agents, domain-specialized models, multi-agent systems

Ali echoed that concern, emphasizing the need for better taxonomies and evaluation methods to measure what’s truly reasoning versus sophisticated pattern recognition. And when we move into less structured, multimodal domains — let alone embodied agents in the real world — those gaps only grow.

Takeaway for founders: Reasoning exists — but only in tightly scoped arenas. If you want your AI to deliver real value today, build narrow and go deep. General intelligence is still out of reach, but specialized intelligence can create meaningful business outcomes now.

Open Source: More Than Just Weights

Open source AI isn’t just about giving everyone access to model weights — it’s about unlocking both research transparency and practical customization. Enterprises may say they’re not planning to train or tune their own models, but in practice, they’re learning that’s where the real leverage lies.

Ali Farhadi challenged a common assumption: “The key power of these models is in tuning. Over the last year, we’re learning how much you can squeeze out of post-training to make models significantly better in your domain… The boundaries between pre-training and post-training are already very diffused.” That diffusion is where things get interesting — and difficult.

Brian Raymond of Unstructured, Luke Zettlemoyer of Meta and UW, Ali Farhadi of AI2 and Madrona's Jon Turow- on DeepSeek, MCP, Reasoning, multimodal agents, domain-specialized models, multi-agent systems. production-ready AI Research

Tuning isn’t as simple as throwing your data at a model. Enterprises quickly discover that improving performance in one area often causes regression in another. The solution? Sophisticated post-training pipelines that blend reinforcement learning, verifiable reward signals, and even a splash of pre-training data to preserve generality. These aren’t plug-and-play workflows — they’re experimental, iterative, and deeply dependent on visibility into the model architecture and behavior.

That’s why open source matters. It gives builders the transparency and flexibility to diagnose, adapt, and evolve models as their needs shift. And beyond the enterprise use case, it’s a lifeline for researchers trying to understand how these systems actually work.

Takeaway for founders: Open source isn’t just a deployment strategy — it’s a tuning strategy, a research accelerator, and a competitive advantage. If you want to push a model to its full potential, you’ll need transparency, tooling, and a tuning loop you actually control.

The Next Frontier: Customization, Efficiency, and Real-World Impact

The panel’s consensus: the next wave of AI value will come from those who can specialize, customize, and efficiently deploy models for real-world problems.

That means:

  • Embracing the pace of research and developer innovation.
  • Building infrastructure that turns multimodal, multi-agent, and domain-specific models into production systems.
  • Investing in evaluation and understanding, not just model size or demo quality.
  • Recognizing that the “one model to rule them all” era is over, future winners will be those who can orchestrate, tune, and combine models for specific needs.

The Bottom Line

We’re in a new era for AI — one where research, open source, and developer energy are converging to create real-world value at unprecedented speed. The winners will be those who can sift signal from noise, specialize and tune for real problems, and build the infrastructure to make it all work at scale. The future isn’t just happening faster — it’s being built by those who can keep up.

Let’s get to work. If you’re building, investing, or leading in this space, I’d love to hear from you. The next phase is here — and it’s only just begun.

Related Insights

    Winning the Wedge: The Flywheels for Durable AI-Native Companies
    What MCP’s Rise Really Shows: A Tale of Two Ecosystems
    Madrona’s 30-Year Journey That Has Just Begun

Related Insights

    Winning the Wedge: The Flywheels for Durable AI-Native Companies
    What MCP’s Rise Really Shows: A Tale of Two Ecosystems
    Madrona’s 30-Year Journey That Has Just Begun