Madrona Partner Jon Turow hosts Gabriel Hubert and Stan Polu, the co-founders of a 2023 IA40 winner, Dust. Dust enables information workers to create LLM-based agents that help them do their own work more efficiently. Gabe and Stan go way back, having started a data analytics company that was bought by Stripe in 2015 where they both stayed for about five years. After stints at a couple other places like OpenAI for Stan, they decided to come back together in 2023 to work on Dust.
In this episode, Jon talks to Gabe and Stan about the broader AI/LLM ecosystem and the classic adoption curve of deploying models within an enterprise. They talk about Dust’s bet on horizontal versus vertical, the builders of tomorrow, limitations of AI assistants, and so much more.
This transcript was automatically generated and edited for clarity.
Jon: You have made a fundamental product choice that I’d like to study a little bit, which is to build something horizontal and recomposable. Depending on what choice you made, I imagine we’d be having one conversation or the other. On horizontal, on composable, you’re going to say, “There’s so many use cases. Flexibility is important, and every different workload is a special snowflake.”
If you had decided instead to build something vertical, it doesn’t have to be AGI, but you’d be saying, “To really nail one use case, you need to go all the way to the metal.” You need to describe the workflow with software in a very, very subtle and optimized way. Both of these would be valid. How do you think about the trade-off between going vertical and going horizontal? How have you landed on that choice that you made?
Stan: It comes with complexities and risks to go horizontal. At the same time, if you assume that the product experience will evolve so fast, it’s also the best place to do that kind of exploration, that playground for product exploration. It’s the intersection of all the use cases at the intersection of all the data. During the current age of GenAI where we are in a conversational agent interface, we are making the right bet. It means we are on par with the UX, but we have access to much more data and much more flexibility.
Empirically, we have checked that hypothesis. This means that some of our users have been confronted with verticalized solutions. We beat the verticalized solution in many cases because we provide more customer stability. Customer stability comes in two flavors. First, provide instructions that better match what the user wants to do in their company, depending on the culture or existing workflows. Second, being able to access multiple sources of information.
On some use cases, having two sources of information and one being the one associated with the verticalized use case, you are much better off because you have access to all the information that lives on Slack, all the information that lives on Notion that’s related to the best case use case. Many of them could be sales and customer support use cases. That enables people to create assistants that are much better by being able to tap into that information that the verticalized SaaS products will never be able to tap into. As a verticalized product, it will be either the incumbent or somebody building on an incumbent, somebody building on a customer support platform, somebody going on the sales data platform, whatever it is.
In that interim, where the UX is being defined, we have an incredible advantage of being horizontal. That may close in the future because as we realize what it is to use models in a particular use case efficiently, then it’s all going to be about not getting a text, but it’s going to be about getting an action ready to be shipped. It’s about being able to send an email to prospects automatically from the product, etc. There, we might have a steeper curve to climb with respect to the verticalized solution because they’ll be more deeply integrated and have the ready-to-go action built probably before us. It’s still an open-ended question. As it stands today, it creates a tremendous advantage.
We’re still in the process of exploring what it means to deploy that technology within companies. So far, that bet has been good, but that’s the most important product purchase we’ve made. We’re checking it every week, every month, almost every day. Was that the right choice? We can comfortably feel that it is the right choice today. We also realize we’re inside an ecosystem that is on moving ground. That’s something that has to be revisited every day.
Gabe: Some of the convictions from data access being a horizontal problem rather than a vertical problem when you distribute to a company have helped with that choice. You can convince a CISO of the validity of your security measures and practices. It’s just as hard to do whether you need one sliver of that data set or all. Your bang for the buck is better when you want to play with the more concrete use case of having access to different sources of information to deliver on a fairly simple example. Imagine you had access to a company’s production databases. You could generate very smart queries on those production databases any day of the week. That’s a product that we hope to see today with these SQL editor interfaces that everybody in the team is supposed to use.
Where in that product is the half-page document describing an active user? Or what a large account is? What do we consider to be a low performance on customer support scores? Those documents live in a different division as the company grows. It’s a division that doesn’t even know a code version exists. It’s meetings where their definitions are updated in a very business-driven setting. That constitution of what an active user is in a separate set of data.
For somebody within the company at a low cost, to be able to ask an agent a question about the number of active users in Ireland or the number of active and satisfied users, you have to cross those data sets. That’s almost systematically the case. A lot of the underperformance you could look at or a skeleton audit in companies today comes from this siloed conversation. To us, being excited to start another company, trying to build an ambitious product company with the experience we’ve had from companies that have seen traction, that have grown fast, that have burnt out, incredible individual contributors that start seeing the overhead and the slowness and how tricky it is to just get a simple decision pushed through because nobody has the lens to see through all these divisional silos, it seems more exciting too to build a solution to that problem. When we pitch it to people who’ve seen and felt that pain, that’s the spark that goes off.
It is a risk. But the people excited with that type of software existing in their team, I argue that they were excited to build for the years to come. Come back in five years, and let’s see if we were right on it.
Jon: Oh, come on. It’s going to be one year.
Gabe: Yeah. That’s true. Fair enough. Fair enough.
Jon: It’s so tempting to project some Stripe influence on the horizontal strategy that you guys have taken. Before I hypothesize about what that would be, can you tell me what you see? What is the influence of how Stripe is put together on the way you’re building Dust?
Stan: In terms of how Stripe was operating, which influenced us in defining what we were to build with Dust, there is a lot of creating a company OS where people have the power to do their job at that desk, which means having access to information and a lot of trust. Some of that has trudged to our product decisions. We’ve built a product that’s not a great fit for a company with granular access control on who has access to that very small folder inside that Google Drive. Those people added only manually on a case-by-case basis.
We are optimistic that the companies that will work the best in the future are the ones that make their internal information available. That is, at the same time, a great fit for building AI-based assistants.
Gabe: Regarding the product, there’s a ton of inspiration from what Stripe did to developers. It gave them the tools to have payments live in the app before the CFO managed to meet with an acquiring bank to discuss a payments pool. It was like if the developer came to the meeting and said, “It’s life. We’ve already generated a hundred bucks in revenue. What are the other questions?”
I think if we can build a platform that puts to bed some of the internal discussions as to which provider of a frontier model we should go and trust for the next two years, a builder internally says, “I just built this on Dust, and we can switch the model in the back if we feel like it’s going to improve over the next months.”
That’s a scenario that the aggregation position is a good one. It requires you to be incredibly trusted. It requires composability. It does mean a set of technical decisions that are more challenging locally. But it enables, optimistically, I think to Stan’s point, some of the smarter, more driven people to have the right toolkit. That’s something that we take from Stripe. Stripe was not a fit for some slower companies when we were there, which ended up okay.
Jon: When we think about the mission you folks are going after, there’s so much innovation happening at the model layer. One thing that we’ve spoken about before is there’s a lot you can accomplish today. When we start talking about what it is that you can accomplish with Dust, can you talk about the requirements that you have for the underlying tech?
Gabe: One of the core beliefs we had in starting the company that essentially dates back to Stan’s time at OpenAI and his ability to be front-row seats at the progression of the various GPT models is that the models were pretty good at some tasks and surprisingly great at some tasks. We will face a deployment problem for that type of performance to reach the confines of the online world. The opportunity was to focus on the product and application layer to accelerate that deployment. Even if models didn’t get significantly more intelligent in the short term if you’re a little hesitant about calling an AGI timeline in the years to come, there’s an opportunity to increase the level of effectiveness people can have when they have a job that involves computers with the current technology.
In terms of requirements with the technology, for us, it’s let’s make the models that are available today, whether they’re available via API as business models or they become available because they’re open source and people decide to host them and make them available via API, et cetera, et cetera, and package them such that smart, fast-moving teams can access their potential in concrete day-to-day scenarios that are going to help them see value.
Stan: The interesting tidbit here is that the models will get better as research progresses. On their own, they’re not enough for deployment and for acceptance by workers. At the same time, the kind of use cases that we can cover depends on the capability of the model. That’s where it’s different from a classical SaaS exercise because you’re operating in an environment that moves fast. The nature of your product itself changes with the nature of the models that it evolves.
It’s something that you have to accept when you walk in that space — the hypothesis that you make about a model might be changed or might evolve with time, and that will probably require changing or evolving your own products as that happens. You have to be very flexible and be able to react quite rapidly.
Jon: There are two vectors we spoke about before when we’ve discussed this in the past. One is that if you stop progress today with the underlying models, there are years of progress that we can make. The other is that if we go one or two clicks back in history to say mobile apps, we saw that there were years of flashlight apps before the really special things started to arrive.
Where would you put us on that two by two of early models versus advanced and how much it matters versus not?
Gabe: What’s interesting is to talk about people who’ve been on the early adopter side of the curve, who’ve been curious, who’ve been trying models out, and who’ve probably been following the evolution of ChatGPT as a very consumer-facing and popular interface. You get this first moment of complete awe where the model responds in something particularly coherent. I’m asking it questions, and the answers are just amazing. Then, as I repeat use cases and try to generate outputs that fit a certain scenario, I’m sometimes either surprised or disappointed.
The stochastic nature of the output of the models is something other than what people think about at the very beginning. They attribute all of the value to pure magic. Then, as they get a little more comfortable, they realize that it might still be magic, or they might be unable to explain it technically. Still, the model isn’t always behaving in a way that’s effectively predictable enough to become a tool.
We’re early in understanding what it means to work with stochastic software. We’re on the left side of the quadrant. In terms of applications, the cutting-edge applications are already pretty impressive. By impressive, I mean that they fairly systematically deliver at a fraction of the cost or at a multiple of the usual speed, an outcome that is relatable with or on par with human performance.
Those use cases exist already. You can ask GPT-4 or a similarly sized performant model to analyze a 20-page PDF. In seconds, you will get several points that no human could compete with. You can ask for a drill-down analysis or a rewrite of a specific paragraph at a fraction of the cost of what you can ask on a Fiverr marketplace or an Upwork marketplace for some tasks. We already have that like 10x faster, 10x better.
In terms of broad adoption, especially by companies, if you take ChatGPT with a few hundred million users, that still leaves 5.4 billion adults who have never touched it and don’t even know what it means by some scale for sure. If you go into the workplace, there are very few companies that are employing generative artificial intelligence at scale in production that were not created around the premise of general artificial intelligence being available.
Some companies have been created in the last years and do, but most companies that existed before that timeline are exploring. They’re exploring use cases. They’re rebuilding their risk appetite and policies around what a stochastic tool might bring regarding upside and downside opportunities and risk. We’re still very early in the deployment of those products.
One indication is that the conversational interface is still the default that most people are using and interacting with when it’s likely that it shouldn’t just be conversational interfaces that provide generative artificial intelligence-powered value in the workplace. Many things could be workflows; many things could be CronJobs. Many applications of this technology could be non-chat-like interfaces, but we’re still in a world where most of the tests and explorations are happening in chat interfaces.
We still want the humans to be in the loop. We still want to be able to check or correct the course. It’s still early.
Stan: One analogy I like to use is that today, to recall what Gabriel just said on the conversational interface, it really feels like we are in the age of Pong, the game for models. You’re facing the model. You’re going back and forth with it. We still need to start scratching the multiplayer aspect of it. We haven’t yet started scratching and interacting with the model in new ways and more efficient ways.
You have to ask yourself, what will be the civilization for LLMs? What’s going to be the Counter-Strike of LLMs? That is equally important to dig into compared to model performance. The mission we set for ourselves is to be the best for our users to be the people who dig in that direction and try to invent what’s going to be the C-5 of interacting with models in the workspace.
Jon: Can you talk about the mission for Dust in the context of the organization that’s going to use it?
Stan: We want to be the platform that companies choose as a bet to augment their teams. We want it to be a no-brainer that you must use Dust to deploy the technology within your organization. Do it at scale. Do it efficiently. This is where we’re spending cycles. It’s funny to realize that every company is an early adopter today. The board talks about AI, the founders talk about AI, and so the company itself is an early adopter. But once you get inside the company, you face a classical adoption curve. That’s where product matters because that’s where the product can help deploy the companies through that chiasm of innovation inside the teams. We want to be the engine of that.
We’re not focusing on the models; we’re trying to be agnostic of the models, getting the best models where they are for the needed use cases. Still, we want to be that product engine that makes deploying GenAI within companies faster, better, safer, and more efficient.
Gabe: One of the verbs that we use quite a lot that is important is augmenting. We see a tremendous upside scenario for teams with humans in them. We don’t need to spend a lot of our time working for companies that are trying to aggressively replace the humans on their teams with generative artificial intelligence because that’s shaving a few percentage points of operational expenditure. There’s a bigger story, a play here, which is if you gave all of the smartest members of your team an exoskeleton, an Iron Man costume now, how would they spend their day, their week, their quarter? What could happen a few quarters from now if that opportunity compounds?
When we decide at Dust about different avenues to prioritize, one that’s consistently a factor is whether we are augmenting or replacing. By replacing, there’s a high likelihood that we, one, provide faster horses to people who see the future as extrapolated from the present. It’s like, “I need to replace my customer service team with robots because robots are cheaper,” when the entire concept of support and tickets as an interface between users and a company is to discuss how a product is misbehaving and may be challenged in the future.
It’s a tension for us because there are some quick wins or deployment scenarios that many companies are considering. It helps us explore and spend time on some of the cars instead of the faster horses scenarios dawning upon us.
Jon: I think it has implications not just for the individual workers, but to your point, Stan, and to your point, Gabriel, there’s going to be a difference in how the employees interact with one another. I’ve just put it to you one way. If I’m going to decide whether to join your company or not, and you’re going to tell me, “You should because I have Dust,” what would be the rest of that explanation?
Gabe: It’s a great point. I think that that’s an example we sometimes use to describe what an ambitious outcome for the company in a few years’ time or what an ambitious state of the world would be for our company in a few years’ time. If you take a developer today — the senior developer getting offers from a number of companies — and in the interview process getting to ask questions about how that company runs its code deployment pipeline. I can ask how easy it is to push code into production, how often the team pushes code into production, and what a review process looks like.
I can read a lot into the culture of that company on how it respects and tries to provide a great platform for developers. Today, developers are at the forefront of being able to read in the stack that the company has chosen, how they prioritize their experience. If you do not have a version control software that allows for pull requests, reviews and cloud distribution that works and is fast, I don’t think you’re very serious about pushing code.
We think that the future has more of the roles within a company having associated software. You could argue that, to a degree, Slack has created that before and after aspect, where if you’re applying at a company today and you ask how do employees check in with each other in an informal way to get a quick answer on something that they’re blocked on and the employer says, “We have a vacuum tube system where you can write a memo and just pipe it in one of the vacuum tubes that’s available at the end of the floor and you’ll get a response within two days,” that should help.
You’re like, “Okay, great.” I don’t think that real-time collaboration is prioritized. We think there’s a continuum of those scenarios that can be built. For us to be able to imagine a future where employees say, “Hey, we run on Dust.” We would love that to be synonymous with, “Hey, we don’t prioritize or incentivize busy work.” Everything that computers can do for you, which really computers should have been doing for you decades ago, we’ve invested in a technology that helps that happen. We’ve built a technology that helps burn through overhead and time sinks of finding the information, where it is, understanding when two pieces of information are contradictory within a company and getting a fast resolution as to which one should be trusted. The OS of that smart fast-growing team is something that we hope to be a part of the strategy for.
Jon: That’s such an evocative image of the vacuum tube. I actually bet if there were a physical one, people would like that as long as there was also Dust.
Gabe: It could be a fun gadget that people actually send employee kudos notes to at the end of the week and just team phrase updates.
Jon: What we’re talking about, though, is there’s a metaphor of the agent. Which is really in our 2023, and 2024 brain, we think of that as another member of the team. Maybe a junior member of the team at the moment. I think it was something you said, Gabriel. That the binary question of whether it’s good enough or not is actually not useful. But rather, how good is it? How should I think about that?
Gabe: Yeah. I stole it from the CIO of Bridgewater, who their communication around GPT-4 was compared to a median associate or analyst, I can’t remember what the name of the roles was. We believe that it performs slightly higher than a junior-level analyst on these tasks. Bridgewater is a specific company that has an opportunity to segment a lot of its tasks in a way that maybe not all companies are able to do.
As soon as you’ve said that, a number of logical decisions can be made around that. We often get asked about specific interactions that people have had with Dust assistants. Like, “Hey, why did it get that answer wrong?” I was like, “Assistants are getting a tough life in those companies because a lot of your colleagues get a lot of stuff wrong a lot of the time, and they never get specifically called out for that one mistake.”
That’s part of the adoption curve that we’re going through where it’s at the level of an interaction. You’re looking at a model that might be, on average, quite good and sometimes quite off. Instead of turning your brain off, you should probably turn your brain on and double-check. At the level of the organization, you’re getting performance that is, in specific scenarios, potentially higher than the median. Then, it was higher than if it got pushed to another team member for that specific set of tasks.
As models get better and the structural definition of the tasks we give them gets clearer, and as the interfaces that help feedback mechanisms get more and more used, those scenarios will progress. The number of times you feel like the answer is good enough, better than what you would’ve come up with on your own, or better than what you could have expected if you had asked a colleague.
One of the things that we systematically underestimate here is also the latency. Ask a good colleague to help you with something. If they’re out for lunch, they’re out for lunch. That’s two hours. General Patton is the one who says, “A good plan violently executed today beats a perfect plan executed tomorrow.” If, as a company, you can compound and rely on that compounding edge in terms of execution speed, the outcomes will be great.
Jon: What we’re talking about is assessing agents not by whether they’re right or wrong but by their percentile of performance relative to other people. Yet, there’s another thing that you both have spoken about: the failure modes will be different. It’s easy for a skeptical human, especially, to say that one weird mistake means this thing must be done.I don’t think it would be the first time in history that we’ve mis-evaluated what technology would bring us by judging it on some anecdotal dimensions.
Stan: Something interesting to jump back on your 2023 brain and how we might not be foreseeing correctly; there’s a massive difference between having intern-level or junior-level assistants where this is a human, so you want to leave the task to them entirely and leave some agency to them. The shape of tasks that can be given to that person is defined by their capability and the fact that they’re junior and have assistants where the agency is kept on the side of the person who has the skills. There’s a message difference between what you can do with junior-level assistants where you keep the agency versus just junior assistants for the humans.
It will be interesting to see how that automation and the augmentation of humans play out. It might be the case that it will be very different from adding 10 interns to a human and adding 10 AI-based assistants to human. It may well be the case that 10 AI assistants augment humans much more than having 10 interns. There’s going to be an interesting landscape to discover.
Jon: Depending on how you frame this, I’m either going forward or backward in history from unlimited manual labor to spreadsheets. A Dust agent reminds me in many ways of a spreadsheet.
Gabe: In terms of capability and the level of abstraction versus the diversity of the tasks, that’s not a bad analogy. It’s unclear if the primitives that currently exist on Dust are sufficient to describe the layer and space that we really intend on being a part of. If we are successful, Dust will probably retain some properties of malleability, the ability to be composable, programmable, and interpretable by the various humans that are using it, which does remind me of spreadsheets, too.
Jon: One thing that you see in your product today is a distinction between the author and the consumer of a Dust agent. It’s reasonable to expect there’s going to be a power law of distribution of more people consuming these things than creating them. Were there some way to really measure spreadsheet adoption? I’m quite sure we’d see the same. That a handful of spreadsheets, especially the most important ones, get created by a small number of us and then consumed by many more of us.
These things are infinitely malleable, and many people can create a silly one that is used once and thrown away.
Gabe: We see that today in some of our customers, who will see the assistants. I had a product manager admitting that they had created a silly assistant mixing company OKRs and astrology to give people a one-paragraph answer on how they should expect to be doing in the quarter to come. They were admitting that it was a distribution mechanism for them. It’s like, “I just want people to know how easy it is to create a Dust assistant, how easy it is to interact with it, and how non-lethal it is to interact with it, too.” There’s always that fear of use case, all of this usage scenario.
The reason we believe that they’re not going to be developers is that the interface has become a natural language in many cases; you’re essentially just looking at the raw desire for some humans to bend the software around them to their will. I think the builders of tomorrow with this type of software have more in common with very practical people around the house who are always fixing things and who won’t let a leak go for two weeks unattended, who’ll just fix the leak with some random piece of pipe and an old tire. It just works, and it’s great. That is seeing opportunity and connecting the Lego bricks of life.
One of the big challenges for companies like us is how to identify them. How do you let them self-identify? How do you quickly empower them such that the rest of the team sees value rapidly? One of the limitations of assistants like Dust within a company is access to the data that the company has provided to Dust. The number of people controlling access to the data gates is even smaller than the number of people who can build and experiment with assistants in some cases. How can a builder reach out to somebody at the company with the keys to a specific data set and say, “Hey, I have this really good use case? This is why I feel we should try it out. How could I get access to the data that allows me to run this assistant on it?” Those are all the product loops.
They have nothing to do with model performance. They have everything to do with generating trust internally about the company, the way the product works, the technology, and where the data goes, all these things that are product problems.
Jon: If I move half a click forward in history, you start to think about data engineering and how empowering it was for analysts to get tools like dbt and other things that allowed them to spin up and manage their own data pipelines without waiting for engineers to turn the key for them. That created a whole wave of new jobs, a whole wave of innovation that wasn’t possible before. To the point that now, it’s impossible to hire enough data engineers.
There’s this empowering effect that you get from democratizing something within a company that was previously secured — even if for a really good reason. I’m connecting that to the point that you made, Gabe, the data itself that feeds the agents is often the key ingredient and has been locked down until today. Based on the use cases that you’re seeing, this is going to be a fun lightning round. My meta question is, has the killer agent arrived? Maybe you can talk about some of the successes and maybe even some of the fun things that aren’t quite working that your customers have tried.
Gabe: I think that killer agent is a product marketable concept that you can slap on your website, and 90% of people who visit upgrade, regardless of their stage of company, their developing, et cetera, et cetera, I don’t think we’re there yet. They ask questions that the Dust, let alone an LLM without a connection to the data, would have no chance of answering.
Those are some interesting cases where I think we’re failing locally and tactically because the answer is not satisfying. Where I’m seeing weak signals of success is that people are going to Dust to ask the question in the first place.
On some of the use cases that we’re incredibly excited about, it’s almost similar situations, but with satisfactory answers, where people are asking surprisingly tough questions that require crisscrossing documents from very different data sources and getting an answer that they unmistakably judge as being way better than what they would’ve had by going to the native search of one SaaS player or by asking a team member, et cetera, et cetera.In some cases, the number of assistants that midsize companies generate on Dust is high. Do you see that as a success or a failure? Does that mean that you’ve been able to give the tools for a very fragmented set of humans to build what they need, and you interpret it as success? Or we’ve essentially capped the upside that they can get from these two specific assistants? That’s still one of the questions that we’re spending a lot of time on today.
Jon: If we go back to our trusty spreadsheet metaphor, there are many spreadsheets. They’re not all created equal.
Gabe: Yeah, it’s fine. Maybe it’s fine. Maybe yeah, not all spreadsheets need to be equal.
Jon: Thank you so much for talking about Dust and your customers. I think customers are going to love it.
Gabe: Awesome. Thank you very much for having us.
Stan: Thank you so much.
Coral: Thank you for listening to this IA40 Spotlight Episode of Founded & Funded. Please rate and review us wherever you get your podcasts. If you’re interested in learning more about Dust, visit www.Dust.tt. If you’re interested in learning more about the IA40, visit www.IA40.com. Thanks again for listening, and tune in a couple of weeks for the next episode of Founded & Funded.