Starburst’s Justin Borgman on entrepreneurship, open source, and enabling intelligent applications

Starburst CEO Justin Borgman

This week on Founded and Funded, we spotlight our next IA40 winner – Starburst Data. Managing Director Matt McIlwain talks to co-founder and CEO Justin Borgman about how launching his first company was like getting a Ph.D. in entrepreneurship, and then they dive into the customer problem Justin saw that made him believe the time was right to launch his second — Starburst. The two discuss open-source alignment, why making use of cloud partnerships early, especially cloud marketplaces, can be so beneficial for startups, why Starburst had to change the name of its query engine from Presto to Trino, and Justin’s guidance for creating a future-proof architecture.

This transcript was automatically generated and edited for clarity.

Coral: Welcome to Founded and Funded. This is Coral Garnick Ducken, Digital Editor here at Madrona Venture Group. And this week we’re spotlighting another 2021 IA40 winner. Today Madrona Managing Director Matt McIlwain is talking with Justin Borgman, founder and CEO of Starburst Data, which was selected as a Top 40 intelligent application by over 50 judges, across 40 venture capital firms. We define intelligent applications as the next generation of applications that harness the power of machine intelligence to create a continuously improving experience for the end-user and solve a business problem better than ever before.

These applications require enabling layers. And we’re delighted to have Justin on today to talk more about the enabling company he co-founded in 2017. Justin walks us through how launching his first company – Hadapt – was basically like getting a Ph.D. in entrepreneurship and then through the customer problem he saw that led to the launch of his second company – Starburst. Matt and Justin discussed why making use of cloud marketplaces early can be so beneficial for startups. Why Starburst had to change the name of its query engine from Presto to Trino, and Justin’s guidance for creating a future-proof architecture. But I don’t want to give it all away. So, with that, I’ll hand it over to Matt and Justin.

Matt: Well, hello everybody. I’m Matt McIlwain, I’m a Managing Director here at Madrona Venture Group, and I’m just delighted to welcome Justin Borgman, Founder and CEO of Starburst Data. Starburst is really behind the popular Presto-based open-source project called Trino that helps customers carry out complex analytics on disparate distributed data sources. We’re going to talk all about that here with Justin and, you know, Starburst was selected as one of the top 40 intelligent applications, as an enabling application. And as you’ll see, Starburst is very much the kind of the core of that. And one of the things we’re going to dig into today a bit is at what layer of abstraction this next generation of data enablers actually lives. But before we get into all of that, Justin welcome.

Justin: Thank you, Matt. You know, we’re honored to be selected, and it’s a pleasure to be here with you today.

Matt: I think it would be just great because prior to Starburst, you’ve done some really amazing things, and I think they kind of inform ultimately how you got energized and excited to create Starburst. Can you, for our audience, just walk us through the time before Starburst?

Justin: Yeah, sure. My journey, at least in big data and analytics, really begins back in 2010. So, 12 years ago with the founding of my first company, which was called Hadapt. And that business was really based on some research by the folks who became my co-founders in that company, Daniel Abadi and Kamil Bajda-Pawlikowski who were a professor and Ph.D. student at Yale University and co-wrote a paper called HadoopDB. And the basic idea of back in 2010 that they had, and really were pioneers with this paper — was could we turn Hadoop, which was becoming the data lake. In fact, the term data lake was really created in the context of Hadoop back then — could we turn that into a data warehouse? Could you actually run SQL analytics on data in Hadoop? Could you connect BI tools? Could you use this effectively as an open-source data warehouse? And I was in business school at the time. I had a computer science degree previous to that. I was a software engineer for the first few years of my career before going to business school. I read this paper, and I was like, this is the coolest thing ever. I walked over to the computer science department and talk to those guys into starting Hadapt with me, which was really the commercialization of that research.

Ultimately, we built that business over four years and learned a tremendous amount in that process, both in terms of the market but also as an entrepreneur, as a first-time CEO. Even though I was in business school, maybe my Ph.D. I guess you could say was going through that first startup. Cause there’s so much that you learn through experience that you really can’t read about and almost can’t be taught without going through it. And some of the lessons of that startup that we saw, and this was particularly evident to me when the company was acquired by Teradata in 2014. So, I became a VP and GM at Teradata. And one of the things that became very clear to me at Teradata, which is by the way, like the pioneer of the enterprise data warehouse, right. They’ve been around 40 years and they kind of created this concept of a single source of truth, get all of your data into one place. And what I found was that despite their success none of their customers had gotten all of their data into one place. And that was a really eye-opening moment to me that centralization might not be possible. If the leading company for 40 years couldn’t do it, why should we expect we can do it now? That got me thinking about the future of data warehousing in a more decentralized fashion. And that coincided with me meeting the creators of an open-source project at Facebook called Presto at the time. And we began to collaborate — Teradata and Facebook — which may seem like an unlikely pair. We started working on how we could make Presto an enterprise-grade solution, to really allow you to query data anywhere. And that was what excited me about the technology. It was a query engine for anything.

Matt: Wow. Can’t wait to dive more into that. It’s interesting, your observation about Teradata, which really was a pioneer in data warehouses and sort of this point of how hard it is almost more from a sociological perspective to get all the data into one centralized place. Was there also, as you learned more about Teradata, a technological constraint? And what did you find what’s? I mean, congratulations. I mean, it was incredible to build Hadapt and to be acquired by one of the really, truly great technology companies. But what was the constraints there, too?

Justin: Those are great questions. By the way, I want to put an exclamation point on the sociological piece. I think as technologists, we naturally think that – it was a great engineer and leader who gave me the advice maybe 10 or 12 years ago. He said, “There are no technical problems, only people problems.” And that has stuck with me because I think as technologists, we often underestimate that. But to your point on the technical side, and I would say this is maybe just part of a function of the business model of the day, Teredata sold their product as an appliance. And an appliance for anyone listening, who doesn’t know what an appliance is. It’s just hardware and software combined.

And the goal of an appliance is well-intentioned — it’s to provide simplicity to the customer. You just plug it in and go. But it also makes it very inflexible to the world that’s evolving around you. So, I think that was one of the challenges you were buying basically high performance, almost like a supercomputer database, and you were paying a lot for that as a result. So, you really couldn’t take advantage of increasingly low-cost commodity hardware, and then even more so, you couldn’t take advantage of the elasticity and the separation of storage and compute that the cloud provides. Incidentally, that was, I think, what really helped give rise to one of your portfolio companies, which is Snowflake, right?

Which really was the first to take advantage of that storage compute separation.

Matt: Yes. And then to effectively say, well, I’m going to let the cloud be the kind of underlying resource around which I can build an abstraction layer on top of that, which in that case was a cloud-native data warehouse. But you have, in a sense, taking a different approach, complementary but different. Bringing us back to the story of the founding of Starburst — tell us a little bit about the Presto team, maybe build on the beginnings of that story of that collaboration and how that led ultimately to the formation of Starburst.

Justin: Absolutely. Presto was first created by Martin, Dane, David and Eric. They all are here at Starburst of course today, but they created it in 2012 at Facebook and then open-sourced it in 2013. And it was really, one of the goals for them was to provide a much faster interactive query engine compared to Hive, which was the previous generation also created at Facebook by the way. So, Facebook was very much pioneers in open sort of data lake, data warehousing analytics. But Hive was not fast enough. Presto was designed to be much faster, and it had this really interesting abstraction where it was truly disconnected from storage, meaning that they were agnostic to data source. So it wasn’t just a SQL engine for Hadoop. It was a SQL engine for anything. You could query my SQL, you could query Postgres, you could query Kafka, you could query Teradata, you could query anything. That was what attracted me to it and began the collaboration. And you’re absolutely right, I think this is one of the hidden secrets of the Presto/Trino history. Teradata played a really important role in those early days in terms of making it by companies outside of Silicon Valley — companies who need access controls and security enterprise features.

Matt: Enterprise abilities and your insight to listen to the customer and understand that those abilities were going to be needed, especially when you’re talking about data and accessing data, you know, it’s a little before your time. One of the very first companies that I became familiar with at Madrona, and it was an investment we’d already made when I joined in 2000 was a company called Nimble Technologies. And this was a precursor, and it didn’t work to be candid. And part of it was the sociological reasons, you know, who moves my cheese, who moved my data. It was trying to do it in a way that was distributed like Presto and Starburst do, but there was so much concern about the abilities – the securability, the reliability, the availability that at that point in time, I don’t think the technologies were ready either, created the challenges. What were the early use cases that you were seeing? I mean, I’m sure there were some inspired by Facebook that were just so much: is such a problem, I’m willing to go take the risk on this new open-source project in this company, building a hardened layer on top of it.

Justin: Well, there are really two categories of use cases. I think, where the Silicon Valley internet companies at the time were using the technology and still do today, the Airbnb, Netflix, Lyft, LinkedIn, Twitter, Uber, Dropbox were effectively using this as a data warehouse alternative. Those companies deal with such a volume of data they just couldn’t possibly fathom buying expensive appliances, let’s say, to store all of this data and analyze it. And so this became the way that they ran all of their analytics. So that was one category— essentially, I have my data in a data lake. In the early days that was Hadoop. In the more recent years, that’s probably S3 on Amazon or Azure data lake storage or Google Cloud Storage. So, you know, I’ve got really cheap storage. I can store my data and open data format so I can use different tools to interact with it. I can train a machine learning model using Spark, and I can query it with Starburst or at the time Presto, which later became known as Trino. And the reality is, that has been a very core bread and butter use case. Some call that use case now a Lakehouse, basically doing a data warehouse in a data lake.

The other category though, which I think you’ll find interesting Matt, and was a big reason why we built a business around this. We were seeing that fortunate 1000 global customers had a slightly different need that I think actually we could only uniquely solve, which was the fact that they had data silos. They had data in a variety of different systems. So, if you’re a big bank, a big retailer, big healthcare company, particularly regulated industries, you have decentralization, and that’s never going to change. It’s just impossible, truly for those types of enterprises. So, what we were able to do is essentially join tables in different systems and give you fast results.

So maybe you’ve got product data or customer behavior data in a data lake, and you’ve got billing data or finance data in a data warehouse. And you want to be able to join these two together to understand how the customer behavior is driving profitability or revenue, or what have you. So those are classically living in maybe different data sources, and we can execute those queries in effectively real time or at query time and give you fast results. And some people will say, well, that sounds a lot like data virtualization of 10 or 15 years ago. The big difference here is that Trino and Starbursts are actually an MPP execution engine. MPP just means massively parallel processing. So, it’s running on a parallel cluster, not just one machine. And because of that, you can get performance and scale that you could never get with those previous generations.

Matt: And I think that was the technological limitation back in the day is that you didn’t have this MPP capability that has subsequently come along. And for that matter networks so that you could do that in a distributed way.

Justin: That’s exactly right. People ask me, “well, what’s different now.” It is those two points. It’s MPP and its network bandwidth. You’re a hundred percent spot on.

Matt: And so what’s interesting, there is that enables these big institutions to create their own intelligent applications effectively, or their own intelligent analytics platform. They may not turn it into an application. They made us choose to use it for some in-house continuous insights. Is that where you have found more of those types of use cases in contrast to somebody using Trino and Starburst as a platform to build an intelligent application as a service?

Justin: So, in house, I would say was definitely where the business started. And, really began with power users who really understand the data that exists in the organization and just don’t have the ability to access it or query it. It really started with like doing exploratory analysis. I’ve got an idea and I want to go test my hypothesis. I need to run some ad hoc queries and get results. And my goodness is going to take me weeks if I have to go to the data engineering team to create pipelines and move data and get it into our data warehouse. And I need to iterate at a much faster speed. So that time to insight was a real driver of early use cases. The other driver was a need for accuracy or freshness I guess I will say because we allow you to effectively skip ETL and we try not to be too dogmatic about this. We’re not saying that ETL is dead or we’re getting rid of ETL. It’s just that we make it optional. And there are going to be cases where it may be advantageous to just connect to your data source and query it rather than moving it. And that gives you some really interesting optionality as you’re doing your analysis.

Matt: And ETL, of course, meaning extract, transform, and load the data. It’s a set of preparations that make the data more queryable and more usable.

Justin: Exactly. So, with the classic data warehousing model pioneered by Teradata and of course Oracle and IBM, it was all about extracting, that’s the E of ETL, your data from the different data sources you have, doing some kind of transformation to normalize it or get it prepared and then loading it into this new enterprise data warehouse.

And that process, that ETL process, ends up taking a tremendous amount of time, particularly human time in terms of creating those pipelines and maintaining those pipelines. Cause you might add a new field in a source database, and now you need to go add that field in your data warehouse, and you’ve got to keep these in sync and so forth. That’s part of the disruption, I guess you could say that we’re offering the market – the ability to skip that process where it makes sense and just query the data where it lives directly.

Matt: Say more about that? Cause I do think that’s one of the transformative capabilities of Starburst. I mean, how do you do that?

Justin: At a technological level, the easiest way to think about our architecture is that we’re a database without storage. That’s the way I explain it to people. For database geeks, they’ll understand the full stack, you know, there’s this SQL parser, cost-based optimizer, query engine and execution engine, and often a storage engine where you’re storing the data. It’s the storage engine piece that we intentionally don’t have. And that’s what gives us a different perspective on really how we design and build the system where we are intentionally reliant on the storage systems that you connect to. And so, we connect to a catalog that you have either a universal catalog — some companies have all their data in one central catalog, and we partner with Alation and Collibra and Glue Catalog on AWS and so forth. Or you’re connecting to the catalog of the individual source systems — Teradata, Oracle, Hive Metastore and Hadoop — and that is effectively how we know where the data lives. And then our engine is going to execute that query, push the query processing down to where the data lives as much as possible to minimize traffic over the network and then pull back what’s necessary to complete the query, execute the join in memory. And back to that point about MPP — that parallel processing is what’s able to give it the performance and scale. Often I have these conversations with customers who maybe are hearing this the first time and they say, “This sounds too good to be true. How can you possibly do this?” It is that MPP aspect that makes this possible in a performant way.

Matt: And in that sense how should I think about where the quote “file system” lives or the data and metadata system that even if I’m not having to deal with the underlying storage, I still need to know the metadata about all the data that I’m trying to access, so I can do a query.

Justin: Different customers have slightly different approaches here. Some leverage a third-party tool, you know, like Alation or a Collibra, which might be a solution. Others maybe are just joining between data lakes and might be leveraging the Hive Metastore. To me, the lasting legacy of Hadoop is really the Hive Metastore. That seems to continue to persist even in the cloud age, if you will. Or, if they’re in an AWS stack, Glue Catalog is a great way of keeping all of your metadata across a variety of Amazon products in one place, we can leverage that we can collect statistics. Collecting statistics is really important because it allows us to optimize the way we execute the queries when we know how the data is laid out and where it lives.

Matt: That’s great. Maybe also so that people that are not familiar with these things, is this a read-only capability or is there a write-back capability? So, I do a query. I can do some analytics. I want to write something back to those underlying distributed data stores. Tell us about that.

Justin: That’s a really important question. And for anyone in the audience — the reason that question is so important is that historically, if we go back to my first startup, in the land of Hadoop, if you will — the early data lakes, you really couldn’t write data effectively. You couldn’t do updates and deletes. It was really designed to be an append-only system. You just keep adding more data to it, but you couldn’t modify the data that existed. And that was a real limiting factor for a lot of use cases. For example, one of the most popular examples is probably GDPR or other data privacy rules that say, look, Matt wants himself out of our database. He doesn’t want us to keep sending him emails. You have to go in and then remove Matt from the database. And that was very challenging to do in a data lake world. And, and that was one of the reasons, quite frankly, that necessitated that you still had to have a data warehouse in your ecosystem. You couldn’t just do everything in a data lake. Now that has changed in the last few years in a very important way on two levels. On both the query and the storage level. And I’ll explain what I mean by that.

So, first of all, on the storage level. There have been new table formats that allow you now in a data lake to make updates and deletes. And they’re really three that are important today. There’s one called Delta, which was created by Databricks. And then later open-sourced. There’s one called Iceberg, which is definitely a fast-mover. And, I would say keep an eye on Iceberg. That was built at Netflix and is used by many of the internet companies today. And then there’s a third one called Hudi, which came out of Uber. And all three of these approaches effectively allow you to do updates and delete. So no longer is this a limitation of a data lake model or a lake house model.

The other piece is on the query engine side, where over the last year or two we’ve added that on the query side. So now you can write data back. You can do updates and deletes in a data lake. You can even create tables in other data sources. We have some customers that use us as part of a cloud migration, where they’re taking data out of a traditional on-prem data warehouse and moving it into a cloud data warehouse and are able to do that through a SQL query engine effectively.

Matt: I’m going to pop this back up for a second to the open-source history here. So it starts out and you’ve got Presto and then I’m curious how it became Trino and then how the Starburst complements and works with the Trino ecosystem. And what are the types of things you’ve built for the commercial product that are complementary to the open source?

Justin: First of all, I’ll just say for me, as I was thinking about starting my second company, open source was an important criteria of the type of business that I wanted to build, because I think there are some really inherent advantages both for the company and customers. The first is, you get the benefit of contributions from a wide audience. I think that really enriches the technology and allows it to grow and evolve at a faster rate than perhaps a single vendor pushing it forward. And what I mean by that is, for example, in the early days the geospatial functions were created by the ride-sharing companies. We didn’t build those. I mean, maybe we would’ve gotten to it eventually. I don’t know, but they built that. So as a result, pretty much every single ride-sharing company in the world now uses this technology. The other benefit is it gives you very broad distribution. It is open source and therefore it is free. Let’s not mistake the fact that it is free. And like anything that’s free, people are going to download it and start using it and use it on a global basis. So, we’ve had customers in Asia Pacific, Europe, Africa, you know, everywhere from the earliest days of the business because of that distribution.

That was one of the lessons, painful lessons for me, actually, I learned in my first business, Hadapt. Although it ran on top of Hadoop, we were selling proprietary software and when Cloudera introduced Impala and that was free and open-source, included with the distribution. So, you know, that was really hard for us because we weren’t getting the same number of looks or evaluations if you will. The last piece I’ll mention on why open source is, I think for customers, it brings the benefit of not feeling locked in to a specific vendor. And I think at least in the data world that has been a historical pain point – where the Oracles and even Teradatas of the world effectively increased prices became very, very expensive and customers fell kind of captive by their vendors. The notion of an open-source project offers customers the freedom to potentially say, you know what, this vendor isn’t adding the value that I want, but I want to continue to use the technology. They have that flexibility. And this is another reason why I think open data formats are really good for customers because then your data is not locked into a proprietary format either.

So that’s a little bit about the kind of why open source. Then you asked the question about sort of Trino and Presto and how we interact with the community today. So, the original Presto was created at Facebook, as I mentioned by my co-founders and the creators effectively left Facebook, joined us and, in the process, created Presto SQL. And so, you actually had two Prestos — a lot of people didn’t know this, but there was Presto DB and Presto SQL. Unless you were really involved in the space, you know, potato/patato, I guess, for, for a lot of folks back then.

Matt: Yeah.

Justin: But, the reality was that the community effectively moved with Presto SQL. That’s where we were investing. That’s where LinkedIn and the other large community players were investing. The name change was more recent. That was a little over a year ago, and that was driven by a trademark issue because PrestoDB was, was the first name. It was created at Facebook, even though it was created by the folks here. It was created while they were employed there. And the way trademark law works, of course, is your employer owns the IP that you create when you’re employed. And so, basically, PrestoSQL had to change its name. So Presto SQL became Trino a little over a year.

Matt: Got it. That’s super helpful. And I think also helpful for the audience. So now we have you know, this open-source Trino and maybe connect the dots between the underlying open-source capabilities and what Starburst is building on top of that.

Justin: First of all, I will say that the open-source aspect of this is still very core to what we do. And my co-founders are deeply involved in the open-source community. And there is a real, I would say philosophical aspect to wanting to make the open-source project a hundred-year project. I think we look at Postgres maybe as a good example of a database created many, many years ago that is still super relevant today. And in order to do that, you have to really have a vibrant community and you have to be making sure that you’re continuously improving it in a meaningful way.

So, the majority of the performance improvements, scalability improvement — those go right into the engine. The engine remains 100% open source. We build our product off of that open source. We do not have our own proprietary fork. some open-source companies do things that way, we don’t. We build directly off of the open source. And what that means is that effectively, when somebody adds a new feature or capability to the open source, our customers are able to pick it up right away because we’re building off of that. But it also means that we’re continuously invested in the success of the open-source project, because the stability of the underlying technology impacts our own stability for our own customers.

So, we invest a lot of time and energy in that and continue to do so both in terms of code quality and testing and code reviews and so forth.

Matt: And that’s a great mindset to have for both the longevity of the underlying Trino open-source movement, and I think it also serves your customers very well. I know this is a simplification — When I think about another company — Databricks is to Spark as Starburst is to Trino, right? And so, in the case of Databricks, they have done some things to supercharge performance to create a managed service and then create a lot of integrations that make it easier to move things in and out of its managed service in the cloud. And then there’s some of these abilities, these commercial abilities that we’ve talked about that kind of wrap around all of that, that seemed to be some of the core things that you get in Databricks that you wouldn’t get naturally, in this underlying Spark open source. Are those the kinds of things that you all differentiate Starburst from Trino on or complement Trino on? Or how do you think about that?

Justin: In many ways. Yes. I think there are probably a little bit of subtle differences to the philosophy. My co-founders are very adamant that we not have different engines, like core elements of the engine. We just don’t do that in a way that Databricks, I think, does in a few areas. So, you’re getting the same core engine on the open source and Starburst. So, that’s maybe one difference. But I think there are a lot of common themes there. I mean, I think really what we’re trying to do is make the technology accessible, useful, and valuable to customers both in terms of the enterprise features and capabilities they need around security or access controls or connectivity to various different data sources — performance as well. We have this notion of materialized views, which is pretty cool, as well as making it just easier to deploy.

We started with a product called Starburst enterprise that is self-managed, meaning customers have to run it and manage it. That’s been very successful, but we just introduced Starburst Galaxy, which is intended to be super easy. And the beauty here of two products, we debated this a lot. Like, are we just pivoting this? Or is this two products? What does this mean? And it is intentionally two products with different criteria. And what I mean by that is Starburst Enterprise is an always will be intended to be maximally flexible to deploy in your environment, whatever you have. So you’re a big bank. You’ve got Kerberos, you’ve got LDAP, you’ve got, Oracle and Db2, and you’ve got all these different things. We’re going to make sure that enterprise works for you within your environment. Galaxy is optimized for ease of use and time to value. It’s kind of the difference between like Linux and your apple iPhone, right? Like iPhone is meant to be useful to even your grandmother, hopefully. That even she can get value out of it. Linux, of course infinitely flexible. And The way we’ve kind of approached those.

Matt: Just to make sure that I and our audiences are understanding Galexy, how similar is the analogy to kind of Mango Classic and Mongo Atlas, where Atlas is the cloud version — it’s a managed service it’s ease of use kind of dimensions to it. Is that a good analogy or not?

Justin: It is. I think it’s probably one of the best analogies. I would say Mongo and, and maybe Confluent are probably our top-two role models in terms of balancing self-managed enterprise product and a cloud product that are similar and different in important ways. To the point about Mongo and that being a great role model for us, we’re lucky enough to have, Carlos Delatorre the former CRO of Mongo as an angel investor very early on. I’ve learned a lot from him over the years. And then, we just hired as our CRO a guy named Javier Molina, who ran sales for that Atlas product specifically. And one of the reasons we were so attracted to him was because he understands that go-to-market motion, and we think that’s going to be really big for us in terms of the market. Today we do very well in the large enterprise. We think that this technology could be applicable to thousands of customers. Not, not just hundreds of customers.

Matt: That is a great hire because that Atlas product from a sort of a standing start four years ago now represents more than half of all of Mongo sales. It’s just incredible to see the team at Mongo in that way. But maybe take us a little bit into the decision to and then launch Galaxy and how that’s additive to both your existing customers and how it opens the door to some new customers.

Justin: I will preface by saying, and some of the audience may know this, we started Starburst as a bootstrap business. We didn’t actually raise venture right away. And that’s important context because, while I loved that part of the company’s history, and I recommend that to any founder who’s able to get a business off the ground that way initially. The one drawback, of course, is you don’t have the capital to go make huge technology bets necessarily. Right? We were funded by revenue. We were a profitable cash flow, positive business. So the moment that we did raise venture, a couple of years into it, that’s when we said, “Okay, we’re going to build this SaaS solution.” So, one part was like, it takes capital to build a SaaS solution, and that was an important trigger. The other motivator though, which kind of gave us confidence that this would work out, is that we were very early and making our self-managed product available on AWS Marketplace. And the reason I mentioned AWS Marketplace is that was a self-service way of buying and consuming our product.

Now it’s not a SaaS solution per se, but it is a self-service way of transacting, deploying via a CFT, and using our technology. What was very interesting to us, is we launched that when we were, I don’t know, 20 people, bootstrap, tiny little company, nobody had ever heard of us. And we did it mostly just because we thought the marketplace was interesting. It wasn’t necessarily any genius idea. Although, it looks maybe genius in retrospect. But what we saw with that was an organic adoption. We didn’t market our marketplace offering. We didn’t push our marketplace offering. We weren’t doing any outbound back then and we saw more and more people start to use it. What was really interesting about that was not only was it growing on its own without us really doing anything to it, also it was a very long tail of customers. And that was what kind of told us. Okay, we’re obviously having a lot of success with Fortune 1000, but there are companies using our stuff that I’ve never heard of before. And that’s super exciting.

Matt: Yeah, that’s awesome.

Justin: And so, for us, that was the signal that there was a market beyond what we were seeing at that point in time.

Matt: I would imagine that is, especially since it was a self-service offering, so, you know, somebody had to have some degree of technical acumen to kind of stand it up. And run it. Were they most often then running it in the cloud, I guess in theory, I could buy it in the marketplace and then operate it on my own desktop, too.

Justin: I think that’s true in theory, but, but you’re right, that it required some heavy lifting on their part. It was a real effort A) to find us and B) to deploy this, to stand it up and manage it all on their own. To us, it was kind of like, imagine how many people might use it if we could make this easy. And that was the motivation for Galaxy.

Matt: Say a little bit more about how it’s been working with, you know, the big cloud service providers to go to market with Galaxy.

Justin: It is actually available on all three major public clouds. And we designed it that way from the start. But, they’re great partners. And look, I’ll preface by saying of course there’s going to be some coopetition and overlap because every cloud provider has an enormous portfolio of products. So there are overlapping points. But at the end of the day, the field organizations, the sellers, just care about driving consumption of those clouds.

And that’s what we do. You know, the more queries you run on Starburst, the more AWS compute or Azure compute or Google compute, you’re consuming. So, they’ve been great to partner with that way. And the marketplaces, going back to that point, turn out to be a great transaction vehicle. I can’t stress this enough for any aspiring entrepreneur. Get your Ph.D. in marketplaces. And by the way, there are a lot of ecosystem partners now that help you with that, like Tackle for example.

Matt: Are you finding, I mean, I’m sure there are differences. Is there naturally better alignment with your products and the kind of customers you’re trying to reach, between the different cloud service providers or is it too early to tell?

Justin: Well, I think we partner with all of them. We enjoy working with all of them. If I was going to maybe single one out just a little bit, I would say that I think Google’s philosophy or approach to the market is interesting to me and well aligned to some of our own fundamental beliefs.

And what I mean by that is I think Google, as the challenger in the market, acknowledges, understands, and embraces that they’re never going to own all of the data in the world. And that’s important at least important, I think for me, and important for customers, because they’re willing to approach the market from the standpoint of not necessarily saying everything has to be in Google or, creating more freedom for customers to basically do different things in different clouds. They’re much more, I guess I would just say, open to the fact that it’s a heterogeneous world, which is a very core aspect of what we believe.

Matt: And so, to that end, do you find that whether it’s in Google or otherwise, that when I deploy Galaxy in somebody’s cloud, and I’m running it in the cloud, that I’m querying data sources that are back on-premise as part of the queries that I do. Or is it strictly the data that’s living in different data repositories or in a data lake in the cloud?

Justin: It can be either one. And that’s part of the power I think for customers is that flexibility, that optionality, that ability to modernize their architecture before they migrate. We’re not saying don’t migrate, but we’re saying we can give you access to everything you want today. And then you can migrate at your own pace, which I think is very powerful. And just to close on the Google point. We just announced a partnership that allows Google customers to leverage big query, to access data in different clouds, different data sources on-prem, etc., effectively extend beyond Google. And I think that’s an important thing to note as well.

Matt: I do think that this whole thing about data and really workload migrations, you referenced it a couple of times. You know, you and I have lived in the cloud and data world for decades now, and it seems like it’s still relatively early innings, but what are you seeing from a customer perspective, especially the enterprise customer, on their, kind of cloud migration journey?

Justin: I will preface by saying it varies. Some are further along in that journey. Some are just getting started. I think one of the biggest things that I find interesting and really try to drill into when I’m talking to customers is to what degree they think they are going to consolidate all of their data into one place. Because what I have seen, and I think this is a risk, so if there are any potential customers listening to this, keep this in mind. Customers have a fantasy, and I can understand why you would like this fantasy of saying, “Oh cool, we’re going to turn off all of these different databases that we have, this total mess that we have on-prem, and we’re going to just get it all into one cloud data warehouse.” And I’m not picking a Snowflake. I’ve heard the same story repeated with every one of the cloud data warehouses out there. My word of caution would be, we’ve seen that movie before over 30 or 40 years, and to the greater extent that you do do that, the more you’re beholden to a particular vendor, which is going to get expensive for you. What I like to remind people is, all these new companies are very charming and attractive today, but Larry Ellison was charming in 1979, and how many of you are still charmed by him today would be my question.

Right? So just be careful in that. Think from a long-term perspective. Create a future-proofed architecture — those would be just some of our pieces of advice.

Matt: That’s good advice. It might be one thing to say I’m going to retire your old employer, you know, Teradata data warehouse in favor of a more modern cloud-based data warehouse. But I do think it’s highly unlikely and ill-advised to think that you’d ever have all your data in one data store to rule them all as it were for all kinds of reasons. But that I think brings us to this data cloud alliance. I note that Google is a part of that, Databricks, Confluence, several others. What was the genesis? What are you trying to accomplish there in service of your customers?

Justin: It’s around trying to create openness, freedom for customers to be able to work in an interoperable fashion across the different clouds that they may participate in. This is another maybe fantasy that I’ll mention. A lot of companies, I think particularly those early in their journey, will say, no, no, no, we’re just doing one cloud. We’re not doing multicloud. We’re just doing one cloud. It’s all going in cloud X. And, the reality is that changes very quickly. One of the fastest ways that that changes is when you make an acquisition. You just bought a new company, and they’re cloud Y, so now your multicloud, whether you want it to be or not. We have a vested interest in trying to give customers choice and the freedom to operate across these different clouds. And I think Google is very forward-thinking in embracing that as well.

Matt: That leads to an interesting question. I mean, I like to think that, infrastructure as a service or kind of the core elements of cloud service providers, was an abstraction layer effectively on top of hardware. To kind of oversimplify it. But is there a new abstraction layer emerging that maybe we could think of as data lakes, data lake houses, cloud-native data warehouses, or how do you think about that layer of abstraction relative to infrastructure, and then relative on top of it to applications?

Justin: Abstraction is such a powerful vehicle I think for application developers, anyone building an architecture. Abstraction gives you a lot of freedom to change the components of course, underneath. For us, what we’re obviously most interested in is being that abstraction layer for SQL-based access to all of the different data sources that you have, so that you have the freedom to change those pieces. Maybe it’s Hadoop and Teradata today and tomorrow it’s S3 and Snowflake — great — so long as your applications, your BI tools, everything that speaks SQL are pointing to Starburst. And then you have the ability to make those changes underneath, around storage and effectively commoditize storage, which is also very powerful for customers. And there is an emerging name, or a category, if you will, that we’re pretty excited about, which is this notion of a Data Mesh, which is really sort of speaking to this idea of decentralized data and creating a mesh that, that sort of works across that. Now that is back to one of the first things you said on this podcast — there’s a sociological component to it. In fact, the creator of this concept is a woman named Zhamak Dehghani. And if anyone’s interested, I encourage you to buy her book. Actually, we’re giving it away for free on our website. But she describes it as a socio-technical sort of movement, if you will. Which is to say it is people, process, and technology altogether. But we think we can be the technology to enable that. The people and process side is very interesting because part of what that enables is the opportunity to decentralize not just access to data, but a decentralized sort of decision-making and ownership of the data. So, this is kind of like a way of putting more power in the hands of the data producers — the ones who are responsible for that data and know the data the best to also participate in the creation of data as a product that can be shared and consumed by others in the organization. So, it’s a really interesting philosophy one that we see certainly gaining a lot of attention, and I think be gaining more and more momentum over time.

Matt: We touched on some of the technological reasons around the why now. Is there evidence of the, why now on sort of these more sociological dimensions and how much has the fact that we all had to live in a digital-only world for a while? And we now believe, I think we all do, that we’re going to be living in a hybrid working world — has that been part of the why now that sociologically people are saying, “Hey, we just gotta change so we can do more of a decentralized approach,” or am I just kind of speculating here?

Justin: I think that’s right. I think the things driving that in my view are, are first of all, just complexity of data sources. We’ve got more data. Everything is collecting data, right? As we’ve digitally transformed, and the pandemic has only accelerated this, we have now more opportunities to analyze and understand and make data-driven decisions. But to do that, it’s just not scalable for everything to always run through one team, one person, one brain. And that’s where I think decentralization is a great way of giving you velocity by delegating and putting more power in the hands of individuals. And I think consistent with that, we operate in an ever more competitive world and companies have to adapt quickly. The speed of adaptation genuinely impacts your top line and your bottom line. So, I think these are some of the things that are driving serious thought around it.

Matt: That’s well said. I have just a couple of fun questions as we wrap up here, but I just wanted to see if there’s anything else that we didn’t cover. That’s important about what Starburst is trying to accomplish.

Justin: I would just say, you know, at the end of the day, what we’re trying to do, and I hope this doesn’t sound cheesy, but we want to do the right thing for our customers. We want to be on the right side of history. And that was one of the things that motivated me to found Starbursts in the first place was that my time in the database industry, up to that point, I met a lot of customers who just felt very trapped, locked in, they weren’t choosing their technology choices. Those choices had already been made and they were stuck with them. They were living with them. Philosophically this notion of freedom is just core to what we’re trying to do. I think you’ll continuously see that in all of our design decisions. We want to be able to support multiple data sources, multiple data formats, be able to operate anywhere. We want customers to be in control, and we think that’s a slightly different perspective than many in the database world at least have historically had.

Matt: I think one other thing that I was curious about is use cases around taking that freedom and distributed, decentralized approach, and then using some of those data sources to help train models from a machine learning perspective. And are you seeing kind of a growth in those kinds of use cases that Starbursts could help unlock?

Justin: Yeah, absolutely. And I always try to be clear that obviously, we don’t do machine learning. We don’t train machine learning models, but I think we’re a very important partner to that process because you need the data to train the model and the more access to data, the better your model is going to be. And so, getting data is the first step to ML and AI. And we think we’re an important part of that.

Matt: We agree. And that’s why we were delighted that, I mean, it was a very strong endorsement of you all being in this enabler bucket for the Intelligent Application 40, and we certainly see and know about those kinds of use cases. A fun question is outside of your company what’s a startup that you’re most excited about that’s related to this broader world of intelligent applications.

Justin: That is a great question. I think Clari is a really interesting example of this. Clari is really the interface that I’m using to understand my business because it ties in all the important aspects of what we’re doing and provides not only a great summarized view, but also predictive analytics about where we’re going to end up. And particularly as you scale, being able to forecast is so critical, especially in the path to an IPO, which we hope will be able to achieve in the next two to three years.

Matt: So. You’ve now been a successful founder, built two companies, Starburst is still a work in process, but you’re doing incredible things. What’s a lesson or two for those in the audience that are either on their own startup journey or considering the startup journey that had been really valuable to you, whether they’re kind of from your first-hand experience or advice from others or a combination.

Justin: Oh man. There’s a lot. I can say there. I think first of all, the advice that I give to any entrepreneur at any stage in the journey, particularly those that are just thinking about maybe being an entrepreneur. I think the single most important attribute is strictly perseverance. You have to have a high pain threshold and a willingness to push through that pain because is not for the faint of heart. It is not easy. I think just some people are built for that. They have the stubbornness, the drive, to push through that, and others get overwhelmed by it and bogged down. So, that’s kind of like a look inside yourself type of thing to evaluate and consider. The piece of advice I will give that I heard myself. I actually asked a now public company CEO founder, “Does this ever get easier?” Because as you’re building, you always think like, okay, at some point, like, I’m just, it’s just going to get easy, right? Like I’m going to be relaxing on the beach, this thing’s going to run itself. And he said, “No, it’s just different kinds of hard.” And that stuck with me because particularly as you scale, every new chapter has been a new challenge and in a totally different way. That’s part of what’s amazing about startups, I think, just from like a personal growth perspective. You are always having to improve yourself, scale to the next level. And so, that really stuck with me. It never gets easier, just different kinds of hard.

Matt: Different kinds of hard. I love that. I don’t know if I’ve heard it phrased that way. So, I really appreciate you sharing that with us, Justin, and yes, you’re always building these new skills for the next phase of the journey, too. And having to let go of things that you did more of so that you can empower others and scale the organization. It has been an absolute pleasure, Justin, visiting with you and incredible what Starburst has accomplished and your role as an enabler of all kinds of data analytics, including those things that go into building machine learning models and intelligent applications. So, thank you very much for taking time with us today and look forward to seeing the continued success of Starburst.

Justin: Thank you, Matt. I sincerely appreciate it. It’s really been my pleasure.

Coral: Thank you for joining us for this IA40 spotlight episode of Founded and Funded. If you’d like to learn more about Starburst, they can be found at To learn more about IA40, please visit Thanks again for joining us and tune in, in a couple of weeks for Founded and Funded’s next spotlight episode on another IA40 winner.

Related Insights

    RunwayML Co-Founder Cristobal Valenzuela on the Intersection of Art and Technology
    Founder Voices from Madrona’s 2022 Annual Meeting
    Qumulo CEO Bill Richter on the Benefits of Enterprise Partnerships
    How SpiceAI is Tackling the AI Tooling Gap with Luke Kim

Related Insights

    RunwayML Co-Founder Cristobal Valenzuela on the Intersection of Art and Technology
    Founder Voices from Madrona’s 2022 Annual Meeting
    Qumulo CEO Bill Richter on the Benefits of Enterprise Partnerships
    How SpiceAI is Tackling the AI Tooling Gap with Luke Kim