Numbers Station Founders on Applying Foundation Models to Data Wrangling

Numbers Station Co-founders Chris Aberger and Ines Chami talk applying the transformational power foundation models to data prep and data-wrangling.

This week, Madrona Managing Director Tim Porter talks to Numbers Station Co-founders Chris Aberger and Ines Chami. We announced our investment in Numbers Station’s $17.5M Series A in March and are very excited about the work they’re doing with foundation models, which is very different than what has been making headlines this year. It isn’t content or image generation – Numbers Station is bringing the transformational power of AI inside of those foundation models to the data-wrangling problems we’ve all felt! You can’t analyze data if the data is not prepared and transformed, which in the past has been a very manual process. With Numbers Station, the co-founders are hoping to reduce some of the bifurcation that exists between data engineers, data scientists, and data analysts, bridging the gaps in the analytics workflow! Chris and Ines talk about some of the challenges and solutions related to using foundation models in enterprise settings, the importance of having humans in the loop — and they share where the name Numbers Station came from. But, you’ll have to listen to learn that one!

This transcript was automatically generated and edited for clarity.

Tim: Well, it’s so great to sit down and be able to have a conversation here on Founded & Funded with Chris Aberger and Ines Chami from Numbers Station. How are you both doing today?

Chris: Doing great. Thanks for having us.

Tim: Why don’t we just start off and tell the audience what is Numbers Station? What exactly it is that you’re doing in your building?

Chris: So, Number Station at a high level is a company that’s focused on automating analytics on the modern data stack. And the really high-value proposition that we’re providing to customers and enterprises is the ability to accelerate the time to insight for data-driven organizations. We are all built around and started around this new technology of foundation models. I know it’s kind of the hot thing now, but when we refer to foundation models, we’re referring to technology like GPT-3, GPT-4, and ChatGPT, and bringing the transformational power of AI inside of those foundation models to the modern data stack and analytics workflows, in particular, is what we’re doing here at Numbers Station.

Tim: We at Madrona, we’re super excited to lead the financing in your last round that we announced not too long ago. And those who’ve been listening to our podcast know that we’re all in on foundation models and GenAI, and we think Numbers Station is one of the most exciting teams and approaches that we’ve come across. So, we’re excited to dig in with both of you here. Maybe tell us both a little bit about your background. How did you meet? How did you come up with this idea for a business?

Chris: Yeah, so we both met, and I’ll let Ines jump in here in a minute because she’s the brains behind a lot of our technology that we have at the company. We all met at the Stanford AI Lab, so we were all doing our Ph.D.s on a mix of AI and data systems. So that’s where I met Ines, as well as Sen Wu, who’s another co-founder, and then our fourth and final co-founder is Chris Re, who was our adviser in the Stanford Lab. We came together a couple of years ago now and started playing with these foundation models, and we made a somewhat depressing observation after hacking around with these models for a matter of weeks. We quickly saw that a lot of the work that we did in our Ph.D.s was easily replaced in a matter of weeks by using foundation models. So somewhat depressing from the standpoint of why did we spend half of a decade of our lives publishing these legacy ML systems on AI and data. But also, really exciting because we saw this new technology trend of foundation models coming, and we’re excited about taking that and applying it to various problems in analytics organizations. Ines, do you want to give a quick intro on your side and a lot of the work that you did in your Ph.D.?

Ines: Yeah, absolutely. Uh, and thanks for having us, Tim. So, my background is, as Chris mentioned in AI, I did my Ph.D. at Stanford with Chris Re. My research was focused on applying AI and machine learning to data problems like creating knowledge graphs, for instance, finding missing links in data using embedding-based approach. So, these were the more traditional methods that we were using prior to foundation models. And toward the end of my Ph.D., applying techniques like foundation models and LLMs to these problems. And we realized, as Chris mentioned, that it made our lives much easier. And so that’s where we got really excited and started Numbers Station.

Chris: Ines is being modest, I’ll just throw in a quick plug there, on some of the work that she did. She was actually one of the first people to show by using these foundation models like GPT, that you could apply them and replace a lot of the legacy systems, some of which we built, as I alluded to earlier, on various different data wrangling and data preparation problems. She authored the seminal paper that kind of came out and proved that a lot of these things were possible along with some other team members that are here at Numbers Station, but she has really been at the forefront of a lot of what you can do with these foundation models.

Tim: That’s awesome and it’s a bit of a feeling like getting the gang back together again and how Madrona got involved and how I met both of you. Chris Re had a previous company called Lattice Data that we were fortunate to invest in. where I originally met Chris. It ended up being bought by Apple. And The Factory was the original investor and sort of incubator for the company and Andy Jacks had been the CEO of Lattice Data, it ended up being bought by Apple. And then there’s Diego Oppenheimer, who introduced us all, and he’s another board member, part of The Factory, and former CEO of Algorithmia, which was another investment. So, you know, many times, we invest in brand new founders that we had never met before and had no connections with. In this case, there was some nice surround sound, and to build on your point, Diego first sent me a demo video and was like, “Hey, you’ve got to check this out.” And I thought what you were doing was pretty magical. Then read your data wrangling paper, Ines, and some of the other papers you wrote, and I was just struck by how you’re a team that brings together cutting-edge fundamental research with the business problem that we’ve seen to be red hot and has been a glaring pain point for many years, along with bringing to bear a differentiated, defensible technology in this space, which we’ll talk about. So little bit of the background as well from our end. But it’s so fun to be working together with both of you and the rest of the incredible team that you’ve begun to build.

So, Chris, you mentioned upfront data analytics, maybe say more about that. Why did you choose data analytics? You came from the Stanford AI Lab. Literally the crucible of the research around foundation models, I think coined the term foundation models. Why did you pick this problem? And, uh, then tell us a little bit more specifically about the persona that you’re going after here initially with Numbers Station.

Chris: Yeah, so when we were looking at where we wanted to take this technology and apply it, there were a couple of different observations that we made and why we decided to go into the data analytics space. The first is something near and dear to our hearts. You can look at all of our backgrounds, Chris Re and myself in particular, and we all have a mix of databases plus cutting-edge AI and ML. Data analytics is this nice sweet spot that’s near and dear to our hearts that we’ve all been working in for the better part of our careers. The other observations that we made are we looked at the data analytics space and a lot of the tools that are out there, and we still saw people that were in a ton of pain. So, we looked at what the practitioners were doing today and there were still so many hair-on-fire problems in terms of getting their data in the right format such that they can get usable insights out of their data. And so, this really excited us that there are a lot of tools that have entered in this space, but there’s still a lot of pain from customer’s perspective in terms of their day-to-day jobs.

We’re really excited about taking this transformational technology and applying it to those kinds of hair-on-fire problems that we saw with different customers. And the third point — this one’s changed a little bit since the space has become so hot in, let’s say, the last three or four months, but when we were starting the company, we looked at where most of the AI talent was flocking. So like, where are the Ineses of the world going? And a lot of them were going to, image generation or content generation or image detection, things of that nature. So, for lack of a better word, kind of sexier applications, not how do I normalize the data inside of your database?

So we saw this talent mismatch, too, in that we could bring some of our expertise on the ML side. And really apply that to an area that’s been underserved in our opinion, by ML experts and the community. And so we are really excited about bridging that talent gap as well. And those are all the reasons that we decided to go after the data analytics space as a whole.

Tim: This down-and-dirty enterprise problem that has been, as you said, hair on fire for many years, lots of dollars spent on solving some of these issues. You hear repeatedly that, so much of the time and effort of teams goes into the sort of end-to-end challenge of data analytics. Maybe we can break it down a little bit. There are front-end issues around how you prep the data, wrangle the data, and analyze the data. You mentioned the seminal paper around using FMs for data wrangling. There’s how do you ask questions about it? How do you put it into production? Talk a little bit about how you apply Numbers Station’s product and technology across that pipeline.

Ines: Yeah, so that’s a great question. at Numbers Station, we started with data preparation, or data wrangling as we like to call it, because, for us, we think it’s basically step zero of any data analytics workflow. So, you can’t analyze data if the data is not prepared and transformed and in a shape where you can visualize it. So it’s really where we’re spending our time today, and the first type of workflows we want to automate with foundation models, but ultimately our vision is much bigger than that, and we want to go up stack and automate more and more of the analytics workflow. So the next step would be automating the generation of reports, so asking questions in natural language and answering questions, assuming the data has already been prepared and transformed. And that’s something that foundation models can do by generating SQL, or other types of codes like Python and even more up stack, we can start generating visualization as well as automate some of the downstream actions. So like, let’s say I generate a report, I figure out there’s an issue or an anomaly in my sales data, can we like generate an alert and automate some of the downstream actions that come with it? The vision is really big and there are a lot of places where we can apply this technology. For Numbers Station today, it’s really on the first problem, which is data preparation, which is probably one of the hardest problems. If we can nail this there’s a lot of downstream use cases that can be unlocked once the data is clean and prepared.

Chris: And just to riff off what Ines said. We looked at a lot of tools that were out there in the market. And some of them that kind of skipped steps from our perspective and went straight to the end thing and the bigger vision that Ines just alluded to, and we noticed that over time a lot of those tools had to add in data preparation or data cleaning techniques in order to make their tools work. So, the way that we view this is by, you know, building the bricks of a house first and working on data transformations in particular, these things that can build data preparation pipelines, and then build on top of that to enable our more ambitious vision over time.

Tim: Yeah, I have to say that the data prep challenges are what initially got me really excited as well. The broader vision over time is going to really come to bear. We just see people wasting so much time on this fuzzy front end of getting data ready to actually do the analytics or to do the machine learning on it. It’s been a forever problem. There’ve been other products that have tried to address this but just don’t fully answer it. And you know, our thought was that in seeing your early prototypes that foundation models provide a zero to one here, where previous products fell short. Maybe say a little bit more, Chris or Ines — what’s different now with foundation models that allow you to solve some of these front-end data prep and wrangling problems in really magical ways?

Ines: Yeah, there’s, an interesting shift in terms of the technology, and something that is enabled by foundation model is who can do this transformation and who can do this wrangling. We’ve seen a lot of tools in the self-service data preparation world, like Tableau Prep or Alteryx, to automate some of these workflows. But it’s all drag and drop and UIs, click-based approaches. So, in terms of capabilities, it’s still pretty constrained and limited by whatever is presented in the user interface and whatever role is encoded in the backend. With foundation models, it’s basically empowering users that may not know how to write SQL or how to write Python or may not know anything about machine learning to do these things, the same way as an engineer would do. And so that’s where it’s really interesting and the turning point, we think, in terms of the technology and where we can enable basically more and more users to do this work. And so that’s why we’re pretty excited for Numbers Station, in particular, to enable more users to do data wrangling.

Tim: You know some of the things you’re talking about, writing Python, writing SQL, historically, there’s been a bit of a divide or maybe a lot of divide between the data analyst who sort of works at her workbench. You may be using a tool like Tableau or Looker or others to take data from different sources, create dashboards, outputs that they share with their team, et cetera. And then there are, data engineers who are building, ETL flows, dbt scripts — do you think of Numbers Station more as this workbench product for the data analyst or more a production workflow product for the data engineer?

Chris: I would say it’s even more bifurcated than you just mentioned because you left out one camp, which is data scientists as well, right? You got that whole other team sitting over there that does a lot of ML tasks. A lot of times, it’s the output of the two teams that you just mentioned. So, I think the world is pretty bifurcated right now. One of the exciting things about this technology is that it can cut down this bifurcation. There doesn’t need to be so much of a hard divide between all the teams that you mentioned. I think each of them still serve purposes, and it’ll take a little bit of time to fully mold down and have the intelligent AI system that can bridge the gap between all of them. But at a Numbers Station, what we can do is bring some of that data science capability over to the data engineering teams and data analysts. We can bring some of that data engineering capability up to the data analyst. Our high-level goal is to enable powerful systems such that it’s not just prototyping at the highest layer, it’s prototyping pipelines that can then be easily deployed into production, such that you have kind of less of these handoffs between teams.

Tim: So, you just not too long ago opened your waitlist and are bringing on customers onto the product and people can go to numbersstation.ai and check it out, what have you seen from customers? Where have you seen early traction? Are there certain use cases? I mean, gosh, data analytics literally touches, you know, every business in the world — where do you see early opportunity and results?

Chris: I think this is true with all products when you build it, we had our preconceived notions, of course, going in, of where we thought people would use the tool. Some of those have turned out to be true, but some of the really exciting things from customers hopping on the platform is them using the platform in ways that we never even imagined and didn’t have in mind when we built the platform. And a lot of the early things that we see with customers coming into the platform is a lot of what’s called schema mapping. So onboarding customer data such that you have a consistent view and a consistent schema of that data that can easily be used downstream. A lot of problems that look like entity resolutions. We call this record matching in our system, but doing effectively fuzzy joins, where I don’t have a primary and foreign key yet, still want to get a unified view of my data inside of my system. And then it even opens up further from there in terms of different SQL transformations and AI transformations, which are classifications of transformations that we have inside of our system, where customers have used these for a variety of different things and transformations that are related to their businesses. But to answer your question, really, those first two points, a lot of onboarding problems and a lot of matching problems in particular, are where people are finding a lot of value from our system right now.

Ines: Yeah. And just to add on a lot of the use cases are for organizations that onboard data from, customers that work with different systems. So we’ve seen, for instance, Salesforce data being extremely messy with open text fields and sales assistants that write reasons and comments about their pipelines. In insurance as well, claim agents are inputting some entries. Whenever there are multiple systems like this that don’t talk to each other in marketing, for instance, HubSpot, etc., it becomes really, really challenging for these organizations to put the data in a normalized and standardized schema to deliver their services. And so that’s where using foundation models to automate some of that onboarding process provides a lot of value for these organizations.

Tim: When you’re describing some of the customer scenarios, maybe paint a picture for people. What does this mean when I said this is magical, you know, the, the end user types in what and sees what happen maybe sort of paint the picture for people at home about what the actual power of using something like this is on a day-to-day basis.

Ines: Yeah, absolutely. And we can just take an example like entity resolution and look into the details. Entity resolution, essentially the idea is given two tables that have records like customer names or product names, and we want to find a join key, basically, and there’s no join key in these two tables, so we want to derive that join key based on textual descriptions of the, the entities or, or different rows. The way this used to be done is by having data engineers or data scientists write a bunch of rules either in Python or SQL that say match these two things if they have the same name, and then if there are one or two characters that differ, it’s still a match, and it becomes really, really complex and people start adding a bunch of hard-coded logic, and with the foundation model, we don’t need that. the really amazing thing is out of the box, they can tell us what they think is a match or not, it’s not going to be perfect, but it’s removing that barrier to entry of having a technical expert to write some code and some rules, and then ultimately the user can go and look at the predictions from the foundation model and analyze them and say yes or no, the model was correct to further improve it and make it better over time.

But really the person doing the work now is the person that understands the data, and that’s really where the value comes, is that they understand the definition of the match, they understand the business logic behind it, and that’s a big shift in terms of how it used to be done before and, and how it can be done today with foundation models.

Tim: This was the magic for me as someone who could, maybe on a good day, write a little bit of SQL, to be able to go in, upload a CSV, connect to my data warehouse, and choose columns and type in what I want to happen and watch in real-time as the product takes care of it is the magical zero to one that we’ve been talking about.

So, okay. Throwing out words like magic, foundation models, what are you actually using? Today a lot of people, when you say foundation models, they think ChatGPT. Maybe talk a little bit, Ines, about under the covers, without giving away anything confidential here, what is the secret sauce?

Ines: So for Numbers Station, we need our models to run at scale on very large data sets that can be millions or even billions of rows. And that’s just impossible to do with OpenAI models or, or very large models. And so, part of our secret sauce is distilling these models into very, very small, tiny models that run at scale on the warehouse. At a high level, there are two steps in using a foundation model. At Numbers Station, there’s a prototyping step where we want to try many things. We want that magic capability, and we want things out of the box. And so, for that aspect, we need very large models. We need models that have been pre-trained on large corpuses of data and that have these out-of-the-box capabilities, and that is swappable. It can be OpenAI, it can be Anthropic models, it can be anything that’s out there, essentially. We’re really leaning into open-source models like Eleuther models as well. Part of it is because, of the privacy and security issues. Some customers really want their own private models that can be fine-tuned on their data and, and pre-trained on their data. So that’s for the large model prototyping piece. And then, for the deployment piece, which is running ad scales on millions of records, we’re also using open-source foundation models, but they’re much, much smaller. So, hundreds of millions. of parameters to be more concrete compared to the hundreds of billions, or tens of billions, in the prototyping phase.

Chris: Yeah. I think one of the things just to add on here as well, is our goal is not to reinvent the wheel, right? So, our goal is not to train and compete with OpenAI and all these companies that are in this arms race to train the best foundation model. We want to pick up kind of the best that’s coming out and be able to swap that in per customer. And then have this fine-tuning and personalization to your data where you have model weights that you can own for your organization. And this is always something that we’ve had in mind in terms of architecting the system and vision for the company. And our view on this was always that we believe that foundation models are going to continue to become more and more commoditized over time. This was more of a daring statement, I would say, when we started the company maybe, you know, two years ago. It’s less of a daring statement now. Like I don’t even know how many open-source foundation models were released in the past week. It seems like a safer statement at this point to say that this is going to be continued to be more and more commoditized, and it’s really all about that personalization piece. So how do I get it to work well for the task at hand that you want? So, in our case, looking at data analytics tasks and how do I get it to personalize for your data and the things that are important for your organization. So those are the high-level viewpoints that have always been important in terms of us architecting out this system.

Tim: You know, you, you’ve both used some different terms that I think maybe the audience and I know even I would, would appreciate some discussion around. You mentioned fine-tuning is an approach for personalizing, Ines you mentioned distillation or distilling. There’s another related concept around embedding. Maybe just talk through what are the different ways that Numbers Station, or in general, that you can sort of personalize a foundation model and how some of those things are different?

Ines: Yeah, it’s a great question and I would even start by talking about how these models are trained by using very large amounts of unlabeled data. A bunch of text, for example. And that’s essentially the pre-training phase to make these models really good at, general-purpose tasks. But what fine-tuning is used for — it’s used to take these large pre-train models and adapt them to specific tasks. And there are different ways to fine-tune the model, but essentially, we’re tweaking the weights to adapt them to a specific task. We can fine-tune using label data; you can fine-tune using weekly supervised data, so data that can be generated by rules or a heuristic approach. And we can also fine-tune by using the, the labels that are generated by a much larger and better model. And that’s essentially what we call model distillation. It’s really when we take a big model to teach a smaller model how to perform a specific task. And so, at Numbers Station, we use a combination of all of these concepts to build small foundation models specifically for enterprise data tasks that can be not only deployed at scale in the data warehouse but also privately and securely to avoid some of the issues that can appear with the very, very large models.

The other aspect of the question was embedding. So, embeddings are a slightly different concept in the sense that it doesn’t involve changing the weights of the model. But embeddings are essentially vector representations of data. So, if I have text or images, I can use a foundation model to translate that representation of pixels or, or words into a numerical vector representation. And the reason why this is useful is computers and, and systems can work much more effectively with this vector representation. At Number Station for instance, we use embeddings for search and retrieval. So if I have a problem, like entity resolution, and I want to narrow down the scope of potential matches, I can search my database using embeddings to essentially identify the right match for my data.

Tim: I think a lot of people have heard about fine-tuning and think about, you know, prompt engineering and trying different prompts or putting in some of your own data to get the generative answer that you want. You’re obviously at a, a different level of sophistication here. You mentioned the pre-training piece. So, for a customer today, Numbers Station out of the box, is there training that has to take place? Do they have to give you data? What’s the customer experience as you apply these technologies?

Ines: There’s no required pre-training. They can start using out of the box, but as they use the platform, that log and that interaction is something we can capture to make the model better and better over time. But it’s not a hard requirement the minute they come on the platform, so they can get the out-of-the-box feel without necessarily having the cost of pre-training.

Chris: And that it, that improvement is just per customer, right? We don’t take the feedback that we’re getting from one customer and use it to improve the model for another. It’s really personalized improvement with that continual pre-training and fine-tuning that Ines alluded to.

Tim: Across these different technologies that you’re providing, what do you think provides the moat for your business? Maybe you could even extend it a little bit to other AI builders out there and how others can establish their moat, who maybe, you know, haven’t come from the Stanford AI Lab or other investors who might be listening and how they think about, you know, as they look at companies, where’s the moat there?

Chris: I really think about this in kind of a, a twofold manner. One is where we started. We came from the Stanford AI Lab. Our background is in research, and we still have that research nature in the company in terms of pushing the forefront of what’s possible with these foundation models and how you actually personalize them to customer use cases. A lot of that secret sauce and technical moat, is in that fine-tuning, continual pre-training, private, eventually FM per organization. And when I say FM, I mean foundation model that can be hosted inside of organizations and personalized to their data. So, a lot of our technical moat is along that end.

There’s another whole host of issues, which I would call last-mile problems in terms of using these models as well and actually solving enterprise-scale problems. And there it’s all about making sure that you plug and integrate into workflows as seamlessly as possible. And for that, we’re laser-focused on these data analytics workflows and the modern data stack in particular, and making sure that we don’t lose sight of that and go after a broader, more ambitious vision to solve AGI. It’s really twofold. It’s the ML techniques that we’ve pioneered and are using underneath the scenes, and we’ll continue to push the boundaries of what’s possible on. And the second part is making it as seamless and easy to use for customers where they are today on the modern data stack.

Tim: Any other thoughts on this Ines?

Ines: No, I one hundred percent agree with Chris. Like there are technical moats around how do we personalize the models, make them better. And there’s the UI and experience moat to basically embed these models in existing workflows seamlessly and, and make people love working with these models. Some people may say, “Oh, it’s just a wrapper around OpenAI.” But actually, it’s a completely new interaction with the feedback, etc. And so, capturing that interaction seamlessly is a challenge and an interesting moat as well.

Chris: Just to double-click on that point. I mean, I think UI/UX is a huge portion of it and a huge part of, of that moat in that second bin that I was talking about. But it goes even deeper than that, too. So, if you just think about it, right? I have a hundred million records, let’s say inside of my Snowflake database that I want to run through a model. If you go and try a hundred billion parameter plus model, it’s just not practical to run that today. The cost that it takes to do that as well as the time that it takes to do that is really impractical. And so, when I say solving those last-mile problems, I also mean how do we train and deploy a very small and economical model that can still get really high quality on what we like to call enterprise-scale data. And so, this is really kind of flipping that switch from “Oh, I can hack up a prototype really fast or play with something in ChatGPT”. To — I can actually run a production workflow. And it goes back to that earlier point of kind of workbench versus workflow, Tim, and how these worlds can nicely mold together.

Tim: I’d say, as we were talking to lots of different enterprise customers broadly, about how they’re thinking about using foundation models or how they’re using it today. The first question that always comes up is one we’ve talked about is how do we train on our data? How do we maintain privacy, etc. The maybe tied for the first question that we get a lot, and I’m, I’m curious on how you’re handling, is hallucination or confidently wrong problem? This manifests itself in obvious ways if you’re using ChatGPT and ask a factual question, and it confidently gives you an incorrect answer.

Here you can imagine things like, I use Numbers Station to parse a million-row csv and we’ve de-duplicated. How do I know it worked? I can’t go through all the million. How do you ensure that there’s not this confidently wrong in something that, you know, might cause problems down the road in the business?

Ines: Yeah, that’s a very good question and something we get a lot from our customers because, uh, most of them are data analysts, and they’re used to everything being deterministic. So, either SQL or rule. And so, they’re not used to having a probability in the output space. And they’re sometimes not okay with it. The way we approach this is twofold. We can either propose to generate the rule essentially for them. So, behind the scenes, the model, like let’s say as you said, they’re parsing a csv. I don’t need to use AI to do that, right? I can use a rule to do that. So, the model is basically just generating the rule for the user, and they don’t have to go through the process of writing that SQL or that Python for the parsing. And for use cases where, it’s just impossible to do with the role that’s what we call an AI transform. Like let’s say I want to do a classification task or something that’s just really impossible to do with SQL, we need to educate the users and make them trust the platform as well as show them when we’re confident and show them when we’re not. So, like part of that is also around the workflow of showing confidence scores, letting them inspect the predictions, monitoring the quality of the ML model, and tackling use cases where 2% error rate is still okay. For instance, if I’m trying to build like a dashboard and I want macro statistics about my data, it’s fine if I miss 2% in the predictions. So that’s, that’s the balance we’re playing essentially with. Either generating some code or using the model to generate the predictions, but really making the user comfortable with this.

Chris: And just to add on to that. These models aren’t perfect right now. Right? I think as you said, Tim, anyone who’s played with these models knows that there’s some limitations and flaws in using them. And a lot of our use cases that we’ve seen to date are where humans are manually going through and labeling something or annotating something. And it’s not that we’re completely eliminating the human from the process. We’re just speeding them up 10 to even a hundred x faster in some cases. And going through that process by having this AI system providing suggestions to them downstream. So, it’s not completely, you know, human out of the loop, yet. Of course, that’s the vision where we think the world will eventually go, but right now it’s still very much human in the loop and just accelerating that journey for the analyst to get to the end result.

Tim: I’ll take hundred x speed up. In that vein, maybe change gears a little bit here. I’m curious you’ve built this small, high-performing team already at Numbers Station. How do you use foundation models on a day-to-day basis? I was recently talking to another founder who said he told his dev team, you know, on our next one-month sprint, I just want everyone to stop everything they’re doing for the first week and just go figure out how you can maximally use all the different tools and then start working on your deliverables. And we will finish our sprint faster and more effectively than if you just started working on it and we worked for the next month. Anything you’ve seen in terms of how you use these tools internally and the productivity increases compared to years past?

Ines: I can speak for me. Obviously, code generation is a big thing. Everyone uses it now. One thing that I found funny is wrangling our own data with Numbers Station. So, we have sales data coming in from our CRM with our pipeline, wanting to do some analysis on that and statistics, and ended up using Numbers Station, which was very fun to do as a dogfooding of our own product, analyzing telemetry data as well, product usage and, and people on the platform. So that’s something that we’ve done. And obviously, for all the outreach and marketing, it’s, it’s quite useful to have a foundation model write the emails and the templates. So, I’m not going to lie, I’ve used that in the past. I don’t know, Chris, if you have more to add to this.

Chris: What I was doing right before this call, uh, was using a foundation model to help with some of my work. But, you know, one of the problems that I always had in working with customers and as a kind of ever-present problem is that you always talk to customers and you, you want to create a personalized demo for them, but they can’t give you access to the data, right? Because it’s proprietary, and they’re not going to throw any data over the wall. So, what I’ve been starting to use foundation models a lot for is, okay, I understand their problem, now can I generate synthetic data that looks very close to their problem and then show them a demo in our platform to really hit home the value proposition of what we’re providing here at Numbers Station.

Tim: We were talking about productivity gains from using foundation models broadly at our, our team meeting yesterday at Madrona, and one colleague who has run a lot of big software teams over the years said, Hey, if we wanted to prototype something in the past, you know, you’d put eight or 10 people on it, it would take weeks, maybe months. And now it’s the type of thing one developer, one engineer could potentially do in weeks or days using, code generation in some of the dev tools. And Numbers Station is bringing that type of superpower to the data analyst that some of these code-gen tools and things bring to the developer.

I’ve alluded to the great team you all have assembled in a short period of time, and it is a super exciting area and there are a lot of talented people that want to come work in this, but I think you all have done an extra effective job on hiring quickly, uh, in hiring great culture fits. And Chris, uh, we haven’t talked about it, but you spent, you know, four or five years at SambaNova before. You built a big team of machine learning software folks there.

And how have you been so effective at hiring? What do you think this hiring market is like right now in this interesting time?

Chris: Lots of practice and lots of failures, I would say, is how we’ve gotten here in terms of hiring. At, at SambaNova, you know, it was an ultra-competitive bull market at that time, and hiring ML talent was really tough. So, I had a lot of early failures in terms of hiring engineers, eventually found my groove in hiring, and built a pretty decent size organization around me there. In terms of the market right now, you know, with all these layoffs going on, there’s a lot of noise in the hiring process. But there are a lot of really, really good high-quality candidates that are out there and looking for a job. So, it’s really just judging, hey, do you actually want to work at a startup or do you want to work at a big company? Because those are two very different things, and there’s nothing wrong with either. But kind of getting to the root of that early on is usually a good thing to look at here. And right now, there’s just a ton of high-quality talent and it’s a little bit less competitive, I’d say, to get that high-quality talent than it was, let’s say, three years ago, four years ago, when we were in the height of, of a bull market.

Tim: So many topics, so little time. Uh, would love to dig deep in, in so many of the areas that we’ve only been able to touch on today, but I’ll just end with this. What is a Numbers Station? How did you come up with this name?

Chris: Yeah, so they were towers that I think were used in one of the, the World Wars or, or Cold War that could send encrypted messages to spies. So, it was really about securely broadcasting information. That’s one of the things that we do here at Numbers Station, is broadcast information to various data organizations, and that’s how we decided to use this name.

Tim: Chris, Ines, thank you so much. Really enjoyed the discussion today and look forward to working together here in years to come.

Chris: Awesome. Thank you so much, Tim.

Coral: Thank you for listening to Founded & Funded. If you’re interested in learning more about Numbers Station, visit NumbersStation.ai. If you’re interested in learning more about foundation models, check out our recent blog post at madrona.com/foundation-models. Thanks again for listening and tune in in a couple of weeks for our next episode of Founded & Funded with Airtable CEO Howie Liu.

Related Insights

Related Insights

Sign up for our newsletter for emerging trends,insights and new investment news.

Sign up for our newsletter for emerging trends,
insights and new investment news.