Founded and Funded – Deploying ML Models in the Cloud, on Phones, in Devices with Luis Ceze and Jason Knight of OctoML

Photo: Luis Ceze

OctoML on Octomizing/Optimizing ML Models and Helping Chips, Servers and Devices Run Them Faster; Madrona doubles down on the Series A funding

Today OctoML announced the close of their $15 million Series A round led by Amplify Partners. Madrona led the seed (with Amplify participating) and we are excited to continue to work with this team that is building technology based on the Apache TVM open source program. Apache TVM is an open-source deep learning compiler stack for CPUs, GPUs, and specialized accelerators that the founders built several years ago. OctoML aims to take the difficulty out of optimizing and deploying ML models. Matt McIlwain sat down with Luis Ceze and Jason Knight on the eve of their Series A, to talk about the challenges with machine learning and deep learning that OctoML helps manage. Listen below!

Transcript below:
Intro
Welcome to found it and funded My name is Erika Shaffer and work at Madrona Venture Group and we are doing something a little different here. We’re here with Matt McIlwain, Luis Ceze, and Jason Knight to talk about OctoML. I’m going to turn it over to Matt, who has been leading this investment. We are all super excited about OctoML and in hearing about what you guys are doing.

Matt McIlwain
Well, thanks very much, Erika. We are indeed super excited about OctoML. And it’s been great to get to know Luis and Jason over many years, as well as the whole founding team at OctoML. And we’ll get to their story in just a second. The one reflection that I wanted to offer was that this whole era of what we think of as the intelligent applications era has been building in its momentum over the past several years. We think back to companies like Turi that we were involved with and Algorithmia and more recently Xnor and now I think a lot of those pieces are coming together in the fullest of ways, is what OctoML is doing. But rather than hear it from me, I think you’ll all enjoy hearing it more from the founders. So I want to start off with a question of going back, Luis, to the graduate school work that some of your PhD students were doing at the University of Washington. Now, tell us a little bit about the founding story of the technology, and the Apache TVM open source project.

Luis
Yeah, absolutely. First of all, I would say that if you’re excited, we’re even more excited about this and super excited about the work you’ve been doing with us. Yes, so the technology came to be because there was this observation that Carlos Guestrin and I had a few a few years ago, actually four years ago now, that said that, there are quite a few more machine learning models that were becoming more popular, more useful, and people tend to use them but then there’s also a growing set of hardware targets, one could map these models to so when you have, a great set of models and growing set of hardware targets. Back the question, so I said, “Well, what’s going to happen when people start optimizing models for different hardware and making the most out of their of their deployments.”. That was the genesis of the TVM project. So it essentially became what it is today a fully automated flow that ingests models from expressing a variety of all the popular machine learning frameworks, and then automatically optimizes them for chosen deployment targets. We couldn’t be more grateful to a big open source community that grew around it too. So the project started as an open source project from the beginning. And today, it has over 200 contributors and is in active deployments in a variety of applications you probably use every day from Amazon, Microsoft and in Facebook.

Matt
I think that our listeners always enjoy hearing about founding stories of companies and your founding team, and principally some of the graduate students that you and Carlos had been working with. Maybe tell a little bit about that and then it’d be great to have Jason join in since he joined up with all of you right at the beginning.

Luis
Absolutely, as soon as is a great way of looping in Jason into the right moment, too. So yeah, so as TVM started getting more and more traction, we did the conference at the end of 2018. And we have well over 200 people come and we’re like, “Oh, wow, this there’s something interesting happening here.” and, it was one of those moments where all stars align where the key PhD students behind the project, including Tianqi Chen, and Thierry Moreau, and Jared Roesch, were all close to graduation and thinking about what’s next. And I was also thinking about what to do next. And then Jason was at Intel at that time, and was really interested in and was a champion of TVM on the Intel side. And He then said, Oh, it turns out that I’m also looking for opportunities. So it’s like since he came and visited us and started talking more seriously and the thing evolved super quickly from there. And now you can hear from Jason himself

Jason
Yeah, actually my background is a data scientist. And through a complicated backstory, I ended up at Intel through a silicon hardware, startup acquisition. And I was running a team of product managers looking at the software stack for deep learning and how a company like Intel was going to, make inroads here and continue to impress and delight our huge customer base. And I was helping to fund some of the TVM work as a result of that and really seeing that, despite my best efforts at Intel, kind of pushing the big ship a few degrees at a time towards these kind of new compiler approaches to supporting this type of new workload and new hardware targets, it was clear that the traction was already taking place with open source TVM project and, and that was where the action was happening. And so it was a natural timing and opportunity for something to happen here in terms of not only Intel’s efforts but more broadly, the entire ecosystem needing a solution like this and the kind of pain points I’d seen over and over again at Intel of just end users wanting to do more with the hardware they had available and the hardware that was coming to them and what needed to happen to make that realistic. And so that was a natural genesis for you me and Luis to talk about this and, and make something happen here.

Matt
That’s fantastic. And of course We had known Jason for a little while at Madrona. And we’re just delighted that all these pieces were coming together. Hey, Luis, can you say a little bit more because you had that first conference in December of 2018 and then a subsequent one in December of 2019. It seemed to be that not only the open source community was coming together, but folks from some of the big companies that might want to help somebody build and refine their models or deploy their models were coming together too and that’s kind of a magical combination when you get all those people working together in the same place.

Luis
Yes, absolutely. So, yes, as I said, the conference that made us realize something big was going on was December 2018. And then a year later, we ran another conference. And by that time, OctoML had already been formed. So we formed the company in late July of 2019. And then by December, we already had the demo of our initial project – our initial product that Jason demoed for the conference. Yes. So in the December 2019 conference, we had pretty much all of the major players in machine learning – those that use machine learning to develop machine learning, were present. So we had, for example, several hardware vendors join us. Qualcomm was being fairly active in deploying a hardware for accelerating machine learning on mobile devices. They had Jeff Gehlhaar, there on the record saying that TVM is key to accessing, their new hardware called hexagon. We had ARM come and also talk about their experience in building a unified framework to unify machine learning support in CPUs, GPUs and their upcoming accelerators. We had Xilinx and we had a few others and Intel who came and talked about their experience in this space. So I wanted to add more to that, what was interesting during that conference was having companies like Facebook and Microsoft talking about how TVM was really helpful in reducing their deployment pains of optimizing models enough such that they can scale in the cloud. And also such that it can actually run well enough on on mobile devices. This was very heartwarming for us because it’s confirming our thesis that a lot of the pain in machine learning, in using machine learning modern applications is shifting from creating the model to really deploying and making the best use of them. And that’s really, our central effort right now is to make it super easy for anyone to get their models optimized and deployed. And by offering our TVM in the cloud flow, so maybe Jason can have a little bit to that from the product side.

Jason
Yeah, so it’s, it’s great seeing the amount of activity and innovation happening in the TVM space at the TVM conference. But it’s clear that there’s still a long, long way to go in terms of just better supporting the long tail of developers who maybe don’t have the amount of experience that some of these TVM developers do in terms of just getting their model and optimizing it and running on a difficult target, like a phone or an embedded platform. So yeah, we’re happy to talk more about that. We actually just put up a blog post kind of detailing some of the things we talked about at the TVM conference. And, and we’ll be giving out more details soon.

Matt
Yeah, maybe I think what’s interesting, if I think about it, from a sort of a business perspective is, on the one hand, you have all kinds of folks, with different levels of skills and experiences, building models, refining their models, optimizing their models, so that they can be deployed. And then you’ve got this whole sort of fragmented group of not just kind of chip makers as you’re referencing but also the hardware devices, that those chips go into to run, whether that’s a phone or a camera or other kinds of devices that you know can be anywhere in a consumer or commercial sense. And what’s interesting to me what I like about the business is that you guys are helping connect some of the dots between those worlds and, a kind of a simplified end to end sort of way. And it would be interesting to spend a little bit more time and maybe talk about the, the the Octomizer, your kind of your first product, specifically, but more generally, what you’re trying to do and connecting those worlds.

Jason
Yeah, definitely. So one way to look at this is we’ve seen a lot of great work from TensorFlow from Google and PYtorch from Facebook and others on the training side for creating deep learning models and training those from data sets, but when you look at the next step in the lifecycle of machine learning model, there’s a lot less hand holding and tools available to get those models deployed into production devices. When you care about things like model size and computational efficiency, and portability across different hardware platforms, etc. And so this actually sits right at the one of the difficulties of the underlying infrastructure and how that’s built with the dependence on hardware kernel libraries. So these are handwritten, hand optimized kernel libraries built by each vendor. And these are, somewhat holding the field back and making it more difficult for end users to get their models into production. And so TVM and the Octomizer that we’re building on top of that makes it easier to just take a model, run it through a system, look at the optimized benchmark numbers for that model across a variety of hardware targets, and then get that model back in a packaged format that’s ready to go for production use, whether you’re using writing a Python application, or you need to bring out every bit of performance with the C shared library and a C API. Or a Docker image with a GRPC wrapper if you want some easy serverless access. So that’s what we’re building with the Octomizer. And it’s designed to be a kind of one pane of glass for your machine learning solutions across any hardware that you care about. And, and, and then we build on top of that with things like quantization and compression and distillation as we move into the future.

Luis
A couple more points to that. Yeah. So those are definitely important. And is the very first step, we’re taking. One interesting to realize is what we’re doing here is that TVM really offers this opportunity of providing a unified foundation for machine learning software on top of our idea of hardware. So by unifying the foundation, in which one could use to deploy models you also create the perfect wedge for you to add more machine learning ops into the flow. So if you know people are starting to realize more and more that, regular software development has enjoyed, incredible progress in DevOps. But now, machine learning doesn’t have that, right. So when we see the Octomizer has a platform, which we start with model optimization and packaging, but it’s the perfect point for us to build on to offer things like instrumenting models to monitor how they’re doing during deployment, to also help understand how models are doing, and essentially provide a complete solution of automated services for machine learning Ops,

Jason
One of those applications as well as training on the edge. In addition, in the sense that training is no more than just a set of additional operations that are required. And having a compiler based approach, it’s quite easy to add these extra set of ops and deploy those to hardware. And so getting things like training on the edge is in target for us in the future as we look forward here.

Matt
That’s great. Well, I want to come back a little bit to the prospect side, but I’m super curious. Now we talked about the company name OctoML. We talked about the product name Octomizer. How did this all come about? How did you guys come up with this name? And, and it’s a lot of fun. I know the story, but for the most the folks here, what, what’s the story?

Luis
Okay, all right. So I always say I’m sure Jason and I can interleave with because we have, there’s multiple angles here. So it turns out, they’re both Jason and I and other folks in our group have an interest in in biology. So nature has been an incredible source of inspiration in building better systems. And nature has evolved incredible creatures, but when you when you look around and you think about some creatures like an octopus, you see how incredibly smart they are. They have distributed brains, right? So they are incredibly adaptable, and they’re very, very, very smart plus very happy and light hearted creatures and creative so this is something that To like resonated with everyone, so it stems really from, from an octopus and, and so like a lot of what we do now has a nautical theme. And then we have the Octomizer, you’re going to hear more in the future about something called aquarium and Cracken and the Barnacles, which are all things that are part of our daily communication, which makes it super creative and light hearted. So all right, Jason, maybe I talked too much. It’s your turn. Now,

Jason
I guess one thing to point out is we really applied machine learning slant to even our name selection, because the objective function or set of regulators, we applied to the name selection process itself, because it needs to be relatively short, easy to spell, easy to pronounce, somewhat unique, but not too unique. And then it has, all these other associations that Luis was mentioning or similar associations. So those are definitely in the objective function as we were working through this process. It’s also rhymes with optimal as well. So, yeah, it took us a while to get there, but we were happy with the result.

Matt
I think you guys did a great job. And I also like the visual notion of, even though they’ve got distributed brains that there is this sort of central part of an octopus and then there’s it can touch anything. So it’s kind of this build one’s gonna run many places sort of image that sort of flows through, but maybe I’m stretching it too much now

Luis
No, that this excellent point is that we do think about, TVM being in our technologies really be a central place, I can touch multiple targets in a very efficient and adaptable and automatic way, right. It’s a definitely within scope of how we’re thinking as well. So great.

Jason
So 9 bits in a byte by being a core primitive computational power of two.

Matt
Very good. Coming back to the open source community, you guys have partly because of your your academic backgrounds and in involvement in other ways in the open source community. So how is it? How are things working within the Apache TVM community along alongside OctoML. So very important time in the life of both and curious to get your thoughts on that.

Jason
Yeah, we really see OctoML as doing a lot of and pushing a lot of work that needs to be done in the open source community, eating our vegetables. So we’re currently ramping up the team to just put more of that vegetable eating spirit in the TVM project and helping pitch in on documentation and packaging, all those things that need to be done. But it’s difficult. Open source is known to attract people to scratch their own itch and solve their own problems. But these kind of less sexy tasks often get undone for long periods of time. So we hope to be a driving force in doing a lot of that. And of course, working with the community more broadly to, connect the dots and help coordinate larger structural decisions that need to be made for the project. And all of this being done under the Apache Foundation, umbrella and governance process. So we’re working closely with the Apache folks and continuing to, smooth and work under that umbrella.

Luis
Yeah, just to add a couple more thoughts here, we are contributing heavily to the Apache TVM project in multiple domains as Jason as Jason said, and we think that this is, also very, very fortuitous for us because we see TVM as well as you one could go and use TVM directly to go and go do what they want. But then, as they start using it they realize that there are a lot of things that a commercial offering, could do, for example, make it much more automated, make it plug and play. TVM the core idea from that start was a research idea and now it’s part of what it I, iss using machine learning for machine learning optimization, and that can be made much more effective with the right data that we are helping to produce as well. So, we couldn’t be happier with the synergy between the open source project, open source community, and also what we’re doing on our private side as well.

Jason
Also, when one thing that’s been nice to see is in talking to users, or soon to be users in the TVM project, they’ll say, Oh, it’s great to see you guys supporting TVM. We were hesitant of kind of jumping in because we didn’t want to jump in and then be lost without anyone to turn to for help. But having someone like yourselves, knowing that you’re there for support makes us feel better about you putting those initial feet on the ground there. So that’s been really nice to see as well.

Matt
Now, that’s really interesting and, we are recording this in a time when in fact, we’re all in different places because we’re in the midst of the Covid-19 crisis. I’m curious on a couple of different levels. One, is with, the open source community two is with the some folks that are interested in becoming, early customers, but even just Thirdly, with your team, how are all those things going for you all working in this environment? And certainly there’s companies like GitLab and others that have had lots of success, both, working as distributed teams and working with their customers in a distributed way. What are some of the early learnings for you all on that front?

Jason
Well, since TVM, start as an open source project, then a lot of us have that distributed collaborative, blood flowing through our veins to begin with. So working remotely in a distributed, asynchronous capacity is kind of part and parcel to working with open source community. So luckily, both those community and us as a company have been relatively untouched on that front.

Luis
Oh, absolutely. So when we when we started the company, we we’re heavily based in Seattle but in no Jason is based in San Diego that started initiatives and we started growing more distributed – we hired people in the Bay Area. We had people in Oregon in the team and it’s working so well it’s been so productive to we were very, very fortunate and lucky not only we already started somewhat distributed to begin with, and now it’s serving us really well. We had great investors with us by to being stuck with us and, and fun years right to the right moment where we need to continue growing. And in fact, we are hiring people in a distributed way. Like just yesterday we had another one another person that we really wanted to hire, assign and join, join our team. So we are fully operating in all capacities, including interviewing, hiring and doing this submitted way and I haven’t noticed any hit in productivity whatsoever. If anything, I think we’d probably be even more productive and focused, right.

Jason
And on the customer side, I would say been a mixed bag in terms of, there are those customers that kind of have some wiggle in their direction or roadmaps here and there, but then there’s also customers that have, orders of magnitude increase in their product demand because they’re serving, Voice over IP or something to that effect. It’s being really heavily in demand in this time of need. And so it just depends, and so luckily, there’s not been any kind of negative shifts there.

Matt
Yeah, you guys, I’ve really been blown away by your ability to attract some just incredible talent to the team here in just a short period of I don’t know, like, seven or eight months of really being a company and I get the sense that that momentum is just going to continue here. So congratulations on that front. I’m curious on the customer front, to pick up on what you were saying, Jason, what are you finding in terms of, kind of customer readiness? I think back to even a few years ago, it seemed like it was almost still too early, there was a lot of, tire kicking around applied machine learning and deep learning. And people were happy to have meetings, but they were more kind of curiosity meetings. Seems like there’s a lot more doing going on now. But I’d be interested in your perspectives on the state of play.

Jason
Yeah, but I would say it’s more than timing, it’s variance, and that we see a huge range and customers that have deep pain in this today in terms of getting computational costs on their cloud bill down yesterday. And because they’re spending, tons of GPU hours on every customer inference request that comes in. And then you have really large organizations with hundreds of data scientists trying to support these very complex set of deployments across, several, half dozen or dozens of different model hardware endpoints. And, and so there’s a lot of pain and a lot of different angles. And it’s, it’s mixed over the set of value propositions that we have performance, ease of use and portability across hardware platforms. And so it’s, been really nice to see, we’re just talking to a large telecom company just the other night. And yeah, just huge amounts of demand. And so it’s, it’s really nice to have the open source ecosystem as well, because it’s a natural, funnel to, to try to pick up on this activity and see, oh, we see someone coming through using the open source project and talking about it on the forums and we have going have a conversation. with them, and there’s naturally already a need there, because otherwise they wouldn’t be looking to the open source project.

Luis
Yeah. And just just one more thing that I think it’s interesting to observe that, yes. So there is there is indication that is, it’s early, but already big enough to have serious impact. For example, we hear companies wanting to move computation to the edge to not only save on cloud costs, but be more privacy conscious. Right now, as you can imagine, as a lot of people working or working from home, all of a sudden, we see a huge spike in conditional demands in the cloud. And, we have some reasons to believe that a lot of that involves running machine learning models in the cloud, that, companies will have to, reduce and improve the performance, because otherwise there’s just simply no way to scale as fast as they need to. So We’re seeing that this spike in demand of cloud services as well being a source of opportunity for us.

Jason
Also, also, one thing I’m excited about too, is on, on the embedded side of things, it’s one reason why there is there’s pent up demand. But it’s, essentially, there hasn’t been much activity in terms of machine learning and the embedded side of things, because there haven’t been solutions out there that people can use to go and deploy machine learning models into embedded processors. And so being able to kind of unlock that chicken and egg problem and solve one, crack the egg essentially, and have a chicken come out and start that cycle, and really unlock the embedded ml market. It’s really exciting proposition to me, as we get there, through our cloud, mobile and embedded efforts.

Matt
And I think that’s what we saw to from, having, been fortunate to, provide the seed capital last summer with you guys into the early fall. And really, be alongside you from day one on this journey. And I’m interested in sort of two things. One is, I think, in retrospect, right, you all made this decision in the early part of this year, that there was enough visibility enough evidence that you were going to go ahead and, and raise a round. And that’s looking like it was well timed now but maybe a little bit of like, why do you decide to do that? And then the second question is, well, what are you going to do with this $15 million that you’ve just raised? And and what’s the plan in terms of, growing the, the business side of the the TVM movement?

Luis
Yeah, absolutely. So we, as I said, we, it was incredibly well timed, by, by luck and good advice as well. Yeah. So at that time, what motivated us was that we had an opportunity to hire, incredible people, and it was quite faster. We actually be more successful in hiring than we could have even, hope for in the best case. So it’s like why not, in this climate when we have interesting people to hire and amazing people, we just go and hire them and need resources for that. And that was the first, let’s do this early. But and now know, as we, as Jason said, we started to engage with, with more customers and getting our technology in the hands of customers. And this is immediately puts, more pressure on us to hire more people to, make sure that our customer engagements are successful. So we’re going to staff that up and make sure that, we have the right resources to make them successful. And also as we as as we go to market and explore more, more thesis on how we build a business around the Octomizer requires effort. And that’s what we that’s we’re going to use the funds for is increase our machine learning systems technology team, and also, grow our platform team because what we’re building here is essentially a cloud platform to automate all of these, a process that requires, a significant amount of engineering. And we’ve been very, very engineering heavy so far naturally because we’re building the technology, and we are very much technologists first. But now’s the time to definitely beef up our business development side as well. And that’s where, a good chunk of our resources are going to go as well.

Jason
Also, one thing to point out is just given where the TVM project sits in the stack, in terms of, having the capability to support pretty much any hardware platform for machine learning, you’re talking about dozens of hardware vendors here, silicon vendors, and then basically be able to cater to any machine learning and deep learning workload on top, whether it’s in the cloud, mobile or embedded, and you’re talking about a huge space of opportunity, right and, and that’s just the beginning in terms of, there’s extensions upstream to training and downstream to post deployment and there’s classical ml and science as well. And so each one of these Kind of permutations is a huge effort in itself. And so just trying to take even small chunks of this huge pie is a big engineering effort. So that’s, that’s definitely where a lot of the money spent is going at this point.

Matt
Well, we’re really excited and honored to be continuing on this journey with both of you and they’re in the not only the founding team, but of course, all the talented folks that you’ve hired. And I think from a timing perspective, the fundraise was, well, timed. But I think from a market perspective, the role that you all are trying to play, the real problems that you’re trying to solve are exceptionally well timed. And so we’re looking forward to seeing how that develops here in the in the months and years ahead.

And we’re excited to be here. Thanks, Matt.

We couldn’t be we couldn’t be more excited. Thank you. Thank you for everything.