Cohere’s Ivan Zhang on Foundation Models, RAG, and Feedback Loops

Ivan Zhang - Cohere Featured Image

Today, partner Jon Turow and Cohere CTO and Co-founder Ivan Zhang dive into the world of foundation models for the enterprise. Cohere was founded in 2019, a time when even the most passionate believers did not realize how soon the world would be struck by the capabilities of foundation models. The company announced a $270 million raise in June, which put it in an over $2 billion valuation. And Ivan’s co-founder Aiden Gomez was a co-author of the seminal paper, Attention is All You Need.

Notwithstanding the big numbers, Cohere keeps itself scrappy and hungry, having still only raised a fraction of what others in the space have to date. And in this week’s episode, Ivan shares about his decision not to finish school and instead gain practical experience working in startups and publishing research to basically prove that he could without a degree. He dishes about applying that same renegade spirit to hiring, shares his thoughts about the importance of feedback loops with customers, as well as the differences between fine-tuning and teaching a model how to retrieve the right knowledge, and so much more.

The 2023 Intelligent Applications Summit is happening on October 10th and 11th. If you’re interested, request an invite here.

This transcript was automatically generated and edited for clarity.

Jon: Ivan Zhang, it’s so great to have you here today. Really excited to talk about you and your journey with Cohere.

Ivan: Thanks, Jon.

Jon: Now you folks got started in 2019, you and Aiden and Nick Frost. Ivan, so before we go back in time, let’s start today in 2023. What is Cohere?

Ivan: Cohere is an AI company based in Toronto building foundational models. We’re at about 200 employees now. We have offices in Toronto, SF, and London. Our mission is to accelerate the next leap in productivity for knowledge workers. That manifests in two ways. So one, we provide a platform where you can access foundational models to build features like summarization, search, and chatbots. And another way is to actually use these models to make employees more productive.

Jon: Let’s rewind all the way back to 2019 or a little bit further. Tell me about your journey into deep learning, and then what was the path from that in Cohere?

Ivan: I like to describe my journey into even tech itself as being a bit of an underdog. A bit more background about how I even got into that position in the first place. So at the time, I knew that I was a builder, and that’s how I learned best. And I wasn’t much of a sit-in-a-classroom-and-absorb-a-lot-of-information kind of guy. I needed to tinker. I needed to get my hands on the technology to learn. So when the opportunity came up to drop out of school to work at my friend’s startup, I obviously did it as a backend and infrastructure engineer. I wanted to expand my skills and learn a bit more. And that’s when I met Aiden Gomez, one of my co-founders now, who was interested in starting an indie research group. We wanted to be independent and do research basically for fun and, in a way, prove that we can do it. And yeah, I got pretty inspired by that, and I thought it would be pretty badass to publish papers as a dropout.

And so we just started working together. And after a few years, we felt like we were ready to start a company. We learned our working styles, and we both got more experience just working and learned how things were done properly. Myself, I was exposed to more and more founders just from the space and seeing how the sausage is made. I felt comfortable with the idea of starting a company.

And so in 2019, I pitched Aiden like, “Hey, why don’t we start something new together?” We tried a few ideas at the time, and they were all AI companies, and it was very difficult. Our first idea was a platform where you upload your neural net, and we’ll compress it and make it more efficient. But I mean, the thing is, nobody was using deep learning in 2019, so there wasn’t much of a market there. But what he had seen within Google was that he invented this thing called transformers, and internally it just proliferated. Every single product team was adopting this architecture for solving language problems, and the improvement gains they were seeing were crazy. Absolutely unbelievable.

Just this one tiny change outpaced 20 years of just heuristical engineering. And so we thought it was really cool, and we saw the potential of this technology like, hey, we can actually use this as a way to help computers interact with humans. So that was one thing. So we saw that, okay, computers being able to understand language was basically solved by birth style models at the time. And then we also saw, oh, GPT-2 came out, and that was sort of a hint at scaling these things up was important, increasing the capacity and also like, wow, these things are very efficient, so we can actually feed a ton of data into it.

And the architectural change of making it decoder only now these things can write. And so the two key pieces were there to actually build a system where it can read and write language. And we thought that was quite exciting, and we decided to quit our jobs and bring Nick along as well to build this company. And at the time, we had no idea what the product was going to be. We were just so excited about the idea of making computers understand language and talk to us. And that’s how I got into deep learning. And being an outsider doing deep learning almost gave me so much energy to invest all my after-work time every day, working till 3:00 AM just managing experiments and making the code base a little bit better. It was quite an exciting project.

Jon: Were a lot of people doing research outside universities and corporate labs at the time?

Ivan: So at the time, no. And I think a lot of listeners might know this group called EleutherAI who started a bit after us. We were For AI — and how we approached it was a bit more closed off. We would be open to applicants, but we would be very selective in who’s actually doing research and who we’re getting compute to. I think Eleuther had a different strategy where they were way more open about who was collaborating, and they were also doing open research. But at the time, basically, nobody was doing research outside of labs. And one shout-out I want to give was GCP’s Research Cloud program made this possible. They gave researchers like us free access to TPUs. In fact, we were one of the first users, first power users, of TPUs outside of Google. And that’s how we ran basically all our experiments.

Jon: How do you think the renegade spirit that you had in those early days with Aiden reflects in the way Cohere operates today?

Ivan: We hire differently. We look for people like us who are from very different backgrounds, interested in the field and want to make a big impact. We’ve brought For AI into Cohere as a way to keep the effort going and provide people with unconventional backgrounds a path into research. And what we learned with For AI is that, hey, you don’t need this perfect background of doing all this research at Fair or DeepMind or Google to make impactful work happen. Some of the papers we published were with people doing research for the first time, and they brought some interesting ideas from whatever field they were coming from, and that made a big impact. So we took that philosophy in how we built the early team at Cohere. We didn’t look for the marquee brand name 10-years-at-Google brain, sort of talent. We found folks who are very clearly builders, and they’re interested in the field. We gave them a chance, and a lot of them have paid off.

So I think even though Cohere is our first official startup, Aiden and I learned a lot from that For AI experience. We’re also way more risk-tolerant, I think. The culture is also very playful in that sense. So we like to do a lot of exploring on the technical front. We do word it that way. We’re playing with the technology to find breakthroughs, and we’re very practical. When you’re running a research lab with basically no money, you have to be very practical about what you can and can’t do. And so we’ve spent some time in our engineering to make sure our tools and platforms are working well for our researchers.

Jon: So when you started Cohere, was it clear you were going to provide foundation models as a service?

Ivan: No, at the time, we weren’t sure what to do with a thing that can talk back to you. It’s such a bizarre experience. I think anyone who tried GPT-2 at the time has shared the same experience. It’s cool, but we’re not sure what we could do with it. So, we try to build a product as our first project at Cohere, which was an autocomplete that sat in every text box, and we thought we were just going to throw ads on it and start making money. But I think we were very naive at the time on how difficult the front end would be for something like that. Because even at the time, the text boxes were coded in all these weird ways. They weren’t just native HTML text boxes anymore. There were weird react components, and there were spell components.

So there’s all these edge cases. It just wasn’t our competitive advantage to focus on. So we decided to just rip out the front end and provide an API service. And we knew at some point, well, we’d find some use cases for the generative model. But we thought the embeddings and fine-tuning were also important features to have for developers trying to build something for production. At the time, 99% of NLP use cases required word embeddings, and we knew to take any model to production required fine-tuning some way to customize the product for your problem. So after a few months, we pivoted the Chrome extension into this API platform with generations, embeddings, and fine-tuning.

Jon: If I kind of zoom forward today, what we have as Cohere today is a collection of really high-quality foundation models as a service behind APIs and a particular traction with enterprises, a particular focus on that. Can you talk about what are the most urgent kinds of things that enterprises are doing with foundation models today?

Ivan: Yeah, so at Cohere, we apply an enterprise lens in all we do. Whether that’s the product roadmap, resource allocation or go-to-market. And so they’re very particular product problems for the enterprise, and that’s not endpoint specific. There’s fine tune control of how scalable you want your models to be. Making the precise trade-off between throw put and cost, endpoints that actually make it easy for you to adopt this use case. So stuff like better search summarization, building assistance for your customer support workforce. Stuff like platform management that’s like a non-starter for enterprises if you don’t have that. Being cloud agnostic, with a comprehensive set of deployment options. Whether that’s using the API deploying into your VPC being where the cloud AI products are like SageMaker or serving people on-prem. Beyond just the raw endpoints themselves, we think about the whole enterprise experience and the precise problems they have.

Jon: So, what are some of the misconceptions that CIOs have when they start engaging in these products?

Ivan: I think the biggest one that’s made its new cycles is hallucination. It’s totally reasonable to have an issue with these models that are basically making things up from your prompt. I think it’s actually a feature, not a bug. The model is doing its best to complete your request. The solution to hallucination isn’t to, oh, well, we’ll just fine-tune the model with all our company documents. Well, no, no, no. That’s not what you do with your employees either. You don’t make your employees memorize all your internal processes and documents. You actually just give them a tool, your internal document store, they do a search against that tool, find the relevant information, and then they produce an email, an answer, a report, whatever it is. We have to think about models in a very similar way. We don’t need these things to memorize. Instead, we need to teach them how to use our tools to retrieve the right information. So I think retrieval augmented generation is a big deal, and that’s something we’re serving enterprises with.

Jon: When we touch on retrieval augmented generation, RAG, and we talk about things like in-context learning, that’s often juxtaposed against things like fine-tuning, which you also mentioned. How do you guide your customers about when each of those would be more appropriate?

Ivan: I think the nice thing about this version of AI that we know with transformers and LLMs is that it’s actually quite intuitive to how humans learn and what the limits are applying our intuitions about how humans operate and learn. I would say you need to fine-tune if you need to teach the model a new capability. Like a new process, a new function. Whereas if you’re finding that, oh, it’s not accurate in its knowledge, if you just want to give it more knowledge, that’s a RAG problem. We need to teach the model how to retrieve the right knowledge. So basically, fine-tuning for new functions and give it a database for everything else, like knowledge.

Jon: Let’s talk about the limits of feeding data into a model. In the context of a bunch of announcements that have been made recently from large data platforms, there’s kind of a debate in our industry over whether it’s appropriate to feed lots of raw data into an ever-expanding context window or whether we should instead be building up a semantic model inside an LLM and feeding that back to run a deterministic code in some traditional environment.

Ivan: I’d like to clarify that’s not one or the other. I think, obviously, there is a balance based on the product that you’re trying to solve. As for the limits of these models today, I definitely don’t think these models are at the capacity of solving RAG perfectly — or even learning how to use tools. We have more scaling to do from what we’re seeing internally, 52 billion parameters or 10 billion parameters. Those models solve a specific set of problems, but the customers are demanding more and more complex and sophisticated use cases that we actually do need to scale up these models. So I think there’s more scaling work to do to raise the limits, yeah.

Jon: What do you see as the role of proprietary or enterprise, home rolled models in an ensemble together with hosted models from something like Cohere?

Ivan: I think there will always be model builders within enterprise because it’s such an interesting problem to work on. However, I think most users will just go with an out-of-the-box solution. It’s very hard to hire MLEs and keep them motivated. So if enterprises have the option to just pull an API or pull a SageMaker deployment to solve the problem that they actually want to solve, they’re going to do that. So I think it’s not unlike open source software where people pay for services on top, people pay for a managed offering because they don’t want to be experts in managing Postgres. They just want a database to store information. And I think we’re going to see a similar thing with foundational models. People don’t want to build up the internal expertise to pre-train a 200 billion parameter model. They just want to solve the language problem in their product.

Jon: When I think about this from an implementation standpoint, what are the most important kinds of data sources that enterprises want to prioritize?

Ivan: Yeah, so I think it’s less important about what the source is. I think the most important thing is the data feedback discipline. How well does their feedback loop work within their product? Are they able to get feedback from their users relatively quickly and easily and then feed that back to their ML team? I think that gives me more of a signal on whether that company would be successful with LLMs or not.

Jon: Does that have implications for who should be doing the work within the organization, whether it’s a data engineering team or a product service team? Who ends up doing the integration work within the enterprises?

Ivan: Typically, it’s a product owner who’s entrepreneurial enough to try this new technology, and they see the opportunity because they realize, hey, I have this feedback loop that gives me great RLHF data to further enhance my models. The great thing about these models is that they’re so flexible in whatever language format you give it’s quite easy to fine-tune these models.

Jon: What we see in a lot of waves of disruptive new technology is individual developers and teams excited to adopt the technology bottom-up without a lot of top-down controls. But senior executives are rightly concerned about the sensitivity of the data that’s flowing through these models. So what should enterprises know about the security of using foundation models in the enterprise?

Ivan: Yeah, I think with bottoms-up adoption, it’s about education. I mean, they should know that they’re potentially sending a ton of sensitive information to a third party like Microsoft or OpenAI. But the solution isn’t to ban this technology. There’s so much productivity to be gained that we should instead find an alternative. And that’s Cohere’s bread and butter is that we provide a private LLM that can deploy into your VPC on-prem, so that you don’t have these risks of sending us any data. We don’t actually see any of our customer’s data if they’re deploying with those options. So if you can’t ban it, well yeah, you have to provide a better alternative. Right.

Jon: So I’ll do a couple of lightning round questions, Ivan, that we can use to wrap this up. So one, what has been the biggest surprise to you about the developments of the past six months in our field?

Ivan: I’m surprised that the iPhone moment came so early. My timeline for this tech was three to four more years out, but unexpected things happen.

Jon: What would you expect to see happen in the next six months? What are some of your predictions of developments in our field?

Ivan: I think most work in front of a computer will be fundamentally different. I think the barrier of leveraging computers to do your work will be way lower. You don’t have to learn a special language to make computers do what you want. You don’t have to learn a special graphical interface to navigate programs. I think in a couple of years, if you know how to describe what you want to do, the computer will do exactly that, and that’s beyond just generating emails, but also taking a series of actions and doing most of your work.

Jon: Well, Ivan Zhang of Cohere. This has been just so much fun. Thank you so much for spending time on the discussion.

Ivan: Yeah, thanks for having me, Jon. This was very, very fun.

Coral: Thank you for listening to this week’s episode of Founded and Funded. If you’re interested in learning more about Cohere, visit cohere.com. If you’re interested in attending our IA summit, visit ia40.com/summit to request an invite. Thank you again for listening, rate and review the show wherever you get your podcasts, and tune in in a couple of weeks for our next episode of Founded and Funded with Chroma Co-founder and CEO Jeff Huber.

Related Insights

    Data Visionary Bob Muglia on Data, AI, and New Book — ‘The Datapreneurs’
    Announcing the Second Annual Intelligent Applications Summit
    IA Summit 2023 Annoucement
    Google’s James Phillips and Former Tableau CEO Mark Nelson on PLG and Scaling

Related Insights

    Data Visionary Bob Muglia on Data, AI, and New Book — ‘The Datapreneurs’
    Announcing the Second Annual Intelligent Applications Summit
    IA Summit 2023 Annoucement
    Google’s James Phillips and Former Tableau CEO Mark Nelson on PLG and Scaling