The Future of Biology Is Generative: Inside Synthesize Bio’s RNA AI Model

 

In this episode of Founded & Funded, Madrona Investor Joe Horsman sits down with Jeff Leek and Rob Bradley, co-founders of Synthesize Bio, a foundation model company for biology that’s unlocking experiments researchers could never run in the lab.

Jeff, chief data officer at the Fred Hutch Cancer Center, and Rob, the McIlwain Endowed Chair in Data Science, share:

  • Why a startup the right fit for generative genomics
  • How generative genomics could reshape research, drug trials, and more
  • Why now is biology’s “ChatGPT moment”
  • What makes Synthesize a true foundation model for biology (not a point solution)

Whether you’re a founder, biotech innovator, or AI researcher, this is a must-listen conversation about the intersection of AI, biology, and the future of medicine.

Listen on Spotify, Apple, and Amazon | Watch on YouTube.


This transcript was automatically generated and edited for clarity.

Joe: Maybe we can start off from the beginning. What is the founding story of Synthesize? Take us back to the conversation that kicked off this company, and why we’re having this conversation today.

Jeff: Rob and I have known each other for a long time. We’ve been academic colleagues for probably 20 years and have followed each other’s science. I moved back to Seattle about three years ago, and as academic leaders, we were running into each other in the halls. This idea of building a foundation model for biology was something that was on both of our minds. We started talking about it, and there was just enough information out there and just enough of an idea out there that we felt like we might be able to take a crack at it.

And so we started thinking about it a little bit and then immediately emailed you and Chris here at Madrona and said, “We need to talk to you right away.” Because we sort of felt like this was the convergence of the moment in time when there was the right kind of data to make this possible and the right kind of technologies to make it possible. And both Rob and I were really excited about trying something, taking a big swing, and trying something cool.

Joe: Maybe to take a click up in elevation. Today, there’s this Cambrian explosion of AI and bio. And I’ll maybe put “AI” in quotes here. There are lots of people who are building things that are probably equally well done by linear regression. But as you point out, there have been some amazing advancements in things like protein design.

There’s a Nobel Prize; lots of companies are now using this to develop drugs or to develop drugs in collaboration with the biopharma ecosystem. But you’re doing something different at Synthesize. I’m curious why do something different, but also how are you thinking about where this fits into the broader ecosystem of everything from people in your labs getting their PhDs and masters, all the way through big pharma trying to find the next blockbuster drug?

Jeff: When we set out to do this project, we didn’t think about solving one particular biological context. There are certainly specific problems we could have tackled, and we both have in our academic careers, which are very highly contextual to a specific disease in a specific set of people. And our goal from the start was to try to do a model that would, in a similar way to large language models, have enabled many applications, we wanted to build models that would enable many different applications. And so I think that’s one difference of why.

One way in which we distinguish ourselves from a lot of different groups is that we’re trying to build this broad-based foundation model with lots of applications. And so whether the pharma company of interest is interested in cancer, or whether they’re interested in neuro, or they’re interested in cardiovascular disease, our model has capabilities in all of those areas. And so for us, it’s less about one specific target and really building that foundation that lots of people can use to accelerate most, if not all, of science across drug development. And so we’re excited about putting it in people’s hands and seeing how they can try it out and use it in a lot of different contexts.

Rob: I think there are a lot of reasons from computer science and machine learning literature to think that modeling the most diverse possible training data set will give the best results. It’s clear that with language models, this is the case. For a long time, people made really focused ML models, and they made progress. But it just turned out that taking the biggest corpus of text you could and then shoving it into a model and letting it model it.

Joe: Yeah. The bitter lesson.

Jeff: It’s pretty tough if you were working on that very specific application, then to have these foundation models come in and just do better on all of the specific applications.

Rob: It’s incredible. Right? I think very few people would’ve predicted this. That like, if your goal, for example, is to help people with legal contracts, maybe you’ll do better by starting with modeling the entire internet.

Jeff: It was not intuitive that that was going to be the solution. And I think probably the same for us.

Rob: Exactly.

Jeff: If you care a lot about this gene in a particular brain of a particular human, it’s not clear that modeling the entire corpus of gene expression data ever collected is the right way to solve that problem.

Rob: Exactly. I mean, I’ve built various ML models throughout my career in my highly specific scientific problem. And some have been useful, some have not. But I would say none of them have made me say, “I can do something I couldn’t do before.”

Joe: Can you explain how this fits into how we think of biology and maybe the central dogma biology that we all learned in high school?

Jeff: To recall your high school biology — everyone has the same DNA sequence in all of the cells in their body, more or less. But then there are little chunks of DNA sequence that code for something called genes. Those are encoded in RNA, and we measure those quantitatively. And so your heart cell is different than your brain cell because you have different abundances of these genes expression, which ultimately get translated into proteins and do functions in your cell.

And RNA is a molecule that’s much easier to measure than something like protein. So there’s a large abundance of measurements of this molecule. And so we, when we were thinking about this idea, we were really focused on where is the capacity right now to generate the kind of training data that would enable us to build a foundation model that really spanned a lot of different applications. And for us, that was RNA.

And it helps that both of us have a lot of experience scientifically, both of our careers are kind of built in that area. And so that was where we started to focused at the beginning. But the nice thing about an RNA molecule is it’s dynamic. It responds to environmental stimulus, it responds to drugs, it responds to what you eat in the day. And so you actually get this readout of biology from this molecule. And so if we can model it, if we can generate data that looks like realistic data from humans, we actually get a real window into the biology of what’s happening in those humans.

Joe: Totally. Like real-time biology?

Jeff: Yeah, exactly.

Joe: And so this is an RNA model that you’re building. Where does this fit into your everyday experiments and the biology you’re trying to unravel at the end of the day?

Rob: I think it’s helpful to always think about analogies with other AI models. At this point, we’re all pretty familiar with large language models, right? And these are generative AI models in the sense that you ask them to do something and then they generate something for you. So a large language model, you give it a prompt, you ask it to do something, and then it creates a bunch of text. And there’s something that’s really useful, lots of people we communicate that way.

We write lots of text. So we wanted to do something like that for biology. What scientists do; what biologists do most of the time is, we generate data then we analyze the data and we do experiments. We might run a clinical trial, and then we get the data and analyze it. And so we wanted to model it was the analogy of an LLM, but for what biologists actually do every day, which is to generate data to do an experiment, get the data to inform the next one.

Joe: And so in building this platform, how do you think about the problems that need to be solved? What can’t be done today that is going to be possible in this new world of Synthesize?

Rob: I think about my own personal scientific experience as a person working at the computer, at the bench, running a lab, et cetera, where I, and people like me, people like Jeff, we’re constantly faced with the need to make decisions when we don’t have enough data. And sometimes that’s because we just don’t have time to get the data. Sometimes it’s because it would cost so much money that it’s not possible, but a lot of the time it’s because the data is just not reachable.

There’s no way we can get it. Now imagine, for example, somebody’s developing a drug to treat neurodegeneration and that drug acts on cells in the brain. There’s no way that you’re going to be able to look inside and see what’s happening in the cells in a patient’s brain who’s taking this drug. But we need this information to make a decision. And so scientists are constantly faced with this impossible task. We have to make decisions. Do we proceed with the drug development when we just can’t get the data that we need? And so we wanted to build a model that would let us get these data.

Joe:. I quite like when all my brain matter stays in my brain. So-

Jeff: That’s the right place for it to stay. Yeah. Exactly.

Rob: Even if there are ethical challenges, you might not want to participate in this experiment.

Jeff: That doesn’t make it less important. It’s such a critical piece of understanding how a drug might function. And so a lot of these experiments that we want to do — an example from my background is the very first data I ever analyzed came from a study where they randomized patients to get either endotoxin or saline solution. And endotoxin is this horrible thing to get where you get really, really sick if you get it. And so people were randomized, so they had to kind of wait and see whether they were in the control group or the get sick group.

But they could only do it on a very small number of people. And they were trying to study the genomics of blunt force trauma. And so this is something that’s pretty hard because you can’t put people in car crashes. So we could do it only at this very small scale where it was just a few people that got randomized to get this really bad intervention. But if you can do that experiment in a model instead of doing it in a human, we could do it for hundreds of people, thousands of people.

There are no constraints around what the experiments you can do. And similarly, if you can sample people’s brains, if you can sample all the tissues in their body, you can get a much more comprehensive view on what’s happening in response to disease, in response to trauma, in response to treatment. And you can do it at a scale and a speed that’s sort of really hard to pull off in a traditional laboratory experiment.

Joe: So I think, as you’re right to point out, this is solving an impossible problems. These are inaccessible samples. These are experiments that are unethical to run. Was there a light bulb moment for you where you’re like, “Wait a minute, I think I can actually simulate these things. It’s maybe not intuitive to me that this would actually work.”

Rob: That’s an important question and something that we thought about a lot. We’re both scientists, I think scientists spend decades being trained to be

Joe:Skeptical by nature?

Rob: very skeptical. Very rigorous. So I mean, we really started out actually by thinking not about how do we make the best model possible, but if we had a model, how do we test if it’s working, what are the ways that we can assess is this doing something useful? Is it producing data that’s meaningful that would actually be useful to people like us? And we actually played this game, this is Jeff’s idea, which is a great idea, which is we started taking the model and simulating an experiment.

We take the model, we would have it generate data for an experiment, and then we put that data next to data from a lab from a parallel experiment. So data either from a scientist doing an experiment in cell culture for example, in the petri dishes or our AI model doing the same thing, but of course in seconds or minutes instead of weeks or months. And then put those data together. And then Jeff would send me these data and say, “Can you tell which is from a lab and which is from the AI data?” And I’ll say for a long time, it would take one or two seconds to say “That’s the AI data.” But there came a time when I couldn’t tell.

Jeff: And the story within the company is this is reinforcement learning with Rob feedback. He was one of the best people at picking out which data set was the AI data, so and which data was the lab data. And sort of once we could get it past Rob, we were like, “Okay, we’re kind of onto something here. We’re at a point where these data are really looking like what you would get from an experiment. They’re sort of indistinguishable.” And a really important point that is actually Rob’s point is when you’re measuring machine learning models, you usually do them in bulk measurements.

You’re measuring their overall accuracy, root mean squared error, things like that. But when you’re measuring a biological foundation model, it’s about what one gene in one environment in one context. And so the way that you measure errors isn’t in this bulk context, but it’s like looking at this particular receptor in this particular tissue under these conditions. And so we would look at areas that were very specific to Rob’s research area and have him look for the exact sort of genes that should be turned on and turned off in the right context. And if you’re savvy about this, you’ll be able to detect them pretty quickly if the models aren’t really accurately describing the whole distribution of what’s going on.

Rob: And I think this is a really important point, and it’s both a challenge and an opportunity, right? The challenge is that in order to build an AI model, train it, do inference, all these kinds of things, you need these aggregate statistics that describe how well the model is recapitulating the kind of whole shape of your data. That’s how you train a model. That’s how you assess it. But at the same time, exactly like Jeff was saying, much of biology, maybe even almost all of biology is about highly specific things. Right?

I’m an RNA biologist, but what I really know about is a couple of genes, and I know about how those genes interact. They make proteins interact with a couple of other proteins and this is my area of expertise. And the same goes for most other biologists and the same is true for drugs. Drugs tend to have ideally a few specific targets that they act on. The same is true for physicians treating patients; they specialize in specific areas.

And so we had to kind of merge these two goals, right? To have a representation of the whole shape of biology, of gene expression data, all these experiments that people have done while also making sure that we captured all the fine details. Because I can tell you that for me as a scientist, if somebody comes to me with a machine learning model and says, “This represents all the data really well. Look at my statistics” and then I look at the one gene that I’m an expert in and it doesn’t seem to understand what that gene is up to, this model is not useful to me.

Joe: That’s exactly what I did the first time you shipped me over the model. I was like, “I’m going to plug in the experiments I know.”

Jeff: I remember that. You sent us back exactly your area of expertise.

Joe: Here’s the genes I want to see.

Jeff: Exactly, these are the genes I want to see.

Rob: It’s exactly right. I mean, maybe it’s like if you have a large language model and it produces text, looks pretty good, but there’s four words that it always misspells. We, as users of language, are going to notice this and fixate on it.

Joe: So I want to come back to the data, but first I have to ask why a company? Right? You both are professors, this is your bread and butter of you could build this, get some amazing papers out. Why do this inside of a company as opposed to just, “Hey, my lab now does synthesize.”

Jeff: That goes back to the email I sent you and Chris right at the start was this was such a cool idea, and we wanted to get going on it immediately. We didn’t want to have to wait. And I mean, there are many amazing things about being an academic researcher, but being able to capitalize on a big swing idea on a very short time scale is a hard thing to do just the way the systems are set up. And so we wanted to move really fast and we wanted to go really big and it felt like the best context to be able to do that was as a company. I feel like that was what drove a lot of our interest in moving this way. What do you think?

Rob: I totally agree. Velocity and scale. We sent you this email, we had some conversations and then we were going. We were doing the things. And that’s exactly what this needs. And I think the second point is scale. We want to build something. Like Jeff mentioned, I mean we’ve done a lot of things in our career and it’s been really awesome, but we want to do something that’s going to affect a lot of scientists, maybe all scientists in biology. And to do that we need scale. We didn’t want to model just a couple of gene expression experiments. We wanted to model everyone we could access.

Joe: Yeah. I think this rhymes with a lot of what we see on the tech side where there is a moment right now. Do you think when we’re in the biology realm, is there an inflection point specific for biology. And kind of a two-parter here. Has life sciences and biopharma had this ChatGPT moment or is that still around the corner? Have we truly had the AHA as a field for this?

Jeff: So to answer kind of both questions at once, I would say I don’t feel like we’ve really had our ChatGPT moment in the sense that there haven’t been a lot of these models that have been deployed in a way that anybody could use them, access them, and build on top of them. Even people who are building sort of something that would be akin to a foundation model have tend to do them inside of a single company and not share them with other groups. And so I think there haven’t been as many swings at these sort of foundation models that anybody can use except for in one space where the protein space feels like there has been some of that work, like the protein structure, protein design space.

So some of our colleagues work in that area, and they’ve been based in a similar way on open datasets that then they built models on top of. And we’ve seen the sort of explosion of interest as people have made those available. And so I think we feel the same thing is possible in all the downstream consequences of biology past those sort of drug target identification with proteins and things like that. And so excited about really contributing to that. And I think that moment is coming though because there is the availability of these huge collections of data that’ve been supported by federal funding and lots of other organizations, and now there’s an opportunity to capitalize on really doing the same kinds of things that were done with large language models. And that’s certainly been our approach to this problem.

Rob: I think there are even closer analogies to be made with large language models. The thing that I find so inspirational about these large language models that we have now is not just that they can do things that I can do really well. Right? I mean, it’s cool that they can write text, this is very useful to me, but they can do things that I can’t do and I never can do. Right? They can translate between any two languages instantly. They can program way faster than any human ever can.

They can do these things that are just beyond human capabilities right now and are probably never going to be within the scope of human capabilities as we understand them. And this kind of by analogy, what I find equally inspirational is — it’s amazing that the protein structure problem has, in many ways, been at least partially solved. I think that’s incredible. But what I find truly inspirational is protein design, making novel proteins that didn’t exist before, that don’t occur in nature. And I think we can do the same thing in other areas of biology. That’s what we’re trying to do here for gene expression.

Joe: That comes back to the data where there’s no internet to scrape for biology. I mean, there are lots of papers out there, but it’s messy. Where do you think the field needs to go in data? Why do you think there is data sufficient to build what you’re doing at Synthesize and what is the foundation on the data that you’ve been working for the past over a year at this point to get to a generative model for biology?

Jeff: I think this is where picking the right molecule is so important. Gene expression data is amongst the molecules that you see in the central dogma biology, the one that’s sort of measured in the most conditions and been studied in the most context. And so there was a real opportunity to capitalize on the fact that the field in general has measured the experience of humans in a variety of different contexts and measured their RNA. And so while there isn’t an internet to scrape, there are a huge collection of existing experiments. The big challenge though is that it’s using other people’s data in the sense that there’s other experiments that have been done.

They aren’t normalized and synthesized to be worked together in one specific context. And so it’s a huge amount of both intellectual work and engineering work to bring all these data sets together and set them up in such a way that you can actually train a model on top of them. And so we’ve been capitalizing on the sort of ability of our team to bring together a large collection of data sets, the expertise that they have in really synthesizing and normalizing the metadata so that the descriptions of those experiments are common and unified across thousands and thousands of human experiments so that we can build models that understand the contextual representation of gene expression across different conditions, across different tissues, across different treatments.

Rob: I think here we really have to give a shout out to Jeff who saw maybe not the exact use of training generative AI models, but certainly the importance and potential of creating harmonized standardized data sets a long time ago.

Jeff: So my lab has been doing that for a while and I didn’t realize it was going to be a training set when we started. We were normalizing and synthesizing data largely for reproducibility, for helping scientists do their work. And that work sort of was where we built our original prototype, was on those data that I had developed in an academic context.

We were able to use those to build our first prototype of the model that we showed to you when we got together. Ultimately, our team has gone wildly beyond where we were when we started with that data set, especially on the side of sort of normalizing and harmonizing the sort of descriptions of the experiments. But yeah, that was sort of part of the reason why we had the AHA moment is we had already been thinking about these big collections of data that we were using in an academic context.

Rob: One of the things that’s been interesting about building this big proprietary data set where we have pair gene expression experiments and then this highly curated metadata we put together is we can see really unexpected things. So one thing that we noticed is that a surprising fraction of experiments are closely related to ones that our model understands well from seeing in the training data. So the way we, this is kind of getting into the weeds, if you’ll bear with me, but we really were into scientists skeptical, et cetera. We wanted to validate our model and so we thought, “Okay, we want to predict future gene expression experiments, so let’s validate it doing that.”

So we picked a date and that was our training data cutoff and all data that is in the public domain generated before that date we trained on and everything that was generated subsequently by scientists and then deposited in public archives, we called our validation set or test set. So we never looked at that data, right? It was totally holdout data and those were future experiments for the purposes of our model. And we can go into how well our model did on that data, which I think did very well and surprised all of us. Really kind of amazing.

But the point I was going to make was about metadata. And one thing that’s really interesting is because we created all the metadata, we could look at statistics aggregated across experiments. And one interesting thing to note is that approximately 95% of all experiments conducted in the future after our training data cutoff date were either in biological context like say primary tissues or cell lines or involve specific chemical perturbations like small molecule drugs or biologic compounds or gene knockdowns, perturbations to genes with CRISPR, et cetera, that we’d seen before. So the key point is that 95% of all future experiments were in a domain that were very close to our training data where we had extremely strong reasons to believe our model not just might perform well but should perform extremely well.

Joe: So can you maybe give a specific example of how you would use either in your labs or someone in your labs or someone in the biopharma ecosystem would use this? So I think putting a concrete example together would be useful for people.

Jeff: I’ll say from my lab, so my background is in biostatistics, that’s where I got my PhD, and so I end up helping people design studies all the time, whether those are clinical trials or preclinical studies or just research studies. And in all of those, you have to figure out what sample size to collect, which population to look at, how do you sample. There’s a wide variety of questions, and usually, you’re just making it up. You’re trying to figure out what it might be, and you’re gambling on a lot of things that you’re making a lot of assumptions about. And so now, we don’t have to make those assumptions. We can just generate the data from all these different circumstances and then we can pick the design that’s going to maximize our chance of success. So I think this is going to accelerate a lot of things where you have to design those studies in advance, and we can now kind of get a sneak preview of what the study’s going to look like before we ever do it, which we couldn’t do before. So it’s really exciting.

Rob: I’m going to give an answer that requires really going into the weeds.

Joe: I love getting in the weeds.

Rob: So one of the things that we’ve done that is super technical but also super exciting is that we have developed a model that lets you not just specify an experiment and then generate the data that results, but also add in data from a lab or a clinical sample and then see what might happen if you modify it. So what that might actually look like for example, is you could take an experimental description like say a sample of a particular cancer treated with a drug of interest and then simulate the gene expression result. That’s something our model can do that we’ve been talking about. The new thing, this in the weeds technical thing I’m talking about is that we can also give it information about what that sample might look like without the drug.

And this is super interesting because that information could, for example, come from a biopsy of a patient who has an active cancer and we’re trying to figure out which treatment course is best so we can give that information to the model and then it can give a patient specific prediction about the effect of the drug. And that is what I see as totally transformative. As you know, there’s lots of academics, lots of companies who are trying to do precision medicine and these efforts are incredibly important, incredibly exciting. But I think our contribution to that is going to be that we can have this huge model that can model essentially anything, but that we can also tailor the results to one person, to one sample.

Jeff: I think that kind of speaks to generally the approach that we’ve taken, which is we want to build these big foundation models that allow you to tackle many different applications. The two things we just talked about, if you go to talk to scientists and say, “Are these two things related to each other?” They’re super, super far apart.

They’re totally different academic disciplines. You would be talking to totally different humans. But our sort of underlying foundation model enables both of these kinds of applications and many others. And so if you ask me what I’m most excited about, it’s actually the things neither of us has even thought about yet.

It’s sort of what the grad student who’s staying up late and trying to figure out what problem, how to solve their problem that tries to apply this and can move forward a whole field that didn’t have answers before. And we don’t know what all of those applications are, but I think that’s so exciting about this to me is sure, we can come up with our ideas, but I’m excited to see what other people come up with.

Joe: On that, I guess let’s fast-forward 10 years. What do you envision the new lab is going to look like with tools like you’re building in the hands of every scientist having this out there for anyone to use? Where do you see this being game changing? How does it impact not only the day to day but I’m going to call it the year to year for the biopharma ecosystem?

Rob: Well, we’re really different scientists, so maybe we can just give our own answers. I would say for me the future I’m excited about that I want to help build is where there’s a seamless blend of what we’re calling generative genomics and wet lab experimentation in clinical trials. I think there needs to just be constant flow of ideas, data, everything between these different areas that a scientist in 10 years or hopefully we’re trying to move quickly, two or three years can do something like simulate a drug screen using our model in an hour in their computer.

Use that then to choose a cell line where they’re going to do an experiment in the wet lab. Get those results back and then use that to then inform, “Okay, we actually need data from this other system using the AI model,” et cetera. So there’s just this constant interplay between different sources of information and data.

Jeff: So my lab is largely computational and so I don’t have a wet lab like Rob does. So for me, it’s really empowering for all the students that work with me. We typically have to form collaborations with people like Rob to generate these data and sometimes that’s the only way to do a kind of experiment and sometimes there are creative ideas that we just can’t find the right collaborator for. And so this is empowering students and postdocs and site research scientists to try experiments that they would maybe not have the right collaborator for or maybe not have the right funding for, be able to go pursue that kind of wild idea that would be tough to pursue otherwise.

So I think I’m really excited about that enablement. The second thing that I think about a lot is how many bets we make. We have to bet in people’s time, in resources, and those bets are often made on relatively little information. So you sort of read the literature, you know what your friends are working on, and you’re sort of gambling that this next idea is going to be the right idea and that the experiment’s going to work out.

And as a person who collaborates with lots of different labs, I’ve seen firsthand how many of those bets don’t pay off and what the consequences are for science and both speed and the people that are actually working on those projects. It can have a huge impact on their careers and their lives. And so I’m really excited about increasing the win probability on every science bet that people have to take. They get to see a little bit in advance, they get to make better bets. Even if we increase that by 15, 20, 30 percent, that’s a lot of resources, that’s a lot of speed that we’re buying for the whole field.

Rob: Our vision is that what we’re building can be used throughout the research chain and the drug development value chain. The examples we just gave are of basic research or maybe translational research, but I hope we aspire for our models to be equally useful in a clinical setting. Earlier on I was talking about how scientists have to make these impossible decisions. Right? You have a certain amount of data, you’re not going to get any more. It’s not enough to know, but you got to make a call. And I think a great example is with clinical trials. Like phase one trials are designed to test drug safety. That is their purpose.

Nonetheless, if you can get any information on efficacy, that is going to be very useful. And so people find themselves in a situation where they’re using a trial that was not powered to make efficacy statements and you’re kind of reading the tea leaves on are there efficacy signals? Is this going to inform the decision that we make about moving forward? Which is a very important decision. Because if you move with one trial, you probably won’t move forward with another one, right?

It’s not just about that one drug, it’s about all your shots on goal. You could imagine, and we’re actively working on this, doing things like taking our model, conditioning. This is this reference conditioning, in the weeds thing I was talking about earlier, conditioning on the results from your small, say N equals 12 patients phase one trial, and then inferring what the results would be like if you’d run a fully powered trial with hundreds of patients. Now, of course, this isn’t the same as running a trial that costs a hundred million dollars, but it’s a lot more information than you had before. And it’ll-

Jeff: And cheaper too.

Rob: It’s a lot cheaper. And it’ll help you make a better decision.

Jeff: And if you’re going to make a $100 million bet, you might want to have some information before you make that $100 million bet.

Rob: Right? I mean, that would just be so useful to get more information to say, “I have a little bit more confidence now in either dropping this program or doubling down.”

Joe: Rob, Jeff, I really appreciate the time today. I want to leave it for you for the last minute. You’re building something for scientists that they can pick up today. Where can people go to learn more about Synthesize, to get access to your models, to start becoming the future of science where I can be empowered by a foundation model?

Jeff: First of all, thanks for having us. We really appreciate it and thanks for being such great supporters in general. It’s been amazing to work with you and Chris and Matt and the rest of the team here at Madrona. People can go to Synthesize.bio and access our models, whether they want to access them directly through our web platform.

We also have API access that they can get access to them in R and Python, which is where a lot of computational biologists live. So they can go access those today and they can go read our preprint about our GEM-1 model that’s online right now and they can see how we’ve carefully evaluated our results and make sure we’re being skeptical scientists.

Rob: Just to double click on that, we really want as many people as possible to use our models. We’re making them available for free right now so that anybody, regardless of where they are or what they’re doing, can experiment with our models, see where they work well for them and let us know if there’s areas where they don’t. Because one of the cool things, the opportunities about a model like ours is that it can be used for almost anything in biomedical research.

And so we’re still figuring out what are the areas where there’s going to be the most transformative impact. Just like with LLMs, I mean five years ago, I wouldn’t have told you the LLMs would revolutionize programming. I don’t think anybody would’ve said that. And we’re trying to figure out what are the areas where we’re going to see the most increases in velocity and scientific power from using the model that we built.

Joe: That’s amazing. I’m super excited. I think we’re in the very early days of this, and so I’m really excited to see where this goes. Like I said, not ChatGPT moment yet, but I think this is going to be impactful for the future of drug development. So thank you both so much for coming on today, and look forward to continuing to work with you.

Related Insights

    The Era of Generative Genomics with Synthesize Bio
    Building an AI-Driven Biotech: Why Product-Led Platforms Will Win
    Beyond the Buzzword: Why ‘TechBio’ Needs to Go and What’s Next for Biotech

Related Insights

    The Era of Generative Genomics with Synthesize Bio
    Building an AI-Driven Biotech: Why Product-Led Platforms Will Win
    Beyond the Buzzword: Why ‘TechBio’ Needs to Go and What’s Next for Biotech