There’s a theorem being tested about how AI reaches general intelligence. Carina Hong’s answer: through mathematics.
Carina is the founder of Axiom, and in less than a year of building, her team’s AI has scored a perfect 120/120 on the Putnam mathematical competition — a test where more than 50% of brilliant undergraduates score zero. More concretely, Axiom Prover has reached 98.93% on a Lean software verification benchmark that leading alternatives solve at 11–12%.
In this conversation with Matt McIlwain, Carina explains her central thesis: that math and code are the two pillars of the digital world, and that any AI infrastructure missing a formal verification layer is structurally incomplete. She walks through the history of verified AI research at Google, DeepMind, OpenAI, and Meta, and explains why each effort stalled just as commercial pressure mounted. She describes what makes hardware and software verification the natural first commercial market, and what Axiom discovered when they tested their prover against circuits that industry-standard formal checkers could not verify.
For founders and operators trying to understand what’s actually changing in AI capability, and for anyone building in adjacent infrastructure spaces, this is a map of where the frontier is and where it’s heading.
Listen on Spotify, Apple, and Amazon | Watch on YouTube.
This transcript was automatically generated and edited for clarity.
Matt: It’s great to be with you, Carina. To me, what’s interesting, having been an investor from day one in Axiom, is there are these debates about what is artificial general intelligence and what is artificial super intelligence. Curious how you define, and whether you choose to define it as AGI or ASI, how you define that, and then we’ll go from there.
Carina: Thanks, Matt. It’s always great to chat with you. I feel like the word AGI actually currently, it has this emotion element in it. It’s the belief that reasoning capability can be so abundant, and that you can have copies of any genius you pick in the world. And it compounds from there. You can have these geniuses work with each other, collaborate, and then ideas fuse between different scientific fields.
And then that capability then trickles down to the day-to-day life, for ordinary Joe to enjoy the same amount of capability and speed of knowledge, discovery, generation, and also applications. So that’s why I think a lot of people when they say, “We are AGI-pilled,” that’s what they mean. They believe that future is going to come.
Then there are also other people who define AGI as more practically. When that capability really flows through everyone’s consumer’s day-to-day life and enterprise workflow, because I think we’re quite ahead in the capability, and there are so many people building applications, which is great, but it’s not like a time that someone, say, in Alabama, would use whatever that is the most state-of-the-art coding model, and say UN code verification tool to build a project that is in their passion. And we’re not there yet.
Matt: Yeah. I was just in Georgia, and I think people are using the state-of-the-art of AI without realizing they’re using the state-of-the-art of AI. And of course, as you know, and we’ll talk about, the state-of-the-art is moving rapidly.
Carina: That’s right.
Matt: So you’ve made this provocative statement that math is AGI. What does that mean? And what should we take about the statement?
Carina:Yeah, I think that is the DNA of Axiom. We’re very clear that, in the future, we’re going to see so many promising markets. The first one is best versus verification. In the future, we’re going to see more markets, but the DNA of the company remains math. And in our view, we think that from math and code, which, if you understand it in a sort of Curry-Howard way, are actually twins. You can then do a lot of things in the kind of use software and generation verification to control things that are in the real world, like physical engineering stuff. And then you can go to everything else. So from our worldview, math is AGI because that is the starting point of reasoning.
A lot of other people have different definitions of AGI. I would say that if you take the view that math is AGI, you have your action plan, you execute that a lot faster, because math is a lot cleaner. So, I was on this AI for science conference in San Francisco. It’s a really great panel, and everyone is doing experiments with the physical world. And obviously, the latency of iteration is quite long. In math, just like in coding, the digital world, your iteration time is a lot shorter.
And the other thing that’s really interesting is that because math is data-scarce compared to the other domains. So you hit some of the roadblocks a lot faster, which allows you to debug those roadblocks. So that’s, I think, one interesting difference between math and coding. With coding, we were able to make a lot of progress at the start locally, just because there’s a lot of coding data.
Matt: There’s a ton of data, yeah.
Carina: That’s right.
Matt: Repositories everywhere.
Carina: And that’s just massive amounts of data compared to, say, informal math data, or formal math data, which is even smaller.
Matt: Yeah, say more about this math scarcity, data scarcity issue, because I think that that’s one of the areas that you all are creatively trying to address.
Carina: That’s right. So, math is such a data-scarce domain. I think a lot of researchers at the beginning were kind of blindsided by the idea that, well, we just take whatever has more data in math, and take that research path, which is informal math. So what they do is they start scraping textbooks, and papers, and chain of thought reasoning data through expert data labeling companies. And they try to basically build a math model that way.
It worked for a bit. So, it worked for, say, high school level, where you really just have a lot of data. And it’s a lot easier to say find human experts around the world at the high school, international math Olympia level, than say at the postdoc, or junior researcher, algebraic geometry level. We thought a bit more first principally. We were like, “Well, we think we need structured data.” And even if that means there isn’t a lot of structured data right now, how do we invent to create structured data? Instead of going to the unstructured route, because today we have a lot more unstructured than structured. And the structured data looks a lot like code. I mean, it’s mostly written in the theorem-proving language called Lean, which has its own history, that’s separate from the deep learning development timeline.
And just wasn’t any Lean data. And people had to hand-code the undergraduate textbooks. They start with algebra, and then they did analysis. I remember this was 2020. I was at MIT. Fast-forward to, I think we started July last year. And we started under the belief that we can try to make progress on formal proving pretty fast. We were unclear what was the moment that formal math can catch up to informal math.
Matt: So take us a little bit, because there’s, I think, two interesting things here. One is this idea of formal and informal AI. Let’s unpack that.
Carina: It’s almost like a religion.
Matt: Yeah, go ahead.
Carina: People just debate about it. And I cannot convince someone who does informal math to look at the formal math. I’m like, “You’re missing out.” Really are. And it’s usually the people who are working on reinforcement learning for other domains, like coding. They know that if you do RLVR, it’s incredible, they know that math is most interesting when it’s proofs, and it’s most interesting if you assign reward for the proof instead of just a numerical answer. So, they tend to join this movement.
Matt: Some people ask the question, “Why now? Why is this all important today?” And I usually start with, well, in a world where we’re building these LLMs, which by definition are non-deterministic, they’re predictive systems, you need some kind of system that’s a prover, that’s a verifier. How do you think about that on the why now question?
Carina: So, first of all, we could not have been early. I think that’s right. Because we’re seeing the… I mean, informal is doing pretty well. It’s at a time when we needed to get started to play catch-up in the data-scarce domain.
I think around the time of 2024, there’s this amazing achievement of AlphaProof. And they use something called Monte Carlo Tree Search. That is quite an expensive system for a startup, expensive method for a startup to do. It was not clear to me that was the only way. Now, in terms of why the urge to make it happen, both from my perspective and a lot of others who join us, I think it was at a time when people had more faith in structured data. So, I think if we talk about the latter half of 2024, people thought about coding as just another enterprise application. Coding is no different from developing an AI financial analyst. It’s no different from developing an AI McKinsey consultant.
Now, obviously, we know now coding’s a lot different than that. I don’t want to talk about recursive self-improvement, but perhaps if you can agree on, at least, that is like autocatalytic. So, coding does have that advantage. We’re not far, I think, from building an AI that can do AI research. Now, the definition of that is very loose. You can define that as getting publications, or even spotlight papers, and ICLR, ICML, NeurIPS. You can also define that as being as good as one of those researchers out there at frontier labs.
Matt: Yes. And some of the great researchers in academia, I was talking with our mutual friend, Carlos Guestrin, and he and some of his teams have been working on some of this area of continuous learning with reinforcement learning.
Carina: Yeah. But I’m sort of fortunate and grateful that the RL coding community, they do strengthen humanity space in this other religion. And I think that’s one reason. So the code gen development. The other is the informal reasoner getting stronger, which actually also helps us. In the AI for math community, there’s this paper called “Draft, Sketch, and Prove.” The idea is after you produce an informal proof, and you try to produce a Lean sketch, and you try to fill in the stories that are the voids of the Lean sketch, it works better. Now, the first part is obviously beneficial if the informal models are doing better. So, September 2024 was 01.
Matt: And so in a sense, it’s iron sharpens iron. They’re making each other better.
Carina: That is correct. And I think that is something that Axiom can talk about a lot and a lot, right? And we’re not talking about the only formal Monte Carlo Tree Search rigid stiff approach. I do think that we are able to get the benefits of both systems. And I think the third thing is just like after September 2023, Lean4 becomes a lot more workable from an engineering perspective. It is still surprisingly not workable. That’s why we built AXLE, Axiom Lean Engine, which is this sort of infrastructure. There are a dozen tools, and we’re adding tools to it on a weekly basis. I believe there are people around the world, literally all across the world, using it. And people are using it interestingly with their favorite model. We see the choice of literally any model. It can be a large language model that’s off the shelf, it can be the specialized model. Even some of our competitors’ models.
Matt: I love this.
Carina: But they use all these models with the infrastructure of AXLE because it does solve a pain point that Lean is slow.
Matt: We were together the week you launched AXLE. And it’s incredible to see the progress.
Carina: Yeah. The growth curve of the user metric is quite high.
Matt: And I think it’s at building that ecosystem out where the iron is sharpening iron that is, I think, going to help the whole community benefit. I love that framing. And I think for some of the folks in the audience that are saying, okay, how much is this getting applied, or beginning to be applied? This tension as it were between math proofs and coding. There are companies, big companies in some cases like the Amazons and Googles that are already working on this area. You’re a little bit familiar with that. Maybe talk about what they’re doing, and then maybe where you think the world’s going.
Carina: So, I think, I certainly draw a timeline of verified AI. So 2016, Google made a pretty important bottoms-up strategic investment. There are good papers coming out of it. Playing with a language, they’re improving language called HOL. HOL List is one of the papers. And then there are white papers, I think in 2019 by Christian Szegedy, a friend of mine, when he and Tony Wu, who later became xAI’s co-founder, that was like 2016, 2019. And then that’s Google, right? Mothership.
And then 2021 was DeepMind. Also bottoms up effort. One intern decides to apply this geometry. It’s just personal hobby. So, when this person was a high schooler, he has no interest in the other Math Olympiad problems compared to his interest for geometry. So he just said, “I just want to do geometry.” And that’s such a smart idea. I sometimes think of it, because when I was a kid, I could never solve the geometry problem on the Math Olympiad in the geometric way. There’s something about my brain, and I think it’s still like now, I don’t have any sense of direction, I can’t envision spatial objects. I can’t do topology in the visual way. When I was studying geometry, I couldn’t picture those curves and surfaces. It’s almost like my brain is inhibited in that way. On the other hand, I overcompensate by being the strongest algebraic brute force person you can find. I mean, when I was a kid, I would eat all the inequality problems.
So, from a strategic point of view, I just figured out that I can express all these lines, points, intersection points, circles, triangles, in complex coordinates, and this is a brute force method. So then I just throw away that picture. I just look at work with this complex coordinates, and I do it the algebraic way. And it would took me three times the time.
But it tells you something philosophical. That’s not quite exactly the AI corresponding thing, but it’s like, if you have a geometric figure, you can try to express it in a symbolic language, for both that human kid, and in this case, this intern of Google DeepMind, who then that was a wildly successful project. They expressed the geometry problem in the vector language, not Lean, even more smaller domain specific vector language. So, all these pictures suddenly become computer code.
I think they later gave up on it. I don’t really quite know. After Alphaproof, it was such a success. And so somehow the informal math team, Gemini had the mainstream voice and the non-lean approach, which is Gemini post-training, took over, they started develop evolve, that is discovery, construction not proving. So we didn’t quite hear much from them.
Matt: Well, I think they got pulled a little bit given some of the new model companies that were making such good progress, and they had to focus for a season on the here and now, which of course creates the opportunity for Axiom Math. So, where do we go from here? Where does the field go, and then where does Axiom Math go?
Carina: I still think sometimes it’s just so interesting to think about why things stopped at each of these players.
Matt: Yeah, that’s an interesting question.
Carina: And we just covered the Google story. The OpenAI story was just like, when Elon was there, with Stan Polu and other folks, they had so many papers. That was frequent within a couple months. Meaning F2F was the high school benchmark they established. That was later sort of really trained on, contaminated by everyone, but it was one of the initial benchmarks pointing at least a direction to go. I think there was GPT-F. They tried to have early version GPT to maybe stand for mathematical functions, I think. And then they just didn’t… I mean, Elon left. They just wasn’t supported.
Facebook was doing it. Facebook was doing it. The Llama team, they spun out to be Mistral, and that was the end of story. I mean, there was Guillaume Lample, there was Fabian. You still see some of the bottoms up Meta researchers, but not really feeling like AI from X was supported. So they came to Axiom. Yeah, great opportunity.
But you just see kind of how short-lived each of these excellent projects are in the big places.
Matt: I think that probably in some of these places the research had to be in service of what ultimately became the commercial pressures. And that perhaps, a few years back, the commercial opportunity to have provers that are the complementary tension to systems that are predictors, was not quite ready for prime time. But now it is.
Carina: But why did they start it? So that’s kind of my…
Matt: Well, I think they knew that it would come.
Carina: I see.
Matt: And it was maybe a timing question.
Carina: Oh, they did it too early?
Matt: They did it too early. That’s my theory. But now we are at this time, where people realize, especially as we go from things like chats that, okay, if it gets a little hallucinated, okay, whatever, you can go find another way to check it, check your work. But in the world of all kinds of commercial use cases, I mean, think about hardware verification, think about software verification, think about a transactional verification, or a security use case, you want some certainty that’s going to counterweight the predictability of the, as we think of them today, LLMs.
Carina: It’s shocking to see how many people are using AXLE to verify things in blockchain.
Matt: Interesting.
Carina: We’re just seeing, wow, all these interesting use cases coming in.
Matt: And a blockchain supposedly is provable. I mean, in and of itself, the system is supposed to be provable.
Carina: I do think it’s interesting, because it’s like the benefit of being in a startup, it declares DNA, hopefully it’s doing well, it’s that you have actually that research security. You’re the core engineering group, you’re pushing on something that is on the critical path, and that critical path does not favor. We’re talking about years of good work, and at a sort of prime stage of this researcher’s career, and just compound from there. After a win, unlike in some other company, there’s a rifting company, you have finally reached that. So your point of too early is so funny. They have outdone themselves.
Matt: They have outdone themselves, and I think-
Carina: Should be forced to pivot.
Matt:… the researchers that I have loved getting to know the most are doing research for impact. And I think at Axiom Math, you are bringing people onto the team that are excited about research for research, sake, and impact. It’s an and equation.
Carina: Yeah.
Matt: So tell us about the Putnam competition from last December and how this came about.
Carina: That’s right. So I think early stage startup, people get excited by a common goal. And for us, that goal, that sort of mission was Putnam. Putnam exam is very hard. So I think if people choose Putnam, that generally mean, one, they have taken that exam, which we found a lot of people whose Math Olympiad trajectory stop at the high school level, and two, they’re not totally hopeless yet. I mean, so a lot of people go to Putnam and more than 50% will get a zero out of 120.
Matt: That’s discouraging, because these are very, very bright people, and to get a zero.
Carina: Yeah, partial credit. Still, more than 50% of about 4,000 participants each year, who represent their respective universities, the Putnam teams, got zero.
Matt: And so many, many people are getting zero out of 120, and there’s only been a few people that have ever gotten a perfect score.
Carina: Out of, I think, 99 years of Putnam, there are five human perfect scores. We got a perfect score, and this is an AI perfect score. And this is just quite, I think, striking, even from our perspective. I personally did not do that well. I got like 39, so not even 40, and got an award for it.
Matt: So how did you prepare for your systems to take on the Putnam test?
Carina: Surprisingly, a pretty hard part was aligning on the school, just because people were like, “Well, this is quite hard.” I mean, you probably hope to start with a high school real time tournament. But timing wise, there just weren’t any. So we started on the day of the international Math Olympia, when we had no money. So the company literally just operated that week. I mean, we had campaign chairs, foldable tables, we didn’t have a line of software written.
Matt: Perfect startup dynamic.
Carina: Yeah. So we didn’t have such a high school tournament to help climb before Putnam.
The other thing was, later we realized that this year’s Putnam, the 120 perfect score is hard for two reasons. First, top performing human of this year is 110. And the best informal LLM, sort of general off-the-shelf models is DeepSeek — 103. So this is the first time a formal AI beats an informal one in the data scar sort of disadvantage kind of starting.
Matt: I think related to this is this whole idea that the pace of innovation is… I’ve never seen it. And being in venture for over 25 years, the pace of change. You’ve been so deeply involved in this area for the last several years. How are you seeing the pace of development in applied AI and where it’s going, and the research?
Carina: So, I think generally progress happen for a couple reasons. Some are random, some are not. The sort of random reasons are like, somehow people start believing buy-in the religion. So I think if you think about how many companies are pitching continual learning after the Ilya circus for the podcast, you can call that like an Ilya surge.
Matt: Yes.
Carina: Continued learning, people have been pushing it since forever. Before the AI people got interested in it, the neuro AI people were quite interested in it as well, just because of a meta learning sort of in the cognitive science literature. Well, interestingly, I think they were studying why it was hard. I did my master at Oxford Neuroscience, and one of my master dissertations actually on continual learning, but from the angle of why it’s hard, from a theoretical perspective, obviously that’s not going to make things easier, better on the practical side. And there are some other not so random reasons, which is, people just realize that hallucination’s getting bad and you need a reasoning model. So I think around the time of 2022 or 2023, I looked into formal theorem proving. My main motivation has never been about hallucination. It just has never been, because I know that’s going to be a controlled problem. Just from a law of physics perspective, humanity’s not going to let this level of BS hallucination happen forever.
Matt: Yeah. It’s almost like the benefit of having systems that can hallucinate, is that in a sense they can be creative, but the risk of a system that hallucinates is that you can’t prove it to be true.
Carina: Back then it wasn’t even creative, but I think it’s like also you can choose to build contractually, kind of like controlling the downside. You can also build expensively. And I’m always the kind of people that’s in favor expanding the universe choice for both myself and others. I kind of jump around and academic disciplines, I do whatever I want. I don’t question why I’m suited to do it. And in that sense, what my interest in formal theorem proving has always been expensively. It’s like, what if we can understand the human brain? And what if we can understand the universe?
Matt: And as you pointed out, I mean, this field, if we’ll call it artificial intelligence, actually, it’s almost the exact 70th year anniversary of that. There was a gathering at my alma mater, Dartmouth College, in the summer of 1956.
Carina: Wow.
Matt: And these different pathways are now in some ways in contention with each other, but I really believe, and I think you believe, that they’re each pushing each other at a pace that we’ve never seen before. And that leads to lots of debates: reasoning systems, continuous learning, where do we get most of the benefit? There’s been a mindset of, well, pre-training was dead, that it was all over. It was all going to be in post-training, and then it was all going to be continuous. And now we’re kind of coming back around and saying, “Wait a minute, some of the newest models.”
Carina: I had that line in our initial seed round pitch deck.
Matt: I think you did. I think you did.
Carina: I think something like, “Scaling is dead, but long live scaling.”
Matt: Yes. I like that framing.
Carina: I think that the point was, it wasn’t scaling instead, is that we want to scale promo data. And it was fascinating.
Matt: And that was a year ago.
Carina: Yeah.
Matt: Right?
Carina: That was so bad. That’s my point. I want to put it in today.
Matt: Well, it’s a fun way to catch people’s attention.
Carina: That’s right. It’s like clickbait.
Matt: So people are curious, I’m sure, how you’re thinking about, okay, so we’re building these really interesting models, we’re succeeding at the Putnam competition. What are the commercial use cases you’re exploring? What might a verification system be able to do?
Carina: I think a very specific call to action is to call everyone who wants to reimagine the EDA space to just come and join us. That’s one. I would say that we have learned a lot in the hardware verification space thanks to a lot of subject matter experts, both in terms of potential customers and in terms of people who have been working in the field, and wanted to be done differently. I think that a lot of the SMT-based formal checking tools are widely used. They are widely used. But they cannot make it autonomous in a way where you need humans to do some handholding, adjusting the counter value to a smaller amount where it does not suffer from the complexity explosion. So we have learned a lot in the hardware verification space.
I think the dream really is to… I mean, if you think about verification cycles being three to four times design cycle in time, and three to four times number of people in headcount, then designing, then surely there’s something that needs to be improved. And then a big sort of question here is, are these existing formal checking tools good enough for the more elastic demand future? And what we have found is that Lean through Improver could verify circuits that a formal checker could not.
Matt: Wow, that’s a big learning.
Carina: Yeah. Not an entire gigantic chip yet, I want to clarify this, but it is a small local part of the circuit. That is still too large for those formal checkers. So this is some sort of interesting margin or gain that we are observing.
And another thing I would say that’s also interesting and counterintuitive is the proof of that case is hundreds of lines of link code. When most of our platinum problems are thousands of lines of link code, and in some of our research problems, tens of thousands of lines of link code.
So, I think it is possible that we could have not one Putnam and be able to serve this customer. So that is interesting, right? It is the sort of center capability overhang that we observe, and just across AI. But it doesn’t mean we shouldn’t win Putnam, because it’s frankly quite fun. And I think that there could be a lot of reasons why you want to push on the math self-improving capability side, because of software verification. So, what we have found is I think we have one of the strongest math provers in the world. And that also corroborates with we saturated the code variant benchmark, a Lean software verification benchmark. I think DeepSeek Prover was at 11, 12%. Goedel-Prover’s at 11, 12%. We are 98.93%.
Matt: That’s incredible. Yeah. And I think this is part of what you’re beginning to learn is that being able to build verification systems, prover systems, can be generalizable at some level. And not only generalizable, but then order magnitude better.
Carina: Yes. And I could argue something that’s interesting, which is, if you think about, “Hey, I just don’t have an affinity for math,” supposedly. “I just don’t care about math.” This is a very sort of extreme brainstorming exercise. And I want to train a model that can do all these things. So, I will need to collect some hardware data. I will then need to collect some software data. And all these data have commercial value attached to it, versus math is kind of free. So, if you train on math data and it transfers quite efficiently to these domains, then this is sort of generalizable, not universally, because you can’t make a good writer or poet out of it, but it’s generalizable in the best way in that you are taking a short, efficient path to tackle some of the most valuable domains where you might not have a data mode if you just pursue those domains.
Matt: I think that makes sense.
So, let’s come back a little bit then to this question of what does this mean for the definition of artificial general intelligence/artificial superintelligence? How do we put all these things we’ve been talking about in the context of math provers systems into that broader world?
Carina: So, think about the word GenAI applications. And so there’s this internet meme of, they’re like a huge building, a lot of bricks, all this stuff. And then there are only two pillars, literally just two bricks here and there. And there’s all this stuff on top of it. And these two pillars, in my view right now, the infrastructure layer, are LLM and predictive analytics. Nothing else. The whole world of genAI is built on probabilistic LLM, which we don’t understand, and predictive analytics.
So it’s just from a symmetry standpoint, Matt, just from a symmetry standpoint. If you have such a stochastic system, it makes sense for you to have some deterministic component. It’s just from a symmetric. So then we want to basically be, we think of ourselves as this sort of edition, an inevitable edition in all possible futures in the AI in front layer.
Matt: Yeah. You need determinism in a world of non-determinism.
Carina: Yeah. I’ll add something that is another… I like symmetry, I think it’s a beautiful concept. In group theory, you have symmetry everywhere. Another symmetry here is, isn’t it beautiful that you can express math in code, and you can verify code with math?
Matt: I love that.
Carina: I think that is just beautiful, because it’s like, in a way, there is a very clear path in front of us to unify the two pillars of the digital world. I kind of agree by so many experts. Eric Schmidt talk about it all day. Two pillars of the digital world, math and programming, math and programming. Programming gives you output; math gives you property. And output is meaningless if you don’t know what property is satisfied. My age is a number. That property, it is my age. I mean, in that sense, you have logic, and you have computation, and they’re unified, I think, in what we’re doing, and what we’re set out to do.
So I think that’s something that is oriented to today’s AI landscape, but there is a future. There’s a future one.
Matt:
Which maybe brings me to one last question, which is, in less than a year, you’ve already built this incredible team with complementary coding, and math, and infrastructure, and a whole set of capabilities. How have you been able to do that? And what are you learning from being a founder, and company builder, and team builder?
Carina: There are a few theories I have. One is, it could be that just formal math was so data poor. It’s like this plant is almost dead, but then you just water it, and it has beautiful flowers coming out of it. That’s one theory, right? Which means very good sample efficiency. That’s kind of the technical idea, is that if you have verified output and pairs, then grounding the natural language, have formal logic ground the natural language, you’re going to do really, really, really well.
In terms of the other sort of team building aspect, I think sometimes, and this is quite encouraging, when I was talking to folks like Francois Charton, for example, and Ken Ono, I felt like it was like there is a shared dream, and there is this shared yearning. Some people don’t state it; some people state it. That is that we all yearn for a superhuman mathematician. And this is very interesting because if you think about in the context of job replacement, say if you have a super human AI lawyer, that means some lawyer is going to be out of job, and some lawyers might be averse to that.
Math is a field where there is a common goal that people are aligned around, and that is the pursuit of knowledge. And so it’s like, if that is a common goal, there isn’t a lot of ego around who’s going to make it happen. And the feeling of being able to clone yourself, you’re a great mathematician, and extend yourself, and clone someone who has passed away decades and centuries ago, and talk to that, these are, I think, beautiful concepts that, somehow, because of technical reasons we have talked about, are coming into life. And what are we seeing every day? And this is maybe why people feel motivated to work even harder. We’re seeing every day is like, we will get an email about a math problem from, say, a professor in Germany that is in Ken’s network at 10:00 AM. By 2:00 PM, four hours later, Axiom Prover generated a whole link proof without human intervention, autonomously checked, and sent it back to that professor. Here is a proof for your unsolved math problem. And that is daily mathematical Pokémon hunting in a funny way, but there’s something deeper, and we’re still slow to react to that fact.
Matt:cYou’re learning your way through that.
Carina: We’re so to react to that fact that there is hardware and software verification. I think there is endless market that is in the category of mathematical superintelligence.
Matt: Well, that shared mission about mathematical superintelligence, the incredible team you’re building, and this ongoing curiosity to discover what the use cases are for that to impact the world, is what’s inspired us to be partners and colleagues with you.
Carina: I don’t know if the last time I saw you, I told you this news, but just their mathematical results, that after I tell people, they’re like, “I don’t really know how to technically prove it, but it sounds beautiful.” And here’s one. It’s something recently autonomously proved by Axiom Prover. So, the Ramanujan tau function is tau, and it is a function that appears frequently in Ramanujan’s lost notebook. It’s one of the fundamental functions in number theory. Axiom Prover, the AI, autonomously proved that the Ramanujan tau function misses 100% of the primes.
Matt: Wow.
Carina: And this is assuming the ABC conjecture, of course, which is something like the Ramanujan hypothesis; a lot of people, or generalized people, just agree or assume all, but this is something that we don’t know. And so that is a beautiful result. I mean, just when we shared it, people felt like that is… I like to be this math evangelist, because I just like talking about it. Relatedly, there are mathematicians like Ramanujan in this case and Maryam Mirzakhani. They left behind bodies of work in terms of Mirzakhani, there is the dynamical systems. And recently, there is a paper that we published, in which Michigan Professor Alex Wright posed the problem. And in that case, we were able to also formalize the solution. So, verify it is correct, and actually optimize the proof and correct the proof mistakes. So, comes in different forms, some autonomous proving, some auto formalization, but we are at a point where we have, I think, seven papers on archived now, where the main engine is done by Axiom Prover. I think five for proving, and two for other formalizations.
Matt: Well, I’m excited about this idea of the beauty of math, and the applied beauty of math that Axiom, you, Carina, and your team are going to help us discover in the years ahead.
Carina: Thanks, Matt.
Matt: So thanks so much for being here with me today.
Carina: It’s really great.
Matt: I really enjoyed the conversation.
Carina: Thank you.