To the “LLMs just interpolate their training data” crowd:
Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.
I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.
stego-tech 8 minutes ago [-]
As others have pointed out, both can be true:
* LLMs do just interpolate their training data, BUT-
* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data
In the case of mathematics, LLMs are essentially just brute-forcing the glorified calculators they run on with pseudo-random data regurgitated along probabilities; in that regard, mathematics is a perfect field for them to be wielded against in solving problems!
As for organic chemistry, or biology, or any of the numerous fields where brand new discoveries continue happening and where mathematics alone does not guarantee predicted results (again, because we do not know what we do not know), LLMs are far less useful for new discoveries so much as eliminating potential combinations of existing data or surfacing overlooked ones for study. These aren't "new" discoveries so much as data humans missed for one reason or another - quack scientists, buried papers, or just sheer data volume overwhelming a limited populace of expertise.
For further evidence that math alone (and thus LLMs) don't produce guaranteed results for an experiment, go talk to physicists. They've been mathematically proving stuff for decades that they cannot demonstrably and repeatedly prove physically, and it's a real problem for continued advancement of the field.
midtake 20 minutes ago [-]
You have a good point about the human rate of mathematical discovery, but Ayer was an idiot and later Witt contradicted early Witt. For the "already implicit" claim to be true, mathematics would have to be a closed system. But it has already been proven that it is not. You can use math to escape math, hence the need for Zermelo-Frankel and a bunch of other axiomatic pins. The truth is that we don't really understand the full vastness of what would objectively be "math" and that it is possible that our perceived math is terribly wrong and a subset of a greater math. Whether that greater math has the same seemingly closed system properties is not something that can be known.
bwfan123 11 minutes ago [-]
> Whether that greater math has the same seemingly closed system properties is not something that can be known
negative numbers were invented to solve equations which only used naturals.
irrationals were invented to solve equations which could be expressed with rationals. complex numbers were invented to represent solutions to polynomials. so on and so forth. At each point new ideas are invented to complete some un-answerable questions. There is a long history of this. Any closed system has unanswerable question within itself is a paraphrasing of goedel's incompleteness theorem.
pseudocomposer 1 hours ago [-]
I'd hope most functional adults understand that the Fields Medal and basically every other annual "prize" out there is awarded to both "recombinant" innovations and "new-dimensional thinking" innovations. Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.
I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.
bbor 50 minutes ago [-]
To keep my usual rant short: I think you’re assuming a categorical distinction between those two types of innovations that just doesn’t exist. Calculus certainly required some fundamental paradigm shifts, but there’s a reason that they didn’t have to make up many words wholesale to explain it!
Also we shouldn’t be thinking about what LLMs are good at, but rather what any computer ever might be good at. LLMs are already only one (essential!) part of the system that produced this result, and we’ve only had them for 3 years.
Also also this is a tiny nitpick but: the fields medal is every 4 years, AFAIR. For that exact reason, probably!
pegasus 8 minutes ago [-]
The fundamental paradigm shift is the categorical distinction. And what would constitute many new words for you? It introduced a bunch of concepts and terms which we take for granted today, including "derivative", "integral", "infinitesimal", "limit" and even "function", the latter two not a new words, but what does it matter? – the associated meanings were new.
symfrog 42 minutes ago [-]
We have had LLMs for much longer than 3 years.
Nevermark 14 minutes ago [-]
I took humans thousands of years, then hundreds of years, to come to terms with very basic concepts about numbers.
Its amazing to me when people talk about recombining things, or following up on things as somehow lesser work.
People can't separate the perspective they were given when they learned the concepts, that those who developed the concepts didn't have because they didn't exist.
Simple things are hard, or everything simple would have been done hundreds of years ago, and that is certainly not the case. Seeing something others have not noticed is very hard, when we don't have the concepts that the "invisible" things right in front of us will teach us.
danielmarkbruce 19 minutes ago [-]
No, we haven't, for any reasonable definition of L.
oncallthrow 35 minutes ago [-]
[flagged]
kelseyfrog 31 minutes ago [-]
> I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.)
The experiment is feasible. If it were performed and produced a positive result, what would it imply/change about how you see LLMs?
sumeno 25 minutes ago [-]
How are you going to train a frontier level llm with no references to post 1700 mathematics?
kelseyfrog 11 minutes ago [-]
Time cutoff LLMs are regularly posted to HN. It takes just one success to prove feasibility.
Besides, we can forecast our thoughts and actions to imagined scenarios unconditioned on their possibility. Something doesn't have to be possible for us to imagine our reactions.
bjt 20 minutes ago [-]
"frontier level" is doing a lot of work there, but the idea would be to only feed it earlier sources.
The problem is the amount of data with that cutoff is really minuscule to produce anything powerful. You might be able to generate a lot of 1700s sounding data, you’d have to be careful not to introduce newer concepts or ways of thinking in that synthetic data though. A lot of modern texts talk about rates of change and the like in ways that are probably influenced by preexisting knowledge of calculus.
dvt 2 hours ago [-]
> I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibniz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.
For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
pulkitsh1234 1 hours ago [-]
Creation is done by humans who have been trained on the data of their life experiences. Nothing new is being created, just changing forms.
A scientist has to extract the "Creation" from an abstract dimension using the tools of "human knowledge". The creativity is often selecting the best set of tools or recombining tools to access the platonic space. For instance a "telescope" is not a new creation, it is recombination of something which already existed: lenses.
How can we truly create something ? Everything is built upon something.
You could argue that even "numbers" are a creation, but are they ? Aren't they just a tool to access an abstract concept of counting ? ... Symbols.. abstractions.
Another angle to look at it, even in dreams do we really create something new ? or we dream about "things" (i.e. data) we have ingested in our waking life. Someone could argue that dream truly create something as the exact set of events never happened anywhere in the real world... but we all know that dreams are derived.. derived from brain chemistry, experiences and so on. We may not have the reduction of how each and every thing works.
Just like energy is conserved, IMO everything we call as "created" is just a changed form of "something". I fully believe LLMs (and humans) both can create tools to change the forms. Nothing new is being "created", just convenient tools which abstract upon some nature of reality.
bwfan123 22 minutes ago [-]
> Aren't they just a tool to access an abstract concept of counting ?
Humans and animals have intuitive notions of space and motion since they can obviously move. But, symbolizing such intuitions into forms and communicating that via language is the creative act. Birds can fly, but can they symbolize that intuitive intelligence to create a theory of flight and then use that to build a plane ?
ulbu 57 minutes ago [-]
that’s why we say that with such discoveries we receive a new way – of looking, of doing, of thinking… these new paths preexist in the abstract, but they can be taken only when they’ve been opened. and that is as good as anything “new” gets.
(and such discoveries are often also inventions, for to open them, a ruse is needed to be applied in a specific way for the way to open).
wslh 8 minutes ago [-]
[dead]
kenjackson 1 hours ago [-]
"new kind of math"
Well I think the point is there is no "new kind of math". There's just types of math we've discovered and what we haven't. No new math is created, just found.
grey-area 1 hours ago [-]
The map is not the territory.
cthalupa 8 minutes ago [-]
I don't know what you're even trying to argue here.
We're not comparing math to reality (though there's a strong argument to be made that reality has a structure that is mathematical in nature - structural realism didn't die a scientific philosophy just because someone came up with a pithy saying), we're talking about if math is discovered or invented.
Most mathematicians would argue both - math is a language, we have created operations, axioms are proposed based on human creativity, etc., but the actual laws, patterns, etc. are discovered. Pi is going to be pi no matter if you're a human or someone else - we might represent it differently with some other number system or whatever, but that's a matter of representation, not mathematical truth.
bbor 45 minutes ago [-]
Does that correction matter, tho…? Discovered or created, it would be new to us, and is clearly not easy to reach!
Someone 44 minutes ago [-]
I think “new math” is ‘just’ humans creating new terminology that helps keep proofs short (similar to how programmers write functions to keep the logic of the main program understandable), and I agree that is something LLMs are bad at.
However, if that idea about new math is correct, we, in theory, don’t need new math to (dis)prove the Riemann hypotheses (assuming it is provable or disprovable in the current system).
In practice we may still need new math because a proof of the Riemann hypotheses using our current arsenal of mathematical ‘objects’ may be enormously large, making it hard to find.
bbor 46 minutes ago [-]
math more like an art than a science.
That’s a fun turn of phrase, but hopefully we can all agree that math without scientific rigor is no math at all.
we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
Do you think it’s possible/likely that any AI system could? I encourage us to join Yudkowsky in anticipating the knock-on results of this exponential improvement that we’re living through, rather than just expecting chatbots that hallucinate a bit less.
In concrete terms: could a thousand LLMs-driven agents running on supercomputers—500 of which are dedicated to building software for the other 500-come up with new math?
Tenobrus 2 hours ago [-]
what basis do you have for assuming an LLM is fundamentally incapable of doing this?
truncate 2 hours ago [-]
What's your basis for assuming LLM is capable of doing this?
I honestly don't know personally either way. Based on my limited understanding of how LLMs work, I don't see them be making the next great song or next great book and based on that reasoning I'm betting that it probably wont be able to do whatever next "Descartes, Newton, Leibnitz, Gauss, Euler, Ramanujan, Galois" are going to do.
Of course AI as a wider field comes up with something more powerful than LLM that would be different.
EMM_386 34 minutes ago [-]
"I don't see them be making the next great song"
Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
Also - music is a subjective. Mathematics isn't.
And in this case, an LLM discovered a new way to reason about a conjecture. I don't know how much proof is needed - since that is literally proof that it can be done.
truncate 3 minutes ago [-]
>> Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
There is quite some questions around that. Music is subjective and obviously different people have different taste, but I wouldn't call any of them to be actual good music / real hits.
>> LLM discovered a new way to reason about a conjecture
I wasn't questioning LLMs ability to prove things. Parent threads were talking about building new kind of maths , or approaching it in a creative/artistic way. Thats' what I was referring to.
I can't speak for maths of hard science as I'm not trained in that, but the creativity aspect in code is definitely lacking when it comes to LLMs. May not matter down the line.
dist-epoch 41 minutes ago [-]
LLMs are already making the next great songs. Just check out the Billboard charts.
truncate 50 seconds ago [-]
I'm sorry, I don't consider them "great songs". Obviously, different people have different taste.
redsocksfan45 2 hours ago [-]
[dead]
blueone 2 hours ago [-]
> what basis do you have for assuming an LLM is fundamentally incapable of doing this?
because I have no basis for assuming an LLM is fundamentally capable of doing this.
sswatson 1 hours ago [-]
Good on you for spelling out this reasoning, but it is manifestly unsound. For a wide variety of values of X, people a few years ago had no reason to expect that LLMs would be capable of X. Yet here we are.
TheOtherHobbes 1 hours ago [-]
In 1989, Gary Kasparov said that it was "ridiculous!" to suggest a computer would ever beat him at chess.
"Never shall I be beaten by a machine!”
In 1997 he lost to Deep Blue.
FartyMcFarter 1 hours ago [-]
Yeah, and back then people moved the goal posts too, saying Deep Blue was just "brute-forcing" chess (which isn't even true since it's not a pure minimax search).
bananaflag 41 seconds ago [-]
Deep Blue was brute forcing chess in the sense that AlphaGo wasn't brute forcing Go.
zardo 54 minutes ago [-]
This is something that could be demonstrated rather than just argued.
Train an LLM only on texts dated prior to Newton and see if it can create calculus, derrive the equations of motion, etc.
If you ask it about the nature of light and it directs you to do experiments with a prism I'd say we're really getting somewhere.
pickleRick243 1 hours ago [-]
Except this has been said since the 2010's and has been proven wrong again and again. Clearly the theory that LLM's can't "extrapolate" is woefully incomplete at best (and most likely simply incorrect). Before the rise of ChatGPT, the onus was on the labs to show it was plausible. At this point, I think the more epistemologically honest position is to put the burden back on the naysayers. At the least, they need to admit they were wrong and give a satisfactory explanation why their conceptual model was unable to account for the tremendous success of LLM's and why their model is still correct going forward. Realistically, progress on the "anti-LLM" side requires a more nuanced conceptual model to be developed carefully outlining and demonstrating the fundamental deficiencies of LLMs (not just deficiencies in current LLMs, but a theory of why further advancements can't solve the deficiencies).
Incidentally, similar conversations were had about ML writ large vs. classical statistics/methods, and now they've more or less completely died down since it's clear who won (I'm not saying classical methods are useless, but rather that it's obvious the naysayers were wrong). I anticipate the same trajectory here. The main difference is that because of the nature of the domain, everyone has an opinion on LLM's while the ML vs. statistics battle was mostly confined within technical/academic spaces.
voooduuuuu 2 hours ago [-]
Ask an LLM to invent a new word and post it here. You will see that it simply combines words already in the training data.
Nevermark 9 minutes ago [-]
You must be joking? Unless by combining words you mean digging deep into Latin and Greek etymology, finding something pithy and linguistically associative.
I can assure you, the percentage of people who can do what they do when it comes to crafting terms, and related sets of terms, for nuanced and novel ideas is very very small.
It happens this is something I do nearly every day.
Models respond to the level of dialogue you have with them. Engage with an informed perspective on terminological issues and they respond with deep perspectives.
I am routinely baffled at the things people say models can't do, that they do effortlessly. Interaction and having some skill to contribute helps here.
satvikpendem 1 hours ago [-]
Funny that the replies are dead. It's true that generally we shouldn't have AI output on HN but this case is an exception as we are explicitly asking for it, so it's interesting that people still flag the replies.
baq 1 hours ago [-]
Mathematics can be mostly boiled down to a few sentences with very lengthy possible combinations, so yeah that is not a problem
konart 1 hours ago [-]
So LLM is german?
2 hours ago [-]
Garlef 1 hours ago [-]
What does "new word" even mean?
robmccoll 1 hours ago [-]
[flagged]
dpoloncsak 2 hours ago [-]
[flagged]
dmos62 2 hours ago [-]
[flagged]
SparkyMcUnicorn 1 hours ago [-]
[flagged]
dvt 2 hours ago [-]
Because by definition LLMs are permutation machines, not creativity machines. (My premise, which you may disagree with, is that creativity/imagination/artistry is not merely permutation.)
fnordpiglet 2 hours ago [-]
I prefer to think of it as they’re interpolation machines not extrapolation machines. They can project within the space they’re trained in, and what they produce may not be in their training corpus, but it must be implied by it. I don’t know if this is sufficient to make them too weak to create original “ideas” of this sort, but I think it is sufficient to make them incapable of original thought vs a very complex to evaluate expected thought.
lukol 2 hours ago [-]
This "new math" might be a recombination of things that we already know - or an obvious pattern that emerges if you take a look at things from a far enough distance - or something that can be brute-forced into existence. All things LLMs are perfectly capable of.
In the end, creativity has always been a combination of chance and the application of known patterns in new contexts.
dvt 2 hours ago [-]
> This "new math" might be a recombination of things that we already know
If you know anything about the invention of new math (analytic geometry, Calculus, etc.), you'd know how untrue this is. In fact, Calculus was extremely hand-wavy and without rigorous underpinnings until the mid 1800s. Again: more art than science.
jfyi 1 hours ago [-]
Newton and Leibniz were "hand-waving"?
If anything, they were fighting an uphill battle against the perception of hand-waving by their contemporaries.
dvt 56 minutes ago [-]
> Newton and Leibniz were "hand-waving"?
Yes, and it's pretty common knowledge that Calculus was (finally) formalized by Weierstrass in the early 19th century, having spent almost two centuries in mathematical limbo. Calculus was intuitive, solved a great class of problems, but its roots were very much (ironically) vibes-based.
This isn't unique to Newton or Leibniz, Euler did all kinds of "illegal" things (like playing with divergent series, treating differentials as actual quantities, etc.) which worked out and solved problems, but were also not formalized until much later.
jfyi 47 minutes ago [-]
I think that I just take issue with the term "hand-waving" as equated to intuition. Yeah it lacked formal rigor, but they had a solid model that applied in detail to the real world. That doesn't come from just saying, "oh well, it'll work itself out". I guess if you want to call that "hand-wavy" we'll just have to disagree.
baq 1 hours ago [-]
And yet nowadays you can restate all of it using just combinations of sets of sets and some logic operators.
nh23423fefe 2 hours ago [-]
god of the gaps
iwontberude 18 minutes ago [-]
non overlapping magisteria
satvikpendem 1 hours ago [-]
What is creativity if not permutation? A brain has some model of the world and recombines concepts to create new concepts.
d3ffa 50 minutes ago [-]
you have clearly never innovated in your life. so why post this nonsense?
rowanG077 29 minutes ago [-]
This is really not an acceptable reply. How about actually engaging with the point the commenter made instead of stamping your foot and throwing a tantrum.
KoolKat23 2 hours ago [-]
It pretty much is, otherwise it is randomness or entropy.
lajamerr 2 hours ago [-]
LLMs by themselves are not able to but you are missing a piece here.
LLMs are prompted by humans and the right query may make it think/behave in a way to create a novel solution.
Then there's a third factor now with Agentic AI system loops with LLMs. Where it can research, try, experiment in its own loop that's tied to the real world for feedback.
Agentic + LLM + Initial Human Prompter by definition can have it experiment outside of its domain of expertise.
So that's extending the "LLM can't create novel ideas" but I don't think anyone can disagree the three elements above are enough ingredients for an AI to come up with novel ideas.
awesome_dude 1 hours ago [-]
You're proving the GP's argument - LLMs aren't creative you say as much, it's the driving that is the creative force
charlie90 1 minutes ago [-]
I believe when we have AI Agents "living" 24/7, they will become creative machines. They will test ideas out their own ideas experimentally, come across things accidentally, synthesize new ideas.
We just haven't let AI run wild yet. But its coming.
lajamerr 1 hours ago [-]
You can tell an agentic system. "Go and find a novel area of math that has unresolved answers and solve it mathematically with verified properties in LEAN. Verify before you start working on a problem that no one has solved this area of math"
That's not creative prompt. That's a driving prompt to get it to start its engine.
You could do that nowadays and while it may spend $1,000 to $100,000 worth of tokens. It will create something humans haven't done before as long as you set it up with all its tool calls/permissions.
Barbing 1 hours ago [-]
If that’s a requirement, aren’t LLMs driven by pretraining which was human driven?
Who decides at which the last point it’s OK to provide text to the model in order to be able to describe it as creative? (non-rhetorical)
hammock 1 hours ago [-]
Recombining existing material is exactly right, and in this case LLMs were uniquely positioned to make the connection quicker than any group of humans.
The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.
Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.
Apocryphon 1 hours ago [-]
Monstrous Moonshine?
throw-the-towel 2 hours ago [-]
See the longstanding debate on whether new math is "invented" or "discovered". Most mathematicians I knew thought it's discovered.
amelius 1 hours ago [-]
This is like saying a sculpture always existed, the sculptor just had to remove the superfluous material.
Or like a musical octave has only 12 semitones, so all music is just a selection from a finite set that already existed.
Sure the insane computation we're throwing at this changes our perspective, but still there is an important distinction.
paulddraper 32 minutes ago [-]
The difference is that math answers (can answer) specific questions.
Like, "does the Riemann zeta function have zeroes that don't have real part 1/2," or "is there a better solution to the Erdős Unit Distance Problem."
The selection of question is matter of taste, but once selected, there is a definitive precise answer.
skybrian 2 hours ago [-]
Any design already exists as a possibility, so it could be said to be both invented and discovered, depending on how you look at it.
cubefox 2 hours ago [-]
All inventions are discoveries, though not all discoveries are inventions.
FrustratedMonky 1 hours ago [-]
Depending on your point of view? I see what you did there.
Who knew Obi-one was just smoking and pontificating on Wittgenstein.
ASalazarMX 1 hours ago [-]
Math is an abstraction of reality, it had to be invented, so more inventions or discoveries could be made within it.
pigpop 1 hours ago [-]
What is an abstraction? It is something that arises from human thought and human thought arises from the activity of neurons which are a part of reality. You can't escape reality unless you invoke some form of dualism.
2ddaa 47 minutes ago [-]
abstractions are objects that come into existence via design and iteration to refine its form. This right here is invention not discovery.
baq 1 hours ago [-]
The test goes like ‘is our universe, or any other universe, required for the axioms to exist’ and I don’t see how ‘yes’ is a defensible answer.
protoplancton 2 hours ago [-]
One can argue that mathematical facts are discovered, but the tools that allow us to find, express them and prove them, are mostly invented. This goes up to the axioms, that we can deliberately choose and craft.
atmosx 2 hours ago [-]
...long standing indeed. It can be traced back to Plato's works.
lioeters 1 hours ago [-]
"The European philosophical tradition consists of a series of footnotes to Plato."
soupspaces 2 hours ago [-]
Regardless of which, both Newton and Leibniz imprint in their findings a 'voice' and understanding different from each other and that of an LLM (for now?)
sillysaurusx 1 hours ago [-]
It’s easy to see that LLMs don’t merely recombine their training data. Claude can program in Arc, a mostly dead language. It can also make use of new language constructs. So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.
baq 1 hours ago [-]
LLMs ingest and output tokens, but they don’t compute with them. They have internal representations of concepts, so they have some capability to work with things which they didn’t see but can map onto what they know. The surprise and the whole revolution we’re going through is that it works so well.
wren6991 43 minutes ago [-]
> they don’t compute with them
Isn't this exactly what chain-of-thought does? It's doing computation by emitting tokens forward into its context, so it can represent states wider than its residuals and so it can evaluate functions not expressed by one forward pass through the weights. It just happens to look like a person thinking out loud because those were the most useful patterns from the training data.
austinl 1 hours ago [-]
I'm not sure how feasible this is, but I love the thought experiment of limiting a training set to a certain time period, then seeing how much hinting it takes for the model to discover things we already know.
E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.
I feel this is the case whenever I "problem solve". I'm not really being creative, I'm pruning a graph of a conceptual space that already exists. The more possibilities I see, the easier it is to run more towards an optimal route between the nodes, but I didn't "create" those nodes or edges, they are just causal inevitabilities.
HDThoreaun 45 minutes ago [-]
I dont know this sort of just seems like youre really stretching the meaning of "creative". The conceptual space of the graph already exists, but the act of discovering it or whatever you want to call that is itself creative. Unless youre following a pre-defined algorithm(certainly sometimes, arguably always I suppose) seeing the possibilities has to involve some creativity.
nomel 22 minutes ago [-]
> seeing the possibilities has to involve some creativity.
I would claim the graph exists, and seeing it is more of an knowledge problem. Creativity, to me, is the ability to reject existing edges and add nodes to the graph AND mentally test them to some sufficient confidence that a practical attempt will probably work (this is what differentiates it from random guessing).
But, as you become more of an expert on certain problem space (graph), that happens less frequently, and everything trends towards "obvious", or the "creative jumps" are super slight, with a node obviously already there. If you extended that to the max, an oracle can't be creative.
Maybe I just need sparser graphs to play in. :)
libraryofbabel 1 hours ago [-]
This is a good point, and there’s some deep philosophical questions there about the extent to which mathematics is invented or discovered. I personally hedge: it’s a bit of both.
That said. I think it’s worth saying that “LLMs just interpolate their training data” is usually framed as a rhetorical statement motivated by emotion and the speaker’s hostility to LLMs. What they usually mean is some stronger version, which is “LLMs are just stochastically spouting stuff from their training data without having any internal model of concepts or meaning or logic.” I think that idea was already refuted by LLMs getting quite good at mathematics about a year ago (Gold on the IMO), combined with the mechanistic interpretatabilty research that was actually able to point to small sections of the network that model higher concepts, counting, etc. LLMs actually proving and disproving novel mathematical results is just the final nail in the coffin. At this point I’m not even sure how to engage with people who still deny all this. The debate has moved on and it’s not even interesting anymore.
So yes, I agree with you, and I’m even happy to say that what I say and do in life myself is in some broad sense and interpolation of the sum of my experiences and my genetic legacy. What else would it be? Creativity is maybe just fortunate remixing of existing ideas and experiences and skills with a bit of randomness and good luck thrown in (“Great artists steal”, and all that.) But that’s not usually what people mean when they say similar-sounding things about LLMs.
zerr 1 hours ago [-]
There is a creational aspect in math - definitions and rules are created.
sigbottle 56 minutes ago [-]
And this is one of the many issues with invoking the logical positivists here...
I'm not even sure why they were invoked. Even disregarding the big techinical debunks such as two dogmas, sociologically and even by talking to real mathematicians (see Lakatos, historically, but this is true anecdotally too), it's (ironically) a complete non-question to wonder about mathematics in a logical positivist way.
block_dagger 1 hours ago [-]
This is the second reference to Wittgenstein I’ve seen today in totally different contexts. Reminded me how much I vibe with his Tractatus.
adam_arthur 2 hours ago [-]
Pretty much everything that appears novel in life is derivative of other works or concepts.
You can watch a rock roll down a hill and derive the concept for the wheel.
Seems pretty self evident to me
smaudet 48 minutes ago [-]
If anything, this is more illustration of how llms are not useful to us...
They will do their own thing, don't need us. In fact, we will be in the way...
We can choose to study them and their output, but they don't make us better mathematicians...
justinnk 35 minutes ago [-]
I see where you are coming from.
However, in the role of personal teachers they may allow especially our young generations to reach a deeper understanding of maths (and also other topics) much quicker than before. If everyone can have a personal explanation machine to very efficiently satisfy their thirst for knowledge this may well lead to more good mathematicians.
Of course this heavily depends on whether we can get LLMs‘ outputs to be accurate enough.
cyanydeez 40 minutes ago [-]
I think someone should be talking to Godel.
BoredPositron 40 minutes ago [-]
Post hoc ergo propter hoc
awesome_dude 1 hours ago [-]
There was a project long long ago where every piece of knowledge known was cross pollinated with every other piece of knowledge, creating a new and unique piece of knowledge, and it was intended to use that machine to invalidate the patent process - obviously everything had therefore been invented.
But that's not how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.
THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.
Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.
LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.
edit: I am going to go further
We have only in the last couple of hundred years realised how to see things that are smaller than what our eye's can naturally see - we've used "glass" to see bacteria, and spores, and we've realised that we can use electrons to see even smaller
We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms
That much is derived from previous knowledge
What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things
paulddraper 2 hours ago [-]
"LLMs just interpolate their training data"
Cracks me up.
What exactly do we think that human brains do?
3 minutes ago [-]
omnimus 1 hours ago [-]
That has been the question since the beginning of humans.
Maybe computers can help understand better because by now it's pretty clear brains aren't just LLMs.
baq 1 hours ago [-]
The optimists believe brains are very special and we’re far from replicating what they do in silicon.
The pessimists just see a 20W meat computer.
59 minutes ago [-]
ActorNightly 39 minutes ago [-]
I love this comment because it so clearly highlights the difference between intelligence and reasoning.
A lot of people across all fields seem to operate in a mode of information lookup as intelligence. They have the memory of solving particular problems, and when faced with a new problem, they basically do a "nearest search" in their brain to find the most similar problem, and apply the same principles to it.
While that works for a large number of tasks this intelligence is not the same as reasoning.
Reasoning is the ability to discover new information that you haven't seen before (i.e growing a new branch on the knowledge tree instead of interpolating).
Think of it like filling a space on the floor of arbitrary shape with smaller arbitrary shapes, trying to fill as much space as possible.
With interpolation, your smaller shapes are medium size, each with a non rectangular shape. You may have a large library of them, but in the end, there are just certain floor spaces that you won't be able to fill fully.
Reasoning on the flip side is having access to very fine shape, and knowing the procedure of how to stack shapes depending on what shapes are next to it and whether you are on a boundary of the floor space or not. Using these rules, you can fill pretty much any floor space fully.
gpugreg 1 hours ago [-]
Maybe the human brain also does other things besides interpolation?
paulddraper 41 minutes ago [-]
There is pre-training, and then empirical observations.
Yes?
2 hours ago [-]
voooduuuuu 2 hours ago [-]
I think you are conflating composition and prediction. LLMs don't compose higher abstractions from the "axioms, symbols and rules", they simply predict the next token, like a really large spinning wheel.
peterlk 2 hours ago [-]
Yes they do…? Who cares if they just predict the next token? The outcome is that they can invent new abstractions. You could claim that the invention of this new idea is a combination of an LLM and a harness, but that combination can solve logic puzzles and invent abstractions. If a really large spinning wheel could invent proofs that were previously unsolved, that would be a wildly amazing spinning wheel. I view LLMs similarly. It is just fancy autocomplete, but look what we can do with it!
Said differently, what is prediction but composition projected forward through time/ideas?
voooduuuuu 2 hours ago [-]
Ask an LLM to invent a new word and post it here, I will be waiting. You will see that it simply combines words already in the training data.
romanhn 1 hours ago [-]
I'm not sure what the point of this exercise is. My prompt to ChatGPT: "Create a new English word with a reasonably sounding definition. That word must not come up in a Google search." Two attempts did come up in a search, the third was "Thaleniq (noun)". Definition: The brief feeling that a conversation has permanently changed your opinion of someone, even if nothing dramatic was said. Nothing in Google. There, a new word, not sure it proves or disproves anything. Or is it time to move the goal posts?
jimmaswell 2 hours ago [-]
Why is everyone who responds to this with a real example immediately flagged/dead?
sillysaurusx 1 hours ago [-]
HN autokills LLM generated comments. People don’t seem to believe this, but there’s proof for you.
1 hours ago [-]
bossyTeacher 2 hours ago [-]
Does a random sequence of letters qualify as a new word?
planetafro 2 hours ago [-]
[dead]
motoxpro 2 hours ago [-]
[flagged]
peterlk 1 hours ago [-]
[dead]
FrustratedMonky 1 hours ago [-]
"Who cares if they just predict the next token?"
Exactly. I also only write one word at a time. Who knows what is going on in order to come up with that word.
frozenseven 2 hours ago [-]
Show me on the anatomical prop where the magical "real reasoning" gland is.
sunshowers 2 hours ago [-]
One might argue that the composition of higher abstractions is the next token predicted after "here is a higher abstraction:"
adampunk 2 hours ago [-]
How sure are you that this is correct?
2 hours ago [-]
lubujackson 2 hours ago [-]
For anyone using LLMs heavily for coding, this shouldn't be too surprising. It was just a matter of time.
Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.
Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
daishi55 1 hours ago [-]
> the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
Isn't this just anthropocentrism? Why is understanding only valid if a human does it? Why is knowledge only for humans? If another species resolved the contradictions between gravity and quantum mechanics, does that not have meaning unless they explain it to us and we understand it?
tern 18 minutes ago [-]
Do the forms etched into stone by weather over millennia in Moab matter to the wind? Certainly yes, in one sense, but not in the same sense we mean when we say things matter to us, or to animals, or even bacteria.
interroboink 55 minutes ago [-]
It's a bit of an "if a tree falls in the forest but nobody hears it, does it make a sound?" quandary. Sure, maybe some aliens in a distant galaxy understand quantum mechanics better than we do. That's great, but it has no bearing on our little bubble of existence.
Though perhaps more to your point, if some superhuman AI is developed, and understands things better than us without telling us about it (or being unable to), it could perform feats that seem magical to us — that would concern us even if we don't understand it, since it affects us.
But I think in the frame of reference of the commenter you were replying to, they're just saying that the low-level AI used in this specific case is not capable of making its results actually useful to us; humans are still needed to make it human-relevant. It told us where to find a gem underground, but we still had to be the ones to dig it out, cut it, polish it, etc.
nextaccountic 12 minutes ago [-]
It's less likely that aliens of distant galaxies will appreciate this rather than, you know, AI themselves
We are in the birth of the AI age and we don't know how it will look like in 100 or 1000 or 10000 or 100000 years (all those time frames likely closer than possible encounters with aliens from distant galaxies). It's possible that AI will outlast humans even
> The measure of our success is whether what we do enables people to understand and think more clearly and effectively about mathematics.
I just wanted to highlight this very correct human-centric thought about the purpose of intellection.
zem 2 hours ago [-]
wow, that was indeed a brilliant essay. i particularly liked the framing that "solving a big conjecture was a cryptographic proof that you had come up with a genuine conceptual innovation".
mooreat 2 hours ago [-]
I think one interesting thing to point out is that the proof (disproof) was done by finding a counterexample of Erdős' original conjecture.
I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.
In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.
Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.
Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.
recitedropper 53 minutes ago [-]
This is impressive, no question.
Without knowing all this model has been trained on though, it is pretty hard to ascertain the extent to which it arrived to this "on its own". The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.
It isn't likely that someone solved this and then just put it in the training data, although I honestly wouldn't put that past OpenAI. More interesting though is the extent to which they've generated training data that may have touched on most or all of the "original" tenets found in this proof.
We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.
Rover222 45 minutes ago [-]
Seems like a very tin-foil-hat-take to me
net01 22 minutes ago [-]
I’m quite certain that a few months ago, some problems were claimed to be solved by AI. However, those claims were actually false and were exactly that, solved erdos problems that were not marked as solved and the solution was "found" by AI.
The corollary is that this is a very valuable capability of AI!
The ability to find incredibly obscure facts and recall them to solve "officially unsolved" problems in minutes is like Google Search on steroids. In some sense, it is one core component of "deep expertise", and humans rely on the same methodology regularly to solve "hard" problems. Many mathematicians have said that they all just use a "bag of tricks" they've picked up and apply them to problems to see if they work. The LLMs have a huge bag of very obscure tricks, and are starting to reach the point that they can effectively apply them also.
I suspect the threshold of AGI will be crossed when the AIs can invent novel "tricks" on their own, and memorise their own new approach for future use without explicitly having to have their weights updated with "offline" training runs.
mrdependable 24 minutes ago [-]
How is that a "tin-foil-hat" take? It's not a secret, and in fact widely reported, that these companies are spending billions on creating training data.
recitedropper 34 minutes ago [-]
I'm not letting the government read my brainwaves.
In all seriousness though: My suggestion is that those shepherding the frontier of AI start acting with more transparency, and stop acting in ways that encourage conspiratorial thinking. Especially if the technology is as powerful as they market it as.
vatsachak 3 hours ago [-]
As I have stated before, AI will win a fields medal before it can manage a McDonald's
A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.
trostaft 2 hours ago [-]
> A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.
ComplexSystems 44 minutes ago [-]
That may be true for now, but it seems clear enough that letting the model use Lean in its internal reasoning process would be a great idea
trostaft 41 minutes ago [-]
That I'd agree with! I really need to get around to learning Lean myself. It might be interesting to try and formalize some missing theoretical pieces from my field (or likely start smaller).
vatsachak 1 hours ago [-]
Yeah, but I wouldn't be surprised if they train the model on verification assisted by Lean.
trostaft 54 minutes ago [-]
Arguing similarly to how stockfish, the chess engine, trains I would not be surprised if this is more common in the future. I don't know if they use any proof verification tools during their reinforcement learning procedure right now, as far as I know they've been focusing more on COT based strategies (w/o Lean). But I'm hardly an LLM expert, I don't know.
Terr_ 2 hours ago [-]
> manage a McDonald's
Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.
> At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]
> At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.
Casual reminder that the author's proposed solution to the labor-automation dystopia is to invent a second identity-verification dystopia. Also casual reminder that the author wanted the death penalty to anyone over the age of 65.
Lerc 2 hours ago [-]
I disagree. It will be able to perform work deserving if a fields medal before it is capable of running a McDonalds. I think it will be running a McDonalds well before either of those things happen, and a fields medal long after both have happened.
c7b 2 hours ago [-]
One could hardly ask for a task better suited for LLMs than producing math in Lean. Running a restaurant is so much fuzzier, from the definition of what it even means to the relation of inputs to outputs and evaluating success.
vatsachak 1 hours ago [-]
Not necessarily. Obviously playing Kasparov on the board requires more planning ability than managing a McDonald's but look at where chess bots are now.
There's much more to being human than our "cognitive abilities"
baq 1 hours ago [-]
Conjecture: the first AI to successfully manage a McDonald’s will be a Gemini.
edbaskerville 2 hours ago [-]
I just visited a McDonald's for the first time in a while. The self-order kiosk UI is quite bad. I think this is evidence in favor of the idea that an incompetent AI will soon be incompetently running a McDonald's.
Silamoth 2 hours ago [-]
Out of curiosity, what issue did you have with the McDonald’s self-order kiosk? I actually think McDonald’s has the best kiosk I’ve ever encountered. The little animation that plays when you add an item to your cart is a little annoying (but I think they’ve sped that up). But otherwise, it’s everything I’d want. It shows you all the items, tells you every ingredient, and lets you add or remove ingredients. I have a better experience ordering through the kiosk than I do talking to a cashier.
ndiddy 1 hours ago [-]
It takes longer than ordering with a cashier, it keeps trying to upsell you, and it's always out of receipt paper because unsurprisingly the company that isn't willing to pay a person to take orders is also not willing to pay a person to maintain the kiosks.
2 hours ago [-]
auggierose 17 minutes ago [-]
> A difficult part was constructing a chess board on which to play math
We have that chess board for quite a while now, over 40 years. And no, there is nothing special about Lean here, it is just herd mentality. Also, we don't now how much training with Lean helped this particular model.
evenhash 2 hours ago [-]
The proof is not written in Lean, though. It’s written in English and requires validation by human experts to confirm that it’s not gibberish.
vatsachak 1 hours ago [-]
Yeah, but I wouldn't be surprised if they train the model on verification assisted by Lean
KalMann 42 minutes ago [-]
I think your analogy is good but I don't believe modern LLMs use Lean or any lean-like structure in their proofs. At least recent open source ones like DeepSeek can do advanced math without it (maybe the most cutting edge ones are doing it I can't say).
sigmoid10 2 hours ago [-]
Managing a McDonalds is a question of integration and modalities at this point. I don't think anyone still doubts that these models lack the reasoning capability or world knowledge needed for the job. So it's less of a fundamental technical problem and more of a process engineering issue.
andy12_ 1 hours ago [-]
I disagree. Even frontier models still achieve way worse results than the human baseline in VendingBench. As long as models can't manage optimally something as simple as a vending machine, they have no hope of managing a McDonalds.
throw-the-towel 2 hours ago [-]
The capability they lack is being able to be sued.
pear01 2 hours ago [-]
Police officers are human. In the United States in the vast majority of cases you can't sue the police, only the community responsible for them.
Assuming you can still sue McDonalds I am not sure if this is a problem in the robotic llm case. I'm also trying to imagine a case where you would want to sue the llm and not the company. Given robots/llm don't have free will I'm not sure the problem with qualified immunity making police unaccountable applies.
There already exist a lot of similar conventions in corporate law. Generally, a main advantage of incorporation is protecting the people making the decisions from personal lawsuits.
nemomarx 2 hours ago [-]
McDonald's are franchises - you generally want to sue the local owner or threaten them in addition to the holding company.
That only requires someone own the ai managed McDonald's though. so long as they can't avoid responsibility by pointing to the AI I don't see why you couldn't sue them.
logicchains 2 hours ago [-]
>Police officers are human. In the United States in the vast majority of cases you can't sue the police, only the community responsible for them.
Police are a monopoly; nobody has a choice about which police company to use. McDonalds are not a monopoly, and many customers would prefer to eat at competitors run by entities that could be sued or jailed if they did anything particularly egregious.
pear01 1 hours ago [-]
You are missing the point. The point is you can still sue the McDonalds. With the police there is a human intuition to also want to sue the officer, given the officer is a human being who has free will and thus made a choice to violate your rights.
The same intuition applies if you walk into McDonald's and a person there mistreats you. You want that person held responsible.
But the LLM is not a person. What is there to even sue? It just seems like it would simply pass through to the corporate entity without the same tension of feeling like we let a human get away with something. Because there is no human, just a corporation and the robot servicing the place.
Put another way - if the LLM is not a person, what is the advantage of a personal lawsuit?
Just sue the McDonalds. Even in a case where the LLM is extremely misaligned and acts in a way where you might normally personally sue the McDonald's employee, I'm just not sure the human intuition about "holding someone accountable" would have its normal force because again - the LLM is not a person.
So given we already have the notions of incorporation and indemnification it doesn't make sense to say what is precluding LLMs from running McDonald's is they can't be sued. If McDonald's can still be sued, then not only is there no problem, there is very likely not even a change in the status quo.
volkercraig 1 hours ago [-]
> we'll see more specialized math AI resembling StockFish soon
Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.
vatsachak 1 hours ago [-]
My claim is that LLMs waste a lot of time training on all available data.
Math is a sequence of formal rules applied to construct a proof tree. Therefore an AI trained on these rules could be far more efficient, and search far deeper into proof space
forinti 2 hours ago [-]
AI is already too old for that.
whimsicalism 2 hours ago [-]
the only thing keeping the mcdonalds from happening will be political, likewise the same with fields medal
segmondy 2 hours ago [-]
our local AI models are already capable of running McDonalds.
ori_b 1 hours ago [-]
We're automating art and science so that we can flip burgers. This future sucks.
vatsachak 1 hours ago [-]
Math is a very specialized subset of art and science more amenable to automation.
My claim is that we haven't even witnessed the move 37 of math yet. I am claiming that math AI is going to get even better
dyauspitr 1 hours ago [-]
Nonsense. Have you been watching the figure live stream? Or the Unitree video from yesterday with real time novel action generation? We’re less than a year away. If you can cook a burger, assemble a sandwich and clean up surfaces you’re all of the way there.
vatsachak 1 hours ago [-]
Fair. Let's see in a year. I'm willing to bet that nothing happens.
dyauspitr 39 minutes ago [-]
Yeah, it’s gonna be an exciting year. I still think you’ll need one human in there, but that’s about it.
2 hours ago [-]
raincole 27 minutes ago [-]
I like how everyone laughed when OpenAI said their models will have "PhD-Level Intelligence" and now the goalpost has been moved to if AI can create new math (i.e., not PhD-Level, but Leibniz/Euler/Galois level.)
dawnerd 15 minutes ago [-]
Yet it still codes like a junior developer that memorized all of stack overflow.
dilap 10 minutes ago [-]
Personally I don't find this to be true anymore! It's not always great and does still will often tend towards unneeded complexity (especially if not pushed a bit), but I often find GPT 5.5 writing code I would have written myself. This was very much not true with earlier models (who make something that worked, but I'd always have to rewrite to make it "good code").
raincole 10 minutes ago [-]
PhDs code like that too. Especially if they're statisticians :)
zeofig 24 minutes ago [-]
I still laugh.
cpard 1 hours ago [-]
The proof brings unexpected, sophisticated ideas from algebraic number theory to bear on an elementary geometric question.
The more I read about these achievements the more I get a feeling that a lot of the power of these models comes from having prior knowledge on every possible field and having zero problems transferring to new domains.
To me the potential beauty of this is that these tools might help us break through the increasing super specialization that humans in science have to go through today. Which in one hand is important on the other hand does limit the person in terms of the tooling and inspiration it has access to.
margorczynski 8 minutes ago [-]
Yep. The thing is people (maybe because of our limited scope) just focus on the depth and not the breadth. Because this is a general purpose model - it also has PhD+ knowledge in Physics, Biology, History, etc.
I think we still don't really comprehend how much can be achieved by a single "mind" that has internalized so much knowledge from so many areas.
doubledamio 56 minutes ago [-]
I’ve always been skeptical about the role of LLMs in mathematics, but this is the first time I’ve seen this argument, and I actually find it very compelling. Maybe LLMs will help us develop more horizontal understanding of the field.
cpard 14 minutes ago [-]
It's up to us I think. We can use LLMs to generate web pages in candy crash style and end up dumper by outsourcing thinking to the machines or we can use it to expand our cognitive capabilities.
What makes me more of an optimist in this case is that people who today decide to go into these sciences are mostly people who are driven by intellectual activity so I feel they are the right ones to figure this out, probably more so than us the engineers.
zozbot234 2 hours ago [-]
The summarized chain of thought for this task (linked in the blogpost) is 125 pages. That's an insane scale of reasoning, quite akin to what Anthropic has been teasing with Mythos.
Today I generated the equivalent of two LOTR books just to fix three missing rows in my SQL models (and open a PR), so +1
dwa3592 5 minutes ago [-]
Few questions that the blog did not answer, if anyone knows that'll be great:
- Does anyone know if this was a 1 minute of inference or 1 month?
- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?
- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?
0x5FC3 2 hours ago [-]
Is there a reason why we only hear of Erdos problems being solved? I would imagine there are a myriad of other unsolved problems in math, but every single ChatGPT "breakthrough in math" I come across on r/singularity and r/accelerate are Erdos problems.
jltsiren 1 hours ago [-]
Erdős problems form a substantial fraction of all mathematical problems that have been explicitly stated but not solved; are sufficiently famous that people care about them; and are sufficiently uninteresting that people have not spent that much effort trying to solve them.
Solving problems people have already stated is a niche activity in mathematical research. More often, people study something they find interesting, try to frame it in a way that can be solved with the tools they have, and then try to come up with a solution. And in the ideal case, both the framing and the solution will be interesting on their own.
bananaflag 2 hours ago [-]
Erdos problems are easier to state, thus they make a great benchmark for the first year of AI mathematics.
tonfa 2 hours ago [-]
Afaik this is because there is a community and database around them.
0x5FC3 2 hours ago [-]
Interesting. OpenAI could also be trying to solve other problems, but Erdos problems maybe falling first?
CSMastermind 2 hours ago [-]
No, Erdos problems were accepted as sort of a benchmark. There's a bunch of reasons they're favorable for this task:
1. They have a wide range of difficulties.
2. They were curated (Erdos didn't know at first glance how to solve them).
3. Humans already took the time to organize, formally state, add metadata to them.
4. There's a lot of them.
If you go around looking for a mathematics benchmark it's hard to do better than that.
They're just famous because Erdos was a great mathematician, kinda like the Hilbert problems a century earlier.
odie5533 25 minutes ago [-]
I was promised a cure for cancer, but all I got was this disproof of an Erdos problem.
empath75 2 hours ago [-]
It's a large set of problems that are both interesting and difficult, but not seen as foundational enough or important enough that they have already had sustained attention on them by mathematicians for decades or centuries, and so they might actually be solvable by an LLM.
1qaboutecs 2 hours ago [-]
Also fewer prerequisites to understand the statement than the average research problem.
aurareturn 3 hours ago [-]
One thing seems for certain is that OpenAI models hold a distinct lead in academics over Anthropic and Google models.
For those in academics, is OpenAI the vendor of choice?
Jcampuzano2 2 hours ago [-]
OpenAI specifically targeted Academia a lot and gave out a lot of free/unlimited usage to top academics and universities/researchers.
They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.
tracerbulletx 2 hours ago [-]
Hasn't AlphaFold been used to make real discoveries for a few years now?
KalMann 38 minutes ago [-]
I think he's talking about reasoning models.
karmasimida 2 hours ago [-]
I think the mathematicians on X are all using GPT 5.5 Pro
bayindirh 2 hours ago [-]
From my limited testing, Gemini can dig out hard to find information given you detail your prompt enough.
Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.
If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.
senrex 59 minutes ago [-]
This is my experience too. Gemini and Gemini deep research are awesome. Claude's deep research is pretty bad really relative to ChatGPT or Gemini.
Overall, I still love Claude the best but it is not what I would want to use if I wanted to really dig into deep research.
The export to google docs in Gemini deep research is tough to beat too. I haven't used Gemini since January but have probably years of material from saved deep research in google docs. Almost an overwhelming amount of information when I dive into what I saved.
FloorEgg 2 hours ago [-]
Gemini seems better trained for learning and I think Google has made a more deliberate effort to optimize for pedagoical best practices. (E.g. tutoring, formative feedback, cognitive load optimization)
As far as academic research is concerned (e.g. this threads topic), I can't say.
snaking0776 2 hours ago [-]
Agreed I usually use Gemini for explaining concepts and ChatGPT for getting things done on research projects.
aurareturn 2 hours ago [-]
Yes, I meant academic research.
cute_boi 2 hours ago [-]
Gemini is like someone with short-term memory loss; after the first response, it forgets everything. That being said, I have checked multiple model and gemini can sometime give accurate answer.
logicchains 2 hours ago [-]
OpenAI models seem to have been trained on a lot of auto-generated theorem proving data; GPT 5.5 is really good at writing Lean.
causal 2 hours ago [-]
A simpler explanation is that more people are using ChatGPT
endymi0n 2 hours ago [-]
To paraphrase Gwynne Shotwell: “Not too bad for just a large Markov chain, eh?”
rhubarbtree 2 hours ago [-]
Erdos, or the model?
agentultra 18 minutes ago [-]
I’m curious about the “autonomous” claim. Usually these systems require a human to guide and verify steps, clarify problems, etc. are they claiming that the reinforcement model wasn’t given any inputs, tools, guidance, or training data from humans?
dwroberts 2 hours ago [-]
Would be interesting to know what kind of preparatory work actually went into this - how long did it take to construct an input that produced a real result, and how much input did they get from actual mathematicians to guide refining it
Jeff_Brown 3 hours ago [-]
Can anyone find (or draw) a picture of the construction?
gibspaulding 2 hours ago [-]
This only a proof that a field with more connections is possible, not what it looks like.
I’m very out of my depth, but the structure of the proof seems to follow a pattern similar to a proof by contradiction. Where you’d say for example “assume for the sake of contradiction that the previously known limit is the highest possible” then prove that if that statement is true you get some impossible result.
ninjha 2 hours ago [-]
They only proved that one exists; computing the actual construction is non-obvious (the naive way to construct it is computationally infeasible).
pradn 2 hours ago [-]
They have a "before" picture but not an "after"!
2 hours ago [-]
paulddraper 37 minutes ago [-]
Yeah, unfortunately, they just proved there existed a better solution, they didn't construct it.
(Though in some ways that's actually more impressive.)
2 hours ago [-]
52 minutes ago [-]
Fraterkes 2 hours ago [-]
I guess if this stuff is going to make my employment more precarious, it’d be nice if it also makes some scientific breakthroughs. We’ll see
ausbah 2 hours ago [-]
shame we won’t see any of these medical breakthroughs when we all lose our jobs and thus our healthcare
karmasimida 2 hours ago [-]
There is a world that AI makes medical breakthroughs, but there is 0 guarantee it is going to be affordable
cubefox 2 hours ago [-]
Breakthroughs in pure mathematics aren't scientific though. They say us nothing about the world, and they are not useful.
CGMthrowaway 1 hours ago [-]
How do you even get an LLM to try to solve one of these problems? When I ask it just comes back with the name of the problem and saying "it can't be done"
KalMann 15 minutes ago [-]
Maybe you need to phrase it better. Like with a more specific direction of thinking.
throwaway2027 2 hours ago [-]
Not to dismiss the AI but the important part is that you still need someone able to recognize these solutions in the first place. A lot of things were just hidden in plain sight before AI but no one noticed or didn't have the framework either in maths or any other field they're specialized in to recognize those feats.
famouswaffles 2 hours ago [-]
Another entry in a growing list of the last couple months (interestingly mostly Open AI):
> AI is about to start taking a very serious role in the creative parts of research, and most importantly AI research itself. While this progress is not unexpected, it reinforces the urgency we feel about understanding this next phase of AI development, the challenges of aligning very intelligent systems, and the future of human-AI collaboration.
I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.
trostaft 1 hours ago [-]
Speaking as a postdoc in math, I must say that this is rather exciting. This is outside of my field, but the companion remarks document is quite digestible. It appears as though the proof here fairly inspired by results in literature, but the tweaks are non-trivial. Or, at least to me, they appear to be substantial to where I would consider the entire publication novel and exciting.
Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .
Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.
One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
auggierose 21 minutes ago [-]
Which model did this? Is it available to the public?
atleastoptimal 1 hours ago [-]
To all AI skeptics:
What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?
KalMann 10 minutes ago [-]
I think there's been natural but steady progress with since 2024 with the release of the o1 model, which showed impressive reasoning capabilities. But I think it's wrong to look at the magnitude of the accomplishments and assume that will be field independent. We don't know the range of problems reasoning techniques are useful for. What we see here is refinement of capabilities that have been noticeable for years.
enoint 1 hours ago [-]
That’s one possibility. If it fails to convince a critical mass that it’s a net improvement in their lives, then the impediment to continual improvement will be sabotage.
rzmmm 29 minutes ago [-]
Maybe after decades. 2022 models were microscopic compared to latest models.
xandrius 60 minutes ago [-]
You should really look up a video about what GPTs fundamentally are.
Rover222 44 minutes ago [-]
You should also really look up a video about what neural synapses really are.
alansaber 3 hours ago [-]
AI isn't going to supercharge science but I wouldn't be as dismissive as other posters here.
tombert 2 hours ago [-]
I'm not a scientist but I like to LARP as one in my free time, and I have found ChatGPT/Claude extremely useful for research, and I'd go as far as to say it supercharged it for me.
When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.
This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.
I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.
kingkongjaffa 2 hours ago [-]
One heuristic I used during my masters degree research thesis was to look for the seminal people or papers in a field by using google scholar to find the most cited research papers and then reading everything else by that author / looking at the paper's references for others. You often only need to go back 3-4 papers to find some really seminal/foundational stuff.
tombert 2 hours ago [-]
Yeah, that's actually how I discovered Leslie Lamport like ten years ago. I was looking for papers on distributed consensus, and it's hard not to come across Paxos when doing that. It turns out that he has oodles of really great papers across a lot of different cool things in computer science and I feel like I understand a lot more about this space because of it.
It doesn't hurt that Lamport is exceptionally good at explaining things in plain language compared to a lot of other computer scientists.
vatsachak 2 hours ago [-]
I absolutely believe that AI will supercharge science.
I do not believe it will replace humans.
unsupp0rted 2 hours ago [-]
I absolutely believe that AI will supercharge science and replace humans.
Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
geraneum 1 hours ago [-]
> Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
Goodness gracious!
vatsachak 1 hours ago [-]
Well, for starters AI doesn't have goals. If there was a super intelligence with goals, why would they work for us?
devttyeu 58 minutes ago [-]
Fwiw if you trained an LLM in an RL sandbox that would require it to have goals, the output llm probably would "have goals"
stonogo 2 hours ago [-]
Not like large language models, which only required tens of megawatts of power and use highly efficient monte carlo methods, eh
TheOtherHobbes 1 hours ago [-]
Individual humans are processing nodes on human culture as a whole, which runs on rather more than tens of megawatts.
unsupp0rted 44 minutes ago [-]
Also it costs a lot to train and run individual humans, and they can only be run for brief periods per day before they crash, hallucinate and possibly get irretrievably broken.
1 hours ago [-]
seydor 2 hours ago [-]
replace, no. obsolete, yes
dvfjsdhgfv 2 hours ago [-]
lol
(That's the first time I used that expression on HN.)
OldGreenYodaGPT 3 hours ago [-]
Isn’t that a joke? It already has supercharged science
ks2048 2 hours ago [-]
Since "supercharged science" is as ill-defined as AGI, ASI, etc., people will be able to debate it endlessly for no reason.
datsci_est_2015 2 hours ago [-]
Where are the second order effects of this supercharging of science? Or has it not been enough time for those to propagate?
comboy 2 hours ago [-]
Not only it supercharged science it supercharges scientist. Research on any narrow topic is a different world now. Agents can read 50 papers for you and tell you what's where. This was impossible with pure text search. Looking up non-trivial stuff and having complex things explained to you is also amazing. I mean they don't even have to be complex, but can be for adjacent field where these are basics from the other field but happen to be useful in yours. The list goes on. It's a hammer you need to watch your fingers, it's not good at cutting wood, but it's definitely worth having.
dvfjsdhgfv 2 hours ago [-]
It's a very heavy hammer. I used it in the way you describe and after double-checking noticed some crucial details were missed and certain facts were subtly misrepresented.
But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.
Karrot_Kream 2 hours ago [-]
I don't think there's a substitute for reading the source material. You have to read the actual paper that's cited. You have to read the code that's being sourced/generated. But used as a reasoning search engine, it's a huge enabler. I mean so much of research literally is reasoning through piles of existing research. There's probably a large amount of good research (especially the kind that don't easily get grant funding) that can "easily" shake out through existing literature that humans just haven't been able to synthesize correctly.
karmasimida 2 hours ago [-]
To be strict, Math is not Science.
But AI is supercharging Math like there is no tomorrow.
renegade-otter 2 hours ago [-]
It will notice things that humans may have missed. That said - it can only work off the body of work SOMEONE did in the past.
throw-the-towel 2 hours ago [-]
> it can only work off the body of work SOMEONE did in the past.
And so do humans. Gotta stand on these shoulders of giants.
bel8 2 hours ago [-]
Can't the previous body of work be from AI too?
renegade-otter 44 minutes ago [-]
Of course it can be, but it's overeager. No matter what your context window is, we will use AI collectively to flood the zone with shit.
ks2048 2 hours ago [-]
Timothy Gowers' tweet about this: "If you are a mathematician, then you may want to make sure you are sitting down before reading futher.".
woah.
dadrian 2 hours ago [-]
While the result is impressive, this blog post is extremely disappointing.
- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)
- It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.
- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.
If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.
Al-Khwarizmi 2 hours ago [-]
Indeed, it's a pity. While many advanced math problems are highly abstract or convoluted to explain to a layman audience, this one in particular is about points in a 2D plane and distances. A drawing would have been nice.
2 hours ago [-]
changoplatanero 2 hours ago [-]
apparently the proof is not constructive in the sense of not giving an easy to compute recipe for generating a set of points that you can plot on a 2d plane
yusufozkan 2 hours ago [-]
"The proof came from a general-purpose reasoning model, not a system built specifically to solve math problems or this problem in particular, and represents an important milestone for the math and AI communities."
seydor 2 hours ago [-]
all reasoning is .. well problem reasoning. restricting black-box AIs to specific human-defined domains because we believe that's better is such a human-ist thing to do.
Kwantuum 2 hours ago [-]
I trust openAI's marketing team 100%
krackers 2 hours ago [-]
It seems plausible given that people have been using off the shelf 5.5 xhigh to decent success with some erdos problems. There is likely still some scaffolding around it though (like parallel sampling or separate verifier step) since it's not clear if you can just "one shot" problems like this.
phkahler 2 hours ago [-]
I would have thought a triangular grid works better than a grid of squares. You get ~3n links vs ~2n for the square grid. Curious what the AI came up with.
comboy 2 hours ago [-]
Yes, not providing visualization of the solution seems criminal.
red_admiral 2 hours ago [-]
Unless it's a non-constructive proof.
kmeisthax 2 hours ago [-]
Knowing OpenAI, the solution's probably being withheld as a trade secret, lest it fall victim to distillation attacks (i.e. exactly the same shit they did to the open Internet).
bustermellotron 1 hours ago [-]
The grid of squares actually gets > Cn for any C. (More in fact… C can grow like n^a/loglog(n).) The AI proved > n^{1 + b} for some small b > 0, which a human (Will Sawin) has now proved can be about b = 0.014. The grid can be rescaled so the edges are not necessarily length 1, but other pairs will have length 1; that is necessary to get more than 2n unit distances.
kilotaras 2 hours ago [-]
Both 3n and 2n are linear, the broken conjecture is that you can't do better than linear.
_heimdall 39 minutes ago [-]
As this becomes more common it makes me wonder where the LLM ends and the harness begins.
The underlying model may still effectively be a stochastic parrot, but used properly that can do impressive things and the various harnesses have been getting better and better at automating the use of said parrot.
seydor 2 hours ago [-]
can the AI please tell us what to do now that all knowledge work will become unemployment?
bmacho 55 minutes ago [-]
Physical labour?
solomatov 2 hours ago [-]
How central is it in the discrete geometry? Could anyone with the knowledge in the field reply?
sigmar 2 hours ago [-]
The blog post links a pdf that OpenAI put together of nine mathematicians that commented on the proof. Each is quite brief and written in accessible terms (or more accessible terms, at least). https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...
energy123 2 hours ago [-]
There's pages of comments from like 8 mathematicians in the attached pdf
taimurshasan 2 hours ago [-]
I wonder how much this cost vs a Math Professor or a team of Math Professors.
Karrot_Kream 2 hours ago [-]
Sadly math professors aren't very expensive. Academics are paid terrible wages. Until recently, tenure was the carrot at the end of a grueling education. But now that tenure positions are getting rarer (well, tenure positions aren't growing vs the number of aspiring academics is), they continue to be cheap highly educated labor.
forgot_old_user 2 hours ago [-]
it will only get cheaper in the long run
aspenmartin 2 hours ago [-]
40x cheaper per year if trends continue
dvfjsdhgfv 2 hours ago [-]
for a sufficiently long definition of long
aspenmartin 2 hours ago [-]
No for a very short definition of long, look at data on: how fast do prices decrease for a constant level of performance
pizzao 2 hours ago [-]
Can someone explain to me what is their "prompting-scaffolding" to make it work ?
yusufozkan 2 hours ago [-]
"This is a general-purpose LLM. It wasn’t targeted at this problem or even at mathematics. Also, it’s not a scaffold. We have not pushed this model to the limit on open problems. Our focus is to get it out quickly so that everyone can use it for themselves." - Noam Brown (OpenAI reasoning researcher) on X
catigula 2 hours ago [-]
Every time I interact even with OpenAI's pro model, I am forced to come to the conclusion that anything outside the domain of specific technical problems is almost completely hopeless outside of a simple enhanced search and summary engine.
For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.
Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.
Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.
dvfjsdhgfv 2 hours ago [-]
Yeah, I remember it was one of my biggest disappointments with LLMs.
somewhereoutth 1 hours ago [-]
The real test would be if an LLM makes an important conjecture.
Kye 2 hours ago [-]
Is this something that can be made explainable to someone without any of the relevant background, or is this one of those things where all that background is needed to understand it? Because I have no idea what's going on here, but would like to.
empath75 3 hours ago [-]
Important note: this was not done with a special mathematics harness or specialized workflow.
dwroberts 2 hours ago [-]
How/why should we know this, it does not explain the process in the text?
analognoise 23 minutes ago [-]
Back when “term rewriting” was “AI”, multiple math tools were released that took known math facts and did tricks like uncovering new integrals - apply the pattern in some depth in a tree, see what pops out.
What was discovered were numerous mistakes in the published literature on the subject. “New math! AI!” No, just mechanical application of rules, human mistakes.
There were things that were theorized, but couldn’t be exhaustively checked until computers were bigger.
Once again, a tool is applied, it has the AI label - its progress! But it isn’t something new. It’s just an LLM.
There’s a consistent under appreciation of AI (and math, honestly), but watching soulless AI mongers declare that their toy has created the new is something of a new low; uninspired, failed creatives, without rhyme or context; this is a bigger version of declaring that your spell checker has created new words.
The result is more impressive than what was done with tables of integrals and SAINT in 1961, sure.
Apparently if you add a “temperature” knob to a text predictor, otherwise sane individuals piss themselves and call it new.
Then again I thought NFTs, crypto, and the Metaverse were stupid, so what do I know.
arsan87 2 hours ago [-]
neato. can we do any thing with this new found knowledge or is this mathematical sports?
can we please put these ground breaking AIs to work on actual problems humans have?
clarle 2 hours ago [-]
People thought neural networks were just an interesting thought exercise a few decades ago and not for practical ML problems, and look what happened since then.
rohitsriram 2 hours ago [-]
[dead]
xiaod 2 hours ago [-]
[flagged]
OldGreenYodaGPT 2 hours ago [-]
[dead]
ShadowPulse4709 2 hours ago [-]
[flagged]
buddhahastha 1 hours ago [-]
[flagged]
dist-epoch 3 hours ago [-]
[flagged]
embedding-shape 3 hours ago [-]
> It's not a new result, LLMs can't produce new results
Who else disproved this longstanding conjecture before the model did so, since obviously it must have been in the training data since before?
ekjhgkejhgk 3 hours ago [-]
Your understanding of this technology is out of date, and getting out of date faster as time goes by.
throwaw12 3 hours ago [-]
Thanks for giving me a hope that there is a still place for human knowledge workers.
bradleykingz 2 hours ago [-]
ok. so what are the implications of for math
brcmthrowaway 2 hours ago [-]
End times are approaching
reactordev 2 hours ago [-]
I dunno, I'm skeptical without proof. I've had the MAX+ plan for a while and I'm sorry, the quality between GPT vs Claude is night and day difference. Claude understands. GPT stumbles over every request I give it.
nathan_compton 2 hours ago [-]
Weird thing to say about a report which literally has the attached mathematical proof.
2 hours ago [-]
reactordev 2 hours ago [-]
Except its not a proof. It's an existential proof of what? Projecting points and loosing density? Nah, it's wrong. At least with Edros you could solve f(x) or not solve it (inf). You can not with this. All they did was balance a really fancy quadratic equation. The projection from C^f to R² doesn't demonstrate geometric injectivity, so nⱼ = |X| isn't established, and the bound collapses.
Rendered at 22:01:37 GMT+0000 (Coordinated Universal Time) with Vercel.
Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.
I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.
* LLMs do just interpolate their training data, BUT-
* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data
In the case of mathematics, LLMs are essentially just brute-forcing the glorified calculators they run on with pseudo-random data regurgitated along probabilities; in that regard, mathematics is a perfect field for them to be wielded against in solving problems!
As for organic chemistry, or biology, or any of the numerous fields where brand new discoveries continue happening and where mathematics alone does not guarantee predicted results (again, because we do not know what we do not know), LLMs are far less useful for new discoveries so much as eliminating potential combinations of existing data or surfacing overlooked ones for study. These aren't "new" discoveries so much as data humans missed for one reason or another - quack scientists, buried papers, or just sheer data volume overwhelming a limited populace of expertise.
For further evidence that math alone (and thus LLMs) don't produce guaranteed results for an experiment, go talk to physicists. They've been mathematically proving stuff for decades that they cannot demonstrably and repeatedly prove physically, and it's a real problem for continued advancement of the field.
negative numbers were invented to solve equations which only used naturals. irrationals were invented to solve equations which could be expressed with rationals. complex numbers were invented to represent solutions to polynomials. so on and so forth. At each point new ideas are invented to complete some un-answerable questions. There is a long history of this. Any closed system has unanswerable question within itself is a paraphrasing of goedel's incompleteness theorem.
I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.
Also we shouldn’t be thinking about what LLMs are good at, but rather what any computer ever might be good at. LLMs are already only one (essential!) part of the system that produced this result, and we’ve only had them for 3 years.
Also also this is a tiny nitpick but: the fields medal is every 4 years, AFAIR. For that exact reason, probably!
Its amazing to me when people talk about recombining things, or following up on things as somehow lesser work.
People can't separate the perspective they were given when they learned the concepts, that those who developed the concepts didn't have because they didn't exist.
Simple things are hard, or everything simple would have been done hundreds of years ago, and that is certainly not the case. Seeing something others have not noticed is very hard, when we don't have the concepts that the "invisible" things right in front of us will teach us.
The experiment is feasible. If it were performed and produced a positive result, what would it imply/change about how you see LLMs?
Besides, we can forecast our thoughts and actions to imagined scenarios unconditioned on their possibility. Something doesn't have to be possible for us to imagine our reactions.
There are people working on this.
e.g. https://github.com/haykgrigo3/TimeCapsuleLLM
Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibniz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.
For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
A scientist has to extract the "Creation" from an abstract dimension using the tools of "human knowledge". The creativity is often selecting the best set of tools or recombining tools to access the platonic space. For instance a "telescope" is not a new creation, it is recombination of something which already existed: lenses.
How can we truly create something ? Everything is built upon something.
You could argue that even "numbers" are a creation, but are they ? Aren't they just a tool to access an abstract concept of counting ? ... Symbols.. abstractions.
Another angle to look at it, even in dreams do we really create something new ? or we dream about "things" (i.e. data) we have ingested in our waking life. Someone could argue that dream truly create something as the exact set of events never happened anywhere in the real world... but we all know that dreams are derived.. derived from brain chemistry, experiences and so on. We may not have the reduction of how each and every thing works.
Just like energy is conserved, IMO everything we call as "created" is just a changed form of "something". I fully believe LLMs (and humans) both can create tools to change the forms. Nothing new is being "created", just convenient tools which abstract upon some nature of reality.
Humans and animals have intuitive notions of space and motion since they can obviously move. But, symbolizing such intuitions into forms and communicating that via language is the creative act. Birds can fly, but can they symbolize that intuitive intelligence to create a theory of flight and then use that to build a plane ?
Well I think the point is there is no "new kind of math". There's just types of math we've discovered and what we haven't. No new math is created, just found.
We're not comparing math to reality (though there's a strong argument to be made that reality has a structure that is mathematical in nature - structural realism didn't die a scientific philosophy just because someone came up with a pithy saying), we're talking about if math is discovered or invented.
Most mathematicians would argue both - math is a language, we have created operations, axioms are proposed based on human creativity, etc., but the actual laws, patterns, etc. are discovered. Pi is going to be pi no matter if you're a human or someone else - we might represent it differently with some other number system or whatever, but that's a matter of representation, not mathematical truth.
However, if that idea about new math is correct, we, in theory, don’t need new math to (dis)prove the Riemann hypotheses (assuming it is provable or disprovable in the current system).
In practice we may still need new math because a proof of the Riemann hypotheses using our current arsenal of mathematical ‘objects’ may be enormously large, making it hard to find.
In concrete terms: could a thousand LLMs-driven agents running on supercomputers—500 of which are dedicated to building software for the other 500-come up with new math?
I honestly don't know personally either way. Based on my limited understanding of how LLMs work, I don't see them be making the next great song or next great book and based on that reasoning I'm betting that it probably wont be able to do whatever next "Descartes, Newton, Leibnitz, Gauss, Euler, Ramanujan, Galois" are going to do.
Of course AI as a wider field comes up with something more powerful than LLM that would be different.
Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
Also - music is a subjective. Mathematics isn't.
And in this case, an LLM discovered a new way to reason about a conjecture. I don't know how much proof is needed - since that is literally proof that it can be done.
There is quite some questions around that. Music is subjective and obviously different people have different taste, but I wouldn't call any of them to be actual good music / real hits.
>> LLM discovered a new way to reason about a conjecture
I wasn't questioning LLMs ability to prove things. Parent threads were talking about building new kind of maths , or approaching it in a creative/artistic way. Thats' what I was referring to.
I can't speak for maths of hard science as I'm not trained in that, but the creativity aspect in code is definitely lacking when it comes to LLMs. May not matter down the line.
because I have no basis for assuming an LLM is fundamentally capable of doing this.
"Never shall I be beaten by a machine!”
In 1997 he lost to Deep Blue.
Train an LLM only on texts dated prior to Newton and see if it can create calculus, derrive the equations of motion, etc.
If you ask it about the nature of light and it directs you to do experiments with a prism I'd say we're really getting somewhere.
Incidentally, similar conversations were had about ML writ large vs. classical statistics/methods, and now they've more or less completely died down since it's clear who won (I'm not saying classical methods are useless, but rather that it's obvious the naysayers were wrong). I anticipate the same trajectory here. The main difference is that because of the nature of the domain, everyone has an opinion on LLM's while the ML vs. statistics battle was mostly confined within technical/academic spaces.
I can assure you, the percentage of people who can do what they do when it comes to crafting terms, and related sets of terms, for nuanced and novel ideas is very very small.
It happens this is something I do nearly every day.
Models respond to the level of dialogue you have with them. Engage with an informed perspective on terminological issues and they respond with deep perspectives.
I am routinely baffled at the things people say models can't do, that they do effortlessly. Interaction and having some skill to contribute helps here.
In the end, creativity has always been a combination of chance and the application of known patterns in new contexts.
If you know anything about the invention of new math (analytic geometry, Calculus, etc.), you'd know how untrue this is. In fact, Calculus was extremely hand-wavy and without rigorous underpinnings until the mid 1800s. Again: more art than science.
If anything, they were fighting an uphill battle against the perception of hand-waving by their contemporaries.
Yes, and it's pretty common knowledge that Calculus was (finally) formalized by Weierstrass in the early 19th century, having spent almost two centuries in mathematical limbo. Calculus was intuitive, solved a great class of problems, but its roots were very much (ironically) vibes-based.
This isn't unique to Newton or Leibniz, Euler did all kinds of "illegal" things (like playing with divergent series, treating differentials as actual quantities, etc.) which worked out and solved problems, but were also not formalized until much later.
LLMs are prompted by humans and the right query may make it think/behave in a way to create a novel solution.
Then there's a third factor now with Agentic AI system loops with LLMs. Where it can research, try, experiment in its own loop that's tied to the real world for feedback.
Agentic + LLM + Initial Human Prompter by definition can have it experiment outside of its domain of expertise.
So that's extending the "LLM can't create novel ideas" but I don't think anyone can disagree the three elements above are enough ingredients for an AI to come up with novel ideas.
We just haven't let AI run wild yet. But its coming.
That's not creative prompt. That's a driving prompt to get it to start its engine.
You could do that nowadays and while it may spend $1,000 to $100,000 worth of tokens. It will create something humans haven't done before as long as you set it up with all its tool calls/permissions.
Who decides at which the last point it’s OK to provide text to the model in order to be able to describe it as creative? (non-rhetorical)
The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.
Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.
Or like a musical octave has only 12 semitones, so all music is just a selection from a finite set that already existed.
Sure the insane computation we're throwing at this changes our perspective, but still there is an important distinction.
Like, "does the Riemann zeta function have zeroes that don't have real part 1/2," or "is there a better solution to the Erdős Unit Distance Problem."
The selection of question is matter of taste, but once selected, there is a definitive precise answer.
Who knew Obi-one was just smoking and pontificating on Wittgenstein.
Isn't this exactly what chain-of-thought does? It's doing computation by emitting tokens forward into its context, so it can represent states wider than its residuals and so it can evaluate functions not expressed by one forward pass through the weights. It just happens to look like a person thinking out loud because those were the most useful patterns from the training data.
E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.
I would claim the graph exists, and seeing it is more of an knowledge problem. Creativity, to me, is the ability to reject existing edges and add nodes to the graph AND mentally test them to some sufficient confidence that a practical attempt will probably work (this is what differentiates it from random guessing).
But, as you become more of an expert on certain problem space (graph), that happens less frequently, and everything trends towards "obvious", or the "creative jumps" are super slight, with a node obviously already there. If you extended that to the max, an oracle can't be creative.
Maybe I just need sparser graphs to play in. :)
That said. I think it’s worth saying that “LLMs just interpolate their training data” is usually framed as a rhetorical statement motivated by emotion and the speaker’s hostility to LLMs. What they usually mean is some stronger version, which is “LLMs are just stochastically spouting stuff from their training data without having any internal model of concepts or meaning or logic.” I think that idea was already refuted by LLMs getting quite good at mathematics about a year ago (Gold on the IMO), combined with the mechanistic interpretatabilty research that was actually able to point to small sections of the network that model higher concepts, counting, etc. LLMs actually proving and disproving novel mathematical results is just the final nail in the coffin. At this point I’m not even sure how to engage with people who still deny all this. The debate has moved on and it’s not even interesting anymore.
So yes, I agree with you, and I’m even happy to say that what I say and do in life myself is in some broad sense and interpolation of the sum of my experiences and my genetic legacy. What else would it be? Creativity is maybe just fortunate remixing of existing ideas and experiences and skills with a bit of randomness and good luck thrown in (“Great artists steal”, and all that.) But that’s not usually what people mean when they say similar-sounding things about LLMs.
I'm not even sure why they were invoked. Even disregarding the big techinical debunks such as two dogmas, sociologically and even by talking to real mathematicians (see Lakatos, historically, but this is true anecdotally too), it's (ironically) a complete non-question to wonder about mathematics in a logical positivist way.
You can watch a rock roll down a hill and derive the concept for the wheel.
Seems pretty self evident to me
They will do their own thing, don't need us. In fact, we will be in the way...
We can choose to study them and their output, but they don't make us better mathematicians...
However, in the role of personal teachers they may allow especially our young generations to reach a deeper understanding of maths (and also other topics) much quicker than before. If everyone can have a personal explanation machine to very efficiently satisfy their thirst for knowledge this may well lead to more good mathematicians.
Of course this heavily depends on whether we can get LLMs‘ outputs to be accurate enough.
But that's not how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.
THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.
Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.
LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.
edit: I am going to go further
We have only in the last couple of hundred years realised how to see things that are smaller than what our eye's can naturally see - we've used "glass" to see bacteria, and spores, and we've realised that we can use electrons to see even smaller
We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms
That much is derived from previous knowledge
What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things
Cracks me up.
What exactly do we think that human brains do?
Maybe computers can help understand better because by now it's pretty clear brains aren't just LLMs.
The pessimists just see a 20W meat computer.
A lot of people across all fields seem to operate in a mode of information lookup as intelligence. They have the memory of solving particular problems, and when faced with a new problem, they basically do a "nearest search" in their brain to find the most similar problem, and apply the same principles to it.
While that works for a large number of tasks this intelligence is not the same as reasoning.
Reasoning is the ability to discover new information that you haven't seen before (i.e growing a new branch on the knowledge tree instead of interpolating).
Think of it like filling a space on the floor of arbitrary shape with smaller arbitrary shapes, trying to fill as much space as possible.
With interpolation, your smaller shapes are medium size, each with a non rectangular shape. You may have a large library of them, but in the end, there are just certain floor spaces that you won't be able to fill fully.
Reasoning on the flip side is having access to very fine shape, and knowing the procedure of how to stack shapes depending on what shapes are next to it and whether you are on a boundary of the floor space or not. Using these rules, you can fill pretty much any floor space fully.
Yes?
Said differently, what is prediction but composition projected forward through time/ideas?
Exactly. I also only write one word at a time. Who knows what is going on in order to come up with that word.
Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.
Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
Isn't this just anthropocentrism? Why is understanding only valid if a human does it? Why is knowledge only for humans? If another species resolved the contradictions between gravity and quantum mechanics, does that not have meaning unless they explain it to us and we understand it?
Though perhaps more to your point, if some superhuman AI is developed, and understands things better than us without telling us about it (or being unable to), it could perform feats that seem magical to us — that would concern us even if we don't understand it, since it affects us.
But I think in the frame of reference of the commenter you were replying to, they're just saying that the low-level AI used in this specific case is not capable of making its results actually useful to us; humans are still needed to make it human-relevant. It told us where to find a gem underground, but we still had to be the ones to dig it out, cut it, polish it, etc.
We are in the birth of the AI age and we don't know how it will look like in 100 or 1000 or 10000 or 100000 years (all those time frames likely closer than possible encounters with aliens from distant galaxies). It's possible that AI will outlast humans even
I just wanted to highlight this very correct human-centric thought about the purpose of intellection.
I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.
In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.
Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.
Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.
Without knowing all this model has been trained on though, it is pretty hard to ascertain the extent to which it arrived to this "on its own". The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.
It isn't likely that someone solved this and then just put it in the training data, although I honestly wouldn't put that past OpenAI. More interesting though is the extent to which they've generated training data that may have touched on most or all of the "original" tenets found in this proof.
We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.
edit: >> https://techcrunch.com/2025/10/19/openais-embarrassing-math/
The ability to find incredibly obscure facts and recall them to solve "officially unsolved" problems in minutes is like Google Search on steroids. In some sense, it is one core component of "deep expertise", and humans rely on the same methodology regularly to solve "hard" problems. Many mathematicians have said that they all just use a "bag of tricks" they've picked up and apply them to problems to see if they work. The LLMs have a huge bag of very obscure tricks, and are starting to reach the point that they can effectively apply them also.
I suspect the threshold of AGI will be crossed when the AIs can invent novel "tricks" on their own, and memorise their own new approach for future use without explicitly having to have their weights updated with "offline" training runs.
In all seriousness though: My suggestion is that those shepherding the frontier of AI start acting with more transparency, and stop acting in ways that encourage conspiratorial thinking. Especially if the technology is as powerful as they market it as.
A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.
However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.
Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.
> At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]
> At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.
[0] https://en.wikipedia.org/wiki/Manna_(novel)
There's much more to being human than our "cognitive abilities"
We have that chess board for quite a while now, over 40 years. And no, there is nothing special about Lean here, it is just herd mentality. Also, we don't now how much training with Lean helped this particular model.
https://en.wikipedia.org/wiki/Qualified_immunity
Assuming you can still sue McDonalds I am not sure if this is a problem in the robotic llm case. I'm also trying to imagine a case where you would want to sue the llm and not the company. Given robots/llm don't have free will I'm not sure the problem with qualified immunity making police unaccountable applies.
There already exist a lot of similar conventions in corporate law. Generally, a main advantage of incorporation is protecting the people making the decisions from personal lawsuits.
That only requires someone own the ai managed McDonald's though. so long as they can't avoid responsibility by pointing to the AI I don't see why you couldn't sue them.
Police are a monopoly; nobody has a choice about which police company to use. McDonalds are not a monopoly, and many customers would prefer to eat at competitors run by entities that could be sued or jailed if they did anything particularly egregious.
The same intuition applies if you walk into McDonald's and a person there mistreats you. You want that person held responsible.
But the LLM is not a person. What is there to even sue? It just seems like it would simply pass through to the corporate entity without the same tension of feeling like we let a human get away with something. Because there is no human, just a corporation and the robot servicing the place.
Put another way - if the LLM is not a person, what is the advantage of a personal lawsuit?
Just sue the McDonalds. Even in a case where the LLM is extremely misaligned and acts in a way where you might normally personally sue the McDonald's employee, I'm just not sure the human intuition about "holding someone accountable" would have its normal force because again - the LLM is not a person.
So given we already have the notions of incorporation and indemnification it doesn't make sense to say what is precluding LLMs from running McDonald's is they can't be sued. If McDonald's can still be sued, then not only is there no problem, there is very likely not even a change in the status quo.
Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.
Math is a sequence of formal rules applied to construct a proof tree. Therefore an AI trained on these rules could be far more efficient, and search far deeper into proof space
The more I read about these achievements the more I get a feeling that a lot of the power of these models comes from having prior knowledge on every possible field and having zero problems transferring to new domains.
To me the potential beauty of this is that these tools might help us break through the increasing super specialization that humans in science have to go through today. Which in one hand is important on the other hand does limit the person in terms of the tooling and inspiration it has access to.
I think we still don't really comprehend how much can be achieved by a single "mind" that has internalized so much knowledge from so many areas.
What makes me more of an optimist in this case is that people who today decide to go into these sciences are mostly people who are driven by intellectual activity so I feel they are the right ones to figure this out, probably more so than us the engineers.
- Does anyone know if this was a 1 minute of inference or 1 month?
- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?
- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?
Solving problems people have already stated is a niche activity in mathematical research. More often, people study something they find interesting, try to frame it in a way that can be solved with the tools they have, and then try to come up with a solution. And in the ideal case, both the framing and the solution will be interesting on their own.
1. They have a wide range of difficulties. 2. They were curated (Erdos didn't know at first glance how to solve them). 3. Humans already took the time to organize, formally state, add metadata to them. 4. There's a lot of them.
If you go around looking for a mathematics benchmark it's hard to do better than that.
For those in academics, is OpenAI the vendor of choice?
They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.
Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.
If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.
As far as academic research is concerned (e.g. this threads topic), I can't say.
I’m very out of my depth, but the structure of the proof seems to follow a pattern similar to a proof by contradiction. Where you’d say for example “assume for the sake of contradiction that the previously known limit is the highest possible” then prove that if that statement is true you get some impossible result.
(Though in some ways that's actually more impressive.)
1. Erdos 1196, GPT-5.4 Pro - https://www.scientificamerican.com/article/amateur-armed-wit...
There are a couple of other Erdos wins, but this was the most impressive, prior to the thread in question. And it's completely unsupervised.
Solution - https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
2. Single-minus gluon tree amplitudes are nonzero , GPT-5.2 https://openai.com/index/new-result-theoretical-physics/
3. Frontier Math Open Problem, GPT-5.4 Pro and others - https://epoch.ai/frontiermath/open-problems/ramsey-hypergrap...
4. GPT-5.5 Pro - https://gowers.wordpress.com/2026/05/08/a-recent-experience-...
5. Claude's Cycles, Claude Opus 4.6 - https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...
I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.
Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .
Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.
One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?
When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.
This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.
I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.
It doesn't hurt that Lamport is exceptionally good at explaining things in plain language compared to a lot of other computer scientists.
I do not believe it will replace humans.
Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
Goodness gracious!
(That's the first time I used that expression on HN.)
But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.
But AI is supercharging Math like there is no tomorrow.
And so do humans. Gotta stand on these shoulders of giants.
woah.
- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)
- It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.
- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.
If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.
The underlying model may still effectively be a stochastic parrot, but used properly that can do impressive things and the various harnesses have been getting better and better at automating the use of said parrot.
For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.
Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.
Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.
What was discovered were numerous mistakes in the published literature on the subject. “New math! AI!” No, just mechanical application of rules, human mistakes.
There were things that were theorized, but couldn’t be exhaustively checked until computers were bigger.
Once again, a tool is applied, it has the AI label - its progress! But it isn’t something new. It’s just an LLM.
There’s a consistent under appreciation of AI (and math, honestly), but watching soulless AI mongers declare that their toy has created the new is something of a new low; uninspired, failed creatives, without rhyme or context; this is a bigger version of declaring that your spell checker has created new words.
The result is more impressive than what was done with tables of integrals and SAINT in 1961, sure.
Apparently if you add a “temperature” knob to a text predictor, otherwise sane individuals piss themselves and call it new.
Then again I thought NFTs, crypto, and the Metaverse were stupid, so what do I know.
can we please put these ground breaking AIs to work on actual problems humans have?
Who else disproved this longstanding conjecture before the model did so, since obviously it must have been in the training data since before?