"My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing."
It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.
therealpygon 2 hours ago [-]
Anthropic using marketing to convince people their models are more advanced, better built, or that AI is a threat that needs to be regulated because only they have the answer? I’m shocked.
More seriously, so far I haven’t seen much indication that Mythos is more than Opus with a security focused code analysis harness. That said, the fact it can find these bugs in an automated fashion is the more important takeaway outside of the hype.
I’m curious what the error rate is on the detections, because none of that means much if it is wrong 90% of the time and we are only hearing about the examples that are useful marketing.
johnbarron 1 hours ago [-]
>> Anthropic using marketing to convince people their models are more advanced, better built, or that AI is a threat that needs to be regulated because only they have the answer? I’m shocked.
I remember when OpenAI was saying GPT-2 was too dangerous to release.
stingraycharles 39 minutes ago [-]
I remember when there was a guy at Google years a few years ago that was convinced that they had an internal, sentient creature in their labs (I think maybe 4 years ago?)
If I’m not mistaken, after the media cycle, he lost his job for breaking confidentiality.
That was the opposite of marketing, Google really didn’t get how to turn this into a product until ChatGPT happened.
player1234 6 minutes ago [-]
[dead]
14 minutes ago [-]
2ndorderthought 46 minutes ago [-]
"it can almost like write 2 paragraphs!" "It might be conscious" "this is basically AGI, we had to fire someone who spilled the beans"
vidarh 3 hours ago [-]
It may well be that the hype was primarily marketing.
The other alternative is that Curl is simply secure enough that there was far less to find than in other projects.
teiferer 1 hours ago [-]
Given how much money is on the line, it would be gross negligence if anything came publicly out of the CEO's mouth or is otherwise published by the company that's not marketing.
bigcat12345678 2 hours ago [-]
My guess:
Marketing is not intentional.
Evidences: 10 years ago, when I interviewed Baidu AI with Andrew Ng and Dario, Dario is the kind of person is pure-hearted to the point being ideological. Given Dario's successful career so far, that essence has gradually grown into a conviction, and surrounded by a purposely built team which amplifies his ideology.
Humans are very convenient creature, a rare few small fraction of them are no doubt the master of convenience: they morph their mental manifold without a hint of contradiction in their own mental mechanisms.
stingraycharles 1 hours ago [-]
These things are layered. They are great scientists, smart people, etc.
Things change when you’re running a business like Anthropic, especially as the CEO. You have a responsibility to shareholders, and you just need to play the game.
Anthropic chose a great angle: focus on professionals / enterprise, safety, etc. Those can both be done by a genuine desire to make great technology, and for business purposes require you to position yourself in a bit “better” way than reality.
Just look at what their strategy is with Mythos, it’s almost perfection: the “it’s not ready to be released to the public” angle hits all the marks: they care about responsibility / safety, they have “the best” model, and “LLMs are dangerous, but we, as the guardians, can be trusted”. This also helps the industry as a whole with regulation: if they’re being constrained, China will develop even more dangerous models.
This is a result of how smart people treat business, it’s PR perfection, especially given how much the whole industry is talking about it.
(Yes, they fail in other PR areas, but that’s a different discussion)
OtherShrezzing 1 hours ago [-]
I'm not sure if that distinction is important, since what you've described less charitably synonymous with the phrase "Dario is delusional, and has surrounded himself with yes-men, so outlandish marketing gets published as a side effect".
Whether the person doing the marketing was sincere about it or not is immaterial, since marketing is experienced almost entirely by the people consuming it, and not the people communicating it. What matters is if the audience is sincerely concerned by the message, and it's transparently the case that they were sincerely concerned by it.
teiferer 1 hours ago [-]
> Marketing is not intentional.
That's an odd definition of "intentional". Evolution has filtered for people with certain views and the marketing has just emerged from their actions. ... So?
A deadly virus (naturally occurring one let's say) wasn't created intentionally. Evolution selected for it. It's still bad and kills people. Doesn't make it nice because of lack of intention.
thombles 1 hours ago [-]
Curl simply isn't a good data point. It's one of the most picked-over codebases in existence with extensive security testing practices. All the researchers using not-quite-Mythos models have had plenty of time to report bugs up to this point. Daniel may be right that Mythos hasn't been a game changer for curl but the preconditions are different for virtually any other codebase. Perhaps the real marketing here is his own modesty about curl's maturity.
GuB-42 43 minutes ago [-]
To me, it is a very good data point.
Curl uses all sorts of tools, including AI tools to find bugs. These tools, according to the article found hundreds of bugs including a dozen CVE.
Mythos found one vulnerability. It means the Mythos is just another tool, not the revolution it claims to be.
It is common that when a new tool is introduced that a bunch of bugs are found, with diminishing returns. Mythos finding one vulnerability is consistent to what I would expect for a major update to an existing tool, which Mythos is over existing LLM-based solutions.
thombles 29 minutes ago [-]
The question is how many security vulnerabilities are actually left in the code after all the recent AI attention. Either Mythos is a nothingburger, or it's substantially more powerful but there's nothing left to do. Even a large amount of C can be correct eventually. Curl has the _potential_ to become a good data point maybe 6-12 months from now - if researchers and new tools find many more vulnerabilities then Mythos is proved to be hype. If they don't, then maybe Mythos is overkill for today's curl and its capabilities are better deployed elsewhere (like Firefox, apparently).
spongebobstoes 24 minutes ago [-]
that makes it a good data point, because it is better able to illustrate the incremental capabilities of Mythos compared to previous tooling
that helps us to understand how much of Mythos is hype and how much is real
20k 50 minutes ago [-]
We see this exact hypetrain every time a new model is released. Mythos simply hasn't lived up to the "we're all gunna die from the flood of vulnerabilities" hype even slightly. Its slightly better than previous models by all accounts, cool stuff
I've seen literally near word-for-word this exact chain of events multiple times previously
21 minutes ago [-]
h1fra 3 hours ago [-]
They might be biased by the fact that curl is significantly more secure than the average software
jansan 2 hours ago [-]
Mythos marketing really leans into that "too powerful to be legal" vibe, much like how PS2s were allegedly banned from North Korea because their chips were basically missile-grade.
coldtea 3 hours ago [-]
>It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.
About as subtle as a personal injury lawyer's billboard
steve1977 3 hours ago [-]
Better Call Dario
te_chris 3 hours ago [-]
A thankfully American reference
Exoristos 2 hours ago [-]
Can you expand on this? Do you mean in contrast to the European AI milieu?
te_chris 1 hours ago [-]
No, the personal injury lawyer billboards.
greendude29 4 hours ago [-]
I'd go out and say the marketing is not subtle. The hype and fanboys/girls are so in line with the marketing that any level of skepticism is seen a an act of defection, but if you look at the words, hyperbole and volume that is used, there is nothing subtle about it.
It's almost Trump-esque - "this model will change everything forever; we are doomed; we are saved; we will all be fired; we will all be rich", etc
xantronix 3 hours ago [-]
That's a pretty good encapsulation of the parallels between the political and the technological: One necessarily thrives upon the other and are inextricable. This moment is a culmination of all the disenfranchisement the bodypolitik have suffered, looking for any possible means of escape or elevation. AI and Trumpism, for their own respective cohorts, are salvation, on offer by different frontmen but ultimately in service of the same system.
They need the hype to pay off way more than we do. So many of us who still write code directly stand to lose nothing of our capabilities if the marketing claims cannot hold water.
ehnto 3 hours ago [-]
I seem to be totally outside the hype bubble, but I have to suspect there is a lot of imagineering and wild extrapolations in the elss technical hype bubbles. I am curious but no enough to go looking.
tonyedgecombe 2 hours ago [-]
>I seem to be totally outside the hype bubble
I'm surprised you say that because it is all over Hacker News. Every single post is co-opted into promoting AI. Try finding a submission with fifty points or more than doesn't have AI or LLM's mentioned somewhere in the comments.
zen928 1 hours ago [-]
Feel free to retire from the field if you grow tired of seeing its latest developments.
tonyedgecombe 13 minutes ago [-]
I already have.
That’s not really the point though. I have no doubt AI is useful, I just don’t want to have it shoved in my face every five minutes.
aaron695 1 hours ago [-]
[dead]
apexalpha 3 hours ago [-]
> An amazingly successful marketing stunt for sure.
This. Well done by Antropic.
It even reached the CISO of my small semi-government org in the Netherlands, who slightly panicked at the announced 'tsunami' of vulnerabilities that was coming with Mythos.
Got us some more money and priority with the board, though.
Never waste a good marketing scare.
helloplanets 6 minutes ago [-]
Anthropic has is quickly destroying customer goodwill by repeatedly pulling the same stunt. Horrible marketing, imho.
It's an entirely different thing to have the company conduct research on LLMs in general being a cybersecurity threat, instead of going "our new model is powerful just too powerful" and shift the discussion to revolve around that. It's slimey.
aswegs8 1 minutes ago [-]
The bar has become so low lately that no one will care.
fpesce 1 hours ago [-]
I don't agree with the "no tsunami in sight": if you don't look at 100+ bugs in Firefox and many more OSS projects, bunch of old unseen-before OpenBSD/Linux RCEs, and a few LPE in just 2 or 3 weeks for Linux itself...
IMO, this does not sound like marketing scare, there is spike of vulnerability disclosures - high quality, low false positives - that can be sensed... It feels like we're speedrunning through few-years worth of high quality bug reports in just a few weeks.
apexalpha 1 hours ago [-]
The LPEs were not found with Mythos but with existing, publicly available models.
stingraycharles 33 minutes ago [-]
And also: they did an earlier run with Opus to discover bugs (like segfaults).
Mythos was fed the list of all those bugs as concrete input, and told: turn them into exploits.
markus_zhang 7 minutes ago [-]
org head is smart.
yjftsjthsd-h 4 hours ago [-]
> Not particularly “dangerous”
I'm not sure that follows. As noted, curl was already analyzed to death with every tool available; most software isn't at that level.
anygivnthursday 2 hours ago [-]
But Mythos is not marketed as a tool that can do the same as other tools already available maybe slightly better, but as a revolution.
croon 2 hours ago [-]
Sure, but isn't it a verdict on Mythos compared to other models?
If so, it would still follow. "Most software" isn't analyzed as much as curl, by either other tooling or other models, that might well find close to the same as Mythos did. As such, Mythos then isn't especially/particularly dangerous.
bilekas 4 hours ago [-]
I don't think I understand what you mean, the "not particularly dangerous" comment was in relation to the vulnerability that was found right ? Surely they would know what constitutes a lower severity level.
vidarh 3 hours ago [-]
The "not particularly dangerous" is a headline for a section talking about Mythos, not the vulnerability.
bilekas 3 hours ago [-]
Ah okay, that makes a bit more sense. I read it wrong. Then the comment is absolutely fair.
Ekaros 4 hours ago [-]
My guess is that it is in category of "you are holding it wrong". Still worth fixing, but requires very specific user input for example. Or very weird scenario. Or in some less used protocol or flag combination.
Sharlin 21 minutes ago [-]
Curl is currently receiving a record number of high-quality bug/vuln reports (a rather sharp change from the earlier slop inundation), so it’s not like there’s nothing to find. Many or most of these are presumably found by human experts assisted by AI tools, but if Mythos were truly revolutionary, it should be able to find such issues on its own.
> The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June
My mind still cannot understand the quality and refinement that's gone into cURL. It really is the perfect example of something done so right, that people barely think twice about.
pjmlp 2 hours ago [-]
Easy, it shows what is achievable if there is a high bar for quality in every single line of code that gets commited, reviewed and merged, regardless of the programming language.
However in the days of race to bottom, offshoring for penies, and now LLM powered code generation, this is a quality most companies won't care unless there is liability in place.
dotancohen 3 hours ago [-]
Curl and SQLite are my favourite examples of properly engineered, rigourously tested _anything_. It's really philosophical - those projects' contribution requirements demand such rigor, and the maintainers stand by that demand. A non-load-bearing document (not project code) is what makes that possible - very reminiscent of Einstein's thought experiments leading to tangible projects such as GPS or Descartes's belief that all problems can be solved through rational thinking.
ontouchstart 55 minutes ago [-]
Some people must be working on training some models exclusively on high quality OSS code base like curl and SQLite without the noise of low quality training data.
I would do that with 100% local models from scratch.
AntiUSAbah 4 hours ago [-]
There is always marketing involved and people should be able to put marketing into perspective.
Also curl in this regard is a open source project, relativly small but critical, well known and used everywhere. Besides image libraries, tools like curl or sudo, su, passwd, etc. would also be my first try.
Mythos is still not known at all what it can do. What does it mean from cost and benchmark pov to have a 10 Trillion parameter model?
Nonetheless, the fact that LLMs got significant better in finding this, better than humans, started to happen half a year ago? so at one point we need to address the elefant in the room and state that today you need to do security scanning additional with LLMs. You need to take this serious.
In worst case, use Anthropics marketing to state that its a must now and something changed.
flohofwoe 2 hours ago [-]
> Nonetheless, the fact that LLMs got significant better in finding this, better than humans, started to happen half a year ago?
*rolls eyes* regular static analyzers also have been "better than humans" for decades, being better than a human at a specific mechanical task really doesn't mean much. The interesting new thing is the type of potential "fuzzy bugs" described in the article that LLMs are able to identify (a comment not matching the code it describes, uncommon usage of a 3rd party library, mismatch of code and a protocol it implements, or often just generally weird looking code somebody should have a closer look at... this closes a gap in the traditional debugging toolboxes, but shouldn't replace them)
AntiUSAbah 1 hours ago [-]
You don't have to dismantle a comment on a microlevel.
It has been clear for ages that certain type of bugs or issues are better solved from software.
But there was still plenty of things a proper SecOps Person would be able to find with help from tooling which automatic tooling wouldn't find.
Taking a limited amount of resources and focusing on the critical things.
I do think this is gone now. Same with Threat modeling etc.
ahofmann 4 hours ago [-]
Putting on my tinfoil-hat: Sooo, the guy who runs the test and delivers the report could just have removed the more interesting bugs and delivered those to any three letter agency?
casey2 2 hours ago [-]
curl's source is public so what would be the gain in the rigmarole? Now if the prompt was "create a patch that inserts a zero-day while fixing a bug" that would be impressive.
bilekas 4 hours ago [-]
[flagged]
AnssiH 3 hours ago [-]
The test was run by an unnamed third party, so cURL's history has no relevance to their benevolence.
Ekaros 4 hours ago [-]
Curl is likely one of the very much more combed over pieces of code at this point. It feels like it has some special draw for people looking for vulnerabilities. Not that it doesn't mean some novel idea can't be looked or checked still.
cakealert 3 hours ago [-]
> No, based on cURL's history, it really seems like they would love to have found a really novel bug.
You just confirmed that you didn't read the article.
"Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report."
bilekas 3 hours ago [-]
I'm not sure how that proves I didn't read the article ?
croon 2 hours ago [-]
Someone external to the curl team ran the test. If that third party found a severe CVE that they could use across all the global curl attack surface, and did not disclose it back to the curl team, the third party could keep using the exploit until discovered independently.
mohsen1 3 hours ago [-]
I don't know about Mythos but in recent weeks I've noticed Opus is constantly failing to fix things in tsz[0] vs GPT 5.5 can easily churn out fixes that are solid and pass tests. I've stopped paying for Claude for now and all my money is going to OpenAI at the moment. Either Opus is massively nerfed or GPT 5.5 is really head and shoulder higher in terms of very difficult tasks. The last percent of conformance tests in tsz are really really difficult and I've seen Opus bailing again and again. So annoying to waste time and tokens to finally get "this is too involved" or "this requires a multi-week sprint to fix".
The new Opus feels like a step backwards. More expensive, thinks more, and it does not get the job done.
vincent_s 2 hours ago [-]
From a user’s perspective 4.7 is a downgrade compared to 4.6 . It’s intended to give Anthropic more control about their compute resources and profitability:
Having never used Claude and only Codex, does Claude actually say “this is too involved” as a response to a prompt?
mohsen1 3 hours ago [-]
Yes it does. Usually after hours of working and not getting results
redditor98654 2 hours ago [-]
I am curious, what kind of work do you use Claude for that sometimes requires hours of working. In my case, I have never seen it go off for more than 10 mins and even that is very rare.
jongjong 9 minutes ago [-]
I'm looking forward to trying Mythos run against my 5000-line, instant-finality, quantum-resistant blockchain project and decentralized exchange (an additional 5000 lines). I already ran all the models up to Opus 4.6 and they couldn't find anything.
utopiah 2 hours ago [-]
Won my bet "voted 10 [vulnerabilities] but in retrospect as you are familiar with Claude and such tooling if you already used any of recent model to done some kind of security review then I'd drop to 1 or even 0." https://mastodon.pirateparty.be/@utopiah/116537456780283420
absynth 3 hours ago [-]
I routinely used to compile C programs on other compilers to find defects that one or another didn't find. Compiling on Windows vs Linux. You could summarize / minimize it down to compiling it with warning as errors etc but you'd be missing the point.
The point wasn't actual cross-platform portability even though that was a nice side effect. It was to flush out all the weird edge cases.
Edges like security flaws. Buffer overflows are usually platform specific. There are plenty of other ways to find these issues but simply recompiling for a different platform surfaces all sorts of issues.
yjftsjthsd-h 4 hours ago [-]
> The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Piece.
Typo, or is there a spoof I should go read?
3 hours ago [-]
dotancohen 3 hours ago [-]
Perhaps he was dictating.
Does it say anything else? Just 'Aaaarggghhhh'?
Hamuko 3 hours ago [-]
Doubt it considering that Daniel Stenberg is Swedish. English dictation when you speak English as a second language with an accent is quite annoying.
Tistron 3 hours ago [-]
Voice input works really well for people speaking English with a Swedish accent. I think the accent of most educated Swedes is mostly a case of prosody. For sure there are some sounds we say slightly differently than native English speakers. We often have some trouble with /s/ and /z/, but I don't know, "war and peace", I think that's easily understood.
Source: voice typing this with Swedish vocal chords, and only had to correct "different lives" to "differently", and add /[^\w\s]/.
aitchnyu 2 hours ago [-]
Android voice input works with kids using both English and native words, here in India. The country runs schools in 25+ primary languages, each with dialects, so a TV/phone with voice input is more marvelous than the nitpicks discussed here.
dotancohen 2 hours ago [-]
I understand completely. You don't want to know what the machine produced, when I asked it for "a new display".
iso1631 3 hours ago [-]
War and Peace is about 590,000 words. Tiny compared to the full Harry Potter collection (about 1 million words over the 7 books), but long for a single book.
perching_aix 3 hours ago [-]
They're referring to the typo in the title, "Piece" vs "Peace".
I also thought they were contending the word count before noticing. Even remarked how I find this a weird metric, given that code is not prose [0], but then I deleted that once I picked up on what's going on.
[0] comparing the output of `wc -w` with the word counts of books I'm reasonably sure will be super off
edit: ran a calc, substituting out symbols (but not underscores), digits, and comments yields a 390K word count compared to the 660K cited. not excluding the comments yields 600K, so more than a third of all words in the sources are comments.
Accacin 1 hours ago [-]
The ten main Malazan books are 3.3 million words, apparently. No wonder it took me such a long time to get through them.
almogodel 3 hours ago [-]
[flagged]
perching_aix 3 hours ago [-]
It's a shame he seems to reject the idea of actually diving in and using these tools interactively:
> It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway.
His expertise I think would elevate the results quite a bit. Although if he never uses LLMs, which it reads like he doesn't, I guess it might backfire just as well. Prompting style (still?) does matter after all, certainly in my experience anyways.
jph00 2 hours ago [-]
He states in the article that they use LLMs for this purpose and find them extremely useful.
perching_aix 2 hours ago [-]
Which can be true without this also being true:
> using these tools interactively
I did read the article. It seems to me they're using LLMs in a prepared manner instead, as mere scanners that produce reports.
OtherShrezzing 43 minutes ago [-]
He posts about his use of language models a lot on Mastodon[0]. He does lots with language models, but doesn't buy all the way into the hype. I'd say he's one of. most reasonable & balanced voices on the subject of AI use in software today. Happy to use the technology, more than willing to push back on marketing bs.
"My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing."
It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.
More seriously, so far I haven’t seen much indication that Mythos is more than Opus with a security focused code analysis harness. That said, the fact it can find these bugs in an automated fashion is the more important takeaway outside of the hype.
I’m curious what the error rate is on the detections, because none of that means much if it is wrong 90% of the time and we are only hearing about the examples that are useful marketing.
I remember when OpenAI was saying GPT-2 was too dangerous to release.
If I’m not mistaken, after the media cycle, he lost his job for breaking confidentiality.
That was the opposite of marketing, Google really didn’t get how to turn this into a product until ChatGPT happened.
The other alternative is that Curl is simply secure enough that there was far less to find than in other projects.
Marketing is not intentional.
Evidences: 10 years ago, when I interviewed Baidu AI with Andrew Ng and Dario, Dario is the kind of person is pure-hearted to the point being ideological. Given Dario's successful career so far, that essence has gradually grown into a conviction, and surrounded by a purposely built team which amplifies his ideology.
Humans are very convenient creature, a rare few small fraction of them are no doubt the master of convenience: they morph their mental manifold without a hint of contradiction in their own mental mechanisms.
Things change when you’re running a business like Anthropic, especially as the CEO. You have a responsibility to shareholders, and you just need to play the game.
Anthropic chose a great angle: focus on professionals / enterprise, safety, etc. Those can both be done by a genuine desire to make great technology, and for business purposes require you to position yourself in a bit “better” way than reality.
Just look at what their strategy is with Mythos, it’s almost perfection: the “it’s not ready to be released to the public” angle hits all the marks: they care about responsibility / safety, they have “the best” model, and “LLMs are dangerous, but we, as the guardians, can be trusted”. This also helps the industry as a whole with regulation: if they’re being constrained, China will develop even more dangerous models.
This is a result of how smart people treat business, it’s PR perfection, especially given how much the whole industry is talking about it.
(Yes, they fail in other PR areas, but that’s a different discussion)
Whether the person doing the marketing was sincere about it or not is immaterial, since marketing is experienced almost entirely by the people consuming it, and not the people communicating it. What matters is if the audience is sincerely concerned by the message, and it's transparently the case that they were sincerely concerned by it.
That's an odd definition of "intentional". Evolution has filtered for people with certain views and the marketing has just emerged from their actions. ... So?
A deadly virus (naturally occurring one let's say) wasn't created intentionally. Evolution selected for it. It's still bad and kills people. Doesn't make it nice because of lack of intention.
Curl uses all sorts of tools, including AI tools to find bugs. These tools, according to the article found hundreds of bugs including a dozen CVE.
Mythos found one vulnerability. It means the Mythos is just another tool, not the revolution it claims to be.
It is common that when a new tool is introduced that a bunch of bugs are found, with diminishing returns. Mythos finding one vulnerability is consistent to what I would expect for a major update to an existing tool, which Mythos is over existing LLM-based solutions.
that helps us to understand how much of Mythos is hype and how much is real
I've seen literally near word-for-word this exact chain of events multiple times previously
About as subtle as a personal injury lawyer's billboard
It's almost Trump-esque - "this model will change everything forever; we are doomed; we are saved; we will all be fired; we will all be rich", etc
They need the hype to pay off way more than we do. So many of us who still write code directly stand to lose nothing of our capabilities if the marketing claims cannot hold water.
I'm surprised you say that because it is all over Hacker News. Every single post is co-opted into promoting AI. Try finding a submission with fifty points or more than doesn't have AI or LLM's mentioned somewhere in the comments.
That’s not really the point though. I have no doubt AI is useful, I just don’t want to have it shoved in my face every five minutes.
This. Well done by Antropic.
It even reached the CISO of my small semi-government org in the Netherlands, who slightly panicked at the announced 'tsunami' of vulnerabilities that was coming with Mythos.
Got us some more money and priority with the board, though.
Never waste a good marketing scare.
It's an entirely different thing to have the company conduct research on LLMs in general being a cybersecurity threat, instead of going "our new model is powerful just too powerful" and shift the discussion to revolve around that. It's slimey.
IMO, this does not sound like marketing scare, there is spike of vulnerability disclosures - high quality, low false positives - that can be sensed... It feels like we're speedrunning through few-years worth of high quality bug reports in just a few weeks.
Mythos was fed the list of all those bugs as concrete input, and told: turn them into exploits.
I'm not sure that follows. As noted, curl was already analyzed to death with every tool available; most software isn't at that level.
If so, it would still follow. "Most software" isn't analyzed as much as curl, by either other tooling or other models, that might well find close to the same as Mythos did. As such, Mythos then isn't especially/particularly dangerous.
https://daniel.haxx.se/blog/2026/04/22/high-quality-chaos/, linked from TFA
My mind still cannot understand the quality and refinement that's gone into cURL. It really is the perfect example of something done so right, that people barely think twice about.
However in the days of race to bottom, offshoring for penies, and now LLM powered code generation, this is a quality most companies won't care unless there is liability in place.
I would do that with 100% local models from scratch.
Also curl in this regard is a open source project, relativly small but critical, well known and used everywhere. Besides image libraries, tools like curl or sudo, su, passwd, etc. would also be my first try.
Mythos is still not known at all what it can do. What does it mean from cost and benchmark pov to have a 10 Trillion parameter model?
Nonetheless, the fact that LLMs got significant better in finding this, better than humans, started to happen half a year ago? so at one point we need to address the elefant in the room and state that today you need to do security scanning additional with LLMs. You need to take this serious.
In worst case, use Anthropics marketing to state that its a must now and something changed.
*rolls eyes* regular static analyzers also have been "better than humans" for decades, being better than a human at a specific mechanical task really doesn't mean much. The interesting new thing is the type of potential "fuzzy bugs" described in the article that LLMs are able to identify (a comment not matching the code it describes, uncommon usage of a 3rd party library, mismatch of code and a protocol it implements, or often just generally weird looking code somebody should have a closer look at... this closes a gap in the traditional debugging toolboxes, but shouldn't replace them)
It has been clear for ages that certain type of bugs or issues are better solved from software.
But there was still plenty of things a proper SecOps Person would be able to find with help from tooling which automatic tooling wouldn't find.
Taking a limited amount of resources and focusing on the critical things.
I do think this is gone now. Same with Threat modeling etc.
You just confirmed that you didn't read the article.
"Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report."
[0] https://tsz.dev
https://news.ycombinator.com/item?id=48072916
The point wasn't actual cross-platform portability even though that was a nice side effect. It was to flush out all the weird edge cases.
Edges like security flaws. Buffer overflows are usually platform specific. There are plenty of other ways to find these issues but simply recompiling for a different platform surfaces all sorts of issues.
Typo, or is there a spoof I should go read?
Does it say anything else? Just 'Aaaarggghhhh'?
Source: voice typing this with Swedish vocal chords, and only had to correct "different lives" to "differently", and add /[^\w\s]/.
I also thought they were contending the word count before noticing. Even remarked how I find this a weird metric, given that code is not prose [0], but then I deleted that once I picked up on what's going on.
[0] comparing the output of `wc -w` with the word counts of books I'm reasonably sure will be super off
edit: ran a calc, substituting out symbols (but not underscores), digits, and comments yields a 390K word count compared to the 660K cited. not excluding the comments yields 600K, so more than a third of all words in the sources are comments.
> It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway.
His expertise I think would elevate the results quite a bit. Although if he never uses LLMs, which it reads like he doesn't, I guess it might backfire just as well. Prompting style (still?) does matter after all, certainly in my experience anyways.
> using these tools interactively
I did read the article. It seems to me they're using LLMs in a prepared manner instead, as mere scanners that produce reports.
[0] https://mastodon.social/@bagder