I wonder how much the 'inflection point' is a thing vs marketing. I'm sure the models got somewhat better, but even now when I'm trying to 'vibe code' a game with the latest models (combination of Codex w/ gpt5.5 and gpt5.3-codex), they really do struggle.
They definitely get something barebones up and running, but it's far from a fully fledged application.
adgjlsfhk1 5 minutes ago [-]
It's very real. Just in the past 2 months or so IMO there's been a pretty big improvement in claude for local dev (although I think a lot of that is less model strength and more harness capability). 1m context is a huge difference (~30 min vs 2.5hr between compact significantly increases the scope of what I get the AI to do before it goes stupid). The other biggest difference I've noticed is a better balance of actually doing the work vs pushing back on bad ideas. I want the AI to tell me if it thinks the thing I am telling it is wrong or a bad idea, but if I confirm, I want it to do that anyway. A couple months ago, the claude was a lot more likely to either say "This is too much work I'm not going to do all of it", tell me the idea was genius (and then pretend to do it) or something equally useless.
shepherdjerred 32 minutes ago [-]
> and there’s zero chance any AI lab would train a model for such a ridiculous task.
I'm not sure that's true anymore considering how popular Simon's blog is
rTX5CMRXIfFG 6 minutes ago [-]
Am I crazy, or are these differences between the best models so marginal that you’d get roughly the same performance if you use the same high-quality harness (ie preloaded instructions from md files, including custom skills)?
Sparkyte 3 minutes ago [-]
No you're not wrong. Many people will see what you see. Enthusiasts will see it as monumental squeezing out that last drop of performance. In my opinion I think it is okay for enthusiasts to feel that way. I'm just satisfied with getting a tool as an aid.
Personal opinion we need to focus more on efficiency instead of how large or complex a model can get as that model creeps into more resource requirements. If the goal is to cost a billion dollars to operate than we've really lost the idea of what models are supposed to be achieving.
throwaway2027 30 minutes ago [-]
December 2025 was the breakthrough for me.
January Claude was euphoric, ChatGPT was up there. February Gemini cooked for a second there. March amazing. April the big bad nerf. May GPT 5.5 is just pure bliss altough 2x limits temporarily, not sure about Claude it's sort of okay still not as good as it felt before, slowly increasing limits with more compute and rebuilding good will.
zarzavat 44 minutes ago [-]
Somewhere right now some human artist is being tasked with drawing illustrations of pelicans riding bicycles to be used as training data at a big AI lab.
minimaxir 36 minutes ago [-]
Every modern image-generation model can generate a pelican on a bicycle trivially. The point of the test is to generate SVG text that represents an image, which is more complicated.
Yes, there are ways to convert raster images to SVG for use in training data but it's not a good use of anyone's time.
jofzar 6 minutes ago [-]
I wouldn't wish creating a svg pelican on a bicycle on my worst enemy
bb88 46 minutes ago [-]
I met Simon for the first time this year at pycon. Wow, what a great guy.
aizk 28 minutes ago [-]
I'm so glad Simon is documenting this. The field is evolving so fast, so rapidly, so hungry for data and money, that few are willing to zoom out and document everything big picture so we can see the changes over time.
I mean do you guys remember "Do anything now"? Just a distant memory, a funny party trick.
iekekke 53 minutes ago [-]
It’s good to see dates being hard coded re. Improvements in the models that should deliver material gains.
As time progresses one now has a yard stick to measure against progress. No more excuses - show me the money baby.
hmaddipatla 2 hours ago [-]
[dead]
Rendered at 03:59:47 GMT+0000 (Coordinated Universal Time) with Vercel.
They definitely get something barebones up and running, but it's far from a fully fledged application.
I'm not sure that's true anymore considering how popular Simon's blog is
Personal opinion we need to focus more on efficiency instead of how large or complex a model can get as that model creeps into more resource requirements. If the goal is to cost a billion dollars to operate than we've really lost the idea of what models are supposed to be achieving.
Yes, there are ways to convert raster images to SVG for use in training data but it's not a good use of anyone's time.
As time progresses one now has a yard stick to measure against progress. No more excuses - show me the money baby.