Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

172 points by tosh 4 hours ago | 117 comments

TeMPOraL 3 hours ago [-]

Oh boy. Someone didn't get the memo that for LLMs, tokens are units of thinking. I.e. whatever feat of computation needs to happen to produce results you seek, it needs to fit in the tokens the LLM produces. Being a finite system, there's only so much computation the LLM internal structure can do per token, so the more you force the model to be concise, the more difficult the task becomes for it - worst case, you can guarantee not to get a good answer because it requires more computation than possible with the tokens produced.

I.e. by demanding the model to be concise, you're literally making it dumber.

(Separating out "chain of thought" into "thinking mode" and removing user control over it definitely helped with this problem.)

kogold 1 hours ago [-]

Let me rephrase that for you:

"Interesting idea! Token consumption sure is an issue that should be addressed, and this is pretty funny too! However, I happen to have an unproven claim that tokens are units of thinking, and therefore, reducing the token count might actually reduce the model's capabilities. Did anybody using this by chance notice any degradation (since I did not bother to check myself)?"

Have a nice day!

estearum 3 minutes ago [-]

Can't you know that tokens are units of thinking just by... like... thinking about how models work?

ShowalkKama 54 minutes ago [-]

the fact that more tokens = more smart should be expected given cot / thinking / other techniques that increase the model accuracy by using more tokens.

Did you test that ""caveman mode"" has similar performance to the ""normal"" model?

Garlef 45 minutes ago [-]

Yes but: If the amount is fixed, then the density matters.

A lot of communication is just mentioning the concepts.

mynegation 51 minutes ago [-]

No, let me rephrase it for you. “tokens used for think. Short makes model dumb”

jstummbillig 2 hours ago [-]

What do you mean? The page explicitly states:

> cutting ~75% of tokens while keeping full technical accuracy.

I have no clue if this claim holds, but alas, just pretending they did not address the obvious criticism, while they did, is at the very least pretty lazy.

An explanation that explains nothing is not very interesting.

prodigycorp 1 hours ago [-]

The burden of proof is on the author to provide at least one type of eval for making that claim.

jstummbillig 55 minutes ago [-]

I notice that the number of people confidently talking about "burden of proof" and whose it allegedly is in the context of AI has gone up sharply.

Nobody has to proof anything. It can give your claim credibility. If you don't provide any, an opposing claim without proof does not get any better.

prodigycorp 38 minutes ago [-]

Sorry I don't know how engaging in this could lead to anything productive. There's already literature out there that gives credence to TeMPOraL claim. And, after a certain point, gravity being the reason that things fall becomes so self evident that every re-statements doesnt not require proof.

systoll 1 hours ago [-]

The author pretended they addressed the obvious criticism.

You can read the skill. They didn't do anything to mitigate the issue, so the criticism is valid.

getpokedagain 1 hours ago [-]

In the age of vibe coding and that we are literally talking about a single markdown file I am sure this has been well tested and achieves all of its goals with statistical accuracy, no side effects with no issues.

vova_hn2 2 hours ago [-]

Yeah, I don't think that "I'd be happy to help you with that" or "Sure, let me take a look at that for you" carries much useful signal that can be used for the next tokens.

jerf 1 hours ago [-]

There is a study that shows that what the model is doing behind the scenes in those cases is a lot more than just outputting those tokens.

For an LLM, tokens are thought. They have no ability to think, by whatever definition of that word you like, without outputting something. The token only represents a tiny fraction of the internal state changes made when a token is output.

Clearly there is an optimal for each task (not necessarily a global one) and a concrete model for a given task can be arbitrarily far from it. But you'd need to test it out for each case, not just assume that "less tokens = more better". You can be forcing your model to be dumber without realizing it if you're not testing.

DonHopkins 29 minutes ago [-]

High dimensional vectors are thought (insofar as you can define what that even means). Tokens are one dimensional input that navigates the thought, and output that renders the thought. The "thinking" takes place in the high dimension space, not the one dimensional stream of tokens.

wzdd 41 minutes ago [-]

They carry information in regular human communication, so I'm genuinely curious why you'd think they would not when an LLM outputs them as part of the process of responding to a message.

kubb 2 hours ago [-]

This is condescending and wrong at the same time (best combo).

LLMs do stumble into long prediction chains that don’t lead the inference in any useful direction, wasting tokens and compute.

prodigycorp 55 minutes ago [-]

Are you sure about that? Chain of thought does not need to be semantically useful to improve LLM performance. https://arxiv.org/abs/2404.15758

davidguetta 52 minutes ago [-]

still doesn't mean all tokens are useful. it's the point of benchmarks

prodigycorp 50 minutes ago [-]

Care to share the benchmarks backing the claims in this repo?

hackerInnen 19 minutes ago [-]

You are absolutely right! That is exactly the reason why more lines of code always produce a better program. Straight on, m8!

NiloCK 2 hours ago [-]

I agree with this take in general, but I think we need to be prepared for nuance when thinking about these things.

Tokens are how an LLM works things out, but I think it's just as likely as not that LLMs (like people) are capable of overthinking things to the point of coming to a wrong answer when their "gut" response would have been better. I do not content that this is the default mode, but that it is both possible, and that it's more or less likely on one kind of problem than another, problem categories to be determined.

A specific example of this was the era of chat interfaces that leaned too far in the direction of web search when responding to user queries. No, claude, I don't want a recipe blogspam link or summary - just listen to your heart and tell me how to mix pancakes.

More abstractly: LLMs give the running context window a lot of credit, and will work hard to post-hoc rationalize whatever is in there, including any prior low-likelihood tokens. I expect many problematic 'hallucinations' are the result of an unlucky run of two or more low probability tokens running together, and the likelihood of that happening in a given response scales ~linearly with the length of response.

samus 2 hours ago [-]

The solution to that is turning off thinking mode or reducing thinking budget.

avaer 2 hours ago [-]

That was my first thought too -- instead of talk like a caveman you could turn off reasoning, with probably better results.

Additionally, LLMs do not actually operate in text; much of the thinking happens in a much higher dimensional space that just happens to be decoded as text.

So unless the LLM was trained otherwise, making it talk like a caveman is more than just theoretically turning it into a caveman.

DrewADesign 2 hours ago [-]

> much of the thinking happens in a much higher dimensional space that just happens to be decoded as text.

What do you mean by that? It’s literally text prediction, isn’t it?

cyanydeez 2 hours ago [-]

There was a paper recently that demonstrated that you can input different human languages and the middle layers of the model end up operating on the same probabilistic vectors. It's just the encoding/decoding layers that appear to do the language management.

So the conclusion was that these middle layers have their own language and it's converting the text into this language and this decoding it. It explains why sometime the models switch to chinese when they have a lot of chinese language inputs, etc.

DrewADesign 2 hours ago [-]

Ok — that sounds more like a theory rather than an open-and-shut causal explanation, but I’ll read the paper.

pennaMan 2 hours ago [-]

>It’s literally text prediction, isn’t it?

you are discovering that the favorite luddite argument is bullshit

ericjmorey 56 minutes ago [-]

I don't consider these researchers luddites.

https://machinelearning.apple.com/research/illusion-of-think...

https://arxiv.org/abs/2508.01191

DrewADesign 2 hours ago [-]

Feel free to elucidate if you want to add anything to this thread other than vibes.

electroglyph 2 hours ago [-]

after you go from from millions of params to billions+ models start to get weird (depending on training) just look at any number of interpretability research papers. Anthropic has some good ones.

HumanOstrich 1 hours ago [-]

> things start to get weird

> just look at research papers

You didn't add anything other than vibes either.

DrewADesign 1 hours ago [-]

Getting weird doesn’t mean calling it text prediction is actually ‘bullshit’? Text prediction isn’t pejorative…

vova_hn2 2 hours ago [-]

> instead of talk like a caveman you could turn off reasoning, with probably better results

This is not how the feature called "reasoning" work in current models.

"reasoning" simply let's the model output and then consume some "thinking" tokens before generating the actual output.

All the "fluff" tokens in the output have absolutely nothing to do with "reasoning".

throw83849494 2 hours ago [-]

You obviously do not speak other languages. Other cultures have different constrains and different grammar.

For example thinking in modern US English generates many thoughts, to keep correct speak at right cultural context (there is only one correct way to say People Of Color, and it changes every year, any typo makes it horribly wrong).

Some languages are far more expressive and specialized in logical conditions, conditionals, recursion and reasoning. Like eskimos have 100 words for snow, but for boolean algebra.

It is well proven that thinking in Chinese needs far less tokens!

With this caveman mod you strip out most of cultural complexities of anglosphere, make it easier for foreigners and far simpler to digest.

suddenlybananas 2 hours ago [-]

>Some languages are far more expressive and specialized in logical conditions, conditionals, recursion and reasoning. Like eskimos have 100 words for snow, but for boolean algebra.

This is simply not true.

throw83849494 35 minutes ago [-]

Well, just take varous english dialects you probably know, there are wast differences. Some strange languages do not even have numbers or recursion.

It is very arrogant to assume, no other language can be more advanced than English.

mylifeandtimes 1 hours ago [-]

Really? Because if one accepts that computer languages are languages, then it seems that we could identify one or two that are highly specialized in logical conditions etc. Prolog springs to mind.

malnourish 57 minutes ago [-]

Yes, really. The concept GP is alluding to is called the Sapir-Worf hypothesis, which is largely non scientific pop linguistics drivel. Elements of a much weaker version have some scientific merit.

Programming languages are not languages in the human brain nor the culture sense.

baq 3 hours ago [-]

Do you know of evals with default Claude vs caveman Claude vs politician Claude solving the same tasks? Hypothesis is plausible, but I wouldn’t take it for granted

zozbot234 58 minutes ago [-]

Grug says you quite right, token unit thinking, but empty words not real thinking and should avoid. Instead must think problem step by step with good impactful words.

afro88 2 hours ago [-]

IIUC this doesn't make the LLM think in caveman (thinking tokens). It just makes the final output show in caveman.

2 hours ago [-]

andai 3 hours ago [-]

I remember a while back they found that replacing reasoning tokens with placeholders ("....") also boosted results on benchies.

But does talk like caveman make number go down? Less token = less think?

I also wondered, due to the way LLMs work, if I ask AI a question using fancy language, does that make it pattern match to scientific literature, and therefore increase the probability that the output will be true?

PufPufPuf 47 minutes ago [-]

You mention thinking tokens as a side note, but their existence invalidates your whole point. Virtually all modern LLMs use thinking tokens.

raincole 2 hours ago [-]

When it comes to LLM you really cannot draw conclusions from first principles like this. Yes, it sounds reasonable. And things in reality aren't always reasonable.

Benchmark or nothing.

samus 2 hours ago [-]

There have been papers about introducing thinking tokens in intermediary layers that get stripped from the output.

agumonkey 2 hours ago [-]

How do we know if a token sits at an abstract level or just the textual level ?

cyanydeez 2 hours ago [-]

It's not "units of thinking" its "units of reference"; as long as what it produces references the necessary probabilistic algorithms, itll do just fine.

otabdeveloper4 1 hours ago [-]

LLMs don't think at all.

Forcing it to be concise doesn't work because it wasn't trained on token strings that short.

HumanOstrich 1 hours ago [-]

> Forcing it to be concise doesn't work because it wasn't trained on token strings that short.

This is a 2023-era comment and is incorrect.

otabdeveloper4 24 minutes ago [-]

LLMs architectures have not changed at all since 2023.

> but mmuh latest SOTA from CloudCorp (c)!

You don't know how these things work and all you have to go on is marketing copy.

Rexxar 2 hours ago [-]

  > Someone didn't get the memo that for LLMs, tokens are units of thinking.

Where do you get this memo ? Seems completely wrong to me. More computation does not translate to more "thinking" if you compute the wrong things (ie things that contribute significantly to the final sentence meaning).

staminade 2 hours ago [-]

That’s why you need filler words that contribute little to the sentence meaning but give it a chance to compute/think. This is part of why humans do the same when speaking.

jaccola 2 hours ago [-]

Do you have any evidence at all of this? I know how LLMs are trained and this makes no sense to me. Otherwise you'd just put filler words in every input

e.g. instead of: "The square root of 256 is" you'd enter "errr The er square um root errr of 256 errr is" and it would miraculously get better? The model can't differentiate between words you entered and words it generated its self...

muzani 1 hours ago [-]

It's why it starts with "You're absolutely right!" It's not to flatter the user. It's a cheap way to guide the response in a space where it's utilizing the correction.

staminade 1 hours ago [-]

What do you think chain of thought reasoning is doing exactly?

lijok 2 hours ago [-]

You’re conflating training and inference

FurstFly 22 minutes ago [-]

Okay, I like how it reduces token usage, but it kind of feels that, it will reduce the overall model intelligence. LLMs are probabilistic models, and you are basically playing with their priors.

teekert 3 hours ago [-]

Idk I try talk like cavemen to claude. Claude seems answer less good. We have more misunderstandings. Feel like sometimes need more words in total to explain previous instructions. Also less context is more damage if typo. Who agrees? Could be just feeling I have. I often ad fluff. Feels like better result from LLM. Me think LLM also get less thinking and less info from own previous replies if talk like caveman.

jaccola 2 hours ago [-]

Yes because in most contexts it has seen "caveman" talk the conversations haven't been about rigorously explained maths/science/computing/etc... so it is less likely to predict that output.

2 hours ago [-]

cyanydeez 2 hours ago [-]

Fluff adds probable likeness. Probablelikeness brings in more stuff. More stuff can be good. More stuff can poison.

nayroclade 2 hours ago [-]

Cute idea, but you're never gonna blow your token budget on output. Input tokens are the bottleneck, because the agent's ingesting swathes of skills, directory trees, code files, tool outputs, etc. The output is generally a few hundred lines of code and a bit of natural language explanation.

DimitriBouriez 1 hours ago [-]

Good point and it's actually worse than that : the thinking tokens aren't affected by this at all (the model still reasons normally internally). Only the visible output that gets compressed into caveman... and maybe the model actually need more thinking tokens to figure out how to rephrase its answer into caveman style

zozbot234 48 minutes ago [-]

Grug says you can tune how much each model thinks. Is not caveman but similar. also thinking is trained with RL so tends to be efficient, less fluffy. Also model (as seen locally) always drafts answer inside thinking then output repeats, change to caveman is not really extra effort.

Hard_Space 3 hours ago [-]

Also see https://arxiv.org/pdf/2604.00025 ('Brevity Constraints Reverse Performance Hierarchies in Language Models' March 2026)

ryanschaefer 3 hours ago [-]

Kinda ironic this description is so verbose.

> Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /caveman

For the first part of this: couldn’t this just be a UserSubmitPrompt hook with regex against these?

See additionalContext in the json output of a script: https://code.claude.com/docs/en/hooks#structured-json-output

For the second, /caveman will always invoke the skill /caveman: https://code.claude.com/docs/en/skills

bjackman 2 hours ago [-]

If this really works there would seem to be a lot of alpha in running the expensive model in something like caveman mode, and then "decompressing" into normal mode with a cheap model.

I don't think it would be fundamentally very surprising if something like this works, it seems like the natural extension to tokenisation. It also seems like the natural path towards "neuralese" where tokens no longer need to correspond to units of human language.

ajd555 31 minutes ago [-]

So, if this does help reduce the cost of tokens, why not go even further and shorten the syntax with specific keywords, symbols and patterns, to reduce the noise and only keep information, almost like...a programming language?

amelius 30 minutes ago [-]

By the way why don't these LLM interfaces come with a pause button?

amelius 20 minutes ago [-]

And a "prune here" button.

It often happens that the interesting information is in the first paragraph or so, and the remainder is all just the LLM not knowing when to stop. This is super annoying as a conversation then ends up being 90% noise.

stainablesteel 16 minutes ago [-]

i imagine they're doing superman level distributed compute across multiple clouds somewhere and cared more about delivering the final result of that than having the ability to pause. which is probably possible, but would require way more work than would be worthwhile. they probably thought the ability to stop and resubmit would be an adequate substitute.

amelius 11 minutes ago [-]

These models are autoregressive so I doubt they are running them across multiple clouds. And besides, a pause button is useful from a user's pov.

veselin 1 hours ago [-]

This is an experiment that, although not to this extreme, was tested by OpenAI. Their responses API allow you to control verbosity:

https://developers.openai.com/api/reference/resources/respon...

I don't know their internal eval, but I think I have heard it does not hurt or improve performance. But at least this parameter may affect how many comments are in the code.

virtualritz 3 hours ago [-]

This is the best thing since I asked Claude to address me in third person as "Your Eminence".

But combining this with caveman? Gold!

eMPee584 2 hours ago [-]

f.e.?

fny 35 minutes ago [-]

Are there any good studies or benchmarks about compressed output and performance? I see a lot of arguing in the comments but little evidence.

samus 2 hours ago [-]

There's linguistic term for this kind of speech: isolating grammars, which don't decline words and use high context and the bare minimum of words to get the meaning across. Chinese is such a language btw. Don't know what Chinese think about their language being regarded as cavemen language...

akdor1154 2 hours ago [-]

I thought the term for those were 'sane languages', and I say that as a native English speaker :)

VadimPR 3 hours ago [-]

Wouldn't this affect quality of output negatively?

Thanks to chain of thought, actually having the LLM be explicit in its output allows it to have more quality.

gozzoo 3 hours ago [-]

I think this could be very useful not when we talk to the agent, but when the agents talk back to us. Usually, they generate so much text that it becomes impossible to follow through. If we receive short, focused messages, the interaction will be much more efficient. This should be true for all conversational agents, not only coding agents.

p2detar 2 hours ago [-]

That’s what it does as far as I get it. But less is not always better and I guess it’s also subjective to the promoter.

pixelpoet 2 hours ago [-]

> Usually, they generate so much text that it becomes impossible to follow through.

Quite often on reddit I'll write two paragraphs and get told "I'm not reading all that".

Really? Has basic reading become a Herculean task?

0xpgm 2 hours ago [-]

Not specifically about your case, but some people are usually just more verbose than others and tend to say the same thing more than once, or perhaps haven't found a clear way of articulating their thoughts down to fewer words.

golem14 2 hours ago [-]

I think the sentiment here is that the short formulation of Kant's categorical imperative is as good and easier to read than the entirety of "types of ethical theory" (J.J. Martineau).

vova_hn2 2 hours ago [-]

> Has basic reading become a Herculean task?

I find LLM slop much harder to read than normal human text.

I can't really explain it, it's just a feeling.

The feeling that it draaaags and draaaaaags and keeeeeps going on and on and on before getting to the point, and by the time I'm done with all the "fluff", I don't care what is the text about anymore, I just want to lay down and rest.

adam_patarino 30 minutes ago [-]

Or you could use a local model where you’re not constrained by tokens. Like rig.ai

vivid242 2 hours ago [-]

Great idea- if the person who made it is reading: Is this based on the board game „poetry for cavemen“? (Explain things using only single-syllable words, comes even with an inflatable log of wood for hitting each other!)

rschiavone 2 hours ago [-]

This trick reminds me of "OpenAI charges by the minute, so speed up your audio"

https://news.ycombinator.com/item?id=44376989

vntok 23 minutes ago [-]

Which worked great. Also, cut off silences.

> One half interesting / half depressing observation I made is that at my workplace any meeting recording I tried to transcribe in this way had its length reduced to almost 2/3 when cutting off the silence. Makes you think about the efficiency (or lack of it) of holding long(ish) meetings.

fzeindl 1 hours ago [-]

I tried this with early ChatGPT. Asked it to answer telegram style with as few tokens as possible. It is also interesting to ask it for jokes in this mode.

amelius 28 minutes ago [-]

It's especially funny to change your coworker's system prompt like that.

zahirbmirza 3 hours ago [-]

You can also make huge spelling mistakes and use incomplete words with llms they just sem to know better than any spl chk wht you mean. I use such speak to cut my time spent typing to them.

floriangoebel 2 hours ago [-]

Wouldn't this increase your token usage because the tokenizer now can't process whole words, but it needs to go letter by letter?

literalAardvark 41 seconds ago [-]

It doesn't go letter by letter, so not with current tokenizers.

owenthejumper 1 hours ago [-]

What is that binary file caveman.skill that I cannot read easily, and is it going to hack my computer.

Robdel12 24 minutes ago [-]

I didn’t comment on this when I saw it on threads/twitter. But it made it to HN, surprisingly.

I have a feeling these same people will complain “my model is so dumb!”. There’s a reason why Claude had that “you’re absolutely right!” for a while. Or codex’s “you’re right to push on this”.

We’re basically just gaslighting GPUs. That wall of text is kinda needed right now.

andai 3 hours ago [-]

So it's a prompt to turn Jarvis into Hulk!

norskeld 2 hours ago [-]

APL for talking to LLM when? Also, this reminded me of that episode from The Office where Kevin started talking like a caveman to make communication efficient.

andai 3 hours ago [-]

No articles, no pleasantries, and no hedging. He has combined the best of Slavic and Germanic culture into one :)

samus 2 hours ago [-]

Both Slavic languages and German have complex declination systems for nouns, verbs, and adjectives. Which is unlike stereotypical caveman speech.

kgeist 3 minutes ago [-]

I wonder why it's assumed cavemen spoke that way. If you go back in time, most languages become more and more synthetic, with more complex declension and conjugation paradigms. The more you move toward modern times, the more our languages simplify, lose complexity, and become analytical. If you think about it, modern European languages are literally 'caveman speech' compared to Proto-Indo-European when you compare the grammars.

iammjm 57 minutes ago [-]

I speak German, Polish, and English fluently and my take is: German is very precise, almost mathematical, there is little room to be misunderstood. But it also requires the most letters. English is the quickest, get things done kind of language, very compressible , but also risks misunderstanding. Polish is the most fun, with endless possibilities of twisting and bending it's structures, but also lacking the ease of use of English or the precision of German. But it's clearly just my subjective take

stared 2 hours ago [-]

I would prefer to talk like Abathur (https://www.youtube.com/watch?v=pw_GN3v-0Ls). Same efficiency but smarter.

ArekDymalski 3 hours ago [-]

While really useful now, I'm afraid that in the long run it might accelerate the language atrophy that is already happening. I still remember that people used to enter full questions in Google and write SMS with capital letters, commas and periods.

vova_hn2 1 hours ago [-]

> I still remember that people used to enter full questions in Google

I think that, in the early days of internet search, entering full questions actually produced worse results than just a bunch of keywords or short phrases.

So it was a sign of a "noob", rather than a mark of sophistication and literacy.

doe88 2 hours ago [-]

> If caveman save you mass token, mass money — leave mass star.

Mass fun. Starred.

sillyboi 42 minutes ago [-]

Oh, another new trend! I love these home-brewed LLM optimizers. They start with XML, then JSON, then something totally different. The author conveniently ignores the system prompt that works for everything, and the extra inference work. So, it's only worth using if you just like this response style, just my two cents. All the real optimizations happen during model training and in the infrastructure itself.

kukakike 1 hours ago [-]

This is exactly what annoys me most. English is not suitable for computer-human interaction. We should create new programming and query languages for that. We are again in cobol mindset. LLM are not humans and we should stop talking to them as if they are.

zozbot234 1 hours ago [-]

Grug says Chinese more suitable, only few runes in word, each take single token. Is great.

3 hours ago [-]

hybrid_study 39 minutes ago [-]

Mongo! No caveman

cadamsdotcom 2 hours ago [-]

Caveman need invent chalk and chart make argument backed by more than good feel.

DonHopkins 37 minutes ago [-]

Deep digging cave man code reviews are Tha Shiznit:

https://www.youtube.com/watch?v=KYqovHffGE8

2 hours ago [-]

saidnooneever 3 hours ago [-]

LOL it actually reads how humans reply the name is too clever :').

Not sure how effective it will be to dirve down costs, but honestly it will make my day not to have to read through entire essays about some trivial solution.

tldr; Claude skill, short output, ++good.

setnone 2 hours ago [-]

caveman multilingo? how sound?

vova_hn2 2 hours ago [-]

I don't know about token savings, but I find the "caveman style" much easier to read and understand than typical LLM-slop.

bhwoo48 3 hours ago [-]

I was actually worried about high token costs while building my own project (infra bundle generator), and this gave me a good laugh + some solid ideas. 75% reduction is insane. Starred

bogtog 3 hours ago [-]

I'd be curious if there were some measurements of the final effects, since presumably models wont <think> in caveman speak nor code like that

Rendered at 13:03:36 GMT+0000 (Coordinated Universal Time) with Vercel.