Great concept. It would've been even more amusing if the entire thing were generated with AI instead, ironically.
kami23 2 hours ago [-]
This read like poetry to me. Thank you for sharing it.
I have a linguistics background and a lot of my philosophizing lately has been on whether or not the emergent abilities of the LLMs is deep down a similar mechanism that creates our consciousness.
For a little bit I was working on having linguistics based evals for a kaggle competition. My challenge was whether or not I could mask things well enough to not trigger its internal state of certain phenomena, and that sent me down a rabbit hole that I'm still exploring.
This story resonated with a lot of questions that can come out of figuring a good solid answer to the what is consciousness question. The one I triggered for me is: Is our perception of time just a slow thread in the giant GPU we are running the universe on? Or more generally, what is time? That's a fun YouTube rabbit hole if you ever need one.
eszed 1 hours ago [-]
Yeah, I currently suspect that consciousness is an emergent property. I read elsewhere (it's somewhere in my HN history, I'm sure) that the biggest compute we can currently muster is something like three or four magnitudes away from the number of neurons / connections (or their analog) that our brains have, so it may be a while until we can expect to see it in our machines. But, if the emergent phenomenon hypothesis is correct, then we eventually will. I'm more scared than pleased by the prospect, but there you are.
kevin_thibedeau 20 minutes ago [-]
Our machines won't have biological systems driving their needs which in turn fuel behaviors like desire and planning for the future. They may imitate them but it won't be innate.
lotyrin 11 minutes ago [-]
I think those are things human consciousness has, not is.
slopinthebag 53 minutes ago [-]
This is not meant as a gotcha, I am genuinely curious how you believe consciousness can be an emergent property. I assume you don't believe consciousness is a physical property in the brain, so what entity is actually experiencing that consciousness? Or, what does it even mean to experience consciousness? Or are these not even the right questions?
pixl97 27 minutes ago [-]
Is a video game a physical property of a computer?
slopinthebag 7 minutes ago [-]
Yes
therealdrag0 35 minutes ago [-]
Those are the questions and there’s stacks and stacks of philosophy pages written about it. Go have a whirl.
kridsdale3 1 hours ago [-]
Time is entropy unfolding as things with nonzero temperature do what they do.
Psychological time is your own weights being updated in response to stimuli and internal processing.
When there isn't anything interesting happening, no updates are needed, and you don't perceive much time. That's why there's a logarithmic effect on the "density" of time as you age.
hippich 1 hours ago [-]
This is actually something I was always confused about. If nothing interesting happens as we get old, it should be boring and as result, slow slog. Yet it feels like time accelerates as I get older.
agumonkey 9 minutes ago [-]
It's coherent. More newness => more memories per period ~ slower to go through. Less newness => less memories ~ nothing to go through (faster sense of time)
pixl97 24 minutes ago [-]
Myself I believe the opposite. The brain itself is one of the most powerful filters that exists, and it attempts to be lazy and fill things in and compresses away the common. All that time we're not doing anything novel just gets compressed away to almost nothing. When you're a kid and seeing new things, feeling new things, learning new things you can't compress that away.
27 minutes ago [-]
BobbyTables2 53 minutes ago [-]
I’ve wondered the same myself, without being a cunning linguist.
I understand the math pretty well but still find it crazy that a bunch of matrices can converse in human languages without ever being “taught”.
Imagine decoding an encyclopedia written in a foreign language where the characters, punctuation, and grammar are unknown — supplemented by a million other texts the same way. Feels like it should be utterly impossible with any amount of computing power…
Today I asked my employer’s Claude to proofread a short software user manual written in markdown. (Trying this with a LLM was a first for me!) It pointed out not only grammar mistakes but also cases where I did not follow my own self-imposed conventions that were never explicitly stated. (I didn’t have a chapter detailing all the typographical conventions the way specification documents often do)
I also asked it what parts might be unclear to a user. The response was surprisingly good — no worse than asking the QA tester for the same feedback.
Also find the LLM seems to “comprehend” subtle technical details of obscure technical specification documents that nobody on the Internet ever discusses.
As for time and the universe, Stephen Wolfram’s theories seem intriguing. He seems a bit obsessed with pretty diagrams but the idea of time dilation being the result of computation seems somewhat more appealing than trying to imagine relationships between time, gravity, and the speed of light .
Obscurity4340 30 minutes ago [-]
If time dilation is said to being a product of computation, why is it that anaesthetic drugs that are taken not to the point of actual unconsciousness cause it. Dont anaesthetics sort of shut everything down/inhibit all that kind of cognitive activity (compute?)
I have to agree. It is messed up that transformers can just talk, and it been pretty normalized. We are only talking about the impact they will have and whether they can do what people say they can, but we arent talking about how crazy it is that they can talk
modzu 13 minutes ago [-]
if youve ever seen a pile of wrinkly mush and wondered.. pretty damn crazy too
It's not often I see something that's fractally wrong but here we are.
There is a dictionary, it's called the tokenizer.
There are grammar rules, they are just very weak because the structure of human language is generally quite weak. When presented with languages which have strong consistent grammars the weights are very easily interpretable as a grammar: https://arxiv.org/abs/2201.02177
The point of the original short story is that the computational substrate doesn't matter when you have Turing completeness. This one seems to think that you don't need structure and interpretability just because you change substrates.
famouswaffles 41 minutes ago [-]
>There are grammar rules, they are just very weak because the structure of human language is generally quite weak. When presented with languages which have strong consistent grammars the weights are very easily interpretable as a grammar: https://arxiv.org/abs/2201.02177
That paper did not train the models on 'a language with strong consistent grammars'. Mathematical Operation tables are not a language. Grammar itself is a post-hoc rationalization and there's no evidence LLMs follow 'grammar rules' anymore than the brain follows grammar rules. Of Course, that's not to say transformers can't learn simple rules if the dataset calls for it.
glitchc 1 hours ago [-]
> fractally wrong
fractally or factually? You mean wrong on so many levels you need a fractal to capture them? If so, what if you could use a neural network instead?
The tokenizer is, at best, a sensory mechanism as evidenced by 1) the random generation of the tokenization scheme, and 2) vastly different tokenization schemes produce virtually identical behavior. It'd be like if Noah Webster threw a bunch of movable type into a bucket (breaking some words in half) and then drew randomly to make the first English dictionary.
EDIT; I was too cavalier with the comparison of tokenizer to sensory modality; my ultimate point is that direct byte-to-token transformers can achieve similar overall performance which to me makes a weights to meat comparison pretty straightforward, but the particular tokenizer in use certainly has a large impact on both efficiency and accuracy on specific problems (e.g. digit representation)
noosphr 2 hours ago [-]
I'm kind of stunned that someone is using my work to tell me I'm wrong. I wrote the code for the dish brain pong and encoding information was a huge part of what that experiment was about.
So when I way that the grok paper and the pong paper fundamentally agree I have some idea of what I'm talking about.
benlivengood 1 hours ago [-]
I might have misunderstood the point you are making. I read the original article as "weights are like meat", and so I'm confused by what you consider fractally wrong.
noosphr 1 hours ago [-]
The point that when the rules the model learns are simple enough they stop being spread out over all the layers and become as easily interpretable as any expert system.
It's just that the rules we feed in the model are extremely poorly defined and we end up with the soup of disjoint rules smeared all across the weights.
This isn't a feature of the models. It's a feature of the training set.
Being shocked that you can store rules in floating point numbers is the same as being shocked you can store rules in integers. It's been a century since Goedel Numbering was invented, we should be used to it by now.
simonh 47 minutes ago [-]
Right, but all of that is still in the weights. The point of the article/joke isn’t literally that there is no grammar, it’s that there is no grammar separate from the weights. It’s all in the weights. And yes, it’s absurd. It’s a joke, but a thought provoking one.
Hubris much? I don't see a necessary contradiction in using someone's work to disprove another aspect of that same person's work.
dpark 2 hours ago [-]
A tokenizer is not a dictionary any more than an alphabet is a dictionary.
noosphr 1 hours ago [-]
The Chinese alphabet is very much a dictionary. All the major tokenizers are far larger.
dpark 1 hours ago [-]
That doesn’t make any sense. A alphabet is a list of valid characters. A dictionary is not just a list. Even in a language like Chinese where individual characters carry meaning, a dictionary tells you what that meaning is. It’s not just a list of characters.
Or to echo article, the dictionary is made out of weights.
simonh 52 minutes ago [-]
A list of words isn’t a dictionary. What a dictionary adds over a list of words is all the relationships between the words needed to interpret them and use them, and all of that is in the weights.
canjobear 1 hours ago [-]
A mapping of Chinese characters to integers (like a tokenizer) would not be a dictionary. You’d also need definitions. At best it’s an index to a hypothetical dictionary.
2 hours ago [-]
throw310822 2 hours ago [-]
> There are grammar rules
And they're made out of weights.
Waterluvian 45 minutes ago [-]
It must have been kind of incredible early on to be exploring this tech and you’re suddenly getting what look like sentences.
luca-ctx 59 minutes ago [-]
Truly fantastic bridge from the original, this deserves an award
MaxLeiter 53 minutes ago [-]
All credit to the original author. I just had to think of analogues.
turtleyacht 3 hours ago [-]
Numbers that dream.
oofbey 2 hours ago [-]
I love this. For anybody not getting the joke, it’s riffing on the classic 1990s essay “They’re made out of meat.”
This original author is mentioned in the second sentence of the linked article, and then again in the third sentence, along with a link to the original story.
CSSer 3 hours ago [-]
It works until they get to the sentience part. Neat idea!
margalabargala 2 hours ago [-]
Even there it works a bit.
> These models are the only other things we've ever met that can hold a conversation, and they're made out of weights
Is a fair point.
RodgerTheGreat 2 hours ago [-]
Not especially. Depending on where you set your standards for "holding a conversation" you can satisfy the requirement with a classical markov chatterbot, a well-trained parrot, a copy of Eliza, or a telemarketer flowchart drawn on a sheet of paper. Only the markov bot is made out of "weights" in the sense of a statistical model.
Parrots are intelligent animals, albeit with a limited capacity for vocabulary and syntax compared to a human, and Eliza and the flowchart are made out of explicitly encoded rules and conversational tactics.
margalabargala 1 hours ago [-]
The quality of "conversation" you can have with everything on your list is highly limited, and is categorically different than the sort of conversation you are able to have with any modern AI.
solenoid0937 51 minutes ago [-]
Weights hold a better conversation at this point than the overwhelming majority of humans.
Rendered at 04:40:19 GMT+0000 (Coordinated Universal Time) with Vercel.
I have a linguistics background and a lot of my philosophizing lately has been on whether or not the emergent abilities of the LLMs is deep down a similar mechanism that creates our consciousness.
For a little bit I was working on having linguistics based evals for a kaggle competition. My challenge was whether or not I could mask things well enough to not trigger its internal state of certain phenomena, and that sent me down a rabbit hole that I'm still exploring.
This story resonated with a lot of questions that can come out of figuring a good solid answer to the what is consciousness question. The one I triggered for me is: Is our perception of time just a slow thread in the giant GPU we are running the universe on? Or more generally, what is time? That's a fun YouTube rabbit hole if you ever need one.
Psychological time is your own weights being updated in response to stimuli and internal processing.
When there isn't anything interesting happening, no updates are needed, and you don't perceive much time. That's why there's a logarithmic effect on the "density" of time as you age.
I understand the math pretty well but still find it crazy that a bunch of matrices can converse in human languages without ever being “taught”.
Imagine decoding an encyclopedia written in a foreign language where the characters, punctuation, and grammar are unknown — supplemented by a million other texts the same way. Feels like it should be utterly impossible with any amount of computing power…
Today I asked my employer’s Claude to proofread a short software user manual written in markdown. (Trying this with a LLM was a first for me!) It pointed out not only grammar mistakes but also cases where I did not follow my own self-imposed conventions that were never explicitly stated. (I didn’t have a chapter detailing all the typographical conventions the way specification documents often do)
I also asked it what parts might be unclear to a user. The response was surprisingly good — no worse than asking the QA tester for the same feedback.
Also find the LLM seems to “comprehend” subtle technical details of obscure technical specification documents that nobody on the Internet ever discusses.
As for time and the universe, Stephen Wolfram’s theories seem intriguing. He seems a bit obsessed with pretty diagrams but the idea of time dilation being the result of computation seems somewhat more appealing than trying to imagine relationships between time, gravity, and the speed of light .
It stars Tom Noonan and Ben Bailey!
https://web.mit.edu/people/dpolicar/writing/prose/text/think...
There is a dictionary, it's called the tokenizer.
There are grammar rules, they are just very weak because the structure of human language is generally quite weak. When presented with languages which have strong consistent grammars the weights are very easily interpretable as a grammar: https://arxiv.org/abs/2201.02177
The point of the original short story is that the computational substrate doesn't matter when you have Turing completeness. This one seems to think that you don't need structure and interpretability just because you change substrates.
That paper did not train the models on 'a language with strong consistent grammars'. Mathematical Operation tables are not a language. Grammar itself is a post-hoc rationalization and there's no evidence LLMs follow 'grammar rules' anymore than the brain follows grammar rules. Of Course, that's not to say transformers can't learn simple rules if the dataset calls for it.
fractally or factually? You mean wrong on so many levels you need a fractal to capture them? If so, what if you could use a neural network instead?
The tokenizer is, at best, a sensory mechanism as evidenced by 1) the random generation of the tokenization scheme, and 2) vastly different tokenization schemes produce virtually identical behavior. It'd be like if Noah Webster threw a bunch of movable type into a bucket (breaking some words in half) and then drew randomly to make the first English dictionary.
EDIT; I was too cavalier with the comparison of tokenizer to sensory modality; my ultimate point is that direct byte-to-token transformers can achieve similar overall performance which to me makes a weights to meat comparison pretty straightforward, but the particular tokenizer in use certainly has a large impact on both efficiency and accuracy on specific problems (e.g. digit representation)
So when I way that the grok paper and the pong paper fundamentally agree I have some idea of what I'm talking about.
It's just that the rules we feed in the model are extremely poorly defined and we end up with the soup of disjoint rules smeared all across the weights.
This isn't a feature of the models. It's a feature of the training set.
Being shocked that you can store rules in floating point numbers is the same as being shocked you can store rules in integers. It's been a century since Goedel Numbering was invented, we should be used to it by now.
Or to echo article, the dictionary is made out of weights.
And they're made out of weights.
https://web.mit.edu/people/dpolicar/writing/prose/text/think...
> These models are the only other things we've ever met that can hold a conversation, and they're made out of weights
Is a fair point.
Parrots are intelligent animals, albeit with a limited capacity for vocabulary and syntax compared to a human, and Eliza and the flowchart are made out of explicitly encoded rules and conversational tactics.