Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think saying that "LLMs can produce outcomes akin to those produced by human intelligence (in many but not all cases)" and "LLMs are intelligent" to both be fairly defensible.

> I see no reason whatsoever to believe that what your wet meat brain is doing now is any different from what an LLM does.

I don't think this follows though. Birds and planes can both fly, but a bird and a plane are clearly not doing the same thing to achieve flight. Interestingly, both birds and planes excel at different aspects of flight. It seems at least plausible (imo likely) that there are meaningful differences in how intelligence is implemented in LLMs and humans, and that that might manifest as some aspects of intelligence being accessible to LLMs but not humans and vice versa.



> It seems at least plausible (imo likely) that there are meaningful differences in how intelligence is implemented in LLMs and humans

Intelligence isn’t "implemented" in an LLM at all. The model doesn’t carry a reasoning engine or a mental model of the world. It generates tokens by mathematically matching patterns: each new token is chosen to best fit the statistical patterns it learned from its training data and the immediate context you give it. In effect, it’s producing a compressed, context-aware summary of the most relevant pieces of its training data, one token at a time.

The training data is where the intelligence happened, and that's because it was generated by human brains.


There doesn't seem to be much consensus on defining what intelligence is. For the definitions of at least some reasonable people of sound mind, I think it is defensible to call them intelligent, even if I don't necessarily agree. I sometimes call them "intelligent" because many of the things they do seem to me like they should require intelligence.

That said, to whatever extent they're intelligent or not, by almost any definition of intelligence, I don't think they're achieving it through the same mechanism that humans do. That is my main argument. I thing confident arguments that "LLMs think just like humans" are very bad, given that we clearly don't understand how humans achieve intelligence and the vastly different substrates and constraints that humans and LLMs are working with.


I guess to me, how is the ability to represent the statistical distribution of outcomes of almost any combination of scenarios, represented as textual data not a form of world model?


I think you're looking at it too abstractly. An LLM isn't representing anything, it has a bag of numbers that some other algorithm produced for it. When you give it some numbers, it takes them and does matrix operations with them in order to randomly select a token from a softmax distribution, one at a time, until the EOS token is generated.

If they don't have any training data that covers a particular concept, they can't map it onto a world model and make predictions about that concept based on an understanding of the world and how it works. [This video](https://www.youtube.com/watch?v=160F8F8mXlo) illustrates it pretty well. These things may or may not end up being fixed in the models, but that's only because they've been further trained with the specific examples. Brains have world models. Cats see a cup of water, and they know exactly what will happen when you tip it over (and you can bet they're gonna do it).


That video is a poor and mis-understood analysis of an old version of ChatGPT.

Analyzing an image generation failure modes from the dall-e family of models isn't really helpful in understanding if the invoking LLM has a robust world model or not.


The point of me sharing the video was to use the full glass of wine as an example for how generative AI models doing inference lack a true world model. The example was just as relevant now as it was then, and it applies to inference being done by LMs and SD models in the same way. Nothing has fundamentally changed in how these models work. Getting better at edge cases doesn't give them a world model.


That's the point though. Look at any end-to-end image model. Currently I think nano banana (Gemini 2.5 Flash) is probably the best in prod. (Looks like ChatGPT has regressed the image pipeline right now with GPT-5, but not sure)

SD models have a much higher propensity to fixate on proximal in distribution solutions because of the way they de-noise.

For example.. you can ask nano banana for a "Completely full wine glass in zero g" which I'm pretty sure is way more out of distribution, the model does a reasonable job at approximating what they might look like.


That's a fairly bad example. They don't have any trouble taking unrelated things and sticking them together. A world model isn't required for you to take two unrelated things and stick them together. If I ask it to put a frog on the moon, it can know what frogs look like and what the the moon looks like, and put the frog on the moon.

But what it won't be able to do, which does require a world model, is put a frog on the moon, and be able to imagine what that frog's body would look like on the moon in the vacuum of space as it dies a horrible death.


Your example is a good one. The frog won't work because ethically the model won't want to show a dead frog very easily, BUT if you ask nano-banana for:

"Create an image of what a watermelon would look like after being teleported to the surface of the moon for 30 seconds."

You'll see a burst frozen melon usually.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: