A good example: you can teach any three-year-old what a stop sign is (even if they can't read!) with one* "sample image." And this will be robust to all sorts of mangled, defaced, or partial stop signs. Current state-of-the-art for ML takes thousands (millions?) of samples to learn a stop sign, and they are still foiled by pieces of masking tape on the sign, or the sign being 10% obscured by a fencepost or something. Similarly you can teach children with not much effort that eating rocks is not a reasonable behavior, nor is making pasta with gasoline sauce.
The issue is that that child didn't come out of nowhere, they are the product of billions of years of adversarial training. Yes, after that training (and multiple years of fine tuning) it only takes a single example to teach them some things, but to get there required massive amounts of time and information.
But also yes, if you make up a sign and show it to an AI, tell them what it is, then ask about it they will totally get it after seeing it once.
Its true their image recognition isn't as good, but again, they have been training for a hell of a lot less time then humans.
LLMs have cognitive automation, but they don't have intelligence.
General intelligence, artificial or natural, does not exist.
Cats, dogs, humans and all animals have specialized intelligence.
They have different collections of skills and an ability to acquire new ones quickly.
Much of animal and human intelligence is acquired through observation of -- and interaction with -- the physical world.
That's the kind of learning that we need to reproduce in machines before we can get anywhere close to human-level AI.
There is no such thing as intelligence. Or rather there is, but its made up of a vast number of different categories in the same way that charisma and agility are.
LLM's have most of the puzzle pieces needed for "intelligence" as understood by humans, they can generalize, they can plan, they can learn information, they have theory of mind, they have object recognition, ect. But there are a lot of different pieces, and they don't have them all yet.
Since they are still lacking many things and fundamental breakthroughs are indeed needed. Breakthroughs like those needed to go from GPT 2->GPT 3->GPT 3.5->GPT 4->GPT 4o, or GPT 4-> Gemini, or Dalle-> Sora.
To me the most notable things AI is lacking are 1) The ability to do long term tasks, 2) Long term memory, 3) The ability to learn fundamentally new skills.
IMHO none of these are impossible to fix or require a fundamentally new model.
In the end I suspect #3 is the greatest roadblock, but we will of course see.
Talent is a real thing that exists and current models of AI cannot copy it, definitionally.
Definitionally why? Honestly there is a pretty solid argument that AI are nothing *but* talent and intuition.
because in the end transformers are just a sophisticated way to predict text.
With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.
Incidentally this is already wrong and outdated. GPT-4o is inherently multimodal, and thus transformers have shown the ability to work on multiple different inputs (senses) and convert them into multiple different outputs, not just text.