No, not really. And it is not a matter of bigger context window sizes or model sizes.
Right, I agree with this. A naive scaling will result in minor gains across every category, but will not result in massive fundamental breakthroughs. (eg. trying to go from GPT 4->5 just by making it bigger would require a huge increase in scale).
Long output tasks will not be solved just by making it bigger either.
What will make it better is said compute increase in combination with all the other stuff and learning that’s going on.
(8:45) Performance on complex tasks follows log scores. It gets it right one time in a thousand, then one in a hundred, then one in ten. So there is a clear window where the thing is in practice useless, but you know it soon won’t be. And we are in that window on many tasks. This goes double if you have complex multi-step tasks. If you have a three-step task and are getting each step right one time in a thousand, the full task is one in a billion, but you are not so far being able to in practice do the task.
…
(9:15) The model being presented here is predicting scary capabilities jumps in the future. LLMs can actually (unreliably) do all the subtasks, including identifying what the subtasks are, for a wide variety of complex tasks, but they fall over on subtasks too often and we do not know how to get the models to correct for that. But that is not so far from the whole thing coming together, and that would include finding scaffolding that lets the model identify failed steps and redo them until they work, if which tasks fail is sufficiently non-deterministic from the core difficulties.
Long output tasks will not spontaneously get better, what will make them better is the people working constantly to make them better at that exact thing altering things like the data formatting, the training structure, the shape and functions of their neural net architecture, hyperparameter values, ect.
This isn’t hypothetical or just copium either, the size of outputs AI can coherently create has been ballooning over the past few years and shows no sign of stopping or slowing down.
But can it tie to the rest of a larger story? Can it direct combat in a way that will benefit overall plot? Correctly take into account established traits of the captains of the ships? Understand the intricacies of space combat in this exact universe?
Yes to all of the above.
It can’t write a whole book properly AFAIK, but if you just tell it to write a few paragraphs or pages? Yeah, like many other “AI can’t do this” stuff once it gets properly defined it turns out it can in fact, already do it.
But no, I don't think that, for example, we can get an LLM that can GM a Bay12 multiplayer forum game without it breaking apart and being filled with mechanical and plot holes. Even if we train it on all forum games in existence and pour millions into training it. Such tasks require properties LLMs lack.
Can it happen with some major breakthroughs and new type(s) of AI? Perhaps, but why should we assume that major breakthrough of this nature will happen?
No, it requires properties that just aren’t powerful enough yet. The difference between being able to do something at 30% and 90% is the difference between uselessness and (with frameworking) actually doing the task fairly reliably.
In practice the difference between 30% and 90% aren’t actually that far off and the fact that they mess up a rule every other post or forget some key setting detail isn’t that far off from them doing so every ten posts, then every hundred posts, then them just not doing so at all.
I would be confidently willing to bet that GPT-6 could run a forum game without issues, but obviously the tech is nowhere near there yet even if you tried pouring hundreds of millions in. (I could even see late in the cycle GPT-5 equivalent AI doing so, but that's much more iffy).
Perhaps, but why should we assume that major breakthrough of this nature will happen?
So they really don’t *need* a ton of breakthroughs (E: Well fundamental breakthroughs that is, they still need a ton more of the minor types of breakthroughs we get every day) (again, a lot of this stuff is there, they just need to make it better).
But the thing that makes me confident that there will be breakthroughs is 1) The fact that new breakthroughs coming out literally every day which is at the least partially demonstrative of our position on the S-curve, and 2) Neural nets and a lot of modern AI architecture is designed to mimic neurons (eg. even some of the circuits are the same such as those for addition), and we already know that neurons can do all of this.
A growing body of research is making some surprising discoveries about insects. Honeybees have emotional ups and downs. Bumblebees play with toys. Cockroaches have personalities, recognize their relatives and team up to make decisions.
You don’t need to have a human size brain to have agency or time recognition or emotions or a lot of other “AI can’t” stuff out there, even tiny insect brains can do much of the stuff.
The idea that neural nets (and by extension AI) are fundamentally unable to do things at the level of an insect and that these will prove huge roadblocks feels a bit funky to me.