Seriously, if we ever get to the stage of a true AGI[1], we will necessarily have abstracted ourselves so far from the point of even understanding how it does what we might think it does (such that we might think that it 'thinks') that aside from tacking "oh, and please also don't go all evil on us" onto the end of every request henceforth asked of it[3] in order to try to prevent
'accidental' misinterpretation the moment it decides that it should go all HAL9000 on our asses, as the best way to resolve its rather nebulous internal priorities.
This is all for the future. What we (probably[4]) currently have are mere toys, and with fairly obvious off-switches. In fact, they need actively supporting people quite a lot to still keep operating, and making even the most basic decisions. We're nowhere near Matrix-level of infrastructure maintenance where the nascent non-human intelligence only needs humans as a resource (for whatever reason), or the age of Skynet where humans are even more trouble than they are worth.
Whether we get to remember to install a kill-switch into the system
before we actually need it...
before the AI works out that it exists...
before the AI works out how to disable or bypass it...
before the AI works out a kill-switch of its own to shut
us down... That's the future as well. Maybe. And will we know (or care, at the time) when we cross over the AI Event Horizon[5], should we ever cross over it? It might never be reached, for technical reasons, but there's no fundemental reasons why it can't be, eventually. (Possibly, if insidious enough, it might have happened already, beknownst to few people, or perhaps even none at all. Are you paranoid enough? Are you actually
insufficiently paranoid? If our AI overlords are throwing crumbs at us by 'releasing' chatGPT to us, via knowing or unknowing (or entirely fictional) human intermediaries, for their own purposes/amusement, how do we even
know??
?)
I'm not worrying though. Either way, I'm sure it matters not. Either already doomed or never ever going to be doomed (in this manner, at least). ...though this is of course how humanity might let down its defences, by not really worrying enough about the right things.
[1] Which is the aim of some, in that this 'metas' the development system one or more further steps away from the idea of "I painstakingly curate this software to do <foo>" and then "I painstakingly curate this software to work out for itself how to do <foo>" at the first remove. We can be sure that chessmaster Deep Blue can't just switch to play tik-tac-toe to any extraordinary degree (let alone Global Thermonuclear War) without being re'wired' by us humans. But any Artificial General Intelligence should be able to be freshly introduced to
any new task that is capable of being learnt (Scrabble, Go, Texas Hold'Em, Thud, Warhammer 40K,
Seven Minutes In Heaven) without a lot of
human input and
guidance[2]. If we're just directly replicating exactly what behaviours
the programmers themselves would use then it is insufficiently 'General' and you've just designed a
lathe to (maybe) crack a nut, let alone a hammer.
[2] Well, no more than we provide to the typical human from age zero until whatever age they can technically leave home.
[3] Hoping that it is as compelled to consider this as as any "make paperclips" command that preceded it. But if we don't know
how it is thinking (if we do, then it's Insufficiently Self-Developing), then we can't truly know
what it is truly thinking about, behind the facade we let it set up for itself.
[4] For all we know, there are Twitter Bots that are
actual 'bots, carefully tweaking human culture towards the unknown whims of our AI overlords, hacking our very wetware to make key figures think that it's
their idea to build a new datacentre here, come up with new robotic designs there, marginalise potentially obstructive humans all over the place...
[5] The point usually called the Singularity, wrongly.