Without intent behind it, "lying" becomes just being wrong. When a monkey gets cheated out of the grape when he pointed at the correct hand, and starts pointing at the wrong hand instead to get the grape, that's functionally more of a lie, then chatgpt coming to the mathematical conclusion that pulling sources out of it's ass is the most pertinent answer.
I did only use the free version, supposedly chatpgt4 is that much better at being correct, but frankly one gets desillusioned. Once you noticed it's eight and a half ways it prefers to formulate answers by, it's hard to unnotice them. Went from, "wow this thing can talk about anyhting", to only asking language related stuff, difficult to translate idioms, quick encyclopedic checks, describing something to find out if a word of vocabulary or a specialised tool exists for it... stuff like that. Sometimes it can struggle to go into detail, one will often get stuck at receiving a general oversighf that is at the level of in-depth knowledge your choice of vocabulary in the initial prompt indicated. "Do a resume on subject X" will rarely yield more than what google would snip from a site and put at the very top, when asked a concrete question. Specifying the answer to be long will usually just incite a lot of repetition.
I've been starting to believe that when openAI makes grandiose statements, it might less be about the actual trajectory of innovation, then their projected favourable outcomes. What's their currency? I doubt they are making profit from selling their services as it stands, so the business model might be centered around selling, rebuying and reselling their stock, while staying on an "exponential" growth trajectory... Keep investing into more and more hardware to throw at the problem and just assume that at some point, some sort of treshold will be crossed, that changes those dynamics.
I tried to investigate the VRAM requirements of chatpgt, and while there isn't much tranparancy on the subject, from what I gathered, it would take about two nvidia A100 with 40gb VRAM, to even consider exectuing an allready trained model. Now I doubt that nets in a single instance of the thing, that would mean that for every free user they have like 15.000$ of gpu, I can't imagine that to be the case... But anyway once you consider what immense amounts of computing power we are throwing at the problem, and you were allready aware of it's shortcomings, it somehow gets a little less impressive.
I know they are talking about pruning, and I could imagine that by going through several rounds of pruning and retraining, we could reach a point where like a pruned version practically indistinguishable from gpt4 or 5, could run on enthousiast gaming gpus of the current or next generation gaming gpus, so like running a 500w card really hot only to chat... A bit sobering perspectives.
So to answer the question, I think we will reach a 4 within the decade: competent enough to fool half of humans into thinking it's not a computer, and competent enough to fulfill a bunch of roles and jobs, but qualitatively not any nearer to conscience, sentience etc to what we got. Everybody just assumes that at some point it will self optimize beyond our imagination, "faster than we can look", but IMO ATM that is just conjecture. ATM the problems still lies firmly on the side of, how do we scale it up, and can we keep scaling up without dimishing returns.