Python gets around the single thread issue by running multiple processes and communicating between them. That's actually probably the approach that a lot of scripting languages can or will take, come to think of it.
I guess someone could implement Python without the global interpreter lock, but then you'd have to add extensions to it to provide for multithreading, like thread creation, pooling, events, mutexes and semaphores, and so on...
Many Python
programs run multiple processes like that, but not until 2.6 did it get standard library support in the form of the multiprocessing module. In contrast, the threading module seems to have been added in 1.5.1, offering a simpler interface for the even older thread module. And no, it's not simple in the slightest to communicate between processes; a general language-level solution would probably perform worse than the GIL.
Meanwhile, Python has been implemented without a global interpreter lock, a few times over. And yes, thread creation, pooling, events, mutexes, and semaphores are also implemented in each one, as part of the standard library. The main difference is that destructors are even less predictable, but that's why we have the with block these days.
Even with a GIL, threading makes it convenient to use blocking system calls or long-running C library calls without blocking the event loop, and enables interleaved processing that works on shared memory structures. It's kind of like running the asyncio loop without leaving yields everywhere. But yes, it makes Python slower than it should be for CPU-bound work on multi-core machines.
Then again, most programs are I/O-bound, which is why removing the GIL hasn't become a big enough priority for the core developers to overcome the issues that it was designed to solve.