I took the original challenge ("outputs the numbers") as including printing...
Yup, I somewhat concede that if each core got a shot at poking the GPU directly (still, consider the access-blocking and permitting needed, while each is doing its half/quarter/sixth-of-a-million, pokes, each) and then the "display page swap" (or whatever you do, these days, on modern GPUs), it might be quicker. I'm worried about aforementioned shared-memory issue, over the lower-bandwidth of the bus towards it, though.
A multicore GPU being given the task, internally, now... Hmmm... Same technique as getting modern graphics cards to do some of the physics calculations (but far simpler, maybe not needing even to 'report back' any more than a cursory "done" to the motherboard), probably already given the firmware for efficiently delineated/shared access to screen memory, and that wouldn't even need a multicore CPU to take advantage, having abdicated all responsibility beyond the minimum necessary to set the situation up and resolve any "Done!" signals. Would highly depend on the card (and GPU's core speed/suitability for non-graphics command-set compared to the CPY), of course, assuming I'm not giving them credit for far more autonomy and self-determination than they actually deserve (at today's tech-level, I know I saw proof-of-concept simulations of the kind of tech I'm describing, a few years back, and am
rather assuming that it is ubiquitous (though maybe non-unified and propriety to each manufacturer) on the actual high-end stuff of today.
Maybe we have a plan! Now, who wanted these million numbers printing out again, and when does he/she want it doing by? 'Cos I probably need to source some hardware, get an intense refresher in Assembler, subscribe to the card manufacturer's developer programme, etc... What's this project's budget? Do I get any staff? I positively
demand a cool domain name. We even need to work out whether we're comma-delimiting the thousands (with optional dot-delimiting for systems with regional settings specifying that!), whether it's one number per line or space-separated... So much to think about, so much to do, but I'm feeling confident that we can get a 95% bug-free program before we even move it to the beta stage.