better yet, use all 4 cores to simulate a single one. not sure how it is done, but I know it is possible
Not possible.
1: some problems are not parallelizable.
2: most code is not written to be run in parallel.
3: automatically converting code to run in parallel is a non-trivial task.
4: parallization is not free, spawning and disposing threads, synchronization, deadlock detection all take time, for threads of sufficiently small duration (those most easily identified) these costs outweigh the benefits gained.
hmm, could it have been single core processors that simulated multi core processors? I usually don't care too much about that kind of stuff, I just remember seeing some software at a friends computer
Not required. A single core can handle as many threads as a multi-core system can, its just that it can handle them only one at a time.
There are things that can be done to speed up a single process on multicore machines.
1: move all other threads to a different core, so that the single thread has full use without context switches.
2: (requires special hardware) run a nostore/noop predictive thread on the other cores to prefetch data from memory to cache. This reduces the latency involved in waiting to retrieve data from memory.
At most, these would give a few percent increase in performance. But they will work marginally for preexisting software.
The only real way to deal with it is a: educate programmers to understand and use parallelization, and to provide tools and language features that make parallelization easier to use. And these solutions don't really impact preexisting software.