As for why the target MUST be the universal Turing machine---
Let's say you are doing something that would be computationally expensive. Say you are doing some fancy stuff that would benefit from a vectorization process-- So, you use SIMD (Or THUMB, or some other vector processing instruction.) You do this because you would very much like for your program to not take 20 million years to complete.
Suddenly, your program is now x86 only. (Because you used SIMD, which is x86 only.)
The various different implementations of vectorial processing differs in significant ways between different platform ISAs, so you cannot just "Substitute THUMB for SIMD if Platform == ARMv7"-- Especially if your use of SIMD is exploiting some specific behavior of the SIMD ISA.
If you broke that operation up into a serially executed list of sequential instructions, then any universal Turing machine "Could run it"-- they would just run it "Very very slowly". (which, is the very reason you used the SIMD instruction to avoid!)
Them's the breaks.
C tried to abstract this reasonably-- It left "The devil in the details" to the compiler. If compiling for ARM, the compiler heuristically interprets your source code, and spits out machine code that targets the THUMB ISA. If compiling for POWER, it targets altivec ISA. If compiling for x86, it targets SIMD/SSE. AND--- if your system is a very minimal SOC with NO VECTOR INSTRUCTIONS AT ALL-- it would unroll everything and do it the super slow sequential process route.
However, again, programmers THINK they are being sly, clever, cute -- "Efficient"-- whatever excuse/reason you want to give--- and will abuse specific quirks of hardware, instead of sticking to the abstraction. Then the compiler cannot properly interpret it, and thus cannot abstract it into a form some other ISA can handle. BOOM-- Unportable code. The usual way they do that, is by in-lining assmbler.
(Another is by doing something VERY platform specific, like "Write #VALUE! -> IO Port FOO". Such as say, writing a byte value to the VGA hardware's palette table. If you are trying to port that code to some other hardware platform, that has no fucking clue what that is supposed to even DO, the code is not going to have the same end result-- BOOM-- code not portable!)
So, to prevent that, you have to smack their little hands, and say "NO, NO IN-LINE ASSEMBLER." (also, NO, YOU MUST USE the language/compiler's HAL!! NO DIRECT WRITES TO HARDWARE!)
Keep doing things like that-- which are necessary to assure code portability-- and the programmer gets huffy, and says "No, I wont use your obtuse, backward, and restrictive language!".
This then resets the problem to the beginning--- Some other language designer sees the problem-- Source code is not portable-- and tries to fix it---- Invents yet another implementation of the minimal code parser, with all its warts and wrinkles, and obtuse requirements..... And nobody wants to use it.