Lol, _m128 isn't in the c standard. It represents a xmm register and you know it.
Asking for a register and not caring which one you get, doesn't make it C because there are examples of that already existing in assemblers. What was that one assembler atari or someone made that could use infinite registers?
int a is a variable.. it doesn't ever have to touch a register, the compiler can do with it whatever it wants and it certainty isn't specific to the intel platform.
And I showed you how to avoid trashing memory in the example above. I told it I was going to clobber the EAX register.. so the compiler can make a choice. If it can avoid having EAX in use during my code, it can let my code clobber it without any ill effects. If EAX is in use and that can't be avoided, it pushes it to the stack and pops it off after my code. GCC makes that choice. It doesn't have to save the state of all of the registers if I code correctly
Since I defined my inputs, if it can arrange for my input to already be in a different register... my inline assembly will use that instead of a variable on the stack. If it sees my output is going to be used soon, it will leave it in EAX or something since I clobbered it anyway.
I've looked at the code output of GCC when using inline, there is nothing wrong with it at all. I'm not against using intrinsics, and I have been using them. I just haven't seen a case where GCC did something better with the intrinsic than my inline.
I have never used MSVC, so I don't know the first thing about it. I do know it doesn't even support inline asm for 64 bit, so I guess if you're using MSVC you totally should use intrinsics only.