I'm not. I'll be more clear though.
#define NROTATE_LEFT(x, n, m) (((x) << (n)) | ((x) >> (m)))
#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))
#define NFF(a, b, c, d, x, s1, s2, ac) {(a) += F((b), (c), (d)) + (x) + (ord32)(ac);(a) = NROTATE_LEFT ((a), (s1), (s2));(a) += (b); }
#define NGG(a, b, c, d, x, s1, s2, ac) { (a) += G ((b), (c), (d)) + (x) + (ord32)(ac); (a) = NROTATE_LEFT ((a), (s1), (s2)); (a) += (b); }
#define NHH(a, b, c, d, x, s1, s2, ac) { (a) += H ((b), (c), (d)) + (x) + (ord32)(ac); (a) = NROTATE_LEFT ((a), (s1), (s2)); (a) += (b); }
#define NII(a, b, c, d, x, s1, s2, ac) { (a) += I ((b), (c), (d)) + (x) + (ord32)(ac); (a) =NROTATE_LEFT ((a), (s1), (s2)); (a) += (b); }
#define NFF0(a, b, c, d, s1, s2, ac) { (a) += F ((b), (c), (d)) + (ord32)(ac); (a) = NROTATE_LEFT ((a), (s1), (s2)); (a) += (b); }
#define NGG0(a, b, c, d, s1, s2, ac) { (a) += G ((b), (c), (d)) + (ord32)(ac); (a) = NROTATE_LEFT ((a), (s1), (s2)); (a) += (b); }
#define NHH0(a, b, c, d, s1, s2, ac) { (a) += H ((b), (c), (d)) + (ord32)(ac); (a) = NROTATE_LEFT ((a), (s1), (s2)); (a) += (b); }
#define NII0(a, b, c, d, s1, s2, ac) { (a) += I ((b), (c), (d)) + (ord32)(ac); (a) = NROTATE_LEFT ((a), (s1), (s2)); (a) += (b); }
#define NRHH0(a, b, c, d, s1, s2, ac) { (a) -= (b); (a) = NROTATE_LEFT ((a), (s2), (s1)); (a) -= (H ((b), (c), (d)) + (ord32)(ac)); }
#define NRII0(a, b, c, d, s1, s2, ac) { (a) -= (b); (a) = NROTATE_LEFT ((a), (s2), (s1)); (a) -= (I ((b), (c), (d)) + (ord32)(ac)); }
int md5_reverse()
{
register unsigned int a1,b1,c1,d1;
a1 = *digest2;
b1 = *(digest2+1);
c1 = *(digest2+2);
d1 = *(digest2+3);
NRII0 (b1, c1, d1, a1, S44, SS44, 0xeb86d391);
NRII0 (c1, d1, a1, b1, S43, SS43, 0x2ad7d2bb);
c1 -= x1[2];
NRII0 (d1, a1, b1, c1, S42, SS42, 0xbd3af235);
NRII0 (a1, b1, c1, d1, S41, SS41, 0xf7537e82);
NRII0 (b1, c1, d1, a1, S44, SS44, 0x4e0811a1);
NRII0 (c1, d1, a1, b1, S43, SS43, 0xa3014314);
NRII0 (d1, a1, b1, c1, S42, SS42, 0xfe2ce6e0);
NRII0 (a1, b1, c1, d1, S41, SS41, 0x6fa87e4f);
NRII0 (b1, c1, d1, a1, S44, SS44, 0x85845dd1);
*working=a1;
*(working+1)=b1;
*(working+2)=c1;
*(working+3)=d1;
return(1);
}
The majority of mdcrack was written prior to 2001. It isn't "hand crafted" assembly, and even if was a lot has changed since it was written. The "sse" version of mdcrack is merely compiled with sse optimizations.
The fact that a program that uses things like SSE2 intrinsics and/or CUDA, which isn't C lol, blows a straight C program out of the water is no surprise. The fact "sse" optimizations from the compiler don't help that much is also no surprise. Thanks for proving my point.
EDIT: Which proves ANOTHER point. Imagine if JAVA used some fancy SSE2 in its md5 code. That would make a Java based cracker faster than md5crack lol. Straight portable Java > straight portable C!