Topic: What language is Dwarf Fortress made in? (Read 48001 times)

devek · « **Reply #90 on:** September 13, 2010, 11:47:18 am »

Actually, let me take that back.

There are some auxiliary lists.. but they are not used for what you think they are used for. The ones I looked at were all temporary and regenerated when they were needed.

EDIT: Does it matter anyway? Every frame (or graphics frame, would have to ask Baughn) it needs to display what is on the screen. To do so requires iterating through every item to find out what is in your viewing area.

bartavelle · « **Reply #91 on:** September 13, 2010, 12:01:27 pm »

Quote from: devek on September 11, 2010, 11:11:42 am

They are not functions, they are literately assembly instructions inside of your code. The advantage of intrinsics is that you can use high level if statements and such with them, so if anything you're really using high level assembly. If you are not using if statements or whatever, your code is identical to the assembly code.

And that you don't have to bother about registry allocation or memory moves, which can make a *huge* difference when a proper compiler (read ICC) is doing it for you. GCC and ICC actually produce a fairly different code from the same intrinsics.

devek · « **Reply #92 on:** September 13, 2010, 12:41:12 pm »

Quote from: bartavelle on September 13, 2010, 12:01:27 pm

And that you don't have to bother about registry allocation or memory moves, which can make a *huge* difference when a proper compiler (read ICC) is doing it for you. GCC and ICC actually produce a fairly different code from the same intrinsics.

I could care less about what it easier or not when neither option is that hard.

When it comes to what the compiler can optimize more, I am not going to argue that point(since I don't know). Is there any difference between the gcc code with or without intrinsics? To be more specific, when you use gcc what information does it have to optimize with from an intrinsic that it didn't have from the inline assembly?

Code: [Select]

        
int a=10, b;
asm ("movl %1, %%eax; 
    movl %%eax, %0;"
    :"=r"(b)        /* output */
    :"r"(a)         /* input */
    :"%eax"         /* clobbered register */
    );

I'm also starting to hate icc and trying to get us to use clang at work, http://blogs.amd.com/work/2010/01/22/chipping-away-the-facade-on-compilers-and-benchmarks-for-amd-processors/

I would rather use a compiler that optimizes based off cpu features, not one that runs the worst code possible for non intel. Of course, we use nothing but intel but... it still annoys me.

Thief^ · « **Reply #93 on:** September 13, 2010, 01:17:02 pm »

Oh back to the earlier debate a moment. I checked on asm usage in UE3. Matrix multiply uses vector intrinsics on ALL platforms, or straight C++ float code if vector instructions aren't available. No asm at all.
A couple of other functions in the math lib have asm blocks in them, but the codepath is only used if compiling for x86 using VC++ 2003 or below (which as there's a compiletime check for VC++ 2008 elsewhere, will never happen). The sqrt implementation even contains a comment that the C-lib sqrt is 60% more efficient than the (commented out) SSE asm code below it.

So there you go.

devek · « **Reply #94 on:** September 13, 2010, 01:27:59 pm »

I think we settled that argument already though.

You say that intrinsics are not assembly, I say that they are. That was the crux of our argument all along.

You can't point to anywhere in the C or C++ standard that says you are right... a compiler will let you use intrinsics, but it lets you use inline assembly too (neither are considered C/C++).

I think we both agree a straight C/C++ implementation would be quite a bit slower, no?

bartavelle · « **Reply #95 on:** September 13, 2010, 01:51:51 pm »

Quote from: devek on September 13, 2010, 12:41:12 pm

I could care less about what it easier or not when neither option is that hard.

On the contrary, for several applications register allocation is actually the problem, and it is hard to do it properly. That's why barswf ran 3 to 4 times faster than mdcrack, using only intrinsics, while mdcrack was "state of the art" hand written assembly. Of course you could do it by hand, but it would just be the most tedious job ever for a reasonably large application. And gcc sucks at that. Clang is okay, but still not comparable to ICC for my applications.

And there is another good thing that comes from intrinsics : while they are not standard, they are somehow portable, contrary to inline asm. You could possibly compile something on gcc and visual studio. I didn't test visual studio performance, so I have no clue if that would be sensible.

devek · « **Reply #96 on:** September 13, 2010, 01:58:33 pm »

Are you smoking crack?

Dude, look at the requirements for BarsWF.

Quote

# CUDA version only:nVidia GeForce 8xxx and up, at least 256mb of video memory.
# LATEST nVidia-driver with CUDA support.Standard drivers might be a bit older (as CUDA 2.0 is still beta)

No shit, the CUDA based fft library beats the snot out of libfft too.

bartavelle · « **Reply #97 on:** September 13, 2010, 02:02:43 pm »

Quote from: devek on September 13, 2010, 01:58:33 pm

Are you smoking crack?

No shit, the CUDA based fft library beats the snot out of libfft too.

Oh rly ? I'm talking about cpu only performance.

devek · « **Reply #98 on:** September 13, 2010, 02:18:28 pm »

I'm not. I'll be more clear though.

Code: [Select]

#define NROTATE_LEFT(x, n, m) (((x) << (n)) | ((x) >> (m)))
#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))

#define NFF(a, b, c, d, x, s1, s2, ac) {(a) += F((b), (c), (d)) + (x) + (ord32)(ac);(a) = NROTATE_LEFT ((a), (s1), (s2));(a) += (b); }
#define NGG(a, b, c, d, x, s1, s2, ac) { (a) += G ((b), (c), (d)) + (x) + (ord32)(ac); (a) = NROTATE_LEFT ((a), (s1), (s2)); (a) += (b);   }
#define NHH(a, b, c, d, x, s1, s2, ac) { (a) += H ((b), (c), (d)) + (x) + (ord32)(ac); (a) = NROTATE_LEFT ((a), (s1), (s2)); (a) += (b);  }
#define NII(a, b, c, d, x, s1, s2, ac) { (a) += I ((b), (c), (d)) + (x) + (ord32)(ac);  (a) =NROTATE_LEFT ((a), (s1), (s2));  (a) += (b);   }
#define NFF0(a, b, c, d, s1, s2, ac) {  (a) += F ((b), (c), (d)) + (ord32)(ac);  (a) = NROTATE_LEFT ((a), (s1), (s2));  (a) += (b);   }
#define NGG0(a, b, c, d, s1, s2, ac) {  (a) += G ((b), (c), (d)) + (ord32)(ac);  (a) = NROTATE_LEFT ((a), (s1), (s2));  (a) += (b);   }
#define NHH0(a, b, c, d, s1, s2, ac) {  (a) += H ((b), (c), (d)) + (ord32)(ac);  (a) = NROTATE_LEFT ((a), (s1), (s2));  (a) += (b);   }
#define NII0(a, b, c, d, s1, s2, ac) {  (a) += I ((b), (c), (d)) + (ord32)(ac);  (a) = NROTATE_LEFT ((a), (s1), (s2));  (a) += (b);   }
#define NRHH0(a, b, c, d, s1, s2, ac) {  (a) -= (b);  (a) = NROTATE_LEFT ((a), (s2), (s1));  (a) -= (H ((b), (c), (d)) + (ord32)(ac));   }
#define NRII0(a, b, c, d, s1, s2, ac) {  (a) -= (b);  (a) = NROTATE_LEFT ((a), (s2), (s1));  (a) -= (I ((b), (c), (d)) + (ord32)(ac));   }


int md5_reverse()
{
register unsigned int a1,b1,c1,d1;

a1 = *digest2;
b1 = *(digest2+1);
c1 = *(digest2+2);
d1 = *(digest2+3);

NRII0 (b1, c1, d1, a1, S44, SS44, 0xeb86d391); 
NRII0 (c1, d1, a1, b1, S43, SS43, 0x2ad7d2bb); 
c1 -= x1[2];
NRII0 (d1, a1, b1, c1, S42, SS42, 0xbd3af235); 
NRII0 (a1, b1, c1, d1, S41, SS41, 0xf7537e82); 
NRII0 (b1, c1, d1, a1, S44, SS44, 0x4e0811a1); 
NRII0 (c1, d1, a1, b1, S43, SS43, 0xa3014314); 
NRII0 (d1, a1, b1, c1, S42, SS42, 0xfe2ce6e0); 
NRII0 (a1, b1, c1, d1, S41, SS41, 0x6fa87e4f); 
NRII0 (b1, c1, d1, a1, S44, SS44, 0x85845dd1); 

*working=a1;
*(working+1)=b1;
*(working+2)=c1;
*(working+3)=d1;	  


return(1);
}

The majority of mdcrack was written prior to 2001. It isn't "hand crafted" assembly, and even if was a lot has changed since it was written. The "sse" version of mdcrack is merely compiled with sse optimizations.

The fact that a program that uses things like SSE2 intrinsics and/or CUDA, which isn't C lol, blows a straight C program out of the water is no surprise. The fact "sse" optimizations from the compiler don't help that much is also no surprise. Thanks for proving my point.

EDIT: Which proves ANOTHER point. Imagine if JAVA used some fancy SSE2 in its md5 code. That would make a Java based cracker faster than md5crack lol. Straight portable Java > straight portable C!

Thief^ · « **Reply #99 on:** September 13, 2010, 02:32:42 pm »

Quote from: bartavelle on September 13, 2010, 01:51:51 pm

And there is another good thing that comes from intrinsics : while they are not standard, they are somehow portable, contrary to inline asm. You could possibly compile something on gcc and visual studio.

Intrinsics are most definitely not portable.

Quote from: devek on September 13, 2010, 01:27:59 pm

You say that intrinsics are not assembly, I say that they are. That was the crux of our argument all along.

They are also not assembly, due to not having to assign registers yourself, and the fact that many of them are actually wrappers for multiple asm instructions. If you want to claim that they are assembly, you should also claim that the C code "int a = 1; int b = 2; int c = a + b;" is assembly: two set register to constant instructions, followed by an add instruction.

Typically the advantage with intrinsics over assembly is that the compiler can optimise around them. Most compilers won't reorder instructions into/past an asm block, for example, and will force members vars to be written to RAM before the asm and reloaded after, instead of caching them in registers (on the assumption that the asm could read/write anything). I'm not sure if C functions containing asm blocks will be inlined into their callers either, guaranteeing you the overhead of a function call. In all, that makes asm inside C actually quite expensive.

bartavelle · « **Reply #100 on:** September 13, 2010, 02:38:26 pm »

Quote from: devek on September 13, 2010, 02:18:28 pm

The majority of mdcrack was written prior to 2001. It isn't "hand crafted" assembly, and even if was a lot has changed since it was written. The "sse" version of mdcrack is merely compiled with sse optimizations.

Ah I'm wrong ! I always figured it was SSE2 assembly as it was touted to be so much better than the competition. However, I do have a working example. The SHA1 and MD5 implementations in john the ripper. The original MD5 is just plain x86, and not too fast. For 32 bits versions I did provide a SSE implementation years ago, with the reverse trick (for MD5 obviously). I admit it could probably be made much faster, but not without breaking the very convenient macros I used, and thus ending with a nightmare of hand adjusting everything.

Or you could just use the intrinsics code I provided years later, after I saw that barswf program in IDA and realized why it was fast. It's much easier to write than the assembly, doesn't include the reverse trick, but is probably more than twice faster (I don't have hard number, but you are free to download and check it). It is even more dramatic with 64 bits.

I agree I'm far from being a good developper, but I still believe that :
* in the worst case, with applications that are not designed to break ICC, it is on par with hand crafted assembly
* on certain applications (like these hash functions) it will give you a much better result than most hand crafted assembly
* it will always be faster to write something fast with it
* it will be more portable (yes, not portable)

devek · « **Reply #101 on:** September 13, 2010, 03:01:53 pm »

Quote from: Thief^ on September 13, 2010, 02:32:42 pm

They are also not assembly, due to not having to assign registers yourself, and the fact that many of them are actually wrappers for multiple asm instructions. If you want to claim that they are assembly, you should also claim that the C code "int a = 1; int b = 2; int c = a + b;" is assembly: two set register to constant instructions, followed by an add instruction.

Typically the advantage with intrinsics over assembly is that the compiler can optimise around them. Most compilers won't reorder instructions into/past an asm block, for example, and will force members vars to be written to RAM before the asm and reloaded after, instead of caching them in registers (on the assumption that the asm could read/write anything). I'm not sure if C functions containing asm blocks will be inlined into their callers either, guaranteeing you the overhead of a function call. In all, that makes asm inside C actually quite expensive.

I hate to go all wikipedia here, but lets talk about what assembly language does.

Quote

It implements a symbolic representation of the binary machine codes and other constants needed to program a particular CPU architecture. This representation is usually defined by the hardware manufacturer, and is based on mnemonics that symbolize processing steps (instructions), processor registers, memory locations, and other language features. An assembly language is thus specific to a certain physical (or virtual) computer architecture. This is in contrast to most high-level languages, which are ideally portable.

If it looks like a duck, smells like a duck, blah blah, its pretty much a duck.

Code: [Select]

m4 = _mm_sqrt_ps(m3);
Mnemonics that symbolize binary machine codes? Check
Mnemonics that symbolize processor registers? Check
Specific to a certain physical architecture? Check

Code: [Select]

int a = 1; int b = 2; int c = a + b;
Mnemonics that symbolize binary machine codes? Negative
Mnemonics that symbolize processor registers? Negative
Specific to a certain physical architecture? Negative

This is a semantic argument now, but.. my argument that writing intrinsics is assembly is very valid. People have written all sorts of high level features into assembly and it is still assembly. There isn't a standard body that says what assembly is or isn't, while there are standards that make it very clear what c/c++ is or isn't.

The fact of the matter is.. I get it now. You have to understand though, and see it from my position. If someone came up to you and said, (without inline assembly or intrinsics) that a c++ compiler will make code as good as assembly can, you would think they were a total idiot.

bartavelle · « **Reply #102 on:** September 13, 2010, 03:11:35 pm »

Oh and there is a case against intrinsics, you can cover yourself with shame if you don't realize that your compiler will optimize them, and possibly remove them altogether if you don't use the data they compute : http://www.pcworld.com/article/140064/hacker_uses_sony_playstation_3_to_crack_passwords.html

devek · « **Reply #103 on:** September 13, 2010, 03:15:34 pm »

GCC does that exact same optimization with inline assembly... That is why you specify what your input/output is, the constraints on them, and register usage is when you make the inline call. Just so it can optimize it.

I would like to see something specific GCC could optimize from an intrinsic it couldn't from inline assembly. I'm totally open minded to that.

Thief^ · « **Reply #104 on:** September 13, 2010, 03:21:18 pm »

Quote from: devek on September 13, 2010, 03:01:53 pm

If someone came up to you and said, (without inline assembly or intrinsics) that a c++ compiler will make code as good as assembly can, you would think they were a total idiot.

For vector code, sure, but only because for some reason vector types haven't been added to c++ itself.
EDIT: Just checked and VC++ (even 2010) at least won't try to use SSE2 vector instructions automatically, only SSE2 scalar instructions. Don't know about GCC.

Quote from: devek on September 13, 2010, 03:01:53 pm

Quote from: Thief^ on September 13, 2010, 02:32:42 pm
They are also not assembly, due to not having to assign registers yourself, and the fact that many of them are actually wrappers for multiple asm instructions. If you want to claim that they are assembly, you should also claim that the C code "int a = 1; int b = 2; int c = a + b;" is assembly: two set register to constant instructions, followed by an add instruction.

I hate to go all wikipedia here, but lets talk about what assembly language does.

Quote
It implements a symbolic representation of the binary machine codes and other constants needed to program a particular CPU architecture. This representation is usually defined by the hardware manufacturer, and is based on mnemonics that symbolize processing steps (instructions), processor registers, memory locations, and other language features. An assembly language is thus specific to a certain physical (or virtual) computer architecture. This is in contrast to most high-level languages, which are ideally portable.

If it looks like a duck, smells like a duck, blah blah, its pretty much a duck.

Code: [Select]
m4 = _mm_sqrt_ps(m3);
Mnemonics that symbolize binary machine codes? Check
Mnemonics that symbolize processor registers? Check
Specific to a certain physical architecture? Check

Code: [Select]
int a = 1; int b = 2; int c = a + b;
Mnemonics that symbolize binary machine codes? Negative
Mnemonics that symbolize processor registers? Negative
Specific to a certain physical architecture? Negative

The two lines I've bolded are in opposition. Both bits of code just use variables, "m3" and "m4" aren't register names, but are variables just like a, b and c in the second example.

News:

Author Topic: What language is Dwarf Fortress made in? (Read 48001 times)

devek

Re: What language is Dwarf Fortress made in?

bartavelle

Re: What language is Dwarf Fortress made in?

devek

Re: What language is Dwarf Fortress made in?

Thief^

Re: What language is Dwarf Fortress made in?

devek

Re: What language is Dwarf Fortress made in?

bartavelle

Re: What language is Dwarf Fortress made in?

devek

Re: What language is Dwarf Fortress made in?

bartavelle

Re: What language is Dwarf Fortress made in?

devek

Re: What language is Dwarf Fortress made in?

Thief^

Re: What language is Dwarf Fortress made in?

bartavelle

Re: What language is Dwarf Fortress made in?

devek

Re: What language is Dwarf Fortress made in?

bartavelle

Re: What language is Dwarf Fortress made in?

devek

Re: What language is Dwarf Fortress made in?

Thief^

Re: What language is Dwarf Fortress made in?