Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: Automatic Address & Offset Finder: September 19, 2008 (Dtil, Tweak, Foreman)  (Read 3926 times)

Jifodus

  • Bay Watcher
  • Resident Lurker
    • View Profile
    • Dwarf Fortress Projects

Finally here, the release of a well over 80 hours of coding time, hours more finding and generating the patterns, and nearly 7800* total lines of code. (And a couple weeks wait for Dtil develop the ability to allow me to write the plugin.)

Introducing PatternsEngine

PatternsEngine is a multi-targetted plugin & library. It's targeted at end users, plugin writers, and utility writers. It's written in C# for .Net 2.0 and is Public Domain.

The whole purpose of this plugin is...
to allow patterns to be created so that tools can automatically find the memory offsets based on pieces of assembly code that use them, even when a new version is released (as long as the pattern still holds)
(Thanks to Baboonanza for pointing out what a moron I was, since I completely missed the point of this plugin/tool. Other than the title that is.)

End Users

If you're an end user, which should be everyone who uses Dtil (and hopefully eventually Tweak), this is for you. Why should you use this? First, take the burden of the utility/plugin writers. Second, if you don't I'll kill you and maybe use your bones for my next construction.

For installing in Dtil:

How do you test to see if it's working? Easy.
  • Exit Dtil
  • Rename Memory.ini to DontUseMemory.ini
  • Start Dtil (and DF if it isn't already running)
  • Click Game>Attach to Process
  • When it asks you if you want to automatically try to discover the addresses, click yes
  • Wait for it to finish compiling the modules
  • If you didn't get any errors, Congratulations! It worked!
    If you did get errors, in <Dtil>\Plugins\Discovery, you'll notice a file called: Jifodus.Identification.log
    Post here with the contents of that log.
  • Now delete the generated Memory.ini and rename the previously renamed DontUseMemory.ini back to Memory.ini

As of September 19, 2008 you can now update configuration for Tweak, Dwarf Foreman, and 3Dwarf/Dwarvis. It's still easy, but it isn't as clean as the integration with Dtil.
  • Download http://www.geocities.com/jifodus/discovery/discovery_standalone.zip (age: September 19, 2008)
  • Extract to a folder
  • Run Jifodus.PatternsEngine.Standalone.exe
  • Click File>Open
  • Navigate to the Dwarf Fortress folder and open up dwarfort.exe
  • Ensure the information displayed in the "Status" tab is correct
  • On the "Tools" tab, setup the paths for the tools you want the configuration updated
  • Click Update
  • Wait for it to notify of it's completion

How likely will it break? Well that depends on whether or not I selected stable patterns. Odds are they will remain stable between bug fix versions. If not, when I get around to updating the pattern or adding a new pattern, you'll find the updated DF.Base.xml at http://www.geocities.com/jifodus/discovery/df.base.xml.zip (age: Non Existant). Of course, others can provide their own versions/changes.

Note: As of September 19, 2008, when you generate the configuration for Tweak, it will warn that the patterns enable_magma_forge and enable_magma_furnaces don't exist. This is to be expected since Tweak can lookup those two values for itself; I'll eventually be adding patterns until then, they will remain missing.

Plugin Writers

You might be asking yourself, how can this be targetted at plugin writers? (When I say plugin writers, I'm of course talking to those who write plugins for Dtil, and eventually Tweak.) My answer is simple: You may never need to use the library directly, nor may you ever need to find custom addresses and offsets (or you might hire me or someone else who's knowledgeable about this subject to do the dirty work; I of course work pro bono, but I'd need free time and you must provide me with the addresses/offsets you need added), but you should at least know how it works and what causes it to fail.

The theory behind the library is that between versions there's fixed code patterns. By searching for these fixed code patterns, it becomes possible to extract the information directly from the Dwarf Fortress executable. Actually I lied, the fixed code patterns highly depends on what the compiler's optimizer does.

Take this assembly code for example (not actually from DF):
Code: [Select]
mov     ebx, viewport_x
mov     ecx, viewport_y
mov     dword_9FB29C, edx
mov     edx, viewport_z
Now Toady decides to change the function that the above assembly code is in, the problem is that now the assembly code looks like this since the optimizer decided to change some registers:
Code: [Select]
mov     edx, viewport_x
mov     ebx, viewport_y
mov     dword_9FB29C, ecx
mov     ecx, viewport_z
Oops, now the naïve way for pattern matching breaks. This is what Tweak's lookup does & what 0x517A5D's hexsearch's typical usage will do.

Enter: PatternsEngine

The trick with PatternsEngine is that the pattern matching knows how to understand x86 machine code. So all the pattern writer has to do is indicate that the registers edx, ecx, and ebx should be ignored by the matching engine.

However, PatternsEngine is not without it's limitations, what about if the code changes to this:
Code: [Select]
mov     edx, viewport_x
mov     ebx, viewport_y
mov     ecx, viewport_z

Well, this is something that PatternsEngine can't handle in the current version.

There's actually one more point where where PatternsEngine will break, you may have noticed that the code sample isn't actually from DF, it's actually based off this code (v0.28.181.40d):
Code: [Select]
mov     eax, viewport_x
mov     ecx, viewport_y
mov     dword_9FB29C, edx
mov     edx, viewport_z

That's right, I changed one register, that's because the x86 machine code specifies a better way to write mov eax, viewport_x where it'll take one less byte to do so. This means if the registers were to change like in the second example (replacing ebx with eax), that pattern again won't match since the instructions at the machine level are different.

Now that the theory is over, here comes the practice (if you don't plan on ever generating patterns, you can stop here):
Code: [Select]
<pattern>
  <name alias="viewport_z" />
  <name alias="screen_z" />
  <export name="z" />
  <import name="viewport_x" as="x" />
  <import name="viewport_y" as="y" />
  <fragment section=".text">
    <!--
.text:004B782F 36C A1 FC 47 D4 00    mov eax, viewport_x
.text:004B7834 36C 8B 0D 70 28 D7 00 mov ecx, viewport_y
.text:004B783A 36C 89 15 9C B2 9F 00 mov dword_9FB29C, edx
.text:004B7840 36C 8B 15 4C 28 D7 00 mov edx, viewport_z
    -->
    <instruction opcode="A1" immediate="#x" />
    <instruction opcode="8B">
      <modrm register="$ecx0" base="[#y]" />
    </instruction>
    <instruction opcode="89">
      <modrm register="$edx0" base="[#dw9FB29C]" />
    </instruction>
    <instruction opcode="8B">
      <modrm register="$edx1" base="[#z]" />
    </instruction>
  </fragment>
  <check>
    A1 FC 47 D4 00
    8B 0D 70 28 D7 00
    89 15 9C B2 9F 00
    8B 15 4C 28 D7 00
  </check>
</pattern>

I'll break this down into it's different tags:
  • <pattern>
    it's just a container for all the information of a single pattern
  • <name>
    each pattern can have multiple name tags, and you specify what the name is through the alias attribute
  • <export>
    you can export one of the pattern's named wild card values through the name attribute
  • <import>
    this is the complement of <export>, this lets you add a dependency on another pattern identified by the name attribute; you then set the name used in the pattern with the as attribute
  • <fragment>
    this has one parameter and it is section, the section specifies what part of the executable it should search, normally it'll be the ".text" section since that's where the exectuable code is; sometimes you're looking for a string, in that case you'd use ".rdata" (which stands for read-only data)
  • instruction
    this is the workhorse of the pattern, you can define upto 3 opcode bytes (I call them octets); you can also add upto 3 prefix bytes (the order wont matter for these) and an immediate attribute which can be a wildcard or a base 16 integer (see below for details)
  • modrm
    it'll either have an extended opcode or a register attribute in addition to a base attribute (see below)
  • <check>
    this lets you check to make sure your pattern actually works, typically I just copy the machine code (you can see it in the comment, right next to the disassembly) &mdash; don't underestimate it's usefulness

Size matters, it tells the matching engine how to process the instruction. Therefore when you specify any "immediate" value, you must be clear how big it's supposed to be.

For example these hexidecimal values all are interpreted differently: 0x01, 0x0001, 0x00000001. The first is interpreted as a single byte, the second as a 2-byte word, and the third as a 4-byte double-word.

What about when it's supposed to be a wildcard? Then you prefix the name with one of '!', '@', or '#' to specify 1-byte, 2-bytes, and 4-bytes respectively. How do you remember the obtuse prefix scheme? By noticing that '!' is Shift+1, '@' is Shift+2, and '#' is Shift+3.

The base attribute builds on the above numbering system and further (abuses) it. All unknown registers are prefixed with a '$'. For those who have done assembly programming will feel right at home, since I tried to keep a syntax as close as possible as what you'd be expected to write. Just remember to keep the different parts in the following order: [Base+Index*Scale+Address/Offset] (and if all you want is a register: Register).

Yes, you are expected to read machine code (to a certain point). Therefore reading the x86 instruction set manuals is highly recommended. (I'll provide a download link, however, since neither AMD or Intel put it in a highly convenient location, I'll add it later.) However, I'll provide a crash course in reading machine code. I'm going to be assuming you're using a disassembler that'll break up the instructions for you automatically and show the machine code per instruction.
  • prefix bytes, they can be in any order (in hex): 2E, 36, 3E, 26, 64, 65
  • modrm is a 3 field byte: mmrr rbbb
    mm - 2 bits represent the mode, 11 (or 3) means that the base field represents a register, the others indicate an addressing mode
    rrr - 3 bits represents register (or opcode, if the instruction doesn't use a register)
    bbb - 3 bits represents the base register or when in addressing mode the base register
  • If register bbb is 5 and it's in addressing mode then there is an SIB byte. However, I don't have to explain more since the disassembler shows you the approximate formatting you'll need anyway.
  • If there's an Address/Offset then it'll follow the ModRM byte (or SIB byte if it has it), it should be easy to tell from the machine code upon inspection how many bytes the Address/Offset has
  • The immediate value comes last, again by inspecting the machine code and comparing it to the disassembled instruction it should be pretty easy to tell how many bytes it has.

Tip: Don't try to spend more than a few hours writing patterns, that is unless you like not having a brain (or don't have one to begin with, in which case it won't matter).

Utility Writers/Code Hackers (for Plugin Writers too!)

You can grab the source from: http://www.geocities.com/jifodus/discovery/discovery_src.zip (age: September 19, 2008)

The code is public domain, so it's free to use, change, or however you want to abuse or reuse it.

This is easier than generating patterns hopefully I'll write a standalone executable so that utilities such as Dwarf Foreman (which is not a .net application) can benefit from it with a simple CreateProcess() (or even easier: system()).

At the bare minimum you'll need to know this interface:
Code: [Select]
namespace PatternsEngine
{
    public interface PatternSource
    {
        string[] GetPatternNames();
        string[] Patterns { get; }
        string[] GetAlternateNames(string name);

        bool IsReady();
        bool Ready { get; }
        string ExecutableSource { get; set; }
        void SetExecutableSource(string file);
        void SetExecutableMemory(byte[] memory);

        bool ScanOnDemand { get; set; }
        bool IsAutoScanOnDemand();
        void SetAutoScanOnDemand(bool enable);

        ulong GetCachedValue(string name);
        ulong GetMatchedValue(string name);
        ulong GetValue(string name);

        void SetCacheValue(string name, ulong value);
    }
}

These exceptions are handy to know about when you use GetValue.
Code: [Select]
namespace PatternsEngine
{
    public class PatternMatcherException : Exception
    {
        public PatternMatcherException(string what) : base(what) { }
    }

    public class ValueNotFoundException : PatternMatcherException
    {
        public ValueNotFoundException(string what) : base(what) { }
    }

    public class ValueNotCachedException : ValueNotFoundException
    {
        public ValueNotCachedException(string what) : base(what) { }
    }

    public class NamedPatternNotFoundException : PatternMatcherException
    {
        public NamedPatternNotFoundException(string what) : base(what) { }
    }
}

Finally the class that drives the generic MatchingEngine (implementation of PatternSource):
Code: [Select]
namespace PatternsEngine
{
    public sealed partial class DFPatterns : MatchingEngine
    {
        public DFPatterns();

        public bool HasVersionName { get { return false; } }
        public string VersionName { get { return "version unavailable"; } }
    }
}

Astute coders will notice in the source that the class is partial, well the that's because HasVersionName and VersionName have alternate implementations; the implementation provided is a dummy implementation. When I make a release, I'll compile it with the real versions that knows how to process the index file in the DF data directory. At the moment the implementation is left secret since it involves actual reverse engineered DF code and the last time I talked to Toady about reverse engineered encryption/decryption code, he requested that I not publish source to it, as such, I'm honoring that request unless permitted otherwise.

I'll make a guarantee about the PatternSource interface, it'll remain unchanged. If I do need to make an interface change, I'll just add a new interface (extended, branched, whatever).

PatternSource 101:
  • Construct DFPatterns (requires no parameters, it does all the hard stuff)
  • Call SetExecutableSource with the path to the executable, or call SetExecutableMemory if all you have access to is the raw memory. (I should comment that I've never actually used SetExecutableMemory, so I don't actually know if it works. If it doesn't then it's the way that PEUtils handles the VirtualAddresses.)
  • Call GetValue with the names of the patterns you'd like to use. Note, it will throw a slew of exceptions: PatternMatcherException, ValueNotFoundException, NamedPatternNotFoundException, ArgumentException, InvalidOperationException and maybe a couple others.
    PatternMatcherException - is typically thrown when it can't understand a pattern.
    ValueNotFoundException & NamedPatternNotFoundException - I hope are self explanatory...
    ArgumentException - maybe thrown in certain circumstances when a pattern does some nasty stuff with wildcards
    InvalidOperationException - is thrown by the pattern matching engine when it enters an invalid state (i.e. engine bug)

The source project use VS2008, however, it targets .net 2.0 since I originally started writing it in VS2005 and do still have the original project files.

I'm sort of working on a standardized GUI that can be used for a consistent interface across multiple plugins. It also would probably be used by any stand alone application I write.

The TweakModule is there, but currently conditionally compiled out for 2 reasons:
First, Dtil uses GetTypes(), and since it adds a dependency on Tweak's library causes GetTypes() to throw an Exception since it can't instantiate a type.
Second, it's broken. Tweak loads it just fine, however, Tweak won't let my plugin run even though it doesn't demand any addresses. So it's pointless. I also never wrote the file handling code, since it isn't straight forward for automatically determine what version is running, I can use the screen scraping technique that Dtil does though.

You'll also notice in the source, through abuse of conditional compiling, that it can use a centralized config file. What's stopping me from using it? I don't have a way to update the XML file from multiple sources yet.

Also, there's more that I'm planning on doing to the pattern config class, like being able to use multiple patterns to hopefully add multi-fragment matching, add back-version pattern maching compatibility, and more! Also the code doesn't know how to deal with circular import dependencies (it decides it needs to crash with a stack overflow instead). It'd also be useful to have one pattern be able to export to other patterns (since a couple patterns are verbatim pattern duplicates of others).

Other features I'd like to add is the ability automatically check for new or updated patterns. Also it'd be nice if there was a 1-click config generator, basically you give it a list of utilities and their config formats, then when the new version rolls around, all you have to do is launch the tool and it'll automatically generate the configuration files for it. The penultimate challenge: add a full blown disassembler & code analyzer to survive most possible code changes.

[*] The actual numbers (I'm too lazy to use SLOC or whatever, so it's just the line numbering provided by VS):
Code: [Select]
Lines File
1469 CacheSections.cs
1187 DF.Base.xml (excludes cache section)
190 Lexers.cs
856 MatchingEngine.cs
475 PEUtils.cs
90 PluginGui.cs
490 Plugins.cs
3036 X86Matcher.cs
7793 Total
« Last Edit: September 19, 2008, 05:25:36 pm by Jifodus »
Logged

isitanos

  • Bay Watcher
  • Seasonal river flood nostalgic
    • View Profile
Re: Automatic Address & Offset Finder (currently for Dtil): September 14, 2008
« Reply #1 on: September 16, 2008, 11:38:46 am »

Very interesting. I'm sure this is gonna help a lot with the updating of various tools.

I was wondering, though: how did you go about figuring, say, the right memory address to designate a tile to be dug out. Did you use the disassembly to follow the flow of the program until you found the place where it was performing that action, and then found out which type of memory structure tiles were using, as well as some call pointing to the beginning of that structure in memory? I'd be interested in examples of how the disassembly looks in that case, because I know how to write some assembler, and I know a bit about how c++ stores objects in memory, but the rest of what you're doing baffles me.
« Last Edit: September 17, 2008, 02:36:49 pm by isitanos »
Logged

Jifodus

  • Bay Watcher
  • Resident Lurker
    • View Profile
    • Dwarf Fortress Projects
Re: Automatic Address & Offset Finder (currently for Dtil): September 14, 2008
« Reply #2 on: September 16, 2008, 08:00:39 pm »

I don't directly analyze the executable, since it's far too big, and not enough of it is used to make it worthwhile.

I find it much easier to set a memory access breakpoint on the address of the field in the structure I'm interested in and watching what instructions access the memory. I in particular use Cheat Engine and use the "Find out what accesses this address" option. That doesn't mean to say I don't add addresses/offsets when I'm doing static analysis of the code, if I happen to see an interesting one while I'm working I'll add it (some of the creature offsets are a result of that).

I realized after the fact, that most of the second section is highly complex and maybe only a couple people would understand it. I should move that part where I explain adding new patterns to the wiki... less clutter in the post.
Logged

Baboonanza

  • Bay Watcher
    • View Profile
Re: Automatic Address & Offset Finder (currently for Dtil): September 14, 2008
« Reply #3 on: September 17, 2008, 10:26:17 am »

Maybe I'm being moronic, but the post doesn't actually seem to say what the purpose of the tool actually is.

Is it designed to allow patterns to be created so that tools can automatically find the memory offsets based on pieces of assembly code that use them, even when a new version is released (as long as the pattern still holds)? If so, cool :)
Logged

Jifodus

  • Bay Watcher
  • Resident Lurker
    • View Profile
    • Dwarf Fortress Projects
Re: Automatic Address & Offset Finder (currently for Dtil): September 14, 2008
« Reply #4 on: September 17, 2008, 10:32:16 am »

Maybe I'm being moronic, but the post doesn't actually seem to say what the purpose of the tool actually is.
Oops, not you, I'm the moronic one. "Information 101, what the heck is it?" ;)
I did, however, properly add it's point to the title.

Is it designed to allow patterns to be created so that tools can automatically find the memory offsets based on pieces of assembly code that use them, even when a new version is released (as long as the pattern still holds)? If so, cool :)
Exactly.
Logged

Mithaldu

  • Bay Watcher
    • View Profile

Goddammit, how in the 9 hells did i miss this? Thank you for editing your wiki page and pinging my watcher, Jifodus!

Your tool isn't directly usable for me, but at the very least i just found addresses for buildings! (and interestingly there are 3 vectors that point at the same addresses, all of which are buildings oO)

Actuallly, Jifodus, can you please have a look at the vectors at these addresses and add the one you think is most significant to the xml files?

015838A0
015838B0
015838F0

Oh, and these too:

0x015832C8
0x015832D8
0x015833B8

They're item vectors. :D
« Last Edit: October 08, 2008, 05:18:35 am by Mithaldu »
Logged

Jifodus

  • Bay Watcher
  • Resident Lurker
    • View Profile
    • Dwarf Fortress Projects

Wow! Buildings and items! Maybe more work can finally be done on analyzing them. I'll get to adding them. I've just looked at the buildings and for my fort, 015838A0 & 015838B0 were the same length and 015838F0 was shorter.

Probably when I upload the patterns for the buildings and items I'll also have these added. My head aches in anticipation of generating another 12 or so patterns (fortunately for Exponent's, I only have to add the hotkeys, the others were already there/there but didn't have the appropriate alias). :-\

I should probably also add a generator for lifevis... which means I have to download it.
Logged