Finally here, the release of a well over 80 hours of coding time, hours more finding and generating the patterns, and nearly 7800* total lines of code. (And a couple weeks wait for Dtil develop the ability to allow me to write the plugin.)
Introducing PatternsEnginePatternsEngine is a multi-targetted plugin & library. It's targeted at end users, plugin writers, and utility writers. It's written in C# for .Net 2.0 and is Public Domain.
The whole purpose of this plugin is...
to allow patterns to be created so that tools can automatically find the memory offsets based on pieces of assembly code that use them, even when a new version is released (as long as the pattern still holds)
(Thanks to Baboonanza for pointing out what a moron I was, since I completely missed the point of this plugin/tool. Other than the title that is.)End UsersIf you're an end user, which should be everyone who uses Dtil (and hopefully eventually Tweak), this is
for you. Why should you use this? First, take the burden of the utility/plugin writers. Second, if you don't I'll kill you
and maybe use your bones for my next construction.
For installing in Dtil:
How do you test to see if it's working? Easy.
- Exit Dtil
- Rename Memory.ini to DontUseMemory.ini
- Start Dtil (and DF if it isn't already running)
- Click Game>Attach to Process
- When it asks you if you want to automatically try to discover the addresses, click yes
- Wait for it to finish compiling the modules
- If you didn't get any errors, Congratulations! It worked!
If you did get errors, in <Dtil>\Plugins\Discovery, you'll notice a file called: Jifodus.Identification.log
Post here with the contents of that log. - Now delete the generated Memory.ini and rename the previously renamed DontUseMemory.ini back to Memory.ini
As of September 19, 2008 you can now update configuration for Tweak, Dwarf Foreman, and 3Dwarf/Dwarvis. It's still easy, but it isn't as clean as the integration with Dtil.
- Download http://www.geocities.com/jifodus/discovery/discovery_standalone.zip (age: September 19, 2008)
- Extract to a folder
- Run Jifodus.PatternsEngine.Standalone.exe
- Click File>Open
- Navigate to the Dwarf Fortress folder and open up dwarfort.exe
- Ensure the information displayed in the "Status" tab is correct
- On the "Tools" tab, setup the paths for the tools you want the configuration updated
- Click Update
- Wait for it to notify of it's completion
How likely will it break? Well that depends on whether or not I selected stable patterns. Odds are they will remain stable between bug fix versions. If not, when I get around to updating the pattern or adding a new pattern, you'll find the updated DF.Base.xml at
http://www.geocities.com/jifodus/discovery/df.base.xml.zip (age: Non Existant). Of course, others can provide their own versions/changes.
Note: As of September 19, 2008, when you generate the configuration for Tweak, it will warn that the patterns
enable_magma_forge and
enable_magma_furnaces don't exist. This is to be expected since Tweak can lookup those two values for itself; I'll eventually be adding patterns until then, they will remain missing.
Plugin WritersYou might be asking yourself, how can this be targetted at plugin writers? (When I say plugin writers, I'm of course talking to those who write plugins for Dtil, and eventually Tweak.) My answer is simple: You may never need to use the library directly, nor may you ever need to find custom addresses and offsets
(or you might hire me or someone else who's knowledgeable about this subject to do the dirty work; I of course work pro bono, but I'd need free time and you must provide me with the addresses/offsets you need added), but you should at least know how it works and what causes it to fail.
The theory behind the library is that between versions there's fixed code patterns. By searching for these fixed code patterns, it becomes possible to extract the information directly from the Dwarf Fortress executable. Actually I lied, the fixed code patterns highly depends on what the compiler's optimizer does.
Take this assembly code for example (not actually from DF):
mov ebx, viewport_x
mov ecx, viewport_y
mov dword_9FB29C, edx
mov edx, viewport_z
Now Toady decides to change the function that the above assembly code is in, the problem is that now the assembly code looks like this since the optimizer decided to change some registers:
mov edx, viewport_x
mov ebx, viewport_y
mov dword_9FB29C, ecx
mov ecx, viewport_z
Oops, now the naïve way for pattern matching breaks. This is what Tweak's lookup does & what 0x517A5D's hexsearch's typical usage will do.
Enter: PatternsEngine
The trick with PatternsEngine is that the pattern matching knows how to understand x86 machine code. So all the pattern writer has to do is indicate that the registers edx, ecx, and ebx should be ignored by the matching engine.
However, PatternsEngine is not without it's limitations, what about if the code changes to this:
mov edx, viewport_x
mov ebx, viewport_y
mov ecx, viewport_z
Well, this is something that PatternsEngine can't handle in the current version.
There's actually one more point where where PatternsEngine will break, you may have noticed that the code sample isn't actually from DF, it's actually based off this code (v0.28.181.40d):
mov eax, viewport_x
mov ecx, viewport_y
mov dword_9FB29C, edx
mov edx, viewport_z
That's right, I changed one register, that's because the x86 machine code specifies a better way to write
mov eax, viewport_x where it'll take one less byte to do so. This means if the registers were to change like in the second example (replacing ebx with eax), that pattern again won't match since the instructions at the machine level are different.
Now that the theory is over, here comes the practice (if you don't plan on ever generating patterns, you can stop here):
<pattern>
<name alias="viewport_z" />
<name alias="screen_z" />
<export name="z" />
<import name="viewport_x" as="x" />
<import name="viewport_y" as="y" />
<fragment section=".text">
<!--
.text:004B782F 36C A1 FC 47 D4 00 mov eax, viewport_x
.text:004B7834 36C 8B 0D 70 28 D7 00 mov ecx, viewport_y
.text:004B783A 36C 89 15 9C B2 9F 00 mov dword_9FB29C, edx
.text:004B7840 36C 8B 15 4C 28 D7 00 mov edx, viewport_z
-->
<instruction opcode="A1" immediate="#x" />
<instruction opcode="8B">
<modrm register="$ecx0" base="[#y]" />
</instruction>
<instruction opcode="89">
<modrm register="$edx0" base="[#dw9FB29C]" />
</instruction>
<instruction opcode="8B">
<modrm register="$edx1" base="[#z]" />
</instruction>
</fragment>
<check>
A1 FC 47 D4 00
8B 0D 70 28 D7 00
89 15 9C B2 9F 00
8B 15 4C 28 D7 00
</check>
</pattern>
I'll break this down into it's different tags:
- <pattern>
it's just a container for all the information of a single pattern - <name>
each pattern can have multiple name tags, and you specify what the name is through the alias attribute - <export>
you can export one of the pattern's named wild card values through the name attribute - <import>
this is the complement of <export>, this lets you add a dependency on another pattern identified by the name attribute; you then set the name used in the pattern with the as attribute - <fragment>
this has one parameter and it is section, the section specifies what part of the executable it should search, normally it'll be the ".text" section since that's where the exectuable code is; sometimes you're looking for a string, in that case you'd use ".rdata" (which stands for read-only data) - instruction
this is the workhorse of the pattern, you can define upto 3 opcode bytes (I call them octets); you can also add upto 3 prefix bytes (the order wont matter for these) and an immediate attribute which can be a wildcard or a base 16 integer (see below for details) - modrm
it'll either have an extended opcode or a register attribute in addition to a base attribute (see below) - <check>
this lets you check to make sure your pattern actually works, typically I just copy the machine code (you can see it in the comment, right next to the disassembly) — don't underestimate it's usefulness
Size matters, it tells the matching engine how to process the instruction. Therefore when you specify any "immediate" value, you must be clear how big it's supposed to be.
For example these hexidecimal values all are interpreted differently: 0x01, 0x0001, 0x00000001. The first is interpreted as a single byte, the second as a 2-byte word, and the third as a 4-byte double-word.
What about when it's supposed to be a wildcard? Then you prefix the name with one of '!', '@', or '#' to specify 1-byte, 2-bytes, and 4-bytes respectively. How do you remember the obtuse prefix scheme? By noticing that '!' is Shift+1, '@' is Shift+2, and '#' is Shift+3.
The
base attribute builds on the above numbering system and further (abuses) it. All unknown registers are prefixed with a '$'. For those who have done assembly programming will feel right at home, since I tried to keep a syntax as close as possible as what you'd be expected to write. Just remember to keep the different parts in the following order:
[Base+Index*Scale+Address/Offset] (and if all you want is a register:
Register).
Yes, you are expected to read machine code (to a certain point). Therefore reading the x86 instruction set manuals is highly recommended. (I'll provide a download link, however, since neither AMD or Intel put it in a highly convenient location, I'll add it later.) However, I'll provide a crash course in reading machine code. I'm going to be assuming you're using a disassembler that'll break up the instructions for you automatically and show the machine code per instruction.
- prefix bytes, they can be in any order (in hex): 2E, 36, 3E, 26, 64, 65
- modrm is a 3 field byte: mmrr rbbb
mm - 2 bits represent the mode, 11 (or 3) means that the base field represents a register, the others indicate an addressing mode
rrr - 3 bits represents register (or opcode, if the instruction doesn't use a register)
bbb - 3 bits represents the base register or when in addressing mode the base register - If register bbb is 5 and it's in addressing mode then there is an SIB byte. However, I don't have to explain more since the disassembler shows you the approximate formatting you'll need anyway.
- If there's an Address/Offset then it'll follow the ModRM byte (or SIB byte if it has it), it should be easy to tell from the machine code upon inspection how many bytes the Address/Offset has
- The immediate value comes last, again by inspecting the machine code and comparing it to the disassembled instruction it should be pretty easy to tell how many bytes it has.
Tip: Don't try to spend more than a few hours writing patterns, that is unless you like not having a brain (or don't have one to begin with, in which case it won't matter).
Utility Writers/Code Hackers (for Plugin Writers too!)
You can grab the source from:
http://www.geocities.com/jifodus/discovery/discovery_src.zip (age: September 19, 2008)
The code is public domain, so it's free to use, change, or however you want to abuse or reuse it.
This is easier than generating patterns hopefully I'll write a standalone executable so that utilities such as Dwarf Foreman (which is not a .net application) can benefit from it with a simple
CreateProcess() (or even easier:
system()).
At the bare minimum you'll need to know this interface:
namespace PatternsEngine
{
public interface PatternSource
{
string[] GetPatternNames();
string[] Patterns { get; }
string[] GetAlternateNames(string name);
bool IsReady();
bool Ready { get; }
string ExecutableSource { get; set; }
void SetExecutableSource(string file);
void SetExecutableMemory(byte[] memory);
bool ScanOnDemand { get; set; }
bool IsAutoScanOnDemand();
void SetAutoScanOnDemand(bool enable);
ulong GetCachedValue(string name);
ulong GetMatchedValue(string name);
ulong GetValue(string name);
void SetCacheValue(string name, ulong value);
}
}
These exceptions are handy to know about when you use GetValue.
namespace PatternsEngine
{
public class PatternMatcherException : Exception
{
public PatternMatcherException(string what) : base(what) { }
}
public class ValueNotFoundException : PatternMatcherException
{
public ValueNotFoundException(string what) : base(what) { }
}
public class ValueNotCachedException : ValueNotFoundException
{
public ValueNotCachedException(string what) : base(what) { }
}
public class NamedPatternNotFoundException : PatternMatcherException
{
public NamedPatternNotFoundException(string what) : base(what) { }
}
}
Finally the class that drives the generic MatchingEngine (implementation of PatternSource):
namespace PatternsEngine
{
public sealed partial class DFPatterns : MatchingEngine
{
public DFPatterns();
public bool HasVersionName { get { return false; } }
public string VersionName { get { return "version unavailable"; } }
}
}
Astute coders will notice in the source that the class is partial, well the that's because HasVersionName and VersionName have alternate implementations; the implementation provided is a dummy implementation. When I make a release, I'll compile it with the real versions that knows how to process the
index file in the DF data directory. At the moment the implementation is left secret since it involves actual reverse engineered DF code and the last time I talked to Toady about reverse engineered encryption/decryption code, he requested that I not publish source to it, as such, I'm honoring that request unless permitted otherwise.
I'll make a guarantee about the PatternSource interface, it'll remain unchanged. If I do need to make an interface change, I'll just add a new interface (extended, branched, whatever).
PatternSource 101:
- Construct DFPatterns (requires no parameters, it does all the hard stuff)
- Call SetExecutableSource with the path to the executable, or call SetExecutableMemory if all you have access to is the raw memory. (I should comment that I've never actually used SetExecutableMemory, so I don't actually know if it works. If it doesn't then it's the way that PEUtils handles the VirtualAddresses.)
- Call GetValue with the names of the patterns you'd like to use. Note, it will throw a slew of exceptions: PatternMatcherException, ValueNotFoundException, NamedPatternNotFoundException, ArgumentException, InvalidOperationException and maybe a couple others.
PatternMatcherException - is typically thrown when it can't understand a pattern.
ValueNotFoundException & NamedPatternNotFoundException - I hope are self explanatory...
ArgumentException - maybe thrown in certain circumstances when a pattern does some nasty stuff with wildcards
InvalidOperationException - is thrown by the pattern matching engine when it enters an invalid state (i.e. engine bug)
The source project use VS2008, however, it targets .net 2.0 since I originally started writing it in VS2005 and do still have the original project files.
I'm sort of working on a standardized GUI that can be used for a consistent interface across multiple plugins. It also would probably be used by any stand alone application I write.
The TweakModule is there, but currently conditionally compiled out for 2 reasons:
First, Dtil uses GetTypes(), and since it adds a dependency on Tweak's library causes GetTypes() to throw an Exception since it can't instantiate a type.
Second, it's
broken. Tweak loads it just fine, however, Tweak won't let my plugin run even though it doesn't demand any addresses. So it's pointless. I also never wrote the file handling code, since it isn't straight forward for automatically determine what version is running, I can use the screen scraping technique that Dtil does though.
You'll also notice in the source, through abuse of conditional compiling, that it can use a centralized config file. What's stopping me from using it? I don't have a way to update the XML file from multiple sources yet.
Also, there's more that I'm planning on doing to the pattern config class, like being able to use multiple patterns to hopefully add multi-fragment matching, add back-version pattern maching compatibility, and more! Also the code doesn't know how to deal with circular import dependencies (it decides it needs to crash with a stack overflow instead). It'd also be useful to have one pattern be able to export to other patterns (since a couple patterns are verbatim pattern duplicates of others).
Other features I'd like to add is the ability automatically check for new or updated patterns. Also it'd be nice if there was a 1-click config generator, basically you give it a list of utilities and their config formats, then when the new version rolls around, all you have to do is launch the tool and it'll automatically generate the configuration files for it. The penultimate challenge: add a full blown disassembler & code analyzer to survive most possible code changes.
[*] The actual numbers (I'm too lazy to use SLOC or whatever, so it's just the line numbering provided by VS):
Lines File
1469 CacheSections.cs
1187 DF.Base.xml (excludes cache section)
190 Lexers.cs
856 MatchingEngine.cs
475 PEUtils.cs
90 PluginGui.cs
490 Plugins.cs
3036 X86Matcher.cs
7793 Total