I think a set of simple sounds getting more complex with the detail level is a good start of an idea.
To start, I can tell you is what a sighted player sees, to give you an idea of what combination of solutions is needed.
By default, the primary screen in fortress mode has three vertical sections. The leftmost is the map, the center one is the menu, and the rightmost is an overview map that is pretty much useless. You can turn off that overview map without losing anything important. Some additional bits of information are located around the edges of the screen including the number of idle dwarves, the current frames-per-second performance, the current z-level, and the phase of the moon.
Additional screens are all-text, though some have a vertical scroll-bar along the edge. This should be easy to reformat with DFHack. Some of the status screens are divided into vertical columns which might give a screen reader trouble. Some even have "tabs" along the top, that bring up different sub-screens. This would be more challenging to fix with DFHack, but perhaps not insurmountable.
The main screen is quite cluttered, but I am confident that DFHack could scrape all of the textual data from the game engine and report it in a friendlier format.
The problem is the map part of the main screen. Each tile has one of 256 glyphs, one of sixteen foreground colors and one of sixteen background colors. The tile can also be flashing, meaning it alternates with another glyph with its own colors. The player can use hotkeys to get detailed information about a tile, but the combination of glyph and colors is supposed to tell the player what is going on. I don't see any way of encoding 65536 possible tile images times hundreds of onscreen tiles, but some simplification seems possible without losing the underlying simulation.
The people who make graphics packs for DF routinely alter the tiles and colors used to represent things in the game. There is no reason one could not set all unmined tiles to one specific glyph, all furniture to a different specific glyph, and so on. Maybe have three or four glyphs for creatures, depending on how much clutter one is willing to endure. A simple sound-based representation of that can be the first level of detail.
The foreground color can encode a second level of detail. Among unmined tiles, one color for all soils, one color for normal sedimentary stones, one color for all gemstones, etc.
The background color can encode a third level of detail, which should probably be reserved for state information like "injured creature" or "wet stone" or "designated for digging."
A keypress can be used to interrogate a tile for additional detail, such as the game's current look command.
This map would be difficult for a sighted person to interpret, but easier to convert into sound. I'm not sure what the current norm is in representing two-dimensional structures in sound, but I'm imaging a sonar ping from the cursor with some type of multi-channel sound system. If you want to go all out, folks have figured out how to make the game render multiple z levels, and it's theoretically possible to make that sonar ping three-dimensional.
If we go with the sonar idea, there is no reason to limit the map to what appears on a monitor screen. If one can interpret sonar pings out to N radius, have the game render out 2N by 2N tiles centered on the cursor. A hotkey can be assigned to a DFHack script that reports the current cursor coordinates when the player wants to get his bearings.
All Dwarf Fortress input consists of keypresses, so it should be trivial to configure any input device to control Dwarf Fortress. The issue is that a good number of those commands depend on the cursor position.
By the way, other than the glyph assignments and cursor coordinates script, I personally have no idea how to build any of the things I mentioned.