This is a more specific case of what is sometimes described as "decluttering". In human factors research, it's well known that when people are under stress (such as fighter pilots who are being shot at), they need an interface that is aggressively streamlined to just the minimum information they need to do the job at hand.
It is sometimes useful to talk about the bandwidth of an interface. The human visual system is by far the highest bandwidth interface a standard person has, easily capable of distinguishing two spatial dimensions (X: left-right, Y: up-down) and depending on definition, one to three colorimetric dimensions, with partial/synthetic information on a third spatial dimension (depth), and a temporal dimension (the "optical flow" analysis built into our wetware is quite sophisticated).
Audio is usually considered the second highest bandwidth; it has partial / synthetic information on two spatial dimensions (radial and elevation, loosely), intensity (which may or may not reflect distance), "color" (tone), and some better temporal processing, including some fairly interesting waveform processing that allows us to perceive harmonics, beat frequencies, and the like.
So, Dwarf Fortress has a famously cluttered interface. On the one hand, we want to dramatically pare down the information it tries to present; on the other hand, we want to present the crucial information in as usable a fashion as possible.
For the first, as noted already there are mods designed to reduce the internal variety that doesn't directly affect game play. For example, instead of a bin containing 3 dog leather, 5 naked mole rat leather, 1 bat leather and so on, these mods simplify the raw files so there is only one (or very few) sorts of leather; the bin might simply say 26 leather, which would take a screen reader far less time to read off. These are usually designed for people on low-memory computers, but should be adaptable for your purposes.
There are also settings in the init files which will help, such as not randomly varying the tiles used for the ground; personally, I find that looks cluttered even with good eyesight and turn it off.
From the other perspective... it seems logical that the fast way to scroll around would need to have a preview that is tone based. Perhaps a scale that ran from low pitch tones to high pitch tones something like the following:
* Open space
* Water
* Grass, dirt, snow, and other natural floors
* Rough stone floor
* Smoothed stone floor
* Engraved stone floor
* Solid dirt
* Solid ordinary stone
* Solid flux
* Solid ore
* Building, construction, etc.
* Your dwarves
* Tame / trained animal
* Traders, guards, etc.
* Wild animals
* Hostile mundane forces
* Megabeasts, necromancers, forgotten beasts, and other special problems
* Magma
Ideally the interface would allow you to quickly scroll around the map, listening to the tones, and then pause the cursor when you got where you were going; the tone would fade over a second or so, and then read the contents of the square to you. This would allow such activities as exploring the edge of a river or cliff audibly, and particularly aid in finding things that are significantly different than their surroundings.
So, instead of it having to read out loud:
Grass. Grass. Grass. Sandy Loam. Grass. Yak. Grass. Goblin Archer.
you might have:
do do do re do fa do ti
If you were interested in a dual-tone systems, either left vs. right ear, or as two-tone chords, you could do some more sophisticated discrimination, possibly combined with octaves. Perhaps the left ear could play a tone based on the floor type, and the right ear play a tone based on what the tile was occupied by (air, stone, goblin, barrel of booze, whatever) ; in the case of solid stone, they would be the same.
I'm not the right sort of expert to design the "audio browser" software, but I don't think there's any fundamental reason it couldn't be done, and DFhack may already include most of the software hooks that would be needed. Or possibly, and somewhat ironically, 3D viewers such as stonesense; the underlying task of taking the Dwarf Fortress screen info and presenting it in a different medium is similar. (You could even look at it as a 1D viewer.)