Uhm... really interesting topic...
I could have a suggestion, but i'm not a programmer and english is not my main language, so i apologize in advance fir everything stupid i could say.
using sounds to "read" the world seems the way to go, but reading each single cell would be problematic at least. You would end hearing only "grass, grass, goblin on grass, grass, grass".
But, what if you could express what the program is reading not with "read text" but with simple sounds?
An "empty" (grass, soil, stone) could be expressed maybe with a low humming sound, different for pitch and intensity to differenciate maybe stone and grass, and units would be instead expressed with normally voiced text.
Also, to avoid the "hunt that goblin cell by cell" thing, the reading program could use a reading grid that would read, lets say, a 3x3 or 5x5 grid and express the position of interesting things with a "radar" sistem using a surround sound sistem ( a good headset would be more than sufficient, or you could use a 5.1/7.1 sistem).
So i'm figuring...
Your pointer is at a coordinate. You hear a low ambience sound that makes you understand that you are surrounded by stone floors. Another, different sound pointing the north means there is a wall in that direction. A ping in the wesr indicates something interesting there. You move your pointer closer, the ping gets stronger, and when you are directly pointing at it a voiced text tells you there is a dwarf farmer there. Different pings can indicate different thing.
That way you could experience the surroundings without a voice telling each thing.