It's not .bmp. It's a RAW image, encoded in interleaved (each pixel gets encoded in R-G-B) rather than planar (3 separate bitplanes of R, G, and B), 24-bit (8 bits per color) color. R, G and B only get 1 byte each. Irfanview has this little option to skip n bytes when opening a RAW image, which how I know interleaved images can create hue-shifted versions of themselves by manipulating that byte-skip value. I'll explain my entire process:
Audacity lets you import raw data, and there's a lot of encodings that you can use. I chose "unsigned 8-bit", since that's an exact 1:1 mapping of the underlying image; each color in a pixel gets 1 byte represented as 1 sample. Easy. If you shift the "audio" by 1 sample to the right, then it should be equivalent to using a byte-skip value of 1.
"Should" is a strong word though. Fortunately, I have tested this out; shifted the "audio" by 1 sample, added silence to the 1-sample gap, then exported it. (R, G, B) is now mapped to (G, B, R). 2-sample gap, and it's now (B, R, G). 3-sample, back to (R, G, B). My theoretical model for how this works may be a bit wrong, but the important thing is that it's all in the set {R, G, B}. Nowhere does it become CMY.
Now I'm left with the following: how did any of the programs in the process manage to hue-shift the resulting image so that RGB becomes CMY? Not Irfanview, not Audacity... then is it RTX Voice?
How is that thing at all aware of the audio it's being fed? It's live noise-reduction software, and the neural net it relies on is trained on human voice, to my knowledge. All it knows is that it gets audio piped into it, and it outputs its noise-reduced output to an audio device of my choice. Then Audacity records the audio device, and that's how I get my output image.
Where in any of that can you end up with a CMY version of an image?