I'm making an emotion transformer for my final year thesis. If it works as expected, I'm pretty sure I could turn it into an actual app. Since you guys seem interested in this Science stuff, thought I'd just post it here.. and having some audience gives me enough motivation and pressure to keep working on this
Besides, I might need a little help with changing it into an application. Prototyping is easy enough, but making it into something most people can use takes a bit more effort.
Emotion transformerBasically, it's to insert an emotion into a neutral speech signal. My suggested approach is to take out the phoneme thingy, get the pitch, modify pitch/loudness/speech rate/whatever, then reassemble it into a more extreme emotion. I have no idea how it'll turn out, but hopefully, it'd work.
- LPC extraction (done): I managed to remove the filter, and reinsert it, so you could get all the "ay", "ee", "ahh" back assembled, even if the pitch is different
- Pitch detection: This is a bit of a bitch. It's simple in theory, just get the first frequency, but well, pitches are like ocean waves, you can't really tell which is the biggest first wave.
- Pitch modification: Hell, this is simple enough in theory, but the mathematics make my eyes cry, then bleed. I'm probably going to do it just in theory, invent my own formula, and hope they match up.
- Other stuff: Assuming I get the detection and modification, this part should be a breeze.
Gonna be prototyped in MATLAB for now. Not real time either.
Speech synthesis appBuilding up from similar techniques, I could actually make some software that synthesizes speech at a certain pitch. Pronunciation is all messed up, but since everything is digital, it could be a really small app, less than 1 MB.
- Filter library: I'd have to find real life phonemes and get a whole list of filters for every phoneme anyone could pronounce.
- Simulate pitch: Just a simple algorithm that creates a pitch.. shouldn't take too long.
- Speech synthesis!: Combine the two together and you get... phonemes. Well, if you combine a few phonemes together you get some mess of a word.
- Emotion: The Microsoft Bob voice is so boring. Even if my robot is speaking like a retard, I want it to speak like an angry/depressed one.
Voice morpherHah, now moving another step forward, I could possibly even build that into a full voice morpher/transformer. Your voice in one end, different one out the other end. It's basically going to be like the first one, but a whole lot more crap coded into C++.
- Development of a LPC algorithm in C++: Linear prediction is painful
- Development of correlation, filters, etc: More hurt.
- Efficiency: I'd probably be tempted to be sloppy, but hey, with some efficient use of memory management, processing time would drop lower than it would for images. Which means.. real-time use. So much fun :3
- Connecting the inputs and outputs to stuff, opening file formats, etc
- Avoiding a lawsuit and other legal stuff.