Topic: Quest to build a speech synthesis/voice transformer thingy (Read 1726 times)

Muz · « **on:** January 15, 2010, 11:28:07 am »

I'm making an emotion transformer for my final year thesis. If it works as expected, I'm pretty sure I could turn it into an actual app. Since you guys seem interested in this Science stuff, thought I'd just post it here.. and having some audience gives me enough motivation and pressure to keep working on this

Besides, I might need a little help with changing it into an application. Prototyping is easy enough, but making it into something most people can use takes a bit more effort.

Emotion transformer
Basically, it's to insert an emotion into a neutral speech signal. My suggested approach is to take out the phoneme thingy, get the pitch, modify pitch/loudness/speech rate/whatever, then reassemble it into a more extreme emotion. I have no idea how it'll turn out, but hopefully, it'd work.
- LPC extraction (done): I managed to remove the filter, and reinsert it, so you could get all the "ay", "ee", "ahh" back assembled, even if the pitch is different
- Pitch detection: This is a bit of a bitch. It's simple in theory, just get the first frequency, but well, pitches are like ocean waves, you can't really tell which is the biggest first wave.
- Pitch modification: Hell, this is simple enough in theory, but the mathematics make my eyes cry, then bleed. I'm probably going to do it just in theory, invent my own formula, and hope they match up.
- Other stuff: Assuming I get the detection and modification, this part should be a breeze.

Gonna be prototyped in MATLAB for now. Not real time either.

Speech synthesis app
Building up from similar techniques, I could actually make some software that synthesizes speech at a certain pitch. Pronunciation is all messed up, but since everything is digital, it could be a really small app, less than 1 MB.
- Filter library: I'd have to find real life phonemes and get a whole list of filters for every phoneme anyone could pronounce.
- Simulate pitch: Just a simple algorithm that creates a pitch.. shouldn't take too long.
- Speech synthesis!: Combine the two together and you get... phonemes. Well, if you combine a few phonemes together you get some mess of a word.
- Emotion: The Microsoft Bob voice is so boring. Even if my robot is speaking like a retard, I want it to speak like an angry/depressed one.

Voice morpher
Hah, now moving another step forward, I could possibly even build that into a full voice morpher/transformer. Your voice in one end, different one out the other end. It's basically going to be like the first one, but a whole lot more crap coded into C++.
- Development of a LPC algorithm in C++: Linear prediction is painful

- Development of correlation, filters, etc: More hurt.
- Efficiency: I'd probably be tempted to be sloppy, but hey, with some efficient use of memory management, processing time would drop lower than it would for images. Which means.. real-time use. So much fun :3
- Connecting the inputs and outputs to stuff, opening file formats, etc
- Avoiding a lawsuit and other legal stuff.

eerr · « **Reply #1 on:** January 15, 2010, 11:59:12 am »

What is your goal, exactly?

Armok · « **Reply #2 on:** January 15, 2010, 02:30:19 pm »

If this works, it's awesome and also something I could really use.
For a look into the closest to this that I can think of having alredy been done, google Praat I think.

Muz · « **Reply #3 on:** January 16, 2010, 12:09:53 pm »

Yeah, Praat is pretty awesome. Praat's a bit better at doing the stuff, but it depends.

Quote from: eerr on January 15, 2010, 11:59:12 am

What is your goal, exactly?

To make an newbie-friendly speech thingy. For one thing, you could type in stuff, and it'd say it, in different voices and tones (albeit awkwardly).

Oh, and you can morph your voice into something more dwarven. Or a girl's voice. Not that you should XD

alfie275 · « **Reply #4 on:** January 16, 2010, 05:44:40 pm »

What would be cool is if you had a translator built into it.

Dwarf · « **Reply #5 on:** January 16, 2010, 05:56:26 pm »

Quote from: alfie275 on January 16, 2010, 05:44:40 pm

What would be cool is if you had a translator built into it.

That's a completely other cup of tea/can of worms though.

bjlong · « **Reply #6 on:** January 16, 2010, 06:26:49 pm »

Can't you use an FFT for the pitch detection? Or would that just pick up the phoneme's features?

winner · « **Reply #7 on:** January 16, 2010, 06:32:51 pm »

Quote from: bjlong on January 16, 2010, 06:26:49 pm

Can't you use an FFT for the pitch detection? Or would that just pick up the phoneme's features?

you should be able to pick up or reconstruct the fundamental pretty easily.

Bay 12 Games Forum

News:

Author Topic: Quest to build a speech synthesis/voice transformer thingy (Read 1726 times)

Muz

Quest to build a speech synthesis/voice transformer thingy

eerr

Re: Quest to build a speech synthesis/voice transformer thingy

Armok

Re: Quest to build a speech synthesis/voice transformer thingy

Muz

Re: Quest to build a speech synthesis/voice transformer thingy

alfie275

Re: Quest to build a speech synthesis/voice transformer thingy

Dwarf

Re: Quest to build a speech synthesis/voice transformer thingy

bjlong

Re: Quest to build a speech synthesis/voice transformer thingy

winner

Re: Quest to build a speech synthesis/voice transformer thingy