Topic: No longer need people for experiment - result's out! (Read 1591 times)

Muz · « **on:** May 27, 2010, 01:52:43 pm »

Hey, got my thesis due next week and I'm just picking up some random people on the Internet to evaluate it. It's about transforming a speech signal with no emotion into something with emotion, most likely just anger in this case.

What I need is just some people to tell me what they think about the speech and how much anger you think it contains. I'll give like 8 different samples, all with different modifications, and you'd rate it from 1-5. Then I'd take the best overall result and do some modifications to it to find which parameters are the most important, and you rate that one from 1-5.

There's a bunch of last-minute stuff I'd like to do, but might not have time to do, like converting to sadness, surprise, etc, so there could be more. Also, this is like the first attempt of its kind to do something like this and I cut a hell lot of corners, so speech quality is rather poor, IMO

Let me know if you're interested, and I'll drop you a PM to avoid any spoilers in the thread contaminating results from others. It might be easier to do it by email, since there's a bunch of samples and it's a bit annoying to link them one by one in a PM.

Muz · « **Reply #1 on:** May 28, 2010, 02:17:27 pm »

Quick bump, since this only until monday (which is like sunday for some of you on the west side of the world). Shouldn't take much time, only a few files, each a second long

Hungry · « **Reply #2 on:** May 28, 2010, 02:25:31 pm »

meh I'll bite, just to be helpful my email is in my profile.

timmeh · « **Reply #3 on:** May 28, 2010, 07:40:15 pm »

Sounds interesting! If you can send it through to me (e-mail is in my profile) I'll check it out ASAP, but it might not be until either late Saturday or very early (I.E. 1-3AM

) Sunday, since I've got graduation tomorrow.

Pillow_Killer · « **Reply #4 on:** May 29, 2010, 12:09:53 pm »

Sure, PM me!

Retro · « **Reply #5 on:** May 29, 2010, 04:19:52 pm »

Hey, I'll give it a shot. Already curious, anyhow, so why not.

Muz · « **Reply #6 on:** May 31, 2010, 01:54:37 am »

Thanks, you two, but a bit late. Gotta send it in 19 hours

I'll let you guys know about the results and stuff once I've got some free time, though.

Muz · « **Reply #7 on:** June 03, 2010, 12:26:14 pm »

All done! Thanks a lot to everyone who participated!

Idea:
To synthesize emotions into speech. Started with only anger here because I got it working literally half a week before submission and only had enough time to clone one emotion. Why? Because synthesized speech sucks. You've all have probably heard Stephen Hawking or one of those ones that come with Windows. The idea here is that putting some emotion would make it sound a lot better, more human and less robot.

Conclusion:
It worked. Sorta. Rated 2.5 out of 5 anger, 2.4 out of 5 quality. Whether it's bad or not depends on what you're doing with it. I'd compare it to well, nice pixel art. You know what the picture's supposed to be, but it's not exactly photorealistic. It's got a few artifacts in the speech, but that buzz actually sounds good when you're used to it.

If you're pulling a prank on someone, it'd work very well on someone who didn't expect it (kinda like Photoshop). For speech synthesizers, it works great at making it sound less boring, just make the pitch contour higher to give it a happier sound, lower to give it a sad one.

Also anger has these spikes in pitch and energy contours. That's much there is to it. It's difficult to simulate just because they change so much more than the transformer can handle. Almost any other emotion has more subtle differences, it should work much better for those.

It's also basically a functional pitch contour transformer. I.e. it can correct you if you're singing out of tune. It's sort of like Photoshop for voice in that sense. But can't really fix your voice if you suck at singing, and if you sing out of key by around 50 Hz, it'd have a techno-ish effect. 50 Hz is still a huge range to change.. you shouldn't be singing that badly

Compared to what other people have done, well.. it's the most successful emotional transformation so far

Unless someone's put some top secret research into something better.

Implementation:
Anyway, while all these PhD students were taking huge piles of statistics, hidden markov models, and basically trying to inverse whatever knowledge they got from emotion detection, I took a more retarded game designer approach. I tried to simply quantify emotions as a bunch of numbers.

So, I split it down into three variables - energy contour, duration modification, and pitch contour. There's a bunch of theories I had around these. One was to imitate it exactly - which didn't work out so well, because it just doesn't go up higher than a certain pitch.

The others were kinda meh, proven wrong. One of them was proven kinda true. What was right is that people don't really notice a lot of the bad effects. I guess we're used to listening to horribly compressed music/videos/phone speech. It's fine to just mess it up.

Technical stuff:
Well, I'm not sure what to say about this. I'm not going to give 100% details until the thesis is officially published by the uni - the whole patent possibilities and all.

The stuff I could say is common knowledge. It uses a standard PSOLA (pitch modifying algorithm). It's just a basic pitch modifier in essence, with modifications to allow it to change time even though it was theoretically a stupid thing to do. I think everyone was skeptical about that, lol.

And uh.. yeah. I don't think any of you really play around with this stuff, so a detailed technical explanation doesn't help. But if you've got questions, ask

Why it shouldn't work:
I did take a hell lot of shortcuts. If a mechanical thing, it'd probably be duct taped all over the place. Surprisingly, it held together, and while I was asking my supervisor why it didn't work... it did. It worked so well that he asked me if the synthesized speech was the original. I'm still scratching my head about it working at all, but it does.

1. Never used any of the formulas or stuff suggested by technical papers. I looked at them for like 4 months and went all "screw this" and wrote some random code based on the pictures.
2. PSOLA doesn't use interpolation

In English, it's got big goddamn chunks in the pitch contour and nobody noticed.
3. Pitch detector doesn't work reliably. It needs to know the pitch before deciding what to change it to. It's sort of like a plane autoflying and landing without vision but not sure how high it's flying.
4. Pitch correction method is stupid. If you had someone screaming from a range of 40 to 400 Hz, it would just assume an error and assumes that you're screaming at 90 Hz for all that range. The "angry" speech shouldn't work at all. That's the first speech file for you guys who heard it.
5. It mixes voiced, silent, and unvoiced speech, which is epically stupid. They're two very different things (in design, not theory). I think some of you heard a big 'pop' in the middle of the second speech file. That seems to be the only noticeable one. Theoretically, it'd be 'popping' all over the place.
6. There's like 20 pages written on how to do duration modification properly. My system uses a "choose it at random" approach. Both work almost equally well, but my algorithm messes up epically when it increases duration by over 1.5.

Anyway, it raises some big questions about why they worked at all, and accidentally unlocked another branch of research into this stuff.

eerr · « **Reply #8 on:** June 03, 2010, 03:44:59 pm »

Dive in the mud, grab something, clean it off slightly, and use it as a trophy?

HideousBeing · « **Reply #9 on:** June 05, 2010, 12:04:38 am »

Well congratulations on stumbling upon a possible breakthrough

Heron TSG · « **Reply #10 on:** June 05, 2010, 10:06:15 am »

Quote

I took a more retarded game designer approach.

It probably worked better because nobody else used people for their experiments!

Muz · « **Reply #11 on:** June 06, 2010, 07:50:34 am »

Everybody uses people for experiments. You need to fill in a form to experiment on rats.

Bay 12 Games Forum

News:

Author Topic: No longer need people for experiment - result's out! (Read 1591 times)

Muz

No longer need people for experiment - result's out!

Muz

Re: Need random people for experiment

Hungry

Re: Need random people for experiment

timmeh

Re: Need random people for experiment

Pillow_Killer

Re: Need random people for experiment

Retro

Re: Need random people for experiment

Muz

Re: Need random people for experiment

Muz

Re: Need random people for experiment

eerr

Re: No longer need people for experiment - result's out!

HideousBeing

Re: No longer need people for experiment - result's out!

Heron TSG

Re: No longer need people for experiment - result's out!

Muz

Re: No longer need people for experiment - result's out!