Game Development Community

voice to text/text to voice

by William Finlayson · in Technical Issues · 09/16/2001 (11:09 am) · 26 replies

does anyone know where I can find a free library for converting text to voice, or voice to text? I've been searching for ages and haven't found anything, the quality doesn't matter as long as it's free.
Page «Previous 1 2
#1
09/16/2001 (11:14 am)
That stuff (text to voice and back) would be very hard. I doubt there is such a library, and if there is, it probably wont be free.
#2
09/16/2001 (12:24 pm)
Microsoft has a free SDK that does both.
#3
09/16/2001 (12:45 pm)
Thanks for the pointer James, I've started the download, 68 megs!

I was going to look into providing voice to text and then text to voice for multiplayer games. Instead of chatting directly (high bandwidth requirements), you speak, it's converted into text, that is sent to the other computers where it is read, or converted back to voice, same requirements bandwidth wise as normal text chatting, but you dont need to type out the messages. Be good if it could work.
#4
09/16/2001 (1:04 pm)
you could look at the tribes 2 code cuz i know they had it
#5
09/16/2001 (1:49 pm)
I've never played T2 before, could you describe how it was used?
#6
09/16/2001 (4:17 pm)
Tribes2 does not have text to speach (or vice versa).
It has voice binds (audio files actived by key combinations that also send a text message, only people who have the sound file activated by the keybind actually hear the message) and built-in real-time voice chat (think Roger Wilco, only it's built in to the game).
#7
09/16/2001 (4:17 pm)
I am glad I was wrong. :)
#8
09/16/2001 (4:41 pm)
cheers Paul. I've been searching all night, and I haven't found anything comparable to the Microsoft Speach SDK, not for free anyway. But it's going to be tomorrow before it's finished downloading and I can take a look at the docs.
#9
09/16/2001 (5:33 pm)
hey i thought that you could record what you want to say from a mic and the screen would put it up for t2??? I could be wrong but i could have swore it was there...

Bryan
#10
09/17/2001 (2:11 pm)
I have the Speach SDK now, I've had a read of the docs, and it looks incredibly easy to add it to any application. The text-to-speach is inteligible and fast, but I've not been able to try the speach recognition - no mic :(

Judging from the docs and samples though, the capabilities for the speach recognition are immense. I just would like to find out if it is fast. I saw in the docs that there is a way to output the sound from tts just to a buffer, where you could play it using the sound setup for your game, and apply any effects and stuff. I'm just dying to get a mic now so I can start playing around with the SDK :)
#11
09/17/2001 (2:50 pm)
I'm doing a bit of playing around with that SDK myself. Current development plans for my team won't be using it, but for a multiplayer title we might eventually develope, it would be very cool :)

One of my own major problems in multiplayer games has always been that I end up being so focused on the action that I completely forget about the text chat window, text-to-speach blows that problem away.

My own team is using Jamagic to develop our engine (Jamagic is a great foundation, but we're have to build and rebuild allot of things to make it work for our projects... we're having allot of fun working with it though :) and games, but adding text-to-speach to the V12 would be worth a serious gold star on the record of the programer that did it.
#12
09/17/2001 (8:01 pm)
I thought i just wanted to add a nuance to all this.

I'm kmaking a WWII based multiplayer and we have to same problem.
We have chossen to solve it in a different way.
Much like the voice commands in Tribes 2 and COunter strikes and almost any new MP game we have added an expansion to that.
It works like this. We record a number of keywords and sentences (plus the standard voice commands) and these are we put in a folder. The advanced user can now via script form his own sentences like:

There's a sniper in the church.

Which is formed by the following pieces:

There's
a sniper
in the
church

This allows for a lot of costumized possibilities and will enhance team work.
This can later be upgraded with new and bigger soundpacks that can be downloaded.
The user then put these in a popup menu and bind this to a key much like the voice commands in T2.
The speach will also be output to text in the chat window.

We will also give the user an option to load specific menus to different maps, since not all maps might contain a church.

That's about it, just wanted to share.

// Clocks out.
#13
09/17/2001 (10:55 pm)
The main problem with that though, is the space that is will take up with all the recorded words, plus you still need to string them together to make sentences. Wiht speash recognition, you can play on as usual while making a message. Also, with text to speach on the other end, any words can be said, not just ones that have been prerecorded, giving much more versitility. You don't even need to check that every player has the required voice packs.

When playing with multiple plays, it is easier if messages from different players sound different, with the tts, different voices can be created easily just by changing a few variables, and then you can set a diferent voice for each player, so that you can differentiate between them without having to read off a name or whatever. Diferent voices can be provided by using the method you described, but it means another whole set of sounds.

The main advantage of your method though, would be the fact that it would sound a lot better :) But I'm willing to put up with the artiicial voices, as it gives all the capabilities of direct voice communication, but without the bandwidth hit.

Something else I was thinking about was the language barrier in multiplayer games. Translating the txt sent between players wouldn't be too hard - you can find some pretty good text translators on the web now. You can get different tts engines for different languages, so you say what you want to say, the sr turns it into a text string, the text reaches the other player's computer, they have a different language set up, so the translator kicks in, making a text string in whatever language they want, and the tts plays it in their language. Pretty nifty, but I'm not sure about feasibility. I've only just started to look into the speach SDK, and I haven't ever really looked in to translators yet, but it would be something fun to try.
#14
09/17/2001 (11:26 pm)
The main reason that we do it like I descriped is that when you are under attack by the enemy you don't have the time to type messages.
That's what the voice commands are there for.
But they are only as good as the people behind the game makes them.
In Counter Strike you have around 20 different one and that might not be enough for the elite player.
The way we do it you can construct a sentence in peace time and bind it to a hot key, then when in combat all you have to do is hit a key (or a combo of keys) to warn your team of any danger.
An elite player can do this very fast.
100x faster than typing the message.

// Clocks out.
#15
09/18/2001 (3:29 am)
And..... faster still would be speach recognition comined with text to speach. You say it, the comp converts it to text which is then both seen and heard. After all, there are some battles that get so intense that even an elite player would need a third hand to hit a chat key-combo to speak without getting killed.

I just love the speach recognition and text to speach concept, so all my opinions or horribly biased *G* LOL

My team's first few titles we have planned are all single player only. For those we have a small staff of voice talent. Our multiplayer title we have planned will more than likely use text to speach and speach recognition. I'm working on a chat client in my spare time to sort-of benchmark the tech before we decide to use it.
#16
09/18/2001 (8:26 pm)
A question to those of you using the Microsoft speech SDK:

How flexible is it with regards to customisation of the voice?

I'm considering using text-to-speech for RPG-style conversations, giving the advantage that conversations can be generated, rather than pre-recorded. Of course, this requires a voice that can express emotion, and a range of voices to choose from.

The old Amiga had speech capabilities (built in!!) where you could adjust speed, pitch, and stress. (Stress was available by putting numbers in words, where a value like 4 was default. "Inconseq8uential" would sound natural, because the "ue" sound is normally stressed. "8Inconsequential" stresses the "in" part. etc.)

With a lot of work with stressing, it was possible to make robotic speech that sounded somewhat interesting, with enough variances to identify different characters. But it still sounded corny.

How much better is Microsoft's SDK?

Thank you for any information.

Grant
#17
09/19/2001 (5:59 am)
Dear Guys,
I've been wathing thi forum a lot lately and want t implement it in my game now....Can I have a link to the downoa page or the name of the sdk...??

Thanks,
Bryan
#18
09/19/2001 (2:20 pm)
Bryan,

www.microsoft.com/speech/

I'm just about to have a good look through the site myself; maybe I'll be able to answer some of the questions I recently posted... although I'd still like a non-biased opinion.

Have fun all...
Grant
#19
09/19/2001 (8:17 pm)
Sounds good.......has anyone let put the SDK in to there game let and hows does it go?
#20
09/19/2001 (9:44 pm)
I'm downloading the SDK right now... but I'm not sure how well it will integrate with a performance-hungry game. The thing recommends 128 mb RAM, which is a hefty chunk. I think I saw a more detailed breakdown of the memory requirements of each speech module somewhere, but I don't remember the details. It could be that speech would be second only to graphics in processing requirements for a game, so I don't know what that means for AI... :\

I'm just glad I'm not writing a performance-hungry game :)

Anyway, more news as it comes to hand... This message brought to you by...

Grant
Page «Previous 1 2