Game Development Community

Torque networking, applied to VOIP

by Anthony Lovell · in Torque Game Engine · 02/04/2004 (9:18 am) · 92 replies

I have made some solid progress at adding VOIP into Torque (I can sample the microphone and compress it using Speex... code should work on Mac and Windows, and I could explore Unix). I intend to eventually share my work with the community when I have it ready, but now I need to learn much more about Torque -- primarily, its networking gestalt.

For my first implementation, I would like to just use Torque's unreliable UDP platformNet to send streaming voice packets through the server to only those other clients who are near enough to hear the speaker (a simple distance limit). Then, on each of those clients, I'd like them to hear the sound emanate from the position of the speaking player.

I had been thinking that my means of coding this was to be a direct use of Net::sendto() but another idea occurs to me:

Is the proper way to do this to add the mouth of the Player as part of the data state that the network layer is supposed to disseminate to other clients, much like his orientation in the world? In such a model, a non-talking player would have no state to transmit to the net, but when my microphone layer hands me some compressed audio, I'd hand it to my local Player instance's mouth, and trust that the neat differencing logic could then realize that it needs to go out to the net. If this is the approach, would I have to take special measures to ensure you could hear people behind you (as they are out of camera scope -- I take it this is different than setting scopeAlways, as the distance is still pertinent)?

Also, on the receive side, would the key be to add an AudioEmitter (or a subclass of one) to the mouth of the speaker? Is there any existing high-level support for audio which is provided on a frame-by-frame basis, and not from a soundfile resource?

Thanks in advance for any counsel you can provide.

tone
Page «Previous 1 2 3 4 5 Last »
#1
02/04/2004 (9:35 am)
FWIW, the projected initial interface for this is just to add 3 Console scripting functions.

voxInitialize() must be called before any others, and returns 0 on success or an error code

voxStartTalking() opens the mike and permits the local user to transmit to other avatars near him in the game world

voxStopTalking() closes the transmit.

By binding a key to start/stop talking on keyDown/keyUp would give you a nice proximity voice chat, I hope.

Future features before I'd consider this truly complete:

1. add channel-based tuning, as is most commonly used to mimic use of game/team/squad radios
2. ability to mute obnoxious louts
3. ability to assign begin-xmit and end-xmit sound effects, to mimic radios
4. ability to have 3D objects that function as the audio source of tuned reception (e.g.: a radio)
5. (far future) facial animation of speakers for lip-syncing.

Not on the list:

1. voice-activated transmission. It just creates more problems than it solves, IMO, but someone else could probably hook it in.

tone
#2
02/04/2004 (9:42 am)
Anthony,

In the case of speex on linux, i managed to compile it like a normal torque lib (e.g. zib, lpng, ljpeg), and have it compile on all the platforms GCC supports fine.

As for the Microphone input, how are you getting the audio? Last time i checked openal doesn't appear to have an interface for that =/

I myself experimented with sending speex encoded data down torque's networking system (via unreliable network packets), and playing the sound down the line with a StreamingSource. However, i must've gone wrong somewhere since it was very buggy - which is why i never got round to the microphone input :)
#3
02/04/2004 (11:08 am)
I am using PortAudio (www.portaudio.com) for input. Not a library, per se, but a code project from which each platform draws a subset of files to create a common interface for sound i/o. Naturally, I am only using the input side. The code you write around these drop-in elements looks like this in my case, and this is a nearly complete listing!

Working fine for me on OS X so far, and am going to try to make it go on XP next.

int 
myPortAudioCallback(
                    void *inputBuffer, 
                    void *outputBuffer,
                    unsigned long framesPerBuffer,
                    PaTimestamp outTime, void *userData )
{
    Vox *talk = (Vox *)userData;
    
    // since we never pass output through PortAudio, we should only be handling input
    if (inputBuffer == NULL) {
		Con::printf("portAudioCallBack has null input buffer");
            return 0;
	}
    
    talk->getVoxEncoder()->queueForEncode((short *)inputBuffer, framesPerBuffer * sizeof(short));
    
    return 0;
}


int 
Vox::initInstance()
{
    mEncoder = new VoxEncoder(myVoxEncoderCallback);
    
    PaError paErr = Pa_Initialize();
    if( paErr != paNoError ) {
        Platform::AlertOK("Pa_Initialize failed", Pa_GetErrorText( paErr ));
        return -1;
    }
    
    
    paErr = Pa_OpenDefaultStream(
                    &mPAStream,
                    1,			// input channels (mono)
                    0,			// output channels
                    paInt32, // vs paFloat32
                    8000,		// sample rate
                    mEncoder->getInputFrameSizeShorts(),		// frames per buffer
                    1,			// number of buffers, 0 == use minimum
                    myPortAudioCallback, // our callback
                    this);		// userData to be handed to callback
    
    
    if( paErr != paNoError ) {
        Platform::AlertOK("Pa_OpenDefaultStream failed", Pa_GetErrorText( paErr ) );
        return -2;
    }
            
    return 0;							 
}


int 
Vox::setIsTalking(bool b) 
{
    PaError paErr;
    if (b == mIsTalking) {
		return 0;
	}
    
    if (b) {
        paErr = Pa_StartStream( mPAStream );
        if( paErr != paNoError ) {
            Platform::AlertOK( "Pa_StartStream failed", Pa_GetErrorText( paErr ) );
            return -1;
        }
    } else {
		
		// write the BitStream we are currently composing to the network
		flushVoxToNet();
		
        paErr = Pa_StopStream( mPAStream );
        if( paErr != paNoError ) {
            Platform::AlertOK( "Pa_StopStream failed", Pa_GetErrorText( paErr ) );
            return -2;
        }		
    }
    mIsTalking = b;
    
    return 0;
}
#4
02/04/2004 (2:11 pm)
As I recall, Tribes2 implemented voice communication by means of unguaranteed events passed to the server, thence to the other clients. The clients could set who the voice should be sent to.
#5
02/04/2004 (2:42 pm)
Peer to peer voice comms ( which is what it sounds like you are working on ) is not a good idea for a couple of reasons.

1. dialup users don't have the bandwidth
2. not all broadband users have the bandwidth. Some like the Charter customer around here have symetrical pipes ( upload is capped way low and counts again download speed ). Most can only get <8kb upstream before the downstream degrades to a trickle.
3. every client has to do "server" like calculations on who to send to and what not.
4. it is hackable, a griefer could hack the client to send to others regardless of whether they want to here the dumbshit or not.
5. it would be way more overhead to syncronize all the mute/ignore settings instead of just letting the server decide who to send audio to and when.
#6
02/04/2004 (2:59 pm)
I am proposing first client-server (server reflects audio to those who should hear it), and if I have time from there I may go for a dynamic model similar to one I designed in the past.

I may look at the events, Ben.

tone
#7
03/13/2004 (7:30 am)
I am close to getting this working (or so it appears), but I need some pointers on how to handle the arriving speech events and stream their decoded audio out through a sound source located at the position of the speaking player.

Here are some topic areas where more information would assist me:

1. Do I dispatch within an event handler based on my NetEvent subclass's getClassName() or getClassId()? If the latter, what is the meaning of the U32 parameter passed to getClassId()?

2. How can I determine which player sent the event (i.e.: who is speaking?). Is that for me to encode within the data packed into my VoxEvent? If so, what means of reference (to the speaking person) is more convenient to employ, and what mapping function allows the receiving client to get a Player * from it?

3. Most importantly: do any of the sound output tools (ideally, a 3D positional audio source) support streaming audio? Which is best to use, and do I need to have a thread for each to keep its input buffer stuffed with sound data?

Thanks in advance for any help.

tone
#8
03/13/2004 (8:48 am)
You should check the process() method. Have you read the NetEvent docs? They should explain all this.

The Ogg Vorbis code should give you a good example of streaming audio.
#9
03/13/2004 (9:49 am)
I wouldnt bother with the position info... I doubt it'll make a lot of sense. Just get it streaming in as a simple stream and that'll do ya. for instance, what possible usage is there in hearing someone talking off "in the distance" either you want to hear them, or not.
#10
03/13/2004 (11:21 am)
@Phil: It's so you can implement voice chat in a 'realistic' fashion. Take an RPG for instance, players whose characters are close enough ingame to hear one another can converse, those whose characters are too far away cannot.

@Anthony: Cool beans, this would be a very cool thing if you could get it working.
#11
03/13/2004 (11:31 am)
Ok, let me try to digest this and see where I get.

Though Phil's advice might eliminate a small amount of complexity, Jeff has my goal exactly ... I will want to code up "radio" style chat in future but want first to code up what no one ever seems to deliver: chat modeled on sounds passing through air so that proximity helps you determine which are most likely directed at you (and hence demand the most attention), working toward eventually having the speaking avatar lip-sync so that again, a natural and proper mechanism can help identify the speaker (rather than glowing neon text).

I've looked at NetEvent a fair bit, but have been working from a Mac and frankly it is very hard to navigate between C++ and headerfiles (which are not in the Mac projects... would it hurt if I added them?). I see a mSourceId in the code, but it is not really used. It is just hard to divine if that is a good means of identifying the speaker.

I'll look for process()... I wonder Which::process()?

tone
#12
03/13/2004 (12:04 pm)
Ahh gotcha ... MyEvent::process()
#13
03/13/2004 (12:54 pm)
This has always been an interesting feature to me, and I'm always on the lookout for info on this on the 'Net. I think a feature like this would add a lot to CRPGs, particularly CRPGs in the Neverwinter Nights model, where the aforementioned griefer problem is less salient because NWN-style CRPGs are often run on invite only.

Quote:Though Phil's advice might eliminate a small amount of complexity, Jeff has my goal exactly ... I will want to code up "radio" style chat in future but want first to code up what no one ever seems to deliver: chat modeled on sounds passing through air so that proximity helps you determine which are most likely directed at you (and hence demand the most attention),

I know what you mean about the "no one ever seems to deliver" bit. There are a number of features like this that would add so much to CRPGs that devs just don't seem interested in at all. I think it has a lot to do with the nature of the industry, but that's a whole other discussion. :)

Quote:working toward eventually having the speaking avatar lip-sync so that again, a natural and proper mechanism can help identify the speaker (rather than glowing neon text).

Wow, to me (a non-coder) that seems like quite a tall order, but I know exactly what you mean about trying to figure ways of identifying speakers without heavy-handed methods like highlighted text or glowing avatars, etc. I think one thing that would do a lot in this direction would be a good emote system. In other words, animated gestures as clues to who is speaking - head bobs, hand gestures (a lot of people 'talk with their hands' anyways), posture, etc.

edit: then again, pseudo-lip-synch wouldn't be hard, and it would go a long way towards identifying the speaker and it would add that extra bit of immersion.

This might not be strictly on-topic, but I've always thought a great way to make 'distance-smart' voice chat even more useful would be voice disguisers. I don't know how good they sound, but I know there are telephonic devices that allow people to disguise their voices so that women can sound like men and vice-versa, children can sound like adults, etc. It might go a long way towards identifying who is speaking when that 20 year old woman playing the 67 year old dwarf male actually sounds like an old male dwarf. This would be especially helpful to Dungeonmasters/Referees in the aforementioned NWN model, who have to roleplay as characters from a wide variety of races, genders, and cultures.

Anyhoo, as I said I'm not a coder so I can't really be of help to you Anthony, other than to again wish you good luck with this.
#14
03/14/2004 (4:20 pm)
I think that the best approch will be to create an additional network layer (like the events or ghosts layer) to send the voice traffic, using events will give priority to the voice instead of to the ghosts. In other words, the engine will prefer to send a voice packet than a ghost update, which is bad anyway. :)
I can help you to create this layer if you want.

Cya
#15
03/15/2004 (12:04 pm)
Well, I'm just trying to coerce the event relaying system for now. Not having much luck, sadly.

The network layer only seems to deliver the netevent to the node that sent it! The other client on the test network is not receiving it. Can someone tell me a filename where I can identify where these arriving in the server (and where choices of who to relay it to are made?)

Lipsyncing is just an idea, mind you. I've used near-realtime lipsynching avatars in the past, and they require a lot of gnarly code -- essentially speech recognition to figure out the phonemes in the speech signal. Overall, I am more than willing to punt on such niceties in favor of visual indications, but do regard this as a longterm goal for such things. TF2's prototype showed this at E3 some 4 years ago!

tone
#16
03/15/2004 (12:08 pm)
I should post my code, just in case:

void 
VoxEvent::process(NetConnection *conn)
{
	NetConnection *ourClient = NetConnection::getLocalClientConnection();
	if (mSourceId == ourClient->getId()) {
                // this printf fires off on the client of the person talking!  nowhere else tho
		Con::printf("process VoxEvent from ourselves? srcId=%d numBytes=%d", mSourceId, mDataLengthInBytes);
		return;
	} 
	
	SceneObject *talker = dynamic_cast<SceneObject*>(Sim::findObject(mSourceId));
	AssertFatal(talker != NULL, "Unable to cast source of VoxEvent to a SceneObject?");
	// TODO: actually add code here to decode and play the audio
        // through the location of the talker
}


and here is the point in my code from which I send out VoxEvents:

void 
Vox::flushVoxToNet() {
	Mutex::lockMutex(mSendBufferMutex);
	
	if (mNumFramesInSendBuffer > 0) {			
		NetConnection *conn = NetConnection::getServerConnection();
		VoxEvent *event = new VoxEvent(mSpeexSendBuffer, mNumBytesInSendBuffer);
		conn->postNetEvent(event);						   
		Con::printf("writing %d frames vox data to net" , mNumFramesInSendBuffer);
	}
	
	
	mNumFramesInSendBuffer = mNumBytesInSendBuffer = 0;
	
	Mutex::unlockMutex(mSendBufferMutex);
}
#17
03/15/2004 (1:55 pm)
Anthony,

The command flushBoxToNet() is only sending the voice data to the server, when you receive the event on the server, you should iterate thru the ClientGroup, broadcasting the messages to all the clients.

ex:

SimGroup* pClientGroup = Sim::getClientGroup();
   for (SimGroup::iterator itr = pClientGroup->begin(); itr != pClientGroup->end(); itr++) {
      NetConnection* nc = static_cast<NetConnection*>(*itr);
      if (nc != NULL)
      {
          VoxEvent *event = new VoxEvent(mSpeexSendBuffer, mNumBytesInSendBuffer);
          nc->postNetEvent(event);                     
      }
   }

The audio decompression and playing should only occur at the clients, the server should act only as a broadcaster.
#18
03/15/2004 (2:20 pm)
Thank you Marcelo .. do I need to run that within process() only if I am the server? Or will the client group have zero members otherwise? Many thanks!

tone
#19
03/15/2004 (2:35 pm)
You should to the following at the event process():

If you are in a server, broadcast the incoming data to the clients, otherwise decompress and play the data. There's a function to check if the current object instance is on server, I think it is isServerObject(), but I'm not sure if this works on Events.
#20
03/16/2004 (7:58 am)
I'm sad to report that I'm at a near-total loss here. None of the other NetEvents marked as being sendable from client or server in the code tree perform logic like this (relaying to clients from within ::process()) and my code is going badly awry.

How are these other events relayed across the network?

tone

Here is my code at present.. I think I am getting an infinite loop (probably because the server is in its own list of clients)? Is there not an example in the code tree of an event that is sent by one client, relayed to some or all of the others by the server, and then processed by the various clients?

void 
VoxEvent::process(NetConnection *conn)
{
//	Con::printf("process VoxEvent from %d  numBytes = %d", mSourceId, mDataLengthInBytes);
	
	// relay the group to any attached clients
	SimGroup* pClientGroup = Sim::getClientGroup();
	if (pClientGroup != NULL) {
		for (SimGroup::iterator itr = pClientGroup->begin(); itr != pClientGroup->end(); itr++) {
			NetConnection* nc = static_cast<NetConnection*>(*itr);
			if (nc != NULL) {
				VoxEvent *event = new VoxEvent(this);
				nc->postNetEvent(event);                     
			}
		}
	}
	// then handle it ourselves...
	
	// TODO: do not play own vox
	NetConnection *ourClient = NetConnection::getLocalClientConnection();
	if (mSourceId == ourClient->getId()) {
		Con::printf("process VoxEvent from ourselves? srcId=%d numBytes=%d", mSourceId, mDataLengthInBytes);
		return;
	} 
	
	SceneObject *talker = dynamic_cast<SceneObject*>(Sim::findObject(mSourceId));

         // this fires off at some point
	AssertFatal(talker != NULL, "Unable to cast source of VoxEvent to a SceneObject?");
	
	Con::printf("got VoxEvent from srcId=%d numBytes=%d, name='%s'", mSourceId, mDataLengthInBytes, talker->getIdString());
	
}
Page «Previous 1 2 3 4 5 Last »