Game Development Community

Bug in dnet.cc (with fix)

by Tim McClarren · in Torque Game Engine · 08/15/2007 (12:37 pm) · 5 replies

In Torque 1.4 and earlier, there is code in ConnectionProtocol::processRawPacket (at the bottom):

void ConnectionProtocol::processRawPacket(BitStream* pstream)
{
   ...
   if(mLastSeqRecvd != pkSequenceNumber && pkPacketType == DataPacket)
      handlePacket(pstream);

   mLastSeqRecvd = pkSequenceNumber;
}

If you look at "handlePacket":

void NetConnection::handlePacket(BitStream *bstream)
{
   ...

   if(mErrorBuffer[0])
      connectionError(mErrorBuffer);
}

Okay... let's find "connectionError":
void GameConnection::connectionError(const char *errorString)
{
   if(isConnectionToServer())
   {
      Con::printf("Connection error: %s.", errorString);
      Con::executef(this, 2, "onConnectionError", errorString);
   }
   else
   {
      Con::printf("Client %d packet error: %s.", getId(), errorString);
      setDisconnectReason("Packet Error.");
   }
   deleteObject();
}

See the problem?

(In general, I try to avoid putting code in objects that causes them to delete themselves... but...)

This should fix it, but it's probably not the best fix:
void ConnectionProtocol::processRawPacket(BitStream* pstream)
{
   ...
   if(mLastSeqRecvd != pkSequenceNumber)
   {
      mLastSeqRecvd = pkSequenceNumber;
      if(pkPacketType == DataPacket)
         handlePacket(pstream);
   }
}

#1
08/15/2007 (2:57 pm)
I'm trying to understand what the issue is.

Is it that the GameConnection is deleted upon error? I guess not, but I'm curious.
#2
08/15/2007 (3:06 pm)
It's the write to mLastSeqRecvd after the delete. So "this" is invalid and that means that write could be writing somewhere it shouldn't be. I'd say it's a good fix to move the write before the possibility of deletion.
#3
08/15/2007 (3:12 pm)
This could potentially cause a crash if a client were to receive an error. Because after the deleteObject call in GameConnection::connectionError the callstack will unwind and will eventually get back to ConnectionProtocol::processRawPacket where it will then be trying to assign a member in an object that no longer exists.
#4
08/15/2007 (3:53 pm)
It *does* cause a crash in a code-base that is threaded. Eventually. Under the right conditions.

The right conditions being: thread behavior allows a context switch during "NetConnection::handlePacket", after "this" is deleted, and the thread being switched to is able to execute a memory allocation for which the memory allocator then returns the just-deleted-memory. Now the thread executing "NetConnection::handlePacket" writes into memory that is occupied by a new, valid allocation holding something else, not itself.

It sounds rare, but our crash reporting code reported 25 crashes with an execution stack that matches! (second highest) :/
#5
08/16/2007 (2:24 pm)
This same issue is in TGE 1.5.2