Game Development Community

TGE, NULL used as strings and Solaris...

by Vincent Cojot · in Torque Game Engine · 08/05/2003 (7:31 am) · 0 replies

Hi everyone,

I had been looking for this for a while now (had been getting seg faults on Solaris/Sparc in various places) and now I know why they are happening and how to fix them... :)

In short, that's because Solaris as an operating system is much less forgiving than other platforms. Check out the code below:

#include

int main( void )
{
char *nullptr = NULL;

printf( "nullptr is %s\n", nullptr );
return 0;
}

Here are the results on various platforms:

Borland and Microsoft C on the Intel (DOS):
"nullptr is (null)"

Borland C on Intel (Win32):
"nullptr is (null)"

GCC on Linux (intel):
"nullptr is (null)"

With GCC on Sun (Sparc) Linux:
"nullptr is (null)"

With GCC on the Sun Sparc platform with SunOS 5.x+, this prints: "Segmentation Fault (core dumped)"

The reason why this happens is described in greater detail in SUN's FAQ ID 3508.

In short, SUN says that code that dereference a NULL pointer in often flawed and should seg fault. They also give a workaround specific to the Solaris/SPARC platform (link with /usr/lib/0@0.so.1 to override default behaviour).

-This- made my day as it stopped the occasionnal crashes I had been getting with TGE on Solaris. I had begun fixing them inside TGE but there are just too many of them. For example, take compiledEval.cc. Here's what I had to change to prevent it from segfaulting into vsnprintf on Solaris:
*** /usr/local/src/torque-solaris-20030728/engine/console/compiledEval.cc       Fri Apr  4 00:17:12 2003
--- engine/console/compiledEval.cc      Mon Jul 28 14:46:42 2003
***************
*** 578,584 ****
              //Con::printf("Adding object %s", currentNewObject->getName());
              if(currentNewObject->isProperlyAdded() == false && !currentNewObject->registerObject())
              {
!                Con::warnf(ConsoleLogEntry::General, "%s: Register object failed for object %s.", getFileLine(ip-2), currentNewObje
ct->getName());
                 delete currentNewObject;
                 ip = failJump;
                 break;
--- 578,588 ----
              //Con::printf("Adding object %s", currentNewObject->getName());
              if(currentNewObject->isProperlyAdded() == false && !currentNewObject->registerObject())
              {
!                if(currentNewObject->getName()!= NULL) {
!                   Con::warnf(ConsoleLogEntry::General, "%s: Register object failed for object %s.", getFileLine(ip-2), currentNewO
bject->getName());
!                } else {
!                   Con::warnf(ConsoleLogEntry::General, "%s: Register object failed for unknown object(currentNewObject->getName()
got NULL).", getFileLine(ip-2));
!                }
                 delete currentNewObject;
                 ip = failJump;
                 break;

Now this makes some sense: if your .mis file references a .dts file that doesn't exist, then it's probably not OK to check for its existence by testing (currentNewObject->isProperlyAdded() == false) since it only appears to work if the .dts is partially loaded (i.e: corrupted) and not missing.

From my investigations, the torque code has a lot of these things which end up passing a NULL string to *printf and they go unnoticed since most platforms allow dereferencing a NULL pointer.

So, Rather than fixing them one by one at this time and given that I don't want to go through the nightmare of writing from scratch a safe_vsnprintf() routine on Solaris, I will use the SUN workaround.. :)

That should make the dedicated server Solaris port -very- stable now.. :)

Vincent