TGE, NULL used as strings and Solaris...
by Vincent Cojot · in Torque Game Engine · 08/05/2003 (7:31 am) · 0 replies
Hi everyone,
I had been looking for this for a while now (had been getting seg faults on Solaris/Sparc in various places) and now I know why they are happening and how to fix them... :)
In short, that's because Solaris as an operating system is much less forgiving than other platforms. Check out the code below:
#include
int main( void )
{
char *nullptr = NULL;
printf( "nullptr is %s\n", nullptr );
return 0;
}
Here are the results on various platforms:
Borland and Microsoft C on the Intel (DOS):
"nullptr is (null)"
Borland C on Intel (Win32):
"nullptr is (null)"
GCC on Linux (intel):
"nullptr is (null)"
With GCC on Sun (Sparc) Linux:
"nullptr is (null)"
With GCC on the Sun Sparc platform with SunOS 5.x+, this prints: "Segmentation Fault (core dumped)"
The reason why this happens is described in greater detail in SUN's FAQ ID 3508.
In short, SUN says that code that dereference a NULL pointer in often flawed and should seg fault. They also give a workaround specific to the Solaris/SPARC platform (link with /usr/lib/0@0.so.1 to override default behaviour).
-This- made my day as it stopped the occasionnal crashes I had been getting with TGE on Solaris. I had begun fixing them inside TGE but there are just too many of them. For example, take compiledEval.cc. Here's what I had to change to prevent it from segfaulting into vsnprintf on Solaris:
Now this makes some sense: if your .mis file references a .dts file that doesn't exist, then it's probably not OK to check for its existence by testing (currentNewObject->isProperlyAdded() == false) since it only appears to work if the .dts is partially loaded (i.e: corrupted) and not missing.
From my investigations, the torque code has a lot of these things which end up passing a NULL string to *printf and they go unnoticed since most platforms allow dereferencing a NULL pointer.
So, Rather than fixing them one by one at this time and given that I don't want to go through the nightmare of writing from scratch a safe_vsnprintf() routine on Solaris, I will use the SUN workaround.. :)
That should make the dedicated server Solaris port -very- stable now.. :)
Vincent
I had been looking for this for a while now (had been getting seg faults on Solaris/Sparc in various places) and now I know why they are happening and how to fix them... :)
In short, that's because Solaris as an operating system is much less forgiving than other platforms. Check out the code below:
#include
int main( void )
{
char *nullptr = NULL;
printf( "nullptr is %s\n", nullptr );
return 0;
}
Here are the results on various platforms:
Borland and Microsoft C on the Intel (DOS):
"nullptr is (null)"
Borland C on Intel (Win32):
"nullptr is (null)"
GCC on Linux (intel):
"nullptr is (null)"
With GCC on Sun (Sparc) Linux:
"nullptr is (null)"
With GCC on the Sun Sparc platform with SunOS 5.x+, this prints: "Segmentation Fault (core dumped)"
The reason why this happens is described in greater detail in SUN's FAQ ID 3508.
In short, SUN says that code that dereference a NULL pointer in often flawed and should seg fault. They also give a workaround specific to the Solaris/SPARC platform (link with /usr/lib/0@0.so.1 to override default behaviour).
-This- made my day as it stopped the occasionnal crashes I had been getting with TGE on Solaris. I had begun fixing them inside TGE but there are just too many of them. For example, take compiledEval.cc. Here's what I had to change to prevent it from segfaulting into vsnprintf on Solaris:
*** /usr/local/src/torque-solaris-20030728/engine/console/compiledEval.cc Fri Apr 4 00:17:12 2003
--- engine/console/compiledEval.cc Mon Jul 28 14:46:42 2003
***************
*** 578,584 ****
//Con::printf("Adding object %s", currentNewObject->getName());
if(currentNewObject->isProperlyAdded() == false && !currentNewObject->registerObject())
{
! Con::warnf(ConsoleLogEntry::General, "%s: Register object failed for object %s.", getFileLine(ip-2), currentNewObje
ct->getName());
delete currentNewObject;
ip = failJump;
break;
--- 578,588 ----
//Con::printf("Adding object %s", currentNewObject->getName());
if(currentNewObject->isProperlyAdded() == false && !currentNewObject->registerObject())
{
! if(currentNewObject->getName()!= NULL) {
! Con::warnf(ConsoleLogEntry::General, "%s: Register object failed for object %s.", getFileLine(ip-2), currentNewO
bject->getName());
! } else {
! Con::warnf(ConsoleLogEntry::General, "%s: Register object failed for unknown object(currentNewObject->getName()
got NULL).", getFileLine(ip-2));
! }
delete currentNewObject;
ip = failJump;
break;Now this makes some sense: if your .mis file references a .dts file that doesn't exist, then it's probably not OK to check for its existence by testing (currentNewObject->isProperlyAdded() == false) since it only appears to work if the .dts is partially loaded (i.e: corrupted) and not missing.
From my investigations, the torque code has a lot of these things which end up passing a NULL string to *printf and they go unnoticed since most platforms allow dereferencing a NULL pointer.
So, Rather than fixing them one by one at this time and given that I don't want to go through the nightmare of writing from scratch a safe_vsnprintf() routine on Solaris, I will use the SUN workaround.. :)
That should make the dedicated server Solaris port -very- stable now.. :)
Vincent
About the author