Linux dedicated profiler
by Stewart Haines · in Torque Game Engine · 07/25/2007 (12:28 am) · 11 replies
I've found a need to use the console profiler (profilerEnable() / profilerDump()) to work out why I'm seeing jerky player motion when connected (from a windows client) to a dedicated server on linux. But I get a seg fault when I call profilerDump().
I'm working with a modified AFX-1.0.2 which is based on TGE 1.5.2, running Debian Sarge and a 2.6.18 kernel.
Using gdb reveals that the torque server falls over in ProfilerInstance::printSortedRootData() in platform/profiler.cc where there are rubbish values in the passed in totalTime and rootVector->mRoot is null. The seg fault is caused attempting to access mRoot->mName.
Running a dedicated server built from the same source on windows is fine. There are platform variations in the profiler code for gathering timing metrics.
The profiler seems quite hairy code, particularly because there's threading and assembler involved in adding up the processor time.
I'd like to know if anyone at GG or in the community has successfully used the console profiler on a dedicated linux setup and can offer clues as to what might be going wrong for me.
Thanks, St.
I'm working with a modified AFX-1.0.2 which is based on TGE 1.5.2, running Debian Sarge and a 2.6.18 kernel.
Using gdb reveals that the torque server falls over in ProfilerInstance::printSortedRootData() in platform/profiler.cc where there are rubbish values in the passed in totalTime and rootVector->mRoot is null. The seg fault is caused attempting to access mRoot->mName.
Running a dedicated server built from the same source on windows is fine. There are platform variations in the profiler code for gathering timing metrics.
The profiler seems quite hairy code, particularly because there's threading and assembler involved in adding up the processor time.
I'd like to know if anyone at GG or in the community has successfully used the console profiler on a dedicated linux setup and can offer clues as to what might be going wrong for me.
Thanks, St.
About the author
#2
I'll post an update if/when I track it down.
@Gary: re: gprof
Yeah, I went down that path too. pthreads doesn't seem to play nicely with gprof. I can't say for sure, but it looks like the gmon.out contains data from an otherwise uninteresting thread. It's filled with initialization info, but none of the juicy bits that I'm trying to find. :)
07/30/2007 (8:54 pm)
I was looking at this today. So far, the ProfilerRoot structures seem to be correctly linked together by the constructor as each PROFILE_START() is encountered. Simply commenting out the printing of the name field produces a mostly useless dump of bizarre times & call counts.I'll post an update if/when I track it down.
@Gary: re: gprof
Yeah, I went down that path too. pthreads doesn't seem to play nicely with gprof. I can't say for sure, but it looks like the gmon.out contains data from an otherwise uninteresting thread. It's filled with initialization info, but none of the juicy bits that I'm trying to find. :)
#3
Gary (-;
07/31/2007 (11:10 am)
Hm. Well, you could do this, or you could just comment out #define TORQUE_MULTITHREAD in torqueConfig.h, which would stop torque from threading stuffGary (-;
#4
I haven't nailed down what's wrong yet. There is a minor and small memory leak (growRoots() doesn't free mRootTable if it was non-NULL), but that's not The Bigger Problem.
Curiously, if one simply comments out the sorted dump in ProfilerInstance::dump() like below, the recursive dump still has plenty of good info:
@Gary:
Yeah, I found that same page after I posted. Great minds think alike. Here's a patch that worked for me on Linux (2 parts) that makes the DEBUG build of TGE gprof friendly:
part 1:
Part 2:
07/31/2007 (10:28 pm)
@Stewart:I haven't nailed down what's wrong yet. There is a minor and small memory leak (growRoots() doesn't free mRootTable if it was non-NULL), but that's not The Bigger Problem.
Curiously, if one simply comments out the sorted dump in ProfilerInstance::dump() like below, the recursive dump still has plenty of good info:
// Now, generate our info and blast it out to the appropriate place
// (ie, console or file.)
// printFunc("Profiler Data Dump:");
// printSortedRootData(rootVector, totalTime);@Gary:
Yeah, I found that same page after I posted. Great minds think alike. Here's a patch that worked for me on Linux (2 parts) that makes the DEBUG build of TGE gprof friendly:
part 1:
*** engine/platformX86UNIX/x86UNIXThread.cc 25 Mar 2007 18:30:31 -0000 1.1.1.1
--- engine/platformX86UNIX/x86UNIXThread.cc 1 Aug 2007 05:18:17 -0000
***************
*** 11,16 ****
--- 11,17 ----
#include <SDL/SDL.h>
#include <SDL/SDL_thread.h>
+ #include <sys/time.h>
#include "pthread.h"
//--------------------------------------------------------------------------
*************** struct x86UNIXThreadData
*** 21,26 ****
--- 22,28 ----
Thread * mThread;
void * mSemaphore;
SDL_Thread *mTheThread;
+ struct itimerval mItimer;
x86UNIXThreadData()
{
*************** struct x86UNIXThreadData
*** 28,39 ****
--- 30,48 ----
mRunArg = 0;
mThread = 0;
mSemaphore = 0;
+
+ mItimer.it_interval.tv_sec = 0;
+ mItimer.it_interval.tv_usec = 0;
+ mItimer.it_value.tv_sec = 0;
+ mItimer.it_value.tv_usec = 0;
};
};
//--------------------------------------------------------------------------
Thread::Thread(ThreadRunFunction func, S32 arg, bool start_thread)
{
+ struct itimerval itimer;
+
x86UNIXThreadData * threadData = new x86UNIXThreadData();
threadData->mRunFunc = func;
threadData->mRunArg = arg;
*************** Thread::Thread(ThreadRunFunction func, S
*** 41,46 ****
--- 50,57 ----
threadData->mSemaphore = Semaphore::createSemaphore();
threadData->mTheThread = NULL;
+ getitimer(ITIMER_PROF, &threadData->mItimer);
+
mData = reinterpret_cast<void*>(threadData);
if (start_thread)
start();
*************** Thread::~Thread()
*** 58,63 ****
--- 69,75 ----
static int ThreadRunHandler(void * arg)
{
x86UNIXThreadData * threadData = reinterpret_cast<x86UNIXThreadData*>(arg);
+ setitimer(ITIMER_PROF, &threadData->mItimer, 0);
threadData->mThread->run(threadData->mRunArg);
Semaphore::releaseSemaphore(threadData->mSemaphore);
return 0;Part 2:
*** mk/conf.UNIX.mk 25 Mar 2007 18:31:19 -0000 1.1.1.1
--- mk/conf.UNIX.mk 1 Aug 2007 05:19:28 -0000
*************** CFLAGS.GENERAL = -DUSE_FILE_REDIRECT
*** 43,56 ****
#-w -fno-exceptions -fno-check-new
CFLAGS.RELEASE = -O2 -finline-functions -fomit-frame-pointer
! CFLAGS.DEBUG = -g -DTORQUE_DEBUG # -DTORQUE_DISABLE_MEMORY_MANAGER
CFLAGS.DEBUGFAST = -O -g -finline-functions
ASMFLAGS = -f elf -dLINUX
LFLAGS.GENERAL =
LFLAGS.RELEASE =
! LFLAGS.DEBUG =
# the included vorbis libs are used by default. You
# can use your own libs if you want, just comment out the following
--- 43,58 ----
#-w -fno-exceptions -fno-check-new
CFLAGS.RELEASE = -O2 -finline-functions -fomit-frame-pointer
! #CFLAGS.DEBUG = -g -DTORQUE_DEBUG # -DTORQUE_DISABLE_MEMORY_MANAGER
! CFLAGS.DEBUG = -pg -g -DTORQUE_DEBUG # -DTORQUE_DISABLE_MEMORY_MANAGER
CFLAGS.DEBUGFAST = -O -g -finline-functions
ASMFLAGS = -f elf -dLINUX
LFLAGS.GENERAL =
LFLAGS.RELEASE =
! #LFLAGS.DEBUG =
! LFLAGS.DEBUG = -pg
# the included vorbis libs are used by default. You
# can use your own libs if you want, just comment out the following
#5
08/02/2007 (4:57 pm)
It may be incidental to the profiler issue that spawned this thread, but I wonder if there's any good reason for having multithreading enabled in a dedicated server?
#6
Gary (-;
08/02/2007 (5:08 pm)
Oh, I forgot to mention the other somewhat obvious but low-tech choice. run strace on the dedicated server, and just watch it for when text stops scrolling by. Ten bucks says there's something networkstupid happening.Gary (-;
#7
What about that howto? you lost me on that patch, is it needed if that multithread-gprof program is used? it IS a neat little bugger..
12/05/2007 (10:33 am)
This is a couple months old, but I ran into the same problem.What about that howto? you lost me on that patch, is it needed if that multithread-gprof program is used? it IS a neat little bugger..
#8
If you want to use gprof (as in "gprof TorqueDemo_DEBUG.bin") to see profiling data gather in old-school POSIX style on a multi-threaded version of Torque, then yes, you need to apply those patches (or similar).
The alternative is to comment out #define TORQUE_MULTITHREAD in torqueConfig.h, rebuild, run, and view the profiling results with gprof. The downside of this method is that -- at least technically -- the binary won't be running the same way. It will be single-threaded rather than multi-threaded, and that difference may or may not be important in your specific circumstance.
12/05/2007 (12:58 pm)
@AlexanderSLG:If you want to use gprof (as in "gprof TorqueDemo_DEBUG.bin") to see profiling data gather in old-school POSIX style on a multi-threaded version of Torque, then yes, you need to apply those patches (or similar).
The alternative is to comment out #define TORQUE_MULTITHREAD in torqueConfig.h, rebuild, run, and view the profiling results with gprof. The downside of this method is that -- at least technically -- the binary won't be running the same way. It will be single-threaded rather than multi-threaded, and that difference may or may not be important in your specific circumstance.
#9
but the guy who wrote that thing in
http://sam.zoy.org/writings/programming/gprof.html says
So, if this is true, why is the patch in x86UNIXThread.cc needed?
thanks
12/10/2007 (8:30 am)
@Andy:but the guy who wrote that thing in
http://sam.zoy.org/writings/programming/gprof.html says
Quote:
It wouldn't be too hard to put a call to setitimer in each function spawned by a thread, but I thought it would be more elegant to implement a wrapper for pthread_create.
Daniel Jonsson enhanced my code so that it could be used in a preload library without having to modify the program. It can also be very useful for libraries that spawn threads without warning, such as libSDL. The result code is shown below and can be downloaded (gprof-helper.c):
So, if this is true, why is the patch in x86UNIXThread.cc needed?
thanks
#10
6-of-one, half-dozen of another -- I can't claim the patch to the engine is the best general solution. The pthread wrapper function via gprof-helper.c may be the right thing for you. For me, it was better (and easier) to have that code present in the engine on a branch of my CVS tree.
Either way, the setitimer(ITIMER_PROF,...) call inside new pthreads is the key ingredient to making gprof-style profiling work when using pthreads.
12/10/2007 (7:13 pm)
@AlexanderSLG:6-of-one, half-dozen of another -- I can't claim the patch to the engine is the best general solution. The pthread wrapper function via gprof-helper.c may be the right thing for you. For me, it was better (and easier) to have that code present in the engine on a branch of my CVS tree.
Either way, the setitimer(ITIMER_PROF,...) call inside new pthreads is the key ingredient to making gprof-style profiling work when using pthreads.
#11
oh, ok. I misunderstood. I thought you were saying both were necessary. I applied your patch anyways, I just wanted to know if it (or both) was necessary.
Thanks a lot
12/11/2007 (8:20 am)
@Andy:oh, ok. I misunderstood. I thought you were saying both were necessary. I applied your patch anyways, I just wanted to know if it (or both) was necessary.
Thanks a lot
Torque Owner Gary "ChunkyKs" Briggs
Gary (-;