Console command table being clobbered
by Stephen Zepp · in Torque Game Engine · 11/07/2004 (4:36 am) · 11 replies
Has anyone had any problems in the past with console commands being indexed incorrectly in the console command table?
We've taken our debug executable and placed it in the 1.3 /example directory, with starter.fps as the game type. During execution, we get a SegFault while trying to load/compile defaultProfiles.cs. gdb shows that the console code is working with an "echo" script command, but 2 frames later somehow thinks it is an audio profile command. This appears to be generated when it looks for the code block to execute for "echo", but finds a completely incorrect code block.
What's wierd about it is:
1) We've done absolutely nothing with any underlying console code.
2) The TorqueDemo.exe works fine in the same dir with no changes.
3) Our executable works in our client environment, with no changes.
Has anyone seen something like this in the past and can point me in the right direction for further troubleshooting?
We've taken our debug executable and placed it in the 1.3 /example directory, with starter.fps as the game type. During execution, we get a SegFault while trying to load/compile defaultProfiles.cs. gdb shows that the console code is working with an "echo" script command, but 2 frames later somehow thinks it is an audio profile command. This appears to be generated when it looks for the code block to execute for "echo", but finds a completely incorrect code block.
What's wierd about it is:
1) We've done absolutely nothing with any underlying console code.
2) The TorqueDemo.exe works fine in the same dir with no changes.
3) Our executable works in our client environment, with no changes.
Has anyone seen something like this in the past and can point me in the right direction for further troubleshooting?
#2
11/07/2004 (10:44 am)
Are you sure you're making your app recompile all the DSOs?
#3
@Dan: We've thought about both of those as well, but as you well know they can be extremely difficult to find (since the crash occurs nowhere near the actual bug), but we're still looking.
One thing that occurred to me while I was moving this weekend is that since we have a dedicated server build and a client only build, we -may- have an issue regarding the executable missing some code. However, we don't have many files at all in out DEDICATED only make rule, and those are things like ODBC back end, and other server side management stuff. Other than problems with memory block alignment, I can't see how this would affect us since none of that code is called from any of the starter.fps stock scripts in any case, but you never know.
11/08/2004 (9:48 am)
@Ben: yes, I've manually deleted every single one I can find multiple times as part of the troubleshooting. Interesting that you bring that up however, because it makes me wonder how difficult it will be once a game is in production to distribute a slightly modified executable (patch)...your question implies that .dso compilation is very executable dependent? If that is true, then I'm guessing when you release a new executable as part of a patch, you also release each and every .dso as well?@Dan: We've thought about both of those as well, but as you well know they can be extremely difficult to find (since the crash occurs nowhere near the actual bug), but we're still looking.
One thing that occurred to me while I was moving this weekend is that since we have a dedicated server build and a client only build, we -may- have an issue regarding the executable missing some code. However, we don't have many files at all in out DEDICATED only make rule, and those are things like ODBC back end, and other server side management stuff. Other than problems with memory block alignment, I can't see how this would affect us since none of that code is called from any of the starter.fps stock scripts in any case, but you never know.
#4
11/08/2004 (4:10 pm)
Building seperate client/server builds, if they don't have all the same console definitions, can have... er... dire consequences. There are requirements that this stuff match up from client to server.
#5
Let me make sure I'm fully clear though: Do you mean "dire consequences if you try to have both client and server in same instantiation (standalone), or do you mean dire consequences--always, even in true dedicated server mode?
For example, there is zero reason to have our ODBC classes in the client executable--do we need to put it in there anyway in our dedicated server-client environment?
FYI, this makes me think back as well to the same type of issue we had with our Mac build (which we never did get worked out, but we're still playing with it)--we discovered at least 1 missing file from our MAC makefile, and I'm betting that this may turn out to be related, and we just didn't catch it because we weren't very experienced with the MAC debugger.
11/08/2004 (4:44 pm)
@Ben: Well, yes, that is definitely the case then. I know of at least 3 server side only classes we've added, all with substantial console methods and definitions. I'll reset our build rules for a standalone rule that will have -all- of the code and see how it goes.Let me make sure I'm fully clear though: Do you mean "dire consequences if you try to have both client and server in same instantiation (standalone), or do you mean dire consequences--always, even in true dedicated server mode?
For example, there is zero reason to have our ODBC classes in the client executable--do we need to put it in there anyway in our dedicated server-client environment?
FYI, this makes me think back as well to the same type of issue we had with our Mac build (which we never did get worked out, but we're still playing with it)--we discovered at least 1 missing file from our MAC makefile, and I'm betting that this may turn out to be related, and we just didn't catch it because we weren't very experienced with the MAC debugger.
#6
If I do a make clean, and then a make, I use the "basic make rule" to build a single windows application. To startup, I take the resulting executable, put it in the Torque/SDK/ directory, and run it.
The script is what causes the instantiation of a "server" within the same application instance, and it can only execute code that is in the same executable (other than scripts obviously), so the server instance comes from the same build rules as the client instance...and that means their console functions pretty much -have- to be in synch, doesn't it?
Of course, it is crashing, so there must be something wrong, but based on the logic I have above I'm really not convinced I took the proper meaning from your last post (that the problem resided in our dedicated server model, even when running in standalone mode).
11/08/2004 (6:34 pm)
@Ben: I've been re-thinking this, and I'm no longer sure that, at least for a standalone build, the "client" and "server" portions CAN be out of synch in any way. If I do a make clean, and then a make, I use the "basic make rule" to build a single windows application. To startup, I take the resulting executable, put it in the Torque/SDK/ directory, and run it.
The script is what causes the instantiation of a "server" within the same application instance, and it can only execute code that is in the same executable (other than scripts obviously), so the server instance comes from the same build rules as the client instance...and that means their console functions pretty much -have- to be in synch, doesn't it?
Of course, it is crashing, so there must be something wrong, but based on the logic I have above I'm really not convinced I took the proper meaning from your last post (that the problem resided in our dedicated server model, even when running in standalone mode).
#7
11/08/2004 (8:52 pm)
Well, one dire consequence is this - if you have network-accessible classes available on the server that aren't on the client, or vice versa, the class IDs used for network communication will be out of synch, resulting in very wacky bugs. This is similar to what can happen if you don't do a clean rebuild after modifying console macros.
#8
I think I'm going to have to do a dump of the entire console command table and see if there is any (hopefully obvious) boundary issues--like "echo" is one off from the audio command being called, or something of that sort. I've been trying to chase through the code during the bootup process, but not much luck so far in finding anything.
11/08/2004 (8:57 pm)
Ok, that makes sense, and while it doesn't appear to be the root problem for this issue, it's something to definitely watch for. When you say network-accessible, you mean basically anything that's a descendant of NetObject?I think I'm going to have to do a dump of the entire console command table and see if there is any (hopefully obvious) boundary issues--like "echo" is one off from the audio command being called, or something of that sort. I've been trying to chase through the code during the bootup process, but not much luck so far in finding anything.
#9
As it turns out, the control flow of the starter.fps (and it appears racing as well) is slightly broken. During load sequence, script function initClient is called, part of which is
However, we do not actually load in the default Profiles until a bit later:
I fixed this error by removing the defaultProfiles exec command from canvas.cs and moving it to just before the init.cs call of customProfiles. This is a short term hack, I would suggest that the control flow of the example scripts provided in 1.3 be reviewed.
Thanks for all the help on this thread, several points and issues were brought up regarding "down the line" errors and bugs that we hadn't considered...should save us some trouble in the long run for sure!
11/09/2004 (5:28 am)
Bug fixed (at least this one). I have to be honest, I have NO idea why our executable crashed on this but the one compiled from stock 1.3 (installation kit) does not, however, the fix did work and now my executable works as well (to this point).As it turns out, the control flow of the starter.fps (and it appears racing as well) is slightly broken. During load sequence, script function initClient is called, part of which is
In file /starter.fps/client/init.cs:
exec("./ui/customProfiles.cs"); // override the base profiles if necessary.However, we do not actually load in the default Profiles until a bit later:
In file common/client/canvas.cs, line 25:
exec("~/ui/defaultProfiles.cs");I fixed this error by removing the defaultProfiles exec command from canvas.cs and moving it to just before the init.cs call of customProfiles. This is a short term hack, I would suggest that the control flow of the example scripts provided in 1.3 be reviewed.
Thanks for all the help on this thread, several points and issues were brought up regarding "down the line" errors and bugs that we hadn't considered...should save us some trouble in the long run for sure!
#10
11/09/2004 (5:33 am)
Correction to above: while it fixed that immediate error, my hack fix also created several other dependency issues for defaultProfiles, so I'm still working on it ;)
#11
Basically, we moved around a lot of the startup sequencing, which happened to require loading defaultProfiles MUCH earlier in the startup sequence. Our code that required this change isn't as robust as it should be, and wasn't acting well when the defaultProfiles weren't loaded ahead of time.
I do appreciate -all- the help from everyone, and this pushes us much farther along the way to being able to release the Terrain Manager updated to 1.3 as a "work in progress" resource in the near future!
11/10/2004 (2:50 am)
Finally figured out a solution, if not figuring out the root cause of the problem, but since it's an artifact of our development line and not TGE directly, not too worried about it.Basically, we moved around a lot of the startup sequencing, which happened to require loading defaultProfiles MUCH earlier in the startup sequence. Our code that required this change isn't as robust as it should be, and wasn't acting well when the defaultProfiles weren't loaded ahead of time.
I do appreciate -all- the help from everyone, and this pushes us much farther along the way to being able to release the Terrain Manager updated to 1.3 as a "work in progress" resource in the near future!
Torque Owner Dan -
1 - A pointer in your altered code that is put off to a bad memory location which happens to be int he Console code area.
2 - In the code you alter you run off the end of your array into the memory used for the console commands.
In short my guess is that the code you altered is stepping on the console code. Take a look at that for uninilized vars, bounds checking on arrays, ensure pointers passed to functions have enough memory allocated to them and bad pointers.