Increasing TorqueScript speed by reducing overhead
by Greg Beauchesne · 08/07/2009 (12:35 am) · 58 comments
EDIT 2009-08-11: The patch has been updated to fix a memory leak. Please un-apply your old patches and apply the new 1.2 version.
EDIT 2009-10-15: TGE users will need to make an additional change. Scroll down for details.
(Too long; didn't read; just give me the code!)
swapWords() is taken from Edward F. Maurina III's TSTK page, and is mainly used as an example of a function that does a little actual work instead of the other tests, which are more trivial.
I should point out that getRealTime() on Windows uses the Windows API function GetTickCount() under the hood, which has a granularity of about 17 milliseconds, so it's not the most accurate thing in the world, but these numbers are so drastically different that it's accurate enough:
Just to be sure, I tried this on a different Torque project on a different computer and the results were not nearly as bad. I also tried it in the console of all the Torque demos available for download and got better results as well. What the heck is going on?
No real difference in #1, #2, or #4 (although none was expected), but a 600% speed up over the previous results in #3, 410% in #5 and #6, and 390% in #7. It seems the Torque memory manager is not so efficient with lots of tiny little allocations and deallocations, which Torque does A) every time a script enters a new stack frame (a.k.a. calls a user function), and B) every time a new variable is declared (including parameters to a function). (A) is bad enough, and (B) just makes it worse, since local variables are "new" every time a function is run. We're getting nickel-and-dimed to death.
I encountered this problem in writing the first (scripted) version of my SCXML state machine, and it was a killer. Lots of little support functions meant damage to the frame rate even with only a few instances running, and every transition and state used a scripted callback as well.
Now, I hear that the memory manager brings good things to the table, and that it's suitable for production code. I've also heard it's been disabled in TGEA because it was too problematic. So I'm not entirely sure what to believe. But I'd rather err on the side of caution and try to get something that works both ways.
The easiest thing to do is just to add a bunch of preprocessor directives that disable the Torque memory manager only for the code in consoleInternal.cc. But the whole problem got me thinking about various other advanced memory management techniques -- would it be possible to speed things up further?
My free list implementation for this was quick and dirty, but the main goal was first to make sure that it could work. The plus side is that allocations and deallocations become a constant-time operation and only involve changing a few pointers, whereas the minus side is that these structures are never released back to the heap unless you explicitly purge them (again, there are other implementations which may not have this problem; this is just how mine works).
After putting in the code changes, I re-ran the benchmarks from the beginning:
Now there are some different problems to deal with. As with the free list solution, the size of hashTable->data is variable. And as with that one, I chose to speed up the initial allocation and then fall back on the regular allocation for any subsequent enlargements. The next problem is that not all Dictionary objects are suitable for the frame allocator. All global variables are kept in a Dictionary created at the beginning of the program, and new global variables can be created at any time. In order to keep everything working, Dictionary and supporting code had to be aware of whether or not it was allocated using the frame allocator and use the proper deallocation mechanisms when necessary. Therefore, for global variables, I chose not to use the frame allocator at all -- it acts just as before. Yet another problem is that of stack frame references. In the script code, it is possible to push a reference to a previous stack frame. This is mainly done during the exec() and eval() functions to allow the evaluated script to have access to the local variables of the calling code, but what it means is that if new local variables are created in a stack frame that is not the topmost stack frame, they would be trashed by the frame allocator when the stack frame reference was popped, and we can't have that. So, again, we fall back to the regular allocator for any exceptions to our rules. It'll be a little slower, but at least the majority of the code will be faster.
With the frame allocator version, here are the results:
I see this last addition as benefitting any functions that make use of local variables (either as parameters or just internally). The more variables with small values that your function uses, the more you'll get out of it.
Lastly, I mentioned above that there may be reasons you want to use the Torque memory manager. So I tried turning it back on while keeping my code changes:
While this resource does not address the fundamental speed issue with TorqueScript (namely, over-reliance on strings), it can help cut out some of the fat in the current system. I haven't seen any of the new T3D stuff, so I don't know if this has already been addressed or what, but until all the new stuff is released, I'm hoping this will help.
After putting up with reading this, you'd probably like to have the code. I've created a diff patch for TGB Pro 1.7.4. It may or may not work for the other Torque variants.
Download the patch
EDIT 2009-08-11: The patch has been updated to fix a memory leak. Please un-apply your old patches and apply the new 1.2 version.
Drop it into the main Torque directory and then run the following from the command line:
If you want to change the preallocation size, it is defined as SMALL_STRING_SIZE in consoleInternal.cc. Make sure to use increments of 16 bytes, since TorqueScript's string code rounds up to boundaries of 16 bytes anyway.
Please let me know how it works for you. (Do you get a noticeable improvement? Is it still stable? Any memory leaks? How does it work with the engines other than TGB? etc.)
EDIT 2009-10-15: TGE users will need to make an additional change.
If you are applying this patch to TGE (and maybe other engines as well), you will also need to make the following change in game/game.cc, function GameRenderWorld():
The cause of the TGE problem is that the original code, FrameAllocator::setWaterMark(0), assumes that it has access to the full memory space of the frame allocator. If you are running TGEA or Torque 3D, you may need to search the code and make sure there are no other similar calls like this. If so, they should be replaced by calls that save and restore the frame allocator watermark as above.
EDIT 2009-10-15: TGE users will need to make an additional change. Scroll down for details.
(Too long; didn't read; just give me the code!)
Introduction
Try this script. Just open up a Torque console and type it in:for ($i = 0; $i < 1000000; $i++) { $j++; }Now try this one:for ($i = 0; $i < 1000000; $i++) { getWord("abcd efgh", 1); }Finally, try this one:function addJ() { $j++; }
for ($i = 0; $i < 1000000; $i++) { addJ(); }Go on; try it. I'll wait. And if you're running a stock build of TGB and your machine is anything like mine, so will you. A lot. You don't need a stopwatch to tell you there's something seriously wrong here. But for the sake of the scientific method, I'll show some numbers. First, I've put together a small suite of tests for this resource:function addJ() { $j++; }
function addThree(%a, %b, %c) { return %a + %b + %c; }
function swapWords( %words, %first, %second )
{
%numWords = getWordCount(%words);
if( ! ( ( 0 == %numWords) ||
(%first < 0) ||
(%first >= %numWords) ||
(%second < 0) ||
(%second >= %numWords) ||
( %first == %second) ) )
{
%tmp = getWord( %words , %first );
%words = setWord( %words , %first , getWord( %words , %second ) );
%words = setWord( %words , %second , %tmp );
}
return %words;
}
function tsSpeedTest()
{
%before = getRealTime();
for (%i = 0; %i < 1000000; %i++) { $j++; }
%after = getRealTime();
echo("Test #1: " @ (%after - %before) @ " ms");
%before = getRealTime();
for (%i = 0; %i < 1000000; %i++) { getWord("abcd efgh", 1); }
%after = getRealTime();
echo("Test #2: " @ (%after - %before) @ " ms");
%before = getRealTime();
for (%i = 0; %i < 1000000; %i++) { addJ(); }
%after = getRealTime();
echo("Test #3: " @ (%after - %before) @ " ms");
%before = getRealTime();
for (%i = 0; %i < 1000000; %i++) {
%a = 1; %b = 2; %c = 3;
%d = %a + %b + %c;
}
%after = getRealTime();
echo("Test #4: " @ (%after - %before) @ " ms");
%before = getRealTime();
for (%i = 0; %i < 1000000; %i++) { addThree(1, 2, 3); }
%after = getRealTime();
echo("Test #5: " @ (%after - %before) @ " ms");
%before = getRealTime();
for (%i = 0; %i < 1000000; %i++) { swapWords("this is a test", 0, 3); }
%after = getRealTime();
echo("Test #6: " @ (%after - %before) @ " ms");
%before = getRealTime();
for (%i = 0; %i < 1000000; %i++) { swapWords("this is a test that is longer", 0, 3); }
%after = getRealTime();
echo("Test #7: " @ (%after - %before) @ " ms");
}Test #1, #2, and #3 are the first three tests from above.swapWords() is taken from Edward F. Maurina III's TSTK page, and is mainly used as an example of a function that does a little actual work instead of the other tests, which are more trivial.
I should point out that getRealTime() on Windows uses the Windows API function GetTickCount() under the hood, which has a granularity of about 17 milliseconds, so it's not the most accurate thing in the world, but these numbers are so drastically different that it's accurate enough:
[TGB-stock] ==>tsSpeedTest(); Test #1: 640 ms Test #2: 1152 ms Test #3: 19456 ms Test #4: 1280 ms ...OK, so a tight loop with a million iterations takes just over half a second (#1), and one with a single built-in function call takes just over 1 second (#2). That's decent enough. But then we get to #3. 20 seconds? Roughly a 2900% slowdown for the crime of using a scripted function? It gets even worse if your function has a few arguments, as evidenced by tests #5-#7:
... Test #5: 63872 ms Test #6: 93824 ms Test #7: 94976 msWow. That's harsh.
Just to be sure, I tried this on a different Torque project on a different computer and the results were not nearly as bad. I also tried it in the console of all the Torque demos available for download and got better results as well. What the heck is going on?
The Torque Memory Manager
Into the debugger we go. Most of the lines I'm stepping over look harmless enough: an addition here, a hash lookup there. But then I run into this:consoleInternal.cc, line 483: Dictionary *newFrame = new Dictionary(this);Uh-oh. I've seen this problem before: a new hash table each time. And it brought its friends:
consoleInternal.cc, lines 297,301:
hashTable = new HashTableData;
...
hashTable->data = new Entry *[hashTable->size];And what are those hash tables for? The variables in a particular stack frame. And each variable comes with yet another allocation:consoleInternal.cc, line 249: ret = new Entry(name);Then it hit me: In the other Torque project, I had defined TORQUE_DISABLE_MEMORY_MANAGER a long time ago. Apparently the default with TGB is to have the Torque memory manager enabled, and it seems this wreaks havoc on TorqueScript speed. Recompiling and trying the scripts again gave me the following results:
[TGB-stock-TDMM] ==>tsSpeedTest(); Test #1: 512 ms Test #2: 1152 ms Test #3: 2816 ms Test #4: 1152 ms Test #5: 12416 ms Test #6: 18560 ms Test #7: 19328 ms(Note: To calculate speed up percentage, I have used the formula speedUp = ((oldTime / newTime) - 1) * 100. So, for example, if the old time is 100 and the new time is 75, that's ((100 / 75) - 1) * 100 = 33% speed up)
No real difference in #1, #2, or #4 (although none was expected), but a 600% speed up over the previous results in #3, 410% in #5 and #6, and 390% in #7. It seems the Torque memory manager is not so efficient with lots of tiny little allocations and deallocations, which Torque does A) every time a script enters a new stack frame (a.k.a. calls a user function), and B) every time a new variable is declared (including parameters to a function). (A) is bad enough, and (B) just makes it worse, since local variables are "new" every time a function is run. We're getting nickel-and-dimed to death.
I encountered this problem in writing the first (scripted) version of my SCXML state machine, and it was a killer. Lots of little support functions meant damage to the frame rate even with only a few instances running, and every transition and state used a scripted callback as well.
Now, I hear that the memory manager brings good things to the table, and that it's suitable for production code. I've also heard it's been disabled in TGEA because it was too problematic. So I'm not entirely sure what to believe. But I'd rather err on the side of caution and try to get something that works both ways.
The easiest thing to do is just to add a bunch of preprocessor directives that disable the Torque memory manager only for the code in consoleInternal.cc. But the whole problem got me thinking about various other advanced memory management techniques -- would it be possible to speed things up further?
Free Lists
My first attempt at this was to create a simple free list implementation and then use it for the script stack frame allocations. With a few exceptions, most of the allocations are fixed-size, so this makes them suitable for free lists. The allocation of hashTable->data presents a little problem, because it's not a fixed-size array. Or is it? It's designed to grow if too many variables get defined, but the initial allocation is always the same:consoleInternal.cc, lines 300-301:
hashTable->size = ST_INIT_SIZE;
hashTable->data = new Entry *[hashTable->size];ST_INIT_SIZE is 15 by default, and the hash table doesn't grow until the number of variables reaches more than twice that. In practice, most scripted functions won't ever need 30 different variables, so as long as we have a way of keeping track of where we allocated our memory from, we can effectively consider this array to be fixed-size as well and yet still throw it back to the right memory pool when we're done with it.My free list implementation for this was quick and dirty, but the main goal was first to make sure that it could work. The plus side is that allocations and deallocations become a constant-time operation and only involve changing a few pointers, whereas the minus side is that these structures are never released back to the heap unless you explicitly purge them (again, there are other implementations which may not have this problem; this is just how mine works).
After putting in the code changes, I re-ran the benchmarks from the beginning:
[TGB-freelist-TDMM] ==>tsSpeedTest(); Test #1: 640 ms Test #2: 1280 ms Test #3: 1664 ms Test #4: 1152 ms Test #5: 11136 ms Test #6: 16000 ms Test #7: 16640 msThat's a 70% speed up for #3 over the previous run. But #5 has barely changed, and it's only a 16% speed up for #6 and #7. And #5-7 are probably more likely in real-world situations. There's room for improvement there.
The Frame Allocator
Another thing that we can take advantage of with the script execution code is the fact that the access pattern is essentially stack based, which would make sense, considering they're implementing script stack frames. One of the things found in compiledEval.cc are these lines:compiledEval.cc, lines 473-474: static S32 VAL_BUFFER_SIZE = 1024; FrameTemp<char> valBuffer( VAL_BUFFER_SIZE );OK, so this means it's safe to use the frame allocator. If you're not familiar with Torque's frame allocator, you can read this, but basically it's an allocation strategy for temporary memory that only needs to add to a single value, and mass deallocation can occur simply by decreasing this value. The catch is that memory basically has to be deallocated in the reverse order in which it was allocated. But again, with a few exceptions, that's almost a perfect fit for script execution.
Now there are some different problems to deal with. As with the free list solution, the size of hashTable->data is variable. And as with that one, I chose to speed up the initial allocation and then fall back on the regular allocation for any subsequent enlargements. The next problem is that not all Dictionary objects are suitable for the frame allocator. All global variables are kept in a Dictionary created at the beginning of the program, and new global variables can be created at any time. In order to keep everything working, Dictionary and supporting code had to be aware of whether or not it was allocated using the frame allocator and use the proper deallocation mechanisms when necessary. Therefore, for global variables, I chose not to use the frame allocator at all -- it acts just as before. Yet another problem is that of stack frame references. In the script code, it is possible to push a reference to a previous stack frame. This is mainly done during the exec() and eval() functions to allow the evaluated script to have access to the local variables of the calling code, but what it means is that if new local variables are created in a stack frame that is not the topmost stack frame, they would be trashed by the frame allocator when the stack frame reference was popped, and we can't have that. So, again, we fall back to the regular allocator for any exceptions to our rules. It'll be a little slower, but at least the majority of the code will be faster.
With the frame allocator version, here are the results:
[TGB-frame-TDMM] ==>tsSpeedTest(); Test #1: 768 ms Test #2: 1024 ms Test #3: 1536 ms Test #4: 1280 ms Test #5: 9344 ms Test #6: 14592 ms Test #7: 15744 msComparing against [TGB-stock-TDMM], we've got 83% speed up in #3, 33% in #5, 27% in #6, and 23% in #7.
Preallocation
I realized, however, that there's another thing I was missing here -- the allocation of storage for variable values. When Torque calls a scripted function, it also creates a copy of all the string arguments passed in. Since we have the frame allocator handy, I figured it was possible to alleviate that as well by allocating small strings with the frame allocator. By default these strings are 16 bytes, but this can be changed with a #define. 16 bytes is large enough to store the string representation of a 32-bit integer or float, and if the value size goes over, then we can fall back on the existing allocator. As before, the results:[TGB-frame-prealloc-TDMM] ==>tsSpeedTest(); Test #1: 640 ms Test #2: 1152 ms Test #3: 1280 ms Test #4: 1280 ms Test #5: 7808 ms Test #6: 12160 ms Test #7: 13952 msNot too shabby. Comparing against [TGB-frame-TDMM], that's another 20% speed up for #5 and #6, and 13% for #7. Against [TGB-stock-TDMM], that's a grand total of 120% speed up for #3, 60% for #5, 53% for #6, and 39% for #7.
I see this last addition as benefitting any functions that make use of local variables (either as parameters or just internally). The more variables with small values that your function uses, the more you'll get out of it.
Lastly, I mentioned above that there may be reasons you want to use the Torque memory manager. So I tried turning it back on while keeping my code changes:
[TGB-frame-prealloc] ==>tsSpeedTest(); Test #1: 640 ms Test #2: 1280 ms Test #3: 1280 ms Test #4: 1408 ms Test #5: 7936 ms Test #6: 11904 ms Test #7: 16768 msThere's some speed loss in #7, but overall the results are nearly the same as before, and a far sight better than running stock TGB with the memory manager enabled.
Conclusion
For faster script execution right off the bat, TORQUE_DISABLE_MEMORY_MANAGER will net you a big gain. I mention this mainly for TGB, because this is the default, and for non-pro TGB, it can't be changed. (Maybe TGB 1.7.5 could fix this?) Beyond that, replacing the memory allocator used by TorqueScript with the frame allocator where possible and preallocating space for small variable values gave anywhere from 40-120% speed up in the above tests. Depending on how much script you're executing, this could mean the difference between skipping a frame or not.While this resource does not address the fundamental speed issue with TorqueScript (namely, over-reliance on strings), it can help cut out some of the fat in the current system. I haven't seen any of the new T3D stuff, so I don't know if this has already been addressed or what, but until all the new stuff is released, I'm hoping this will help.
After putting up with reading this, you'd probably like to have the code. I've created a diff patch for TGB Pro 1.7.4. It may or may not work for the other Torque variants.
Download the patch
EDIT 2009-08-11: The patch has been updated to fix a memory leak. Please un-apply your old patches and apply the new 1.2 version.
Drop it into the main Torque directory and then run the following from the command line:
patch -p0 < tsFrameAllocator-1.2.patch(If you're running Windows and don't have the patch program, you can get it here.)
If you want to change the preallocation size, it is defined as SMALL_STRING_SIZE in consoleInternal.cc. Make sure to use increments of 16 bytes, since TorqueScript's string code rounds up to boundaries of 16 bytes anyway.
Please let me know how it works for you. (Do you get a noticeable improvement? Is it still stable? Any memory leaks? How does it work with the engines other than TGB? etc.)
EDIT 2009-10-15: TGE users will need to make an additional change.
If you are applying this patch to TGE (and maybe other engines as well), you will also need to make the following change in game/game.cc, function GameRenderWorld():
void GameRenderWorld()
{
PROFILE_START(GameRenderWorld);
FrameAllocator::setWaterMark(0);
...
AssertFatal(FrameAllocator::getWaterMark() == 0, "Error, someone didn't reset the water mark on the frame allocator!");
FrameAllocator::setWaterMark(0);
PROFILE_END();
}becomesvoid GameRenderWorld()
{
PROFILE_START(GameRenderWorld);
U32 waterMark = FrameAllocator::getWaterMark();
FrameAllocator::setWaterMark(waterMark);
...
AssertFatal(FrameAllocator::getWaterMark() == waterMark, "Error, someone didn't reset the water mark on the frame allocator!");
FrameAllocator::setWaterMark(waterMark);
PROFILE_END();
}The cause of the TGE problem is that the original code, FrameAllocator::setWaterMark(0), assumes that it has access to the full memory space of the frame allocator. If you are running TGEA or Torque 3D, you may need to search the code and make sure there are no other similar calls like this. If so, they should be replaced by calls that save and restore the frame allocator watermark as above.
#2
08/07/2009 (4:27 am)
Woah, thats quite the performance increase! I'll give it a shot in TGEA 1.8.1 later today, and let you know hows things turn out for me.
#3
08/07/2009 (1:29 pm)
Whoa those are significant improvements. Would love to hear one of the engineers from GG comment on this.
#4
08/07/2009 (2:05 pm)
Very interesting write up and results from your experiment. Definitely got to try this out.
#5
Finally, I chose to use FrameAllocator only without disabling Memory Manager (3rd column) because it already gives a very good boost to TS performance, at least in your test cases. I'll let you know if this modification acts on our real FPS.
Many thanks for sharing!
08/07/2009 (7:25 pm)
Hi Greg, this is a very interesting article and you did a really good job! I applied your patch to my TGB and I have done some tests with TORQUE_DISABLE_MEMORY_MANAGER definition also. Test#, Standard, NoMemoryManager, Patched, Patched+NoMemoryManager 1, 281, 218, 266, 250 ms 2, 499, 436, 452, 436 ms 3, 6833, 952, 514, 576 ms 4, 499, 438, 514, 532 ms 5, 23291, 5022, 3792, 3884 ms 6, 33993, 7442, 5820, 5866 ms 7, 34632, 8128, 7892, 6458 ms
Finally, I chose to use FrameAllocator only without disabling Memory Manager (3rd column) because it already gives a very good boost to TS performance, at least in your test cases. I'll let you know if this modification acts on our real FPS.
Many thanks for sharing!
#6
08/08/2009 (4:24 pm)
Before on T3D beta 4 with Multiplayer fix release build and after.No-Debug-Before Debug-Before No-Debug After Debug-After Test #1: 128 ms Test #1: 320 ms Test #1: 128 ms Test #1: 384 ms Test #2: 320 ms Test #2: 960 ms Test #2: 256 ms Test #2: 960 ms Test #3: 576 ms Test #3: 3200 ms Test #3: 320 ms Test #3: 1088 ms Test #4: 384 ms Test #4: 960 ms Test #4: 448 ms Test #4: 960 ms Test #5: 3392 ms Test #5: 14272 ms Test #5: 2560 ms Test #5: 8576 ms Test #6: 6464 ms Test #6: 21568 ms Test #6: 4224 ms Test #6: 13696 ms Test #7: 7360 ms Test #7: 23552 ms Test #7: 4992 ms Test #7: 16320 ms
#7
By the way, I thought I'd throw in this addendum about the SMALL_STRING_SIZE value, in case it wasn't clear.
If you do a lot of script work with the 2- or 3- component vector strings (such as those from t2dVectorAdd() or VectorAdd(), for example) then you probably want to increase this value to 32 or 48. It doesn't cost anything more (assuming you don't overrun the frame allocator memory) and you can squeeze out a bit more speed with it. I deliberately made the string argument in test #7 longer than 16 bytes to exercise this. Here's the difference for me between 16 (the default) and 32 for that test:
That's a 6% gain without the memory manager and 27% with the memory manager. Obviously the gain with the memory manager is going to be more because the memory manager was slower to begin with, but either way, you've made out. Now that I think about it, I'd probably recommend making it larger anyway. The default size of the frame allocator memory is 3 MB, and you pretty much have all of it to work with during script execution. It's going to be tough to reach that unless you're doing some deep recursion or some really heavy local variable %array[] work.
08/09/2009 (12:52 am)
Glad to hear that it's working for everybody so far.By the way, I thought I'd throw in this addendum about the SMALL_STRING_SIZE value, in case it wasn't clear.
If you do a lot of script work with the 2- or 3- component vector strings (such as those from t2dVectorAdd() or VectorAdd(), for example) then you probably want to increase this value to 32 or 48. It doesn't cost anything more (assuming you don't overrun the frame allocator memory) and you can squeeze out a bit more speed with it. I deliberately made the string argument in test #7 longer than 16 bytes to exercise this. Here's the difference for me between 16 (the default) and 32 for that test:
Test #7 without memory manager SMALL_STRING_SIZE = 16: 14280 ms SMALL_STRING_SIZE = 32: 13480 ms Test #7 with memory manager SMALL_STRING_SIZE = 16: 16968 ms SMALL_STRING_SIZE = 32: 13312 ms
That's a 6% gain without the memory manager and 27% with the memory manager. Obviously the gain with the memory manager is going to be more because the memory manager was slower to begin with, but either way, you've made out. Now that I think about it, I'd probably recommend making it larger anyway. The default size of the frame allocator memory is 3 MB, and you pretty much have all of it to work with during script execution. It's going to be tough to reach that unless you're doing some deep recursion or some really heavy local variable %array[] work.
#8
08/09/2009 (6:17 pm)
when you have 100 ai units running a muck its not that hard believe me.
#9
Any takers?
08/09/2009 (6:34 pm)
I think a lot of people think the is awesome and I bet some are wondering why is this is having to be implemented by a user, what are we not seeing is this going to break something else? Quote:Would love to hear one of the engineers from GG comment on this
Any takers?
#10
08/10/2009 (4:18 pm)
After patching this into my TGE 1.5.2 project, the dedicated server which normally uses up to approximately 35 megabytes of memory, after running for about six hours was using approximately 198 megabytes of memory. I reverted back to the build before I implemented this resource, and it was back to using 35 megabytes. I don't know where it's happening, or if it happens on TGB the same way, but there's some kind of memory leak here.
#11
For those of you who have already applied the old patch, you can unapply it and download the new one. Or you can make the following changes:
In consoleInternal.cc, inside Dictionary::remove(), change
In consoleInternal.cc, inside Dictionary::reset(), change
08/10/2009 (9:44 pm)
Jesse: You are absolutely right. I believe I have found the problem and have uploaded a new patch.For those of you who have already applied the old patch, you can unapply it and download the new one. Or you can make the following changes:
In consoleInternal.cc, inside Dictionary::remove(), change
if (!isFrameAllocated()) {
delete ent;
}to this:if (isFrameAllocated()) {
ent->~Entry();
} else {
delete ent;
}In consoleInternal.cc, inside Dictionary::reset(), change
if (!FrameAllocator::owns(walk)) {
delete walk;
}to this:if (FrameAllocator::owns(walk)) {
walk->~Entry();
} else {
delete walk;
}
#12
Some similar work was done for iTGB, but i believe the team there went for some more radical changes to the core of TorqueScript.
I'm not sure if the plan is to merge those changes back into T3D/T2D, but i'm fairly sure you'll see that or Greg's resource in T3D 1.1.
08/11/2009 (1:39 pm)
This at first glance looks like good optimizations. Some similar work was done for iTGB, but i believe the team there went for some more radical changes to the core of TorqueScript.
I'm not sure if the plan is to merge those changes back into T3D/T2D, but i'm fairly sure you'll see that or Greg's resource in T3D 1.1.
#13
09/09/2009 (3:31 pm)
We're using tons of scripting and this patch seems to be a godsend. I'm also thinking about ways on how to optimize Sim::FindObject(), which takes most of our frame time.
#14
09/13/2009 (12:30 am)
I seem to be getting crashes which are coming from Dictionary::cleanup. The error that I get in the OS X console is that a Non-aligned pointer is being freed. Any suggestions on how to go about fixing this? There's obviously only one variable that's being freed in Dictionary::cleanup.
#15
Maybe there is a buffer overflow corrupting the hashTable->data pointer. This code doesn't contain memory guard fence regions, so there's not any room for catching/warning about errors there. What are the values of hashTable->size and hashTable->count when this crash occurs? For that matter, what is the value of hashTable->data?
Also, does this crash occur immediately on startup or can your game run for a while before it happens? Under normal circumstances, that particular section of cleanup() should only be executed when cleaning up a large number of local variables.
09/13/2009 (7:18 pm)
Hmm. Does the crash occur inside the delete[] operator, or does it occur when dereferencing hashTable->data? The only things hashTable->data is ever set to are "new Entry*[]" and pointers into the frame allocator memory (which should be caught by FrameAllocator::owns()).Maybe there is a buffer overflow corrupting the hashTable->data pointer. This code doesn't contain memory guard fence regions, so there's not any room for catching/warning about errors there. What are the values of hashTable->size and hashTable->count when this crash occurs? For that matter, what is the value of hashTable->data?
Also, does this crash occur immediately on startup or can your game run for a while before it happens? Under normal circumstances, that particular section of cleanup() should only be executed when cleaning up a large number of local variables.
#16
09/15/2009 (3:30 pm)
It can run for a while before it occurs. It seems to happen most frequently when resizing the game resolution. I'll try and get more information out of a debug build.
#17
The crash itself is at the check to see if the owner is this. The hashTable seems to be corrupted at this point and so my debugger is reporting both the size and the count as 0.
Not exactly sure where I'll get with this but I'll continue to dig.
09/27/2009 (5:30 am)
I am seeing the same problem as Thomas in OS X except that it happens consistently at startup for me. Not fun. I can get right through the menus and loading screens but just before the switch to "in game" it's crashing.The crash itself is at the check to see if the owner is this. The hashTable seems to be corrupted at this point and so my debugger is reporting both the size and the count as 0.
Not exactly sure where I'll get with this but I'll continue to dig.
#18
One thing that would be helpful for me to know is if, at that point, FrameAllocator::owns(hashTable) is true or not. My guess would be yes, but if merely dereferencing hashTable crashes the program, then maybe not.
Here's some code that may help you, too. Basically, creation and deletion of Dictionary frames (and therefore hashTable pointers) should always be paired together -- there should never be a case where one hashTable is created and then a different one is destroyed. So we can add a stack in that tracks the hashTable values, and if any value other than the stack is about to be deleted, that indicates something went wrong.
So, in consoleInternal.cc, near the top, add:
Then, as the last line in Dictionary::setState(), add:
In Dictionary::~Dictionary(), add the following as the first line:
And likewise in Dictionary::cleanup(), add the following as the first line:
You might be able to catch the error closer to its cause.
09/27/2009 (4:50 pm)
If it crashes right at that "if" statement, then it sounds like the hashTable pointer is corrupted rather than the hash table structure itself (i.e. you get a crash when trying to dereference "hashTable" and then your debugger just shows you 0 for everything). Does that sound correct?One thing that would be helpful for me to know is if, at that point, FrameAllocator::owns(hashTable) is true or not. My guess would be yes, but if merely dereferencing hashTable crashes the program, then maybe not.
Here's some code that may help you, too. Basically, creation and deletion of Dictionary frames (and therefore hashTable pointers) should always be paired together -- there should never be a case where one hashTable is created and then a different one is destroyed. So we can add a stack in that tracks the hashTable values, and if any value other than the stack is about to be deleted, that indicates something went wrong.
So, in consoleInternal.cc, near the top, add:
static Vector<void *> *htPtr = NULL;
static void pushHT(void * ht)
{
AssertFatal(((U32)ht & 0x3) == 0, "Misaligned hashTable pointer");
if (!htPtr) htPtr = new Vector<void *>();
htPtr->push_back(ht);
}
static void popHT(void * ht)
{
AssertFatal(htPtr->last() == ht, "Popping off wrong hashTable pointer");
htPtr->pop_back();
}Then, as the last line in Dictionary::setState(), add:
pushHT(hashTable);
In Dictionary::~Dictionary(), add the following as the first line:
popHT(hashTable);
And likewise in Dictionary::cleanup(), add the following as the first line:
popHT(hashTable);
You might be able to catch the error closer to its cause.
#19
09/27/2009 (5:42 pm)
Thanks for the help Greg, I'm now getting the error of "Popping off wrong hash table pointer" and am able to properly debug the crash. I'll report back with anything of interest.
#20
09/27/2009 (8:46 pm)
Ok, so after cleaning up some things in the custom parts of our codebase (mainly switching from traditional char to FrameTemp) it's no longer crashing at the popHT in Dictionary::Cleanup... it's back at the hashTable->owner == this test. The hashTable is corrupted, no count/size/data. Grr. Can't seem to pin a finger on the exact problem I'm seeing. 
Associate Konrad Kiss
Bitgap Games
I did some testing on Torque 3D, which has TORQUE_DISABLE_MEMORY_MANAGER set by default, so the memory manager is disabled. My results were:
I went ahead, and applied your resource. Results:
I think the numbers speak for themselves. Thank you.