Compiling with SSE enabled
by Duncan Gray · in Torque Game Engine · 11/03/2006 (7:00 pm) · 18 replies
I was compiling my current game with VC6 and decided to try Express 2005 (VC8) to take advantage of the newer compiler options.
A benchmark test gave me a 65% speed increase on a Pentium 1.6GHz and a 300% speed increase on a AMD Athlone 64. It's very noticeable during scenelighting.
According to this data, not all cpu's have the SSE instruction set.
So what happens with those other cpu's, will the game revert to normal instruction set or will it just crash and burn. It's a burden to release two version of the game.
A benchmark test gave me a 65% speed increase on a Pentium 1.6GHz and a 300% speed increase on a AMD Athlone 64. It's very noticeable during scenelighting.
According to this data, not all cpu's have the SSE instruction set.
So what happens with those other cpu's, will the game revert to normal instruction set or will it just crash and burn. It's a burden to release two version of the game.
About the author
#2
I notice TGE has some asm code to support SSE and 3DNow and selects it if it detects a SSE capable CPUs etc. but it still does not give the same performance as when you have the the entire app compiled with SSE.
Seeing as most recent computers, as in, less than 4 or 5 years old, should work with SSE, perhaps it's not a bad idea to limit your game to those computers to benefir from the huge performance boost of SSE.
Any opinions on that?
11/05/2006 (8:53 pm)
Ok, I tried an SSE compiled exe on a pentium III ( which does not support SSE ) and it did not run.I notice TGE has some asm code to support SSE and 3DNow and selects it if it detects a SSE capable CPUs etc. but it still does not give the same performance as when you have the the entire app compiled with SSE.
Seeing as most recent computers, as in, less than 4 or 5 years old, should work with SSE, perhaps it's not a bad idea to limit your game to those computers to benefir from the huge performance boost of SSE.
Any opinions on that?
#3
11/05/2006 (10:22 pm)
Interesting. Not sure I know what compiler switches you are using, but I guess it's time to find out.
#4
I am glad you have been posting your results.
I was not aware of the option to compile the entire app to use SSE, nor would I have guessed that level of performance gain.
Todd
11/05/2006 (10:23 pm)
Hi Duncan,I am glad you have been posting your results.
I was not aware of the option to compile the entire app to use SSE, nor would I have guessed that level of performance gain.
Todd
#5

I also set Enable link-time code generation (/GL) by the Whole Program Optimization option of the Optimzasion menu under C/C++

An idea I'm considering is to create a mini executable from the winCPUinfo.cc code, let it check what CPU you have and then have it launch the game executable which best suits your CPU.
My game executanle is only 4meg compressed so having two in the install package wont kill anyone's download quota.
11/06/2006 (2:43 am)
Well then a picture might help I guess
I also set Enable link-time code generation (/GL) by the Whole Program Optimization option of the Optimzasion menu under C/C++

An idea I'm considering is to create a mini executable from the winCPUinfo.cc code, let it check what CPU you have and then have it launch the game executable which best suits your CPU.
My game executanle is only 4meg compressed so having two in the install package wont kill anyone's download quota.
#7
Differences in FPS:
Regular: ~333.3 fps
SSE: ~333.3 fps
SSE2: ~333.3 fps
Differences in filesize:
Regular: 2.8 MB
SSE: 2.8 MB
SSE2: 3.0 MB
So I'm not going to use it. Still cool that it works for you though.
11/06/2006 (3:32 am)
I didn't think SSE optimizations would yield such a high performance boost, really.Differences in FPS:
Regular: ~333.3 fps
SSE: ~333.3 fps
SSE2: ~333.3 fps
Differences in filesize:
Regular: 2.8 MB
SSE: 2.8 MB
SSE2: 3.0 MB
So I'm not going to use it. Still cool that it works for you though.
#8
11/06/2006 (10:01 am)
Stephan, check your log files and compare the scene-lighting times. FPS might not be such a good indication when it's awesome to begin with.
#9
11/06/2006 (10:11 am)
Hmm, if it does improve relight times by that much, using a SSE-enabled build might be interesting for in-develipment usage. We have a .mis file which takes 10+ minutes to relight at full quality, and it's a PITA having to wait that long.
#10
I have not timed VC8 with SSE on and off to see the performance difference, but that would be a good exercise to do before embarking on some magic code to check CPU flags and launch the correct EXE.
11/06/2006 (10:23 am)
Also keep in mind that the improvements I mention where between VC6 and VC8 with all options on.I have not timed VC8 with SSE on and off to see the performance difference, but that would be a good exercise to do before embarking on some magic code to check CPU flags and launch the correct EXE.
#11
SSE2 works only on the newest processors. SSE might be a better choice. Don't know how it would affect the speed though.
11/06/2006 (11:12 am)
Duncan, in the one screenshot, you have SSE2 set.SSE2 works only on the newest processors. SSE might be a better choice. Don't know how it would affect the speed though.
#12
11/06/2006 (11:15 am)
Lighting times are 16 (regular) vs 11 (SSE2). That's pretty cool.
#13
@ David, I have not yet checked to that level of detail yet.
I'm still pondering all the implications, i.e. SSE3 (Prescot) is also supposed to be a lot faster than SSE2 but my brief bit of research shows that you need to check one of the CPU flag bits to see if the CPU can handle SSE, but SSE,SSE2 and SSE3 all seem to use the same CPU flag, so how do you know which one you got? I'm sure there is a way but I have not got that far yet. I'll spend some time on it today.
11/06/2006 (12:01 pm)
@ Stefan, that's a 25% speed increase. what CPU are you using BTW?@ David, I have not yet checked to that level of detail yet.
I'm still pondering all the implications, i.e. SSE3 (Prescot) is also supposed to be a lot faster than SSE2 but my brief bit of research shows that you need to check one of the CPU flag bits to see if the CPU can handle SSE, but SSE,SSE2 and SSE3 all seem to use the same CPU flag, so how do you know which one you got? I'm sure there is a way but I have not got that far yet. I'll spend some time on it today.
#14
11/06/2006 (12:46 pm)
Pentium 4.
#15
Stock 1.5 Exe 40.703 35
Detault settings 46.703 35
SSE2 51.453 33
SSE2 + /GL 52.437 34
That makes about as much sense to me as something that doesn't make sense!
Edit: This is a Core 2 Duo 6600 at 2.4GHz using Visual Studio 2005 (not express).
11/06/2006 (3:20 pm)
For what it's worth, I tried a couple tests of Starter.FPS with a single spawn point. Tests results are light times, approx fps:Stock 1.5 Exe 40.703 35
Detault settings 46.703 35
SSE2 51.453 33
SSE2 + /GL 52.437 34
That makes about as much sense to me as something that doesn't make sense!
Edit: This is a Core 2 Duo 6600 at 2.4GHz using Visual Studio 2005 (not express).
#16
Let us know what CPU that was, in case it make a difference.
I wont be able to dig further into this for a few days, got to solve a UNICODE conflict with a 3rd party DLL first.
11/06/2006 (4:05 pm)
Thats weird.... at least you got 1fps improvement for what it's worth. CPU manufacturers spend a fortune on putting this technology into their products so I would expect it to be worth the bother.Let us know what CPU that was, in case it make a difference.
I wont be able to dig further into this for a few days, got to solve a UNICODE conflict with a 3rd party DLL first.
#17
That's not how it works. You need to actually support SSE instructions in your code or else you won't be seeing much improvement at all. With that said, Torque does not contain alot of SSE specific code. Basically, there's some SSE accelerated matrix math, but that's about it.
11/06/2006 (4:23 pm)
Quote:
CPU manufacturers spend a fortune on putting this technology into their products so I would expect it to be worth the bother.
That's not how it works. You need to actually support SSE instructions in your code or else you won't be seeing much improvement at all. With that said, Torque does not contain alot of SSE specific code. Basically, there's some SSE accelerated matrix math, but that's about it.
#18
11/06/2006 (4:55 pm)
I realize that you need to "support SSE instructions ". The point is the compiler seems to convert the other TGE maths functions into SSE instructions. You yourself got a 25% increase by using SSE2.
Torque Owner Duncan Gray