Previous Blog Next Blog
Prev/Next Blog
by date

Video Game Optimization (VGO)

Video Game Optimization (VGO)
Name:Eric Preisz
Date Posted:Jul 19, 2007
Rating:Not Rated
Public:YES
Comments:YES
RSS Feed:GarageGames Blog feedor Subscribe with .
Profile Page:View profile page for Eric Preisz

Blog post
Video Game Optimization is one of my passions. I didn't realize it until about the last year, but video game hardware and trends are likely to drive a major portion of all of the entire processing market. Chipmakers are changing their focus, due to the limitations of heat and the speed of light, from more horsepower to more cores. Remember the days when chipmaker marketing focused on the almighty GHz? Video Games and multi-processing are tightly coupled today, and even more so in the future.

Has multi-threaded code ever been as important as it is now? Sure, it's been the single most important concept in retail game hardware for the past 7 years, it's just a bit obscure. Graphics cards are inherently multi-threaded. It's even more interesting to note that when using graphics cards, we take advantage of multi-threaded code/hardware without needing to write the syncing code associated with multi-threaded CPU programming.

There is an interesting race occurring right now (no not a race condition). There are two multi-process approaches to processing. In one corner, the CPU -a low latency low throughput processor. In the other corner, the GPU -a high latency, high throughput processor. May the best silicone win. Judging from the past 5 years, the GPU will outpace parallel processing performance growth for the next 5 years.

Maybe we shouldn't use the term graphics cards anymore. In my opinion, they should be called parallel processing cards. There is some anecdotal data out there to support this name. First, the evolution of GPGPU. Second, both ATI and NVIDA now have programming languages for their cards so that programmers can access the hardware without needing a graphics API. Lastly, AMD proactively squashed GPU encroachment by purchasing ATI. Did someone feel threatened?

In concert with this evolution of processing power, I am proud to offer a preview of a new service to exploit this hardware. Venagen, a video game optimization service organization, is in the start-up phase and will begin ramping-up this fall.

www.venagen.com

Also, for more information on video game optimization, please visit our knowledge base at:

www.vertexbuffer.com (edit: which has just recently been attacked by viagra spammers. edit: fixed!)


Your comments are welcome and appreciated!

Recent Blog Posts
List:08/24/08 - Proud to Be a Dork
06/23/08 - Developer Stereotypes: Decade by Decade
03/30/08 - Optimizing Marble Blast Ultra
03/27/08 - Green Screens, Mo-Cap, and Research...oh My!
02/14/08 - GDC and C2C
10/04/07 - Full Sail and Simulation
09/13/07 - 10 myths of VGO
07/19/07 - Video Game Optimization (VGO)

Submit ResourceSubmit your own resources!

Phil Carlisle   (Jul 19, 2007 at 14:37 GMT)
Hey Eric,

The other day I was thinking about breaking out a copy of nvPerfhud on TGEA and seeing how it profiled, then I wondered if you'd actually done any of that kind of stuff??

I'm interested to hear your take on the performance of TGEA right now. I get the feeling that its mostly CPU bound and that somehow its not as optimal as it could be. But I simply dont have time to really approach it in a rational way right now.

Did you ever do that kind of exercise with torque that you could share with the GG masses? It'd be interesting to see how you go about that kind of task with a familiar engine.

Eric Preisz   (Jul 19, 2007 at 15:00 GMT)
Phil,

I haven't run it in a while and my best guess is that nVPerfHud wouldn't help you as much as v-tune or code analyst.

Let me make some guesses, based on some assumptions that I would have to validate. Also, it's hard to make a blanket statement since each app is different.

My guess is that the average TGEA game is CPU bound(also true for almost all dx9 engines). And chances are the performance issue lies in the drivers due to the overhead of drawcalls. Every frame, we have a "budget" of draw calls. This budget is a function of the CPU, not the GPU. If you are limited by the CPU, the GPU will starve since it receives data and instructions from the CPU. When this occurs, we can suggest that the CPU is an over-utilized resource.

So if this is the problem, then we can fix it by creating less state changes. This job may lie in the hands of the artist who is creating a mesh with many different textures. Each texture will cause an additional draw call. Muliple shader changes also cause this state change and is even more expensive than the texture state change.

To determine if you game is CPU or GPU bound, you will need to get the utilization number back from the graphics card (pix or nvperfhud will work with most modern card/driver combos). If the utilization number is less than 80%, then you are likely CPU bound. If your utilization number is greater than 90%, then you are likely GPU bound. In between and you are in processor purgatory, and speeding up either will probably help a bit.

Someday, when time permits, I will take some time to dig into TGEA and see if I can find any quick optimizations. Right now, I'm guessing (and guessing often doesn't work) that respective implementations of instancing for shader model 2.0 and shader model 3.0 would help reduce the driver work and reduce stress on the CPU.

I've done this with Ogre recently and found that they were GPU bound because they were not using compressed textures. I assumed the drivers again, but simply adding code (like say 5 lines or so) gave me a 30% (on average) frame rate increase.
Edited on Jul 19, 2007 15:02 GMT

Phil Carlisle   (Jul 19, 2007 at 19:21 GMT)
Thanks Eric, I hope you get a chance to do this kind of thing, because I'm really quite interested in it in a general sense, but dont have enough time to do it myself and It feels like a job for you :) hahahaha.

BTW: Doesnt OGRE do shader LOD? Have you found that technique to be much use?

Actually, while we're at it, you dont have any solid figures on the performance of any top rated engines in this kind of field do you? I'm thinking say something like Battlefield 2 level (as its a good match for TGEA in terms of feature set render (scene) wise).
Edited on Jul 19, 2007 19:22 GMT

Eric Preisz   (Jul 19, 2007 at 20:01 GMT)
Phil, I'm not sure I know what shader LOD is. I googled it but didn't see much. Post a link if you've got one. If shader LOD is a pixel technique I'm probably not a fan. Since we can think of the vertex stage as a "for" loop that contains a pixel "for" loop, I think that putting more stress on the pixel kernel would not help since we already have support for lod. If shader LOD is a vertex technique, I'm in favor, but that's obviously not going to do much for a texture issue. But you have sparked my interest.

From the performance side in Ogre, what I saw was a texture kernel that limited the performance of our entire pipeline on both ATI and NVIDIA cards. Maybe their implementation is blowing the texture cache? If their goal is to save texture bandwidth, they aren

Phil Carlisle   (Jul 19, 2007 at 20:56 GMT)
Eric, your post got cut.

Shader LOD is dropping a shader from a complex shader (say one with multiple passes to achieve a complex effect) to a simpler version based on distance or complexity.

For example, dropping a normal map shader down to a simple diffuse shader based on object projection size or distance from camera. Basically the same as geometry LOD, but making shaders fall back to dumber versions of themselves. Eventually falling back to a diffuse texture only, then a const colour only.

I cant remember where I remember reading about it. Might be on ATI or NVIDIA site. But it makes sense. I read about using similar LOD for bones too (basically having your bone animation structured so that you can drop non important bones off the bone transform list easily and do animation LOD too.

Eric Preisz   (Jul 19, 2007 at 22:09 GMT)
Darn apostrophe.

Anyways, I understand what you are saying now so most of what I wrote is not important anyways.

Yes, lodding a shader can be a good idea. Shader optimization is something Im still digging deeper into, but what I ve read says that a dynamic shader branch will cost you around 6 units overhead. If the work you are reducing is less than 6, then its a waste. If its more than 6, then you can potentially save.

There are some distinctions between how a branch is compiled that muddies the waters and those I havent deeply explored yet.

Nvidia and ATI both have shader perf tools for profiling, but since my students do know shaders that well when they come into my class, I dont go that deep into it.

Shaders use mimd and simd hardware for instruction level parallelism. There are some really interesting optimizations you can do if you understand this specialize vector hardware. Swizzling and masking are two that stand out as something that works in a shader but boggles the mind of a CPU programmer.

The concept of SIMD is same concept that dominates the cell processor and SSE instructions. In fact, my students did a SIMD lab today.

To answer your other question about profiling other engines. No not really. Without source, or the hook that enables nvperf hud Im a fish out of water. Ive mostly optimized open source, tge, and research apps that I created to isolate ideas.
Edited on Jul 19, 2007 22:10 GMT

Casey Weidner   (Jul 19, 2007 at 22:13 GMT)
Interesting concepts Phil, i would love to read some white papers on this stuff, i could imagin that it would probably be a pain in the ass to add those sort of things to TGEA, seems as i would think it would change the model formats, which may require a rewrite of the exporters.

Eric Preisz   (Jul 19, 2007 at 22:21 GMT)
@ Casey,

No, not difficult in TGEA. The great thing about shaders are that they are largely decoupled from the a game engine. All you would need to do is send a constant about the rendering objects distance and use that in your shader.

Although, if possible, try to move any branching that you are tempted to do in the pixel shader to the vertex shader. Unless you have great quick pixel rejection from sorting front to back, your pixel shader will be called more often than your vertex shader.

Phil Carlisle   (Jul 20, 2007 at 08:33 GMT)
Actually, in TGEA with the procedural materials, it really wouldnt be that difficult. What you'd have to do, is add some code to allow hints to the procedural material system to instruct it how to fall back. It currently falls back on less capable hardware, so its not a huge stretch to imagine that it could also fall back to a less expensive material entirely.

Eric Preisz   (Jul 20, 2007 at 14:01 GMT)
Falling back to a different shader may or may not help you. If you are CPU bound, and you are switching shaders to fallback, you could be causing more CPU work and slowing down your game. If you are limited by either of your shaders, then it would help even though you are causing more render states. Since normally, we are limited by the CPU more, I would be careful not to create more state changes. Im not sure if TGEA is sorting states or not, been a while since I played with it. From what Ive read, you should sort you scene by pixel shader, vertex shader, and then texture. After that, I dont know what states you should sort by.

You must be a member and be logged in to either append comments or rate this resource.