Game Development Community

Perf numbers for t2d

by Jason Swearingen · in Torque Game Builder · 08/10/2005 (1:15 am) · 36 replies

I was currious about the performance abilities of T2D , so i modified the basic tutorial into a little "Stress" app.

Specifically, i was wondering about the number of sprites (fxsceneobject2d's) i could have on the screen.

for these tests, i randomly place the sprites around the screen (-40 to 40 BY -30 to 30)
also, collisions were not turned on for these sprites
and no effects (such as flame) were added to them

With static sprites: (using enemy1 graphic, sized 50 x 25 )
It seems that with anything under 300 sprites alive at a time, performance is pretty good on my system, but once it gets over that level, speed takes a very sharp nosedive, running at about 40% (but constant at about that level)

With sprites set to rotate: (using enemy1 graphic, sized 50x25)
perf goes down when about 300 sprites are alive.

With sprites set to rotate: (using enemy1 graphic, sized 1 X 1)
perf goes down when about 200 sprites are alive.

With sprites not rotating: (using enemy1 graphic, sized 1 X 1)
perf goes down when about 200 sprites are alive.
this seems weird, probably there are rendering savings with big graphics overlapping that we dont get here



This was done increasing by about 20 sprites per second, and it took maybe a second or two for the performance to go from full speed to half.

Probably system speeds can effect this, but i'm wondering if others have taken time to run these types of numbers on T2D and what they've found.

PS: I'm using my t2d.net wrapper, if someone has a perf test written in torquescript i would be interested in porting it to .NET and comparing results to see how much my wrapper helps (or hurts) perf.


anyway, something to discuss!
Page «Previous 1 2
#1
08/10/2005 (2:42 am)
Another question, i've seen some tutorials from melv and such have a status bar on top that show various memory/object status... that, and bounding boxes for the objects.

Is there a tutorial around showing how to implement this stuff easily?

thnx
#2
08/10/2005 (4:29 am)
Add this : t2dSceneGraph.setDebugOn( BIT(0) );

right after : new fxSceneGraph2D(t2dSceneGraph); // or whatever you name your scenegraph...
#3
08/10/2005 (8:08 am)
There's a whole list of bits in the reference doc you can turn on for debugging information in a scenegraph.
#4
08/10/2005 (8:13 am)
This might help :)
#5
08/10/2005 (11:21 am)
Quote:sized 1 X 1

That might be you're problem. The engine likes nice power of 2 sized sprites: 2, 4, 8, 16, 32, 64, 128, 256, 512. I'm not sure if it handles 1 correctly even though it is a power of 2. I've also heard that over 512 causes problems and older cards don't like anything over 256.

Sadly I tried finding some links talking about it in more depth but it appears my Google Karma is low today. Maybe someone else will have better luck.
#6
08/10/2005 (12:22 pm)
Good info guys, thanks!

@phil, i was setting the fxsceneobject.SetSize(x,y), which i belive is not actually a pixel size, but size relative to the screen... i dont know how big it is, but yeah a lot bigger than 1 pixel i'm sure.

though it may be interesting to use different sized images used for the sprites, see how that effects things...
#7
08/11/2005 (4:19 am)
About T2D performance. I actually coded a add-on batch renderer the last three days. This included batching all the rendering and do within-a-layer sorting to minimize state changes and to step off the immediate mode rendering that is now used in T2D.

I had hoped that rendering each layer with glDrawArrays() would help performance, but it did not for my configuration which I conclude is fully CPU limited (and not anything else). Which I found odd, suprising and disappointing at the same time. (I didnt have a profiler unfortunately).

Maybe systems with graphics cards less than mine however could benefit as the amount of GL commands given and the amount of state changes was drasticaly lowered (especially with the tilemaps). I did sorting on primitives (quads/lines), texture and blendmode and then renderered as much quads/lines in one go as possible.

Anyway, I will make some more tries to optimize rendering, but that will need more changes to the main code, which I was able to prevent in this setup where I only needed to replace every glCommand() by glx->glCommand() and then flush a batch when needed.

Test system: AMD2700+ 512Mb / Radeon 9800Pro, game: EggStatic.

Anyone else got some thoughts how to improve rendering performance for both low & high end?
#8
08/11/2005 (8:00 am)
Would you be willing to release your diff, even if only privately in e-mail to me? I'm close to finishing a sample title and then I want to port T2D to TSE, which will require both the batching work you've just completed, plus a port to GFX. It sounds like you've done at least half, if not both parts. Anyway, my e-mail address is: jason_cahill@verizon.net.
#9
08/11/2005 (8:27 am)
Quote:i was setting the fxsceneobject.SetSize(x,y), which i belive is not actually a pixel size, but size relative to the screen... i dont know how big it is, but yeah a lot bigger than 1 pixel i'm sure.
The lower performance makes sense then because T2D is having to scale the original image down. This can be pretty intensive, especially if it is going from a 512x512 to 1x1. In practice if you wanted to have 200 1X1 sprites then it would make sense to create an image already scaled.

I wonder if T2D optimizes on the fact that they are all the same image being scaled to the same size? Hmmm.... sounds like a quest.

Of course in practice that isn't the kind of optimization that would really gain you much.
#10
08/11/2005 (12:13 pm)
Quote:Which I found odd, suprising and disappointing at the same time.

It isn't surprising. Few are the 2D games that render enough stuff to make a TNT2 Ultra work (outside of potential video memory issues, of course). Rendering is almost never going to be your bottleneck unless you've got lots of particle effects and so forth going on.

Quote:I want to port T2D to TSE

Just curious: why do you want to port T2D to the TSE? I could understand wanting to do some shader work in 2D games, but it'd be far easier to just add some OpenGL shader support in T2D than to port the backend renderer to TSE. And the TSE stock shaders aren't going to work in a 2D game environment, so you'll have to write new ones anyway.

Quote:I wonder if T2D optimizes on the fact that they are all the same image being scaled to the same size?

It doesn't really work that way. T2D doesn't do any scaling; it's just sending triangles to the graphics card. It's the graphics card that's doing the scaling (and it's pretty fast at it. Indeed, it always does scaling, even if it doesn't have to).
#11
08/11/2005 (12:51 pm)
Quote:It doesn't really work that way. T2D doesn't do any scaling; it's just sending triangles to the graphics card. It's the graphics card that's doing the scaling (and it's pretty fast at it. Indeed, it always does scaling, even if it doesn't have to).

Ah yes! Good point, I always forget that even though its 2D it is hardware accelerated.

Is it really the texture overlap that is saving so much performance?
#12
08/11/2005 (1:01 pm)
I talked with some "real" game engine programers (i dont consider myself close) and they mentioned that slow-down at about 400 sprites on-screen is very bad.

I know that T2D is early adopter, so I'm pretty sure that performance will be addressed before release....

but it would be good if by release time there was some documentation around "how to optimize performance in your game" that gave a do's and dont's ...

anyway, i am glad i brought this subject up.. there's a lot of great discussion around this :)
#13
08/11/2005 (8:56 pm)
Yeah the engine isn't fast, but hopefully it'll improve. It's not doing any sort of optimizing at all yet, so there's plenty of ways to make it faster. I'm sure Melv has some plans. :)
#14
08/11/2005 (10:10 pm)
From what I see, T2D would need a serious restructure to do so. Running through objects and calling object->render(), while being nice and clean, is clearly not that fast. For better performance it seems, data would have to be organised in a way more close to how objects are rendered.

If T2D was not EA anymore, I would be happy to do so..

@Smaug: Could I get your email and ask you a question before I start another 3-day learning course in rendering? :)
#15
08/12/2005 (7:38 am)
So, stepping back and thinking about this about... assuming you are using the standard screen size of 100x75, your first two tests are actually better than I would have expected. a 50x25 sprite means it consumes 1/6 of the pixels on the screen (1/2 of the width * 1/3 the height = 1/6 of the screen). 300 sprites that large implies 50x overdraw per frame before noticing slowdown. That's phenomenal!

Remember there are two factors that determine video perf: total number of polygons and total fill rate. It's pretty clear that fill rate is not an issue from the first test. For the second test, if you could make one minor change: make the image map for the 1x1 sprites use a 16x16 texture, then I bet 200 would jump up quite a bit. But, as Edo points out, it appears that we are limited by a total polygon cap because of a lack of efficiency in the way we render.

I can't speak to good programming patterns with OpenGL, but I can speak to DirectX: the DirectX documentation specifically states in it's performance section that batching is critical. They recommend that you not call DrawPrimitive (the basic command for submitting polys) more than 300 times per frame. So, if you want lots and lots of polys, then you need to send large quantities in a single call, using the same state and texture data.

As I said, I've been interested in going to TSE because the thought of pixel & vertex shaders in 2D is really cool, not because it's practical. The first step in moving to GFX would be to pivot all render calls such that you are basically setting an ImageMap's texture once, then batching every object that uses it in one draw call, then moving forward.

Anyway, my basic point is: 300 sprites/frame is not bad in your first case and second. In your 3rd and 4th cases, it shows that we could benefit from some optimization.
#16
08/12/2005 (12:35 pm)
Quote:assuming you are using the standard screen size of 100x75, your first two tests are actually better than I would have expected.

I want to hear back from him about his screen size. We should be talking in pixels, not % of screen or T2D standard units. Pixels are the units that graphics cards care about, and all tests should be done with a 1:1 T2D unit-to-pixel ratio.

Quote:I can't speak to good programming patterns with OpenGL, but I can speak to DirectX: the DirectX documentation specifically states in it's performance section that batching is critical. They recommend that you not call DrawPrimitive (the basic command for submitting polys) more than 300 times per frame. So, if you want lots and lots of polys, then you need to send large quantities in a single call, using the same state and texture data.

I can speak to OpenGL patterns. While batching is nice and does provide a performance boost, the most important thing (in either API really, though for slightly different reasons) is to reduce the number of state changes. Matrix alterations, changing the texture (particularly that one), etc. In many way, it boils down to the same optimizations, except that it's OK (relatively) to call glEnd-equivalent calls (whether actually glEnd or glDrawArray or glDrawEleements) thousands of times per frame. D3D has some inefficiencies to its marshalling of driver calls that make their equivalent calls (DrawPrimitive) more expensive than they need to be. What matters in GL is that you, for example, do things like make sure that your tilemaps all come from the same texture image (or perhaps 2 images, but no more than that), or that you combine sprite imagemaps so that most, or all (depending on how big a texture you use), of the sprites that you use come from the same texture. Once you do that, it's just a matter of going into the engine and sorting by texture (where possible. Obviously draw order and layers take higher priority).

But, once again, this all assumes that rendering is the bottleneck here. Always find out where the bottleneck is before repairing it.

Quote:As I said, I've been interested in going to TSE because the thought of pixel & vertex shaders in 2D is really cool, not because it's practical.

You don't need TSE to have vertex and fragment shaders in 2D. OpenGL is quite capable in these regards, and it's easy enough to add such functionality directly to T2D.
#17
08/12/2005 (1:30 pm)
@Smaug and all,

I was using the setup used for the "Basic Tutorial", nothing different than what i mentioned.

800x600 resolution, windowed.

My computer system is actually fairly beefy, though my videocard isnt BLING BLING.
3.4ghz P4 HyperThreaded
1gb ram
1HD, 113gb free space
Video is ATI FireGL v3100 (2d professional card, not much 3d acceleration) 128mb ram. Dual Monitors both running at 1280x1024, 1 @60hz refresh (LCD) other @100hz refresh

...........

as for how I am doing rendering, I am effectivly calling "create enemy" many times per schedule loop. Here is my code (though it is T2D.NET, but you'll get the gist of it)

/// <summary>
/// creates a new enemy, and schedules CreateEnemy to be called again in 2 seconds.
/// </summary>
/// <param name="arguments"></param>
/// <returns></returns>
public string CreateEnemy(string[] arguments)
{
for(int i=0;i<int.Parse(this.inputArguments[6]);i++)
{
	
	Enemy newEnemy = new Enemy(this.gameScene,Client.imageMapCache["T2D/client/images/enemyship1"]);
	
	this.numberEnemies++;
	if(this.numberEnemies%10==0)
	{
		Torque.T2D.ConsoleFunctions.Output.Echo("ENEMIES CREATED == "+this.numberEnemies.ToString());
	}
}
//creates an enemy (since this is same function, will loop)

SimFunctions.Schedule( (int)(2000/double.Parse(this.inputArguments[6]))
	, 0
	, new CallDotNet.FunctionHookDelegate(this.CreateEnemy));

return null;
}
public int numberEnemies = 0;

if you are actually looking at that code, then this.inputArguments[6] is the string "5".. i increase that number to increase the stress on the engine (quicker that is)
#18
08/12/2005 (2:08 pm)
@ Jason - I dont suppose you have the built in profiler results for your tests do you?

Or if you havent run it/dont know about it -
1, open the console and type profilerenable(true);
2, close the console and do some stuff
3, open the console and type profilerdump();

The results are in your console log.

There only seems to be a basic set included in the T2D engine code but it is easy enough to add your own to whatever code you want.

include platform/profiler.h

enclose the code you want to include in the profiling in profile start/end pairs which can be nested.

PROFILE_START(NameOfCodeSectionBeingProfiledToDisplayInLog);

...
code to be profiled
...

PROFILE_END();

Below is the basic profiler dump from running the spaceshooter demo for a few minutes

Ordered by stack trace total time -
% Time  % NSTime  Invoke #  Name
  0.814 -99.186        0 ROOT
100.000  75.765  3336922   MainLoop
 24.102   0.102    25603     ProcessTimeEvent
 15.617   0.046    25603       RenderFrame
 12.743  10.668    25603         CanvasRenderControls
  1.873   1.873   131504           DrawText
  0.123   0.123   308353           MemoryAlloc
  0.079   0.079   308353           MemoryFree
  0.001   0.000      878           MemoryRealloc
  0.000   0.000      617             MemoryFree
  0.000   0.000      617             MemoryAlloc
  2.654   2.654    25603         SwapBuffers
  0.174   0.174    25603         CanvasPreRender
  8.116   8.095    25603       ClientProcess
  0.014   0.014    61615         MemoryFree
  0.006   0.006    24722         MemoryAlloc
  0.001   0.000     1197         MemoryRealloc
  0.000   0.000     1192           MemoryFree
  0.000   0.000     1192           MemoryAlloc
  0.231   0.219    25603       SimAdvanceTime
  0.009   0.009    45198         MemoryAlloc
  0.002   0.002    11340         MemoryFree
  0.000   0.000      377         MemoryRealloc
  0.000   0.000      376           MemoryFree
  0.000   0.000      376           MemoryAlloc
  0.032   0.032    25603       ClientNetProcess
  0.003   0.003    25603       ServerProcess
  0.001   0.001    25603       ServerNetProcess
  0.134   0.126      625     ProcessInputEvent
  0.005   0.005    15208       MemoryAlloc
  0.002   0.002    10656       MemoryFree
  0.000   0.000        2     MemoryAlloc
  0.000   0.000        2     MemoryFree
#19
08/12/2005 (9:43 pm)
Just another thing about my add-on batching & state minimizing. The drawback of this late-stage sorting is that of course CPU use went up and the rendering speed went down. So for my system, the rendering is CPU bound and not (1) fillrate or (2) geometry limited. Only in case (2) batching can help.. and as Smaug suggested.. even a TNT2 prob. does ever run into that limitation with T2D but maybe some integrated chipsets do.

So, at his point batching & minimizing state changes seems useless without addressing the CPU usage first. This btw seems very common for any modern GPU.

see for example also the openGL faq 22.090 which exactly confirmed my tests:
Quote:Are display lists or vertex arrays faster?

Which is faster varies from system to system.

If your application isn't geometry limited, you might not see a performance difference at all between display lists, vertex arrays, or even immediate mode.

Actually, Nvidia recommends that you can throw in all kinds of nice effects, such as shaders... as they will come for free as long as you are only CPU limited!
#20
08/12/2005 (10:37 pm)
@David I was planning on using .NET Performance counters for my perf stuff in the future, but i didnt know about torque's built in perf functions.

If i had to choose, i'd put more faith in the built-in .NET stuff, though the torque stuff is a great way to come up with benchmark numbers that compare TorqueScript to my T2D.NET implementation.

I probably wont bother adding the perf stuff to my stress setup, but i definatly will be adding it to my implementation of the tutorial, so i can compare the render times.
Page «Previous 1 2