Bottleneck - processor, graphics or network bandwidth?
by Mike Stoddart · in Technical Issues · 10/03/2002 (6:29 pm) · 5 replies
In a game like Torque, where exactly is the bottleneck when there are a lot of players connected:
- the server's processing time
- the client's graphical update time
- the network bandwidth
I always assumed that it was in the server's processor time, in that the server will eventually become bogged down in CPU cycles.
- the server's processing time
- the client's graphical update time
- the network bandwidth
I always assumed that it was in the server's processor time, in that the server will eventually become bogged down in CPU cycles.
#2
If that makes sense!
10/03/2002 (8:54 pm)
Would it help much if another server was attached to share the processing power? For example the two servers shared the calculations for objects, and informed each other (and the clients) of updates to the ones they control?If that makes sense!
#3
Distributed systems may also be difficult to get running stablely and reliably on a millisecond time scale; the amount of interaction needed between servers for a full-on physics simulation can be tremendous, so they'll need a very fast and wide connection.
You might want to ask yourself if it would be better to buy a quad-xeon server box than to re-engineer things to run on multiple boxes.
10/03/2002 (9:30 pm)
Certainly... Of course, you'd have to do some fundamental tweaking to Torque's networking model to get such a thing to work ;)Distributed systems may also be difficult to get running stablely and reliably on a millisecond time scale; the amount of interaction needed between servers for a full-on physics simulation can be tremendous, so they'll need a very fast and wide connection.
You might want to ask yourself if it would be better to buy a quad-xeon server box than to re-engineer things to run on multiple boxes.
#4
Depends on what you mean by "share" if you mean by creating a parallel processing system that calculates "some" of the work that is non duplicate on each server then yes. But that is probably WAY more work than is worth undertaking.
Occum's Razor dictates that "Throwing hardware" at the problem would be less work and less money ( if in your case time == money ) :)
Problem is one server could not process "some" object and another machine process "some" objects if those objects were destined to interact, because both machines would have to know the complete current state of all the "interesting" objects in the simulation.
Most parallel systems I have worked on broke the work up into bits that were did not rely on other machines and only aggregated the data at the end of the calculations.
Take the game simulation layer, it would be very difficult to parallelize things, if you split it up into layers it might benefit extremely large worlds where each terrain is on a diferent server and when players approach a terrain boundry they are mirrored on the other server until they are not visiable on the original. But what do you do about "far reaching" visibilities like radar?
More sophisticated parralellizing of the actual simulation as in collision detection, particle generation, physics simulation and what all are so tightly coupled to TIME and in code that parralellizing them would probably slow things down tremedously unless you had THOUSANDS of "interesting" objects and then the internet as it stands would limit what you could sent back to the client in form of simulation information in real time.
A good example is multi-treading on some operating systems is so inefficient in some cases that it incurs more overhead in the creation of the thread, management of the context switching and so on that it actually SLOWS down the system dramatically in typical non-dramatic cases.
Then you have the network overhead of syncing the servers.
A good example is an application I wrote to parrallel render frames in Lightwave to replace the functionality that NewTek removed when it released a newer version of screamernet way back in the day.
It did things like quadrant rendering per machine, single scan line per machine, or arbiratry sub image rendering per machine on a frame by frame basis. And then a master thread composited all the sub frame renders to the final frame. After I spent all the time creating this sophisticated tool I found out why NewTek took out this functionality.
Because unless the scene was a contrived scene, and most were "real world" scenes and far from contrived. In almost every case, a small part of the frame would take more time than all the rest of the frame combined. Thus I had a single processor working overtime and longer on a small part and the rest of the system waiting. So I refactored the master compositor to cache the results and combine them asynchronisly. This improved things greatly, but after some testing I realized that all this work got me in a rare case 10% performance increase on a single frame! Over all render times for a sequence were practically the same as each processor doing an entire frame and having 4 processors doing a frame each. It was actually SLOWER when using full frame pixel filters and the such because of internal rendering of duplicate information by multiple processors!!!!
This got me very interested in what I do today, distributed systems development and concurrent processing.
Get a good book on Concurrent Processing and read up on the overhead of processing power and pitfalls and most importantly hidden development effort to do concurrent processing, especially in a distrubted environment.
10/03/2002 (9:57 pm)
Quote:Would it help much if another server was attached to share the processing power? For example the two servers shared the calculations for objects, and informed each other (and the clients) of updates to the ones they control?
If that makes sense!
Depends on what you mean by "share" if you mean by creating a parallel processing system that calculates "some" of the work that is non duplicate on each server then yes. But that is probably WAY more work than is worth undertaking.
Occum's Razor dictates that "Throwing hardware" at the problem would be less work and less money ( if in your case time == money ) :)
Problem is one server could not process "some" object and another machine process "some" objects if those objects were destined to interact, because both machines would have to know the complete current state of all the "interesting" objects in the simulation.
Most parallel systems I have worked on broke the work up into bits that were did not rely on other machines and only aggregated the data at the end of the calculations.
Take the game simulation layer, it would be very difficult to parallelize things, if you split it up into layers it might benefit extremely large worlds where each terrain is on a diferent server and when players approach a terrain boundry they are mirrored on the other server until they are not visiable on the original. But what do you do about "far reaching" visibilities like radar?
More sophisticated parralellizing of the actual simulation as in collision detection, particle generation, physics simulation and what all are so tightly coupled to TIME and in code that parralellizing them would probably slow things down tremedously unless you had THOUSANDS of "interesting" objects and then the internet as it stands would limit what you could sent back to the client in form of simulation information in real time.
A good example is multi-treading on some operating systems is so inefficient in some cases that it incurs more overhead in the creation of the thread, management of the context switching and so on that it actually SLOWS down the system dramatically in typical non-dramatic cases.
Then you have the network overhead of syncing the servers.
A good example is an application I wrote to parrallel render frames in Lightwave to replace the functionality that NewTek removed when it released a newer version of screamernet way back in the day.
It did things like quadrant rendering per machine, single scan line per machine, or arbiratry sub image rendering per machine on a frame by frame basis. And then a master thread composited all the sub frame renders to the final frame. After I spent all the time creating this sophisticated tool I found out why NewTek took out this functionality.
Because unless the scene was a contrived scene, and most were "real world" scenes and far from contrived. In almost every case, a small part of the frame would take more time than all the rest of the frame combined. Thus I had a single processor working overtime and longer on a small part and the rest of the system waiting. So I refactored the master compositor to cache the results and combine them asynchronisly. This improved things greatly, but after some testing I realized that all this work got me in a rare case 10% performance increase on a single frame! Over all render times for a sequence were practically the same as each processor doing an entire frame and having 4 processors doing a frame each. It was actually SLOWER when using full frame pixel filters and the such because of internal rendering of duplicate information by multiple processors!!!!
This got me very interested in what I do today, distributed systems development and concurrent processing.
Get a good book on Concurrent Processing and read up on the overhead of processing power and pitfalls and most importantly hidden development effort to do concurrent processing, especially in a distrubted environment.
#5
10/04/2002 (2:45 pm)
Sounds like you guys need a new tool. We may have exactly the kind of thing you are looking for, distributed architecture and low latency. We are looking to demo the product - anybody have a working engine we could talk about engaging in a compelling demo to prove the soundness of a truly massive architecture?
Torque Owner Jarrod Roberson
Since the server does not render much of the game that the client renders it does less work there, but much more because it is mediating lots of data between all the clients.
Lots of render expensive things can slow the client down that has no impact on the server.
Just as lots of non-render data that is game state can put a heavy load on the server that the client only has to deal with for itself where the server has to do that much work * number of clients.
Network issues would not be as much of an issue if the server has a big enough pipe to handle all the clients + some overhead.
Then you have hardware issues, misconfigured or under powered machines/connections for servers will affect everyone, clients with underpowered machines/connections will not affect everyone, just themselves.
I don't think there is one single metric that will determine the performance of a game in the wild.