Jump to content
Sign in to follow this  
Rexxenexx

Enable the arma2server.exe to use Cuda cores

Recommended Posts

forgive me if I'm wrong... but first off CUDA is meant for graphics rendering, not server side calculations.

secondly, it would be a waste of time as most server boards that have onboard don't even have CUDA capabilities.

Share this post


Link to post
Share on other sites

It's for any processing. Check out http://www.nvidia.com/object/cuda_home_new.html

Also it wouldn't be a waste of time since a majority of users don't own real rack style servers. Typically a user will have a dedicated graphics card. Nvidia owns most of that market, so much so that developers are including Cuda enhancements in new software.

Share this post


Link to post
Share on other sites

If I understand correctly. :rolleyes: Who runs the server via arma2server.exe and has a graphics card Nvidia These drivers CUDA good for help?)

( or I did not understand correctly :( )

Share this post


Link to post
Share on other sites
Who runs the server via arma2server.exe and has a graphics card Nvidia

I do. I have two other computers with Nvidia cards other than the one I use now. One has 8800GTX the other a 9600GT. Both support Cuda.

These drivers CUDA good for help?

Yes. Quad or Dual core Intel vs 300-500 or so Cuda cores. Now it's not that simple as 4 vs 500 but the power to offload some processes -er a bunch of process- from the CPU is there. Free! Weeeeeeeee!!!! Your mattress is FREE!!!!

Maybe their should be a poll of how many of us are using their old computer with an Nvidia card as a dedicated server. If arma2server.exe could offload some processes to the GPU, since it is not being used, it would increase the FPS on the server. The situation being: A User at home has two computers. Both are normal, average, every day computers. Both have Nvidia cards. The user runs arma2server.exe to serve the game. The user plays the served game on the other computer. Is that clearer? If not Google translate it.

More reading = http://en.wikipedia.org/wiki/CUDA <---Edgeamacashun

BTW it's 3:30AM here :yay:

Share this post


Link to post
Share on other sites

Rexxenexx

Thanks for the detailed response! :thumbsup: :)

Share this post


Link to post
Share on other sites
It's for any processing. Check out http://www.nvidia.com/object/cuda_home_new.html

Also it wouldn't be a waste of time since a majority of users don't own real rack style servers. Typically a user will have a dedicated graphics card. Nvidia owns most of that market, so much so that developers are including Cuda enhancements in new software.

I totally disagree with that.

I think most users use real rack style servers or we'd have crap performance on all missions.

Look at any of the popular or bigger servers, I doubt they are hosted on a second box somebody has. Most users don't have a good enough connection to host.

Share this post


Link to post
Share on other sites

CUDA is waste of time ...

single hardware vendor specific proprietary architecture of NVIDIA origin

in today's world if You think about 'additional' computing on GPU then OpenCL is best way to go

http://www.khronos.org/opencl/

it's open standard, adopted widely between hardware and software developers ...

anyway use of GPGPU computing for 'game server' is atm. highly unlikely

as most of 'usual dedicated servers' utilize GPUs w/o needed calculating power/features

thus the key orientation for dedicated server is multithreading to utilize more CPUs and CPU cores

for client side or user build servers with powerful GPUs the power of GPGPU will be used e.g. for physics (like opensource Bullet via OpenCL)

Share this post


Link to post
Share on other sites

im sorry, but do i understand this right?

in both cases ( cuda and opencl ) its quite possible that i could run a client pc, with 1 gfxcard for arma *leds say ati and another card(s) *lets say nvidia

and configure the nvidia cards to offload my cpu?

Share this post


Link to post
Share on other sites

Yea in any case (cuda or opencl) you could have three supported cards and almost totally free the cpu. Personal experience shows me that no matter what when AI position and handlers on the units like first-aid, or someother script, multiply/spawn my cpu can't take it and the servers FPS drop. The bandwidth and RAM usage doesn't change much (looking at it with #monitor). So any form of affordable processing enhancement to cycle through code faster is basically what I'm looking for.

More info on OpenCL (pretty much same link above): http://www.nvidia.com/object/cuda_opencl_new.html

Share this post


Link to post
Share on other sites

You quite misunderstood parallel computing. CUDA or any of these APIs is not a silver bullet which boosts anything easily. The real question is that whether you have anything to boost at all, that is, what is the most time spent upon in the game and can this be implemented on a massively parallel architecture.

Simulations usually has lots of parallel processes, but sometimes it is harder to exploit than it looks. One may consider e.g. putting path finding on the GPU, but does the increased communication cost worth it? In reality, GPU applications perform well only on quite specific problems (e.g. multiplying a large dense matrix with another), and sometimes even on strongly parallel problems the GPU is beaten by the CPU, especially when lots of program flow control is necessary (e.g. when the matrix is sparse).

Considering the fact the the A2 server currently uses about 4 CPU-active threads which can easily run on about 2 cores, there is plenty of unused processing power for ArmA2 even on current CPUs. I am happy if the devs increase the parallelism of the engine just twice the amount, and really use 4 cores.

Even 2x is hard and it is possible that cannot be done without re-writing significant portions of the entire game. I doubt that asking them to put the architecture on 400 threads is realistic when they could make only 4 at the moment.

Share this post


Link to post
Share on other sites
You quite misunderstood parallel computing. CUDA or any of these APIs is not a silver bullet which boosts anything easily.

I never said it would be easy. I do understand parallel computing enough to know off loading processes to one or more Nvidia cards would benefit greatly on a large multiplayer server. Especially if there are lots of AI.

I am happy if the devs increase the parallelism...

That's all I'm suggesting. I'm sure they're already on it but it would be best if they push beyond what is available now. We do need more processing power when it comes to the AI on a dedicated server. I don't see much of a need for single player or local hosting, but the reality is in three years our quad cores are going to be jokes for a server. BIS needs the motivation to natively/dynamically support multicores/multithreading beyond 4 or even eight in the future server application.

None the less, that OpenCL is a great idea since it's supported by AMD and Nvidia.

Share this post


Link to post
Share on other sites

To a layman the AI (1 ai, 1 thread) and physics (1 object, 1 thread) sound exactly like tasks that could be easily processed in parallel with a crazy amount of threads. Like a thousand, instead of one.

The engine is already capable of off-loading AI and physics to clients... why not to other threads on the same machine?

Share this post


Link to post
Share on other sites
I never said it would be easy. I do understand parallel computing enough to know off loading processes to one or more Nvidia cards would benefit greatly on a large multiplayer server. Especially if there are lots of AI.
What makes you so sure it would help in this particular case?

I mean - sure, if done correctly, there should be some difference, BUT would the difference be huge enough to make it worht the cost?

I am not convinced of that.

To me, the best way seems to be concentrating on CPU/threading optimizations (and maybe even program code optimizations in general), not CUDA or OCL.

The engine is already capable of off-loading AI and physics to clients... why not to other threads on the same machine?
What?

Share this post


Link to post
Share on other sites

Yes. The difference would be massive! Imagine server power from your second to last computer. That's HUGE! Using a multicore/hyperthreading cpu along side a GPU or three processing mass amounts of player locations, physics, and AI. I'd say that's the future. Remember when everyone was saying OFP is an old engine that's why it's so slow and lags on everything? The idea I originally brought up supports advancing the engine. "would it be worth the cost?" YES, absolutely. All you two or three naysayers, keep it to yourself. Try and contribute an idea not what is being done already. So far I have only heard one idea other than mine. Think outside the box. I don't want ArmAIII to have the same problems as OFP, ArmA, and ArmAII.

Share this post


Link to post
Share on other sites
off loading processes to one or more Nvidia cards would benefit greatly on a large multiplayer server. Especially if there are lots of AI

You seems to assume there is some automatic way to move processes from the CPU to the GPU. The truth is that they are quite different at the hardware level, and the potential speedup is entirely subject to the nature of the computation.

We can just guess as we do not know what is the actual bottleneck in the engine. "AI" usually means pathfinding in current games, and definitely the CPU hit of AI is mostly pathfinding in A2, as there is no other complex process - like learning - takes place.

Here is a study on accelerating A* with a GPU:

http://portal.acm.org/citation.cfm?id=1413968

The authors claim a significant speedup, but as usual, the details are quite important here. For example they used small graphs with lots of agents, and even with that

G0 and G1 workloads are of relatively low agent count and GPU performance scale is either none or insignificant. Speedup is substantially more noticeable for tens to hundreds of thousands of agents

Which A2 has nothing close to. And what if A2 is using huge graphs with a small amount of agents? It is possible that it changes the nature of the computation in a way which makes the GPU worse.

Probably the devs are just laughing on this as the pathfinding is already multi-threaded and is not a limiting factor. You can easily check with ProcessExplorer that the A2 server uses 4 CPU-consuming threads, only one of them being bottleneck.

Share this post


Link to post
Share on other sites
What?

In multiplayer, the AI in a client-led squad will be computed on the client, not on the server (this is evident by the AI being responsive, even when connection to server is completely lost). Also, the vehicles' physics the player or his AI-sub-ordinates have locality of (ie, are the driver of), will be computed on said client (also evident if you cut the connection).

Share this post


Link to post
Share on other sites

OK you guys made up situations that wouldn't benefit by co-processing and then trashed what you brought up. Congrats.

On any dedicated server the enemy and their respective physics are all calculated by the server. As they multiply the CPU is tasked too much lowering the "FPS" when you are "#monitor"ing it. I still -and will forever- stick with> Any added co-processing with the GPU would be beneficial on any dedicated server, even if it's just physics. BIS is already working out advancing CPU processes so that's a given.

How about you guys who are fixated on not using the GPU start a separate thread with new ideas on how you think the server.exe can be sped up? That would be constructive.

Share this post


Link to post
Share on other sites
OK you guys made up situations that wouldn't benefit by co-processing and then trashed what you brought up. Congrats.

On any dedicated server the enemy and their respective physics are all calculated by the server. As they multiply the CPU is tasked too much lowering the "FPS" when you are "#monitor"ing it. I still -and will forever- stick with> Any added co-processing with the GPU would be beneficial on any dedicated server, even if it's just physics. BIS is already working out advancing CPU processes so that's a given.

How about you guys who are fixated on not using the GPU start a separate thread with new ideas on how you think the server.exe can be sped up? That would be constructive.

It would only be beneficial to the tiny percent of junked together home dedicated servers.

Most servers you rent today don't even have a graphic card, for obvious cooling reasons and simply because they are useless for a headless machine. It is more cost/use effective to simply put better/more cpus inside than a gpu, which is a specialised kind of microprocessor.

Share this post


Link to post
Share on other sites
It would only be beneficial to the tiny percent of junked together home dedicated servers.

Most servers you rent today don't even have a graphic card, for obvious cooling reasons and simply because they are useless for a headless machine. It is more cost/use effective to simply put better/more cpus inside than a gpu, which is a specialised kind of microprocessor.

Of course servers don't have GPU's before they have programs which utilize them. Duh... That's like someone was saying in '96 that "oh there's no point of making 3D accelerated games because no computers have 3D accelerators". The applications must come first before there is any point in having the hardware.

Many communities build their own extremely high-end servers and send them to a co-location centre, so they have full control and guaranteed performance of the machine. If you got a good performance increase from using a GPU, I'm sure many people would be happy to pop in their old 9800's, gtx2xx etc in them.

Edited by Pulverizer

Share this post


Link to post
Share on other sites

For example in 1995 Siggraph I saw a demonstration of a military sim running off one computer connected to these boxes @ 1.5' X 2' X 1' in size each and their were four stacks @6' high -two on each side of a @100" screen- that were processing the information. If you increase the scale of the simulation and run out of power all you have to do is buy another box and daisy chain it. Each box cost $15,000. And this is in 1995!

Today we are talking, and some are trying to argue, that -on a CONSUMER simulation- we don't need any more processing power other than the CPU? Pure BS. Nvidia is powering ahead with easy ways for companies to utilize the GPU as a CPU to keep costs down. Even if the resyncing in serial processes is slower than just going through the CPU it's worth it when you have 64 players playing Warfare in five or more towns at the same time. That's so worth it! Were talking about consumers here folks. "Oh NO someone is fighting another town!! OH NOOOOOOZE!!" Remember that? ;) That's what I want to avoid.

If I wanted to buy a multiple Xeon rack I'd just tell BIS F-it don't give us a dedicated server just pull an EA and SELL us control of a server via a monthly subscription through d-bag hosts.

I don't see any issue of wanting to stuff three video cards in a computer and having a dedicated server use the cores on the CPU and all three GPU's to run Warfare with 64 players and all I have to worry about is the bandwidth. I've got Fios so my bandwidth is good so long as the dedicated server is good.

Share this post


Link to post
Share on other sites
OK you guys made up situations that wouldn't benefit by co-processing and then trashed what you brought up. Congrats.

Not really. You imagined a world where, I quote you, "Any added co-processing with the GPU would be beneficial on any dedicated server". This is simply false. If you understand "beneficial" which concerns the entire A2 project, you have to consider development time as well. What is more important, a new extension pack, or figuring out how to do GPU computing?

And as we discussed, plenty of examples exist where off-loading to the GPU makes the app slower. You do not know that A2's bottlenecks are not alike. Neither can you (or me) be sure about the opposite, although you seems to be confident about the first being true for some reason (just bought a new NVIDIA card? :) )...

What I am saying is that the engine is lacking in many other aspects, and it is ridiculous to ask the devs to spend months on an obscure GPU acceleration which probably gives some benefit on limited architectures, all that by using still evolving APIs. Even using up all CPU cores should come before GPU computing. There are still hundreds of bugs listed on dev-heaven. Fix at least half of them first.

I am actually running a dedicated server on a machine with a unused but powerful NVIDIA card in it, still, this is my honest opinion.

Share this post


Link to post
Share on other sites

Don't try and put words in my mouth by oversimplifying my statements. Sure it takes time to develop but that's not what we are discussing.

Also I'm not an Nvidia hugger if that's what you are implying. Nvidia is the only manufacturer that unloads a ton of tools for developers to work/experiment with. Simple as that. I really could care less about brands. The complaint I have and others have is the server FPS under load. I don't see the harm in at least trying to improve the server app with a different method. It just seems like a waste to have the videocard to nothing while it's fully programmable. I agree the latency is high for simple serving, but if the scenario is large enough to pin the CPU I still think there could be enough repeatable code to offset the latency.

What is more important, a new extension pack, or figuring out how to do GPU computing?

How to do GPU computing.

Edited by Rexxenexx

Share this post


Link to post
Share on other sites

How about this. Using the GPU to do ragdoll physics? dedicated server (ragdoll calcs) >> position of tweens >> clients.

BTW:

The next generation CUDA architecture (codename: "Fermi") which will be standard on NVIDIA's released (GeForce 400 Series [GF100] (GPU) 2010-03-27)[15] GPU is designed from the ground up to natively support more programming languages such as C++. It is expected to have eight times the peak double-precision floating-point performance compared to Nvidia's previous-generation Tesla GPU. It also introduces several new features[16] including:

* up to 512 CUDA cores and 3.0 billion transistors

* NVIDIA Parallel DataCache technology

* NVIDIA GigaThread engine

* ECC memory support

* Native support for Visual Studio

So you should be able to more simple stuff with the 400s irrespective of the latency compared to the CPU, for cromag reasons.

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×