Jump to content
k3lt

Low CPU utilization & Low FPS

Recommended Posts

... my best guess would be that everything is constantly loaded into buffers from memory rather than from arrays/vectors.

... yes, this is one of the problems arma engine still have to deal with.

Most of the used memory (multiple GB) is allocated dynamically (via custom allocator) in very smal pieces (below 8kB, most frequently allocated size is 24 Byte).

The cost of such small memory object allocation itself is not the problem, but the fact that the location of each individual allocation is more and more scattered in address space (instead in continous blocks) over time,

results in very poor cpu cache locality.

This is a general problem with dynamical allocated memory, especially if memory objects are very small, and probably it would make sense to rethink/rework this detrimental allocation pattern.

Share this post


Link to post
Share on other sites
... yes, this is one of the problems arma engine still have to deal with.

Most of the used memory (multiple GB) is allocated dynamically (via custom allocator) in very smal pieces (below 8kB, most frequently allocated size is 24 Byte).

The cost of such small memory object allocation itself is not the problem, but the fact that the location of each individual allocation is more and more scattered in address space (instead in continous blocks) over time,

results in very poor cpu cache locality.

This is a general problem with dynamical allocated memory, especially if memory objects are very small, and probably it would make sense to rethink/rework this detrimental allocation pattern.

I agree.

I remember this issue in my Operating Systems course, and if I remember correctly, since memory can't be split up/resized and shuffled around once allocated (at least in Linux, IIRC), I think the ARMA engine itself needs to rethink when and how often it accesses memory because of this problem. Ex. not constantly updating stuff that the user can't see.

Having the source code would be super nice lol; I actually want to try and change some things in it now.

Edited by ruhtraeel

Share this post


Link to post
Share on other sites
and probably it would make sense to rethink/rework this detrimental allocation pattern.

Especially since they have been doing it this way since OFP.

Share this post


Link to post
Share on other sites
I agree.

I remember this issue in my Operating Systems course, and if I remember correctly, since memory can't be split up/resized and shuffled around once allocated (at least in Linux, IIRC), I think the ARMA engine itself needs to rethink when and how often it accesses memory because of this problem. Ex. not constantly updating stuff that the user can't see.

Having the source code would be super nice lol; I actually want to try and change some things in it now.

What are you waiting for?

:EDITH:

Actually everyone who actually knows that the engine/the game is broken and unoptimized is invited to apply there and show this bohemian guys how to do it right.

Share this post


Link to post
Share on other sites
Myke;2705400']What are you waiting for?

:EDITH:

Actually everyone who actually knows that the engine/the game is broken and unoptimized is invited to apply there and show this bohemian guys how to do it right.

I agree, someone definitely should, they are in dire need of someone with that skill.

Share this post


Link to post
Share on other sites
Myke;2705400']What are you waiting for?

:EDITH:

Actually everyone who actually knows that the engine/the game is broken and unoptimized is invited to apply there and show this bohemian guys how to do it right.

Being able to work in an area such as video game design would be am incredible opportunity, as it was one of my primary reasons for choosing Computer Science as my degree.

Location becomes the biggest issue, however, as I am currently delegated to primarily oil and gas applications for programming in my city. The closest significant video game studios to me are Bioware, Edmonton and Relic Entertainment, Vancouver.

I do think that there are an abundance of talented programmers in Europe, though. Anyone ever notice that almost every 3rd party mod of a game has the majority of characters with thick European accents? Either there is some intrinsic value of European languages that perfectly suits video game characters, or they have some amazingly talented programmers over there, judging from some of the mods that I have played.

Or both.

Share this post


Link to post
Share on other sites

http://forums.bistudio.com/showthread.php?178133-Speeding-up-time-in-game-magically-makes-those-graphic-fps-issues-no-problem/page5

It turns out ruhtraeal that we have already hit Amadahls law from what Mike says in that page.

As ArmA 3's primary thread loop begins to become bogged down because it is waiting for sub-loops (AI)

and vice versa to catchup. Thereby forcing the rest of the threads to throttle-back.

According to a discussion I had with dwarden.

ArmA 3 benefits the most from not a "total overclock" but a singular core overclock.

This means if you have the fastest core running ArmA 3's primary thread which

I hope if you overclocked it ArmA 3 would just put the main thread

on it. So in theory the rest of the game shouldn't slow down.

And memory addresing what do you mean by that? The de-allocation / re-allocation of memory? Or hardware related issues as well? Such as the type of memory addressing a CPU within the cache uses e.g full-associative, set-associative or direct mappped? Or the primary motherboard mounted DIMM chips which run exceptionally slow and have an accessing only 1% of the time for all the L1,L2 and L3 caches? That as far as I know from a book that I am reading that is what happens "Upgrading and Repairing PC's 21st edition by Scott Mueller"

Yes I do say a lot of video-games waste a lot of computing power. In the sense of the stream-processors. Games rely on a lot of "real-time factors" Such as Raw clock speed,memory-bitwidth-interface,shader core count, and compute engine divisions, TMU's and ROP's , memory clock etc. But a large percentage of the CUDA cores or stream cores / unified shader's go to waste.

Not only this but as explained by Mike if the server is bogged down and the client is not the frame-rates to the client will be limited. The ONLY real solution to coding missions for ArmA 3 is to get smart with AI management avoid, overloading the scene, don't write functions that depend on other functions too much as this will open more threads then call on more threads which only results in more bog down etc. I mean how many people are running Altis Life *shudders* functionception. Functions as stated are a "non-scheduled envirnoment" meaning execute this code as fast as possible negating anything another thing is running scheduled means it is called on a stack basis rather than a best effort basis.

Personally I think Bohemia does pretty damn well for what tools they've got.

And again relating to your comment about Amahdals law we hit that long ago. Why do you think AMD decided to go for a heterogeneous core construction for their new CPU's? Hybrids between GPU and CPU. http://www.extremetech.com/computing/116561-the-death-of-cpu-scaling-from-one-core-to-many-and-why-were-still-stuck

That explains it well enough. That games being written currently are purely unable to take FULL advantage of a multicore platform in parallel processing. BF3 was starting to catchup, Bohemia might need to as well so maybe you are right ruhtrael that they do need to re-think their game engine but heres a question? Not many people are programming highly-parallel applications, who foots the bill? I remember when I was about 13 they were talking about the buzz around CUDA technology and the count of stream-processors they've been increasing that sort of computing I have only even seen reach close to full parallel capability is Nvidia Tesla scientific Computing systems because they are dependent on money and a lot of it drug companies rely on such simulations and even chemistry. Games on the other hand do not have the capability top use the 3468 cores provided by a Tesla GPU it will be more likely to use it if both the game and the system is configured for multi-core usage. E.g a system running a Dual-processor configuration. And non-SLI bridged video-cards SLI is a bottleneck through and through. So is Crossfire.

Edited by Polymath820

Share this post


Link to post
Share on other sites

I have to say Crysis 3 is pretty dam awesome at multicore optimization. I think its the only game that uses all my 8 cores.

There are a few more but I don't play those (hence why I forgot their names).

Share this post


Link to post
Share on other sites

And memory addresing what do you mean by that? The de-allocation / re-allocation of memory? Or hardware related issues as well? Such as the type of memory addressing a CPU within the cache uses e.g full-associative, set-associative or direct mappped? Or the primary motherboard mounted DIMM chips which run exceptionally slow and have an accessing only 1% of the time for all the L1,L2 and L3 caches? That as far as I know from a book that I am reading that is what happens "Upgrading and Repairing PC's 21st edition by Scott Mueller"

'The (first level CPU) cache misses directly influence the performance of the application. Having the data in the cache when the processor needs it is one way to optimize performance of an application. Additionally, because cache size is small, it is desirable to fill the cache with data that will be used before it is evicted from the cache.'

Considering this above, you probably can't find the average cache hit/miss ratio for arma by just reading a book :)

If you have a AMD CPU, you could use CodeXL (AMD's excellent profiler, free), to find out in what ranges cache misses are in real.

Greets,

Fred41

Share this post


Link to post
Share on other sites

i follow this trhead for to long...you guys can't do nothing to fix the game,nothing,so stop wasting your time to do something,to try to do something.The engine is the problem they will never fix cuz they can't fix.This problem was spotted since arma was maked.I deleted arma 3, days ago,can't play any map,or mod,except for wasteland and altis-strasi life(it's funny how players use this game in a wrong way,i mean buy arma only for mod).Sorry for my english but english is not my motherlanguage,hope you guys stop talking about arma because,if somebody,or the develovepers,will make a really fps improvement,only that day you can say the time you spent here isn't wasted at all.

P.S nobody will give me back my money,because i spent my money for an unplayable game.Thanks ;) they make a fantastic game,but with a big problem,no fps,no party.Why dayz work better than arma?arma is out of beta since 1 year maybe,dayz is a pre-alpha,and i get arround 70-80 fps on a pre-alpha game,and i'm not talking a simply game,i'm talking about a very hard game with a lot of stuff,bugs ecc...

So why they can make his game better,and arma 3 team can't do the seam?they stopped making patch,i mean,the last huge patch was on 30 of apryl...then they make another patch,if we can call it patch,of 34mb,that "improve"mp...34 mb of improvement?okok..than the best part,the kart!because the game is perfect so they can use theyr time to make useless stuff(for me,it's my opinion,i know it's a different team okok)than since 30 of apryl i never seen another big patch.

Anyway,have a good day

Edited by ziffa2

Share this post


Link to post
Share on other sites

Bohemia Provides us the tools to make excellent missions whether we want to use them is our choice. Just a few examples include.

ArmA 3's buildin Garbage collection function BIS_fnc_GC; AI-render distance management etc. People are expecting too much from the game in "giant AI fights" better management of a mission is the key better design strategies is the key. And avoidance of things like functionception. With over 800 lines of code per function.

On a side note I found the data needed fred41 it appears while ArmA 3 is running it's sort of random how often the cache is flushed. Anywhere from 2 cache flushes to 90 cache flushes, per second. Windows comes with an extensive performance collection toolkit "Performance Monitor"

http://imgur.com/QwOotaz

Edited by Polymath820
Added Cache Flush chart

Share this post


Link to post
Share on other sites
My Rig: Click

Latest Benchmark-Results (GeForce v337.88 | Dev.-Build: v1.21.124.754):

Tools used: MSI-Afterburner v3.0.0.2384 Final | HWiNFO64 v4.39-2215 Beta

Low + Disabled:

============

Stratis = 135fps

CPU-Load: max = 88.6% | 36.9% | 55.7% | 78.7% | 46.7% | 78.6% | 61.1% | 86.9%

GPU(s)-Load: ~50-55%

Altis = 107fps

CPU-Load: max = 86.9% | 32.1% | 53.6% | 73.5% | 56.1% | 40.5% | 68.8% | 37.4%

GPU(s)-Load: ~45-50%

Low:

====

Stratis = 116fps

CPU-Load: max = 86.4% | 65.1% | 51.5% | 52.4% | 49.1% | 32.0% | 71.2% | 37.1%

GPU(s)-Load: ~50-55%

Altis = 99fps

CPU-Load: max = 89.3% | 44.9% | 55.2% | 44.7% | 47.5% | 34.2% | 60.3% | 42.0%

GPU(s)-Load: ~45-55%

Standard:

=======

Stratis = 86fps

CPU-Load: max = 84.4% | 50.9% | 67.3% | 50.4% | 53.1% | 44.5% | 52.4% | 45.7%

GPU(s)-Load: ~50-60%

Altis = 76fps

CPU-Load: max = 81.4% | 43.4% | 54.4% | 44.0% | 90.8% | 35.9% | 64.4% | 42.0%

GPU(s)-Load: ~45-55%

High:

====

Stratis = 64fps

CPU-Load: max = 80.8% | 41.3% | 53.1% | 70.0% | 45.4% | 38.8% | 68.2% | 37.4%

GPU(s)-Load: ~60-70%

Altis = 61fps

CPU-Load: max = 81.3% | 32.8% | 80.0% | 32.5% | 44.2% | 43.3% | 89.9% | 40.3%

GPU(s)-Load: ~50-70%

Very-High:

========

Stratis = 50fps

CPU-Load: max = 80.4% | 37.8% | 52.0% | 40.1% | 72.5% | 41.9% | 48.6% | 67.5%

GPU(s)-Load: ~60-70%

Altis = 46fps

CPU-Load: max = 89.5% | 38.4% | 63.8% | 63.5% | 47.0% | 35.4% | 82.7% | 45.1%

GPU(s)-Load: ~50-70%

Ultra:

====

Stratis = 39fps

CPU-Load: max = 79.3% | 37.4% | 54.5% | 44.9% | 47.9% | 36.3% | 94.0% | 41.7%

GPU(s)-Load: ~50-80%

Altis = 37fps

CPU-Load: max = 82.1% | 56.8% | 51.8% | 41.7% | 91.4% | 33.9% | 82.7% | 40.9%

GPU(s)-Load: ~40-75%

Maxed-Out:

=========

Stratis = 29fps

CPU-Load: max = 77.2% | 48.1% | 52.9% | 47.7% | 45.9% | 43.2% | 55.2% | 92.2%

GPU(s)-Load: ~40-70%

Altis = 13fps

CPU-Load: max = 79.4% | 35.6% | 66.2% | 75.5% | 41.2% | 36.8% | 78.9% | 33.0%

GPU(s)-Load: ~35-65%

The total CPU-Load / CPU-Usage never reached 50%.

:)

Update:

======

Latest Benchmark-Results (GeForce v337.88 | Dev.-Build: v1.21.124.861):

Tools used: MSI-Afterburner v3.0.1 Beta | HWiNFO64 v4.39-2220 Beta

Changed some start-up parameters:

Old:

===

-cpuCount=4 -exThreads=7 -maxMem=2047 -maxVram=2047 -noLogs -noSplash -malloc=tbb4malloc_bi

New:

====

-cpuCount=4 -exThreads=7 -maxMem=15359 -maxVram=2047 -noLogs -noSplash -malloc=tbbmalloc -enableHT

I used Fred41's malloc and with the other changes above I got between around -3.5% to +11.0% less / more fps.

I got the biggest fps-gain on high-settings and on lowest + disabled and max. settings I actually lost some fps.

The max CPU-Load increased by ~5%, but the total CPU-Load still didn't reach more than 44.4%.

Maybe someone with a bit more time can play around with my changes a bit and can maybe conclude where the increase / decrease of fps comes from.

:)

Edited by TONSCHUH
typo

Share this post


Link to post
Share on other sites

Maybe just a typo in the post but -enable HT should be -enableHT.

/KC

Share this post


Link to post
Share on other sites

'-enableHT' is probably right, but in this case, I think that the '-cpuCount=4' parameter is working.

Otherwise, if I understand well, '-enableHT' will overwrite '-cpuCount=4'

As I have said elsewhere, I think that's it's time now to clean up the Arma3 Startup parameters Wiki page of all the Arma/Arma2 vs WindowsXP tweaks.

I hope some among BIS Dev and enlightened community members will help to bring some changes and improvements in this area.

Share this post


Link to post
Share on other sites
'-enableHT' is probably right, but in this case, I think that the '-cpuCount=4' parameter is working.

Otherwise, if I understand well, '-enableHT' will overwrite '-cpuCount=4'

As I have said elsewhere, I think that's it's time now to clean up the Arma3 Startup parameters Wiki page of all the Arma/Arma2 vs WindowsXP tweaks.

I hope some among BIS Dev and enlightened community members will help to bring some changes and improvements in this area.

I wouldn't expect the -cpuCount parameter to affect much; at most, I'd imagine that there would just be more cores waiting for memory accesses, unless your CPU is REALLY old.

[quote name=Polymath820;2705543

And memory addresing what do you mean by that? The de-allocation / re-allocation of memory? Or hardware related issues as well? Such as the type of memory addressing a CPU within the cache uses e.g full-associative' date=' set-associative or direct mappped? Or the primary motherboard mounted DIMM chips which run exceptionally slow and have an accessing only 1% of the time for all the L1,L2 and L3 caches? That as far as I know from a book that I am reading that is what happens "Upgrading and Repairing PC's 21st edition by Scott Mueller"

Deallocation and reallocation, yes. Here's the top answer in the link I posted about how to optimize your game:

"Optimise your data layout! (This applies to more languages than just C++)

You can go pretty deep making this specifically tuned for your data, your processor, handling multi-core nicely, etc. But the basic concept is this:

When you are processing things in a tight loop, you want to make the data for each iteration as small as possible, and as close together as possible in memory. That means the ideal is an array or vector of objects (not pointers) that contain only the data necessary for the calculation.

This way, when the CPU fetches the data for the first iteration of your loop, the next several iterations worth of data will get loaded into the cache with it.

Really the CPU is fast and the compiler is good. There's not really much you can do with using fewer and faster instructions. Cache coherence is where it's at (that's a random article I Googled - it contains a good example of getting cache coherency for an algorithm that doesn't simply run through data linearly)."

Imagine this: You have some refresh of an element every miniscule amount of time. Within this refresh, there are possibly multiple loops (for, while, etc). Within each of the loops, you refer to a pointer for an object you created, which is stored somewhere in virtual memory (which gets translated into physical memory), not necessary related to the location of other things that are accessed near it.

This gets expensive FAST. This is what I was talking about regarding big O analysis: if you have like 5 nested loops in a piece of logic (O=n^5), each of which access something in memory, and the entire piece of logic is part of a function that refreshes at very high rate, these accesses are going to start to bottleneck everything else in the system.

EDIT: Sorry if I forgot to clarify some stuff. When you malloc something (whether it would be an object that you need, or a buffer for input, etc), it gets translated to a virtual address. This address is NOT the address in your physical memory (RAM). The virtual address needs to be translated into a physical address. This uses page tables; the page table's address goes through a translation procedure (just an algorithm that looks at each byte block of an address and where it corresponds to in physical memory). If the translation of the address of an entry inside a page table points to somewhere not in physical memory (RAM), it raises a page fault, and then the OS has to go looking on the hard drive for it.

Edited by ruhtraeel

Share this post


Link to post
Share on other sites
I wouldn't expect the -cpuCount parameter to affect much; at most, I'd imagine that there would just be more cores waiting for memory accesses, unless your CPU is REALLY old.

Deallocation and reallocation, yes. Here's the top answer in the link I posted about how to optimize your game:

Imagine this: You have some refresh of an element every miniscule amount of time. Within this refresh, there are possibly multiple loops (for, while, etc). Within each of the loops, you refer to a pointer for an object you created, which is stored somewhere in virtual memory (which gets translated into physical memory), not necessary related to the location of other things that are accessed near it.

This gets expensive FAST. This is what I was talking about regarding big O analysis: if you have like 5 nested loops in a piece of logic, each of which access something in memory, and the entire piece of logic is part of a function that refreshes at very high rate, these accesses are going to start to bottleneck everything else in the system.

Fred41's large pages malloc certainly seems to result in higher, more stable FPS, I am using it to good effect in Arma 2.

Share this post


Link to post
Share on other sites
Fred41's large pages malloc certainly seems to result in higher, more stable FPS, I am using it to good effect in Arma 2.

TBH, when someone mentioned a "malloc" fix, I was instantly intrigued; if it was what I was expecting, then it could truly lead to good FPS gains.

Good job Fred41; people like you help keep games going. I am going to look into this "large page malloc" more once I get off work.

http://forums.bistudio.com/showthread.php?178133-Speeding-up-time-in-game-magically-makes-those-graphic-fps-issues-no-problem/page5

It turns out ruhtraeal that we have already hit Amadahls law from what Mike says in that page.

As ArmA 3's primary thread loop begins to become bogged down because it is waiting for sub-loops (AI)

and vice versa to catchup. Thereby forcing the rest of the threads to throttle-back.

According to a discussion I had with dwarden.

ArmA 3 benefits the most from not a "total overclock" but a singular core overclock.

This means if you have the fastest core running ArmA 3's primary thread which

I hope if you overclocked it ArmA 3 would just put the main thread

on it. So in theory the rest of the game shouldn't slow down.

Also, I forgot to clarify one more thing: Amdahl's law is the absolute limit for performance gain. This would happen only if every single process/thread gets to a point of execution where a mutex (mutual exclusion) is needed to prevent race conditions, or else weird stuff like Heisenbugs start appearing. I wouldn't say that we've hit the absolute limit of Amdahl's law yet, where every single thread has been optimized to the point where it can't be changed without messing up functionality; just individual conditions where a mutex is needed.

Cache hit/miss is pretty hard to calculate, yes. It's one thing to sit in an Operating Systems course final and then do a replacement algorithm for a string of 8 numbers, but it's another thing to do it for an entire system...

Edited by ruhtraeel

Share this post


Link to post
Share on other sites
TBH, when someone mentioned a "malloc" fix, I was instantly intrigued; if it was what I was expecting, then it could truly lead to good FPS gains.

Good job Fred41; people like you help keep games going. I am going to look into this "large page malloc" more once I get off work.

.

http://forums.bistudio.com/showthread.php?163640-Arma3-and-the-LARGEADDRESSAWARE-flag-(memory-allocation-gt-2GB)

http://forums.bistudio.com/showthread.php?177454-a-simple-registry-tweak-for-increased-performance

these two combined have given me the highest, most stable FPS I've had in Arma (x64, 16GB RAM)

Share this post


Link to post
Share on other sites

Thanks. If this works for me (aside from other things, such as the movement mod that allows jumping and stuff), I know what my next 3 days will be spent on.

I've also thought of an improved interaction system; if the ARMA engine provides a function that spits out the coordinates/direction of your crosshair, perhaps this could be used in accordance with player position to make a standard "e to interact" system.

Edited by ruhtraeel

Share this post


Link to post
Share on other sites

Oddly enough those tweaks did almost nothing for me. Game does feel smoother in general when using Large Pages but I saw zero FPS gain.

Share this post


Link to post
Share on other sites
Oddly enough those tweaks did almost nothing for me. Game does feel smoother in general when using Large Pages but I saw zero FPS gain.

Did you try it like me with the pre-made quality-profiles (except low + disabled) ?

No increase with the high-settings profile at all ?

I did a fresh install of my sys not to long ago, so I didn't expect any big improvements at all, but I got in multiple runs the same results (+/- 1fps).

:confused:

Share this post


Link to post
Share on other sites
I'm sorry. But what are pre-made quality-profiles?

Its in the first Graphics Tab when u open the options. When u klick the field a List will open with "VeryLow,Low,Medium,High,VeryHigh,Ultra"

Share this post


Link to post
Share on other sites
TBH, when someone mentioned a "malloc" fix, I was instantly intrigued; if it was what I was expecting, then it could truly lead to good FPS gains.

Good job Fred41; people like you help keep games going. I am going to look into this "large page malloc" more once I get off work.

Also, I forgot to clarify one more thing: Amdahl's law is the absolute limit for performance gain. This would happen only if every single process/thread gets to a point of execution where a mutex (mutual exclusion) is needed to prevent race conditions, or else weird stuff like Heisenbugs start appearing. I wouldn't say that we've hit the absolute limit of Amdahl's law yet, where every single thread has been optimized to the point where it can't be changed without messing up functionality; just individual conditions where a mutex is needed.

Cache hit/miss is pretty hard to calculate, yes. It's one thing to sit in an Operating Systems course final and then do a replacement algorithm for a string of 8 numbers, but it's another thing to do it for an entire system...

"ArmA 3 benefits the most from not a "total overclock" but a singular core overclock.

This means if you have the fastest core running ArmA 3's primary thread which

I hope if you overclocked it ArmA 3 would just put the main thread

on it. So in theory the rest of the game shouldn't slow down."

I have the software to do that very thing,overclock a single core.Its used by xtreme overclockers(world records).

I did read a post awhile back claiming this did work for increased fps.After getting advice from a overclockers forum and trying slight OC on a single core,I was able to OC a single core.I haven't tried OCing after that as I don't have the right cooling,at minim I need watercooling, recommended liquid nitrogen setup 5MHz and beyond.I have two AMD cpus gathering dust and once I get watercooling Ill give it a go,others might want to try now.

I do see using OSD,cores 4,5,6,7 not doing much,in fact I think their asleep when playing Arma 2/3

The other reason I haven't bothered, is I replaced my 6850s with a R9 290 and spending $500 to get playable fps I find totally unacceptable.I was going to buy one anyway,but still

WARNING,this will fry your cpu if you dont read the instructions and read again,and even then

For those with AMD cpu,just look for "PScheck" and the instructions on how to use it.

Edited by AussieBobby

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×