Jump to content
_blub

xtbbmalloc - A custom memory allocator for A3

Recommended Posts

Hey guys,

 

I made a custom memory allocator for Arma3 and I want to share it with you. Originally I made this for testing purpose only but maybe it can improve performance in some cases.

 

Features

  • Based on Intels tbbmalloc 4.4 Update 5 (https://www.threadingbuildingblocks.org/)
  • Support for large pages
  • Includes some special tweaks which can improve performance in some cases (opt in) (experimental stuff is not included anymore).
  • Interface for reading memory statistics and modifying parameters on the fly
  • Customizable via settings file
  • Source code included. Feel free to modify :)
  • Readme.txt for usage is included but there are also many answers in this thread.

 

Download (based on "tbb44_20160526oss", battleye compatible)

https://dl.dropboxusercontent.com/u/103425066/xtbbmalloc.zip

 

Download (based on "tbb2017_20161128oss", NOT battleye compatible yet...)

https://dl.dropboxusercontent.com/u/103425066/CMA.zip

 

Download 64-bit Version (based on "tbb2017_20161128oss", NOT battleye compatible yet...)

https://dl.dropboxusercontent.com/u/103425066/CMA_2016_12_14.zip

 

Edited by _blub
Added 64-bit version
  • Like 14
  • Thanks 1

Share this post


Link to post
Share on other sites

thanks, I'm sure when some users find time they will do those benchmarks ;)

Share this post


Link to post
Share on other sites

Tried this the other night after months and months of running Fred41s' large page malloc.

 

I didn't do any proper tests and this is was on one of our groups' regluar mission nights so comparisons are not possible. But, it felt like there was a slight regression in fps and I just suspect that it was due to me doing something incorrect with your malloc.

.dll file is in the dll folder, and .ini is in the arma3 root. When testing this I'm not using the patched arma3LP.exe created by his tools, I'm using the regular arma3.exe.

my .ini are as follows (your large page ini without the comments);

[Default]
DebugBreak = 0
UseLargePages = 1
ForceMaxWorkingSet = 0
SeLockMemoryPrivilege = 1
SeIncreaseWorkingSetPrivilege = 0
WritePages = 0
HoldMemory = 0
PreAllocBytes = 0

I have privilege set due to the prior use of Fred41s dll, but I double checked just to make sure.

 

I get no xtbbmalloc.txt in my arma3 and that's why I'm hesitant on stuff actually getting loaded properly. Do you have any pointers you can give me? :)

Share this post


Link to post
Share on other sites

Hi guys,

 

Its not battleye safe, I cant do anything about it.

You need to run Arma without battleye. I forgot to mention this, sorry!

 

@Bamse:

Please look into your .rpt file which is located at ("C:\Users\[YourUserName]\AppData\Local\Arma 3" or simply "%AppData%\..\Local\Arma 3")

There should be a line which starts with "Allocator:". You can see which allocator has been loaded. If it is not xtbbmalloc.dll let me know.

 

Also I did not compile it with special instruction sets (e.g. SSE, AVX) enabled to make it more compatible. I will add some custom versions soon but I dont think they will provide much performance improvements.

Share this post


Link to post
Share on other sites

To get BattlEye whitelisted, just get it blocked once by BattlEye and then send an email with the filename to BattlEye to get it whitelisted.
Don't bother to attach the files/source code etc..

Important:
If BattlEye is doing maintance or the BE Servers overloaded etc, it will by default block the dll, even though its whitelisted.
Its the same method as for getting an extensions whitelisted.
 

Share this post


Link to post
Share on other sites

Thanks torndeco.

I will do that in the future if it turns out that this allocator can be useful :)

Share this post


Link to post
Share on other sites

 

Tried this the other night after months and months of running Fred41s' large page malloc.

 

I didn't do any proper tests and this is was on one of our groups' regluar mission nights so comparisons are not possible. But, it felt like there was a slight regression in fps and I just suspect that it was due to me doing something incorrect with your malloc.

.dll file is in the dll folder, and .ini is in the arma3 root. When testing this I'm not using the patched arma3LP.exe created by his tools, I'm using the regular arma3.exe.

my .ini are as follows (your large page ini without the comments);

[Default]
DebugBreak = 0
UseLargePages = 1
ForceMaxWorkingSet = 0
SeLockMemoryPrivilege = 1
SeIncreaseWorkingSetPrivilege = 0
WritePages = 0
HoldMemory = 0
PreAllocBytes = 0

I have privilege set due to the prior use of Fred41s dll, but I double checked just to make sure.

 

I get no xtbbmalloc.txt in my arma3 and that's why I'm hesitant on stuff actually getting loaded properly. Do you have any pointers you can give me? :)

 

Is Fred41's dll still usable?I thought something happened with the last few game updates?

Share this post


Link to post
Share on other sites

Hiand thanks for the updated tbb4 I am using the armaLP.exe with your allocator. Is that wrong?

 

All tests with arma as admin and Benchmark Altis with my personal graphics settings so it is not comparable to the other Benchmarks where they set certain graphic settings. Did only one run per setting!

 

UseLargePages = 1
SeLockMemoryPrivilege = 1

arma.exe     + xttbmaloc - 65 fps

armaLP.exe + xttbmalloc - 69 fps

 

above +

ForceMaxWorkingSet = 1
SeIncreaseWorkingSetPrivilege = 1

armaLP.exe + xtbbmalloc - 68 fps

 

above +

HoldMemory = 1:

armaLP.exe + xtbbmalloc - 68 fps

 

compared to fred's tbb:

 

armaLP.exe + tbbmalloc - 67 fps

 

armaLP.exe + tbb4malloc_bi - 61 fps

 

System:

 

Intel i7 6700K

Z170 board

AMD R290X graphics card

16GB Ram

 

Maybe the special editions with SSE or so can also spend some extra frames :-)

 

Does anybody know exactly what the armaLP.exe is doing and why also xtbb gains from it?

Share this post


Link to post
Share on other sites

@ineptaphid:

One of the last updates changed the interface for custom allocators. There are now 3 additional functions required for aligned allocations. Those are not included in fred41s allocator.

 

@pestbeule:

I have no idea what armaLP.exe is :). Is it part of the performance build? I have never seen it before.

  • Like 1

Share this post


Link to post
Share on other sites

ArmaLP.exe is a patched version of the arma executable to make it Large-Page compatible (Fred provided a tool to recreate it after after each update). I honestly wasn't sure if it's still required (somebody said not, and I supposed it's possible BIS re-enabled compatibility by default, as I think was once the case) but I figured it can't do any harm and continue to run the tool.

 

As I read pestbeule's results some small but useful increase is to be had - well done!

Share this post


Link to post
Share on other sites
blub: I confirmed in the rpt that I get the correct dll loaded (as per "Allocator: E:\Steam\steamapps\common\Arma 3\dll\xtbbmalloc.dll" in my rpt).

 

I still do wonder if it gets it's magic done properly tho, it almost seems like it doesn't allocate large pages for me perhaps? The reason for saying this is mainly due to textures loading noticeably faster with your malloc compared to fred41's, but once loaded freds malloc give me a bit more fps in according to the following;

I ran YAAB four times in a row. 2 with the original arma3.exe, once with xtbb, once with fred41s old malloc. Then I ran it twice again with the patched arma3LP.exe, once with xtbb, once with freds.

Here are my results, last run at the top and in descending order (pardon my spelling error in the first run): http://images.akamai.steamusercontent.com/ugc/257084746516393851/9BAB3961C785FF1A8C4F9D763FFDF1B4607D96E8/

 

So in short. arma3LP.exe doesn't do squat for me when running YAAB with any combination, but fred41s malloc tend to add be a few more of those precious fps-thingies :)

Share this post


Link to post
Share on other sites

Hey guys,

 

first about fred41s tool (thanks derfunkt for the link). It sets the registry key HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\Arma3.exe\UseLargePages to 1 to ask the application loader to allocate large pages for the arma3.exe image file.

There is not much documentation about this registry key. On msdn it says about UseLargePages: "Load image using large pages if possible". I guess this is just a "hint". Not a rule. Also I dont think this should provide much performance improvements (if it works at all).

 

@Bamse:

So xtbbmalloc.dll gets loaded but there is still no xtbbmalloc.txt in your Arma3 directory? How do you start the game and with which parameters?

Share this post


Link to post
Share on other sites

That question got me digging a bit more. I use Arma 3 Sync to launch the game and in a3s' installation directory I found xtbbmalloc.txt. The thought of checking there didnt even cross my mind before since nothing else has put files in there before, and indeed it didn't use large pages as I guestimated :D

1 5 : 0 6 : 3 1 : 
I n i t
C o n f i g : 
U s e r   i s   a d m i n :   Y e s 
U s e L a r g e P a g e s :   N o 
S e L o c k M e m o r y P r i v i l e g e :   N o 
S e I n c r e a s e W o r k i n g S e t P r i v i l e g e :   N o 
F o r c e M a x W o r k i n g S e t :   N o 
W r i t e P a g e s :   N o 
H o l d M e m o r y :   N o 
P r e A l l o c B y t e s :   0 . 0 0 0 
E x t 0 S e t P r i v i l e g e s :   0 ,   E r r o r :   0 
U s i n g   n o r m a l   p a g e s 
  
1 5 : 0 6 : 3 1 : 
I n i t   d o n e 

I moved xtbbmalloc.ini to my A3S install directory and voila! Large Pages now in use. Running YAAB later on tonight :)

Thank you so much for the nudge in the correct direction!

 

EDIT: Running YAAB again I'm getting +- 0.5 fps between with fred41's and xtbb so it's well withing standard deviation. Although I've been running freds malloc in an almost religious manner for many many many months I'd rather run a malloc that is as good and in active development/running regular updates. Consider me a convert! :D

Share this post


Link to post
Share on other sites

It sets the registry key HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\Arma3.exe\UseLargePages to 1 to ask the application loader to allocate large pages for the arma3.exe image file.

There is not much documentation about this registry key. On msdn it says about UseLargePages: "Load image using large pages if possible". I guess this is just a "hint". Not a rule. Also I dont think this should provide much performance improvements (if it works at all).

It's also supposed to clear (in the patched copy) the IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE flag from the PE header. Not sure to what end, as I read it this stops the image from being relocatable. Perhaps this is intended to save on call indirection or address arithmetic.

Share this post


Link to post
Share on other sites

so if I'm not mistaken the only difference to the 'non-public' latest Fred41 malloc vs this XT alloc are:

- no ability set region sizes (16-256MB)

- some never explained tweaks (I guess Torndeco or someone similar may know more details what was done there)

- some extra logging

(are the preallocatedbytes same as the MaxLargePagePrealloc in fred41's allocator (0-3200MB range) or different ?)

note:

- Arma 3 uses SSE2 too (all used CPUs for Arma 3 (from minimum up) support SSE2))

- for sometime the default TBB4 allocator is compiled to utilize SSE2

your allocator shall too be compiled/utilize SSE2 (Torndeco might have some more details on possible tweaks)

AVX, AVX2 shall be optional as not all CPUs support those ...

non-SSE/SSE1 at max. for stability testing or w/e old CPUs (tho I don't think anyone plays Arma 3 on Pentium 2, Athlon XP, Duron)

also something to think about (as another option):

- there exists also combination of AWE (which allows access >4GB for 32bit applications) + private allocations

yes, AWE mechanism is still present on 64 bit platforms and could be useful for 32bit applications (in specific cases)

Share this post


Link to post
Share on other sites

@Bamse:

Good to see its working :)

 

@defunkt:

Yeah saw that "feature" too. You are right, if you clear that flag, the image can not be relocated anymore. But there should not be a noticeable performance impact. So I am not sure what his idea was...

 

@dwarden:

Sorry I dont know fred41s non public release so I cant tell you all differences.

I am not sure what you mean with "ability to set region sizes".

Also I dont know what "MaxLargePagePrealloc" does. "preallocatedbytes" in xtbbmalloc is just allocating X bytes when loading. The idea is that you combine it with "HoldMemory". So you would allocate some memory at loading and hold it. When the game later needs some memory, it doesnt need to be allocated (which could take a little) because it is already there. As I said (hopefully) in the readme file. This is experimental. I wanted to see how it affects performance. But I didnt do any tests yet...mostly because I have no idea how to do a good benchmark. YAAB is not bad. But it is still dynamic and every time a little different :)

 

The uploaded xtbbmalloc is compiled with SSE2 (which is default). I just forgot about that. I can compile it additionally for AVX2 but I dont think it will produce much different code...

 

I know about AWE but I guess it needs to be supported by the game engine.

 

Just another little hint which I should have included in the first post: When you use large pages make sure that you have lots of free memory and/or your PC is not running for long time. Otherwise it can take a long time for your OS to find memory regions which are suitable for large pages :)

Share this post


Link to post
Share on other sites

xtbbmalloc compiled for AVX2: https://dl.dropboxusercontent.com/u/103425066/xtbbmalloc%20AVX2.7z

No idea if AVX2 is even used but make sure your CPU can handle it (Haswell or newer for Intel, no idea about AMD). I dont think you will notice any difference but let the placebo effect do his job :)

 

 

 

Share this post


Link to post
Share on other sites

it might be also good to put up AVX build as only last several generations of CPU support AVX2 ...

also as the arch flag don't have 4.x and AMD supports AVX too it's more 'wide' range of CPU choice

Share this post


Link to post
Share on other sites

xtbbmalloc compiled for AVX2: https://dl.dropboxusercontent.com/u/103425066/xtbbmalloc%20AVX2.7z

No idea if AVX2 is even used but make sure your CPU can handle it (Haswell or newer for Intel, no idea about AMD). I dont think you will notice any difference but let the placebo effect do his job :)

 

Thanks alot for the special build. Did a quick re-test - As you said ... No FPS incease at all. So I don't think an AVX1 build will do the trick...

Share this post


Link to post
Share on other sites

Little update:

v1.0.2: https://dl.dropboxusercontent.com/u/103425066/xtbbmalloc/xtbbmalloc%201.0.2.7z

 

Added some statistics and a new experimental configuration: "LockPages" (Check readme)

Custom builds for SSE2, AVX, AVX2, with and without statistics enabled.

Also added a demo mission which shows how to query and show xtbbmalloc´s statistics ingame.

 

Did also some YAABs. I think I did not see the same battle twice :P

So dont believe too much in those results...

 

Used Dev build from 30.06.2016, no SDD

1) First run to allow Windows to cache some files... FPS: Did not check

2) tbbmalloc_bi (default allocator) @35.1 FPS

3) xtbb(+stats, LockPages) @36.5

4) xtbb(+stats, LargePages) @ 39.3

5) xtbb(+stats) @34.7

6) xtbb(no stats) @ 35.1

7) xtbb(no stats, LargePages) @37.0

8) xtbb(no stats, LockPages) @ 34.5

 

As you can see this benchmark is not really good for such tests :)

 

Please let me know if you have ideas for some tweaks or if you want something to be included.

  • Like 1

Share this post


Link to post
Share on other sites

blub: yeah, that's what I meant with getting as repeatable numbers as one can get with my previous post(s). Arma AI is Arma AI and will do whatever it wants, at all times ... no matter how hard you yell at it :)

Running a few in a row still gets you a ball park idea tho and all my results have been very comparable even tho FPS varies, the difference in percent shown is pretty darn close to identical when compared even if you do ten runs with your malloc and then ten runs with BI.

And again, this on my computer and YMMV as always.

 

Trying the new version out tonight! :)

  • Like 1

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×