Jump to content
GoldJohnKing

Gold's CMA: The first ever port of Microsoft's mimalloc and Epic's rpmalloc, and a new version of Intel's tbbmalloc

Recommended Posts

--- === Updated 1 (2022-05-05) === ---

 

Happy Labour Day!

 

Half a year has been passed since the first published version of Gold's CMA, and now here's some news about it.

 

 

Microsoft's mimalloc

 

https://github.com/GoldJohnKing/mimalloc/releases

 

It has been updated to v2.0.6, following upstream, though Microsoft's update mainly focus on macOS related function.

 

Besides, I have refactored many part of the malloc to make it more robust and more efficient, as well as provided more variants to let you choose.

 

mimalloc-v206.dll is for general use, if you do not know which one to choose, this is always the recommendation.

 

If you have large memory and always have more than 8 GB consecutive free memory space when you run Arma 3, you may try mimalloc-v206-lock-pages.dll, as it would lock several consecutive 1 GB large pages in your physical memory and prevent it from being swapped into virtual memory. However, based on various tests and players' feedback, this may only provide minor performance gain, while bring side-effects in some circumstances. I would not recommend to use this variant personally, but it does do better works sometimes.

 

If you have large memory and do not want to use the "lock-pages" variant, or you frequently encounter the "STATUS_ACCESS_VIOLATION" crash, you may want to try mimalloc-v206-no-collect.dll. This variant would omit all memory flush/free requests from Arma 3, keeping the memory blocks managed by mimalloc internally, which may greatly reduce the probability of the crash. However, this crash is usually caused by hardware issue, you shall always check if your PC is too overclocked or too old.

 

If you have small memory, I recommend using mimalloc-v206-scheduled-collect.dll. This variant would collect and flush unused memory every 5 minutes, which may keep the memory less fragmented and reduce memory use. However, this is not without risks. As I said above, if you frequently encounter the "STATUS_ACCESS_VIOLATION" crash, you should fallback to general one or "no-collect" variant.

 

 

Epic's rpmalloc

 

https://github.com/GoldJohnKing/rpmalloc/releases

 

Yes! Here's a new memory allocator ported to Arma series! But mainly as a supplement to mimalloc and tbbmalloc.

 

rpmalloc is designed to be lock free. It is lightweight that it does not have as many overheads as mimalloc or tbbmalloc does.

 

 

Intel's tbbmalloc

 

https://github.com/GoldJohnKing/oneTBB/releases

 

Intel is not updating their tbbmalloc that often, and it's not as feature rich as mimalloc, so It's still the same.

 

 

About Large Pages / Huge Pages / Lock pages

 

You need to first guarantee Lock Page privilege to your user account and reboot your system, then set arma3_x64.exe to always rub with Administrator rights.

 

Only then can the memory allocator utilize huge page support and bring the performance boost as expected.

 

You may refer to this article by Microsoft:

 

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/enable-the-lock-pages-in-memory-option-windows

 

 

About AVX, AVX2 and AVX512

 

Through various tests on different machines and by digging deeper into the memory allocator itself, it is now sure that the performance improvement mainly comes from huge pages support.

 

The AVX, AVX2 or AVX512 instruction set does not bring any difference at all, because the memory allocator does not utilize it at all.

 

Yes, I am pretty sure now, AVX is mainly a placebo. Sad.

 

If you are familiar with AVX, please correct me, or tell me more about it.

 

 

About BattlEye

 

It has been reported by some users that some versions of my memory allocators has been whitelisted by BattlEye, but I have not confirmed it yet.

 

If you can confirm which file(s) has been whitelisted by BattlEye, feel free to tell about it!

 

 

--- === Original Contents Below (2021-12-25) === ---

 

Merry Xmax!

 

This is GoldJohnKing, a newbie C++ programer, a probably well-known Arma 3 server administrator, and a community developer in China Mainland.

 

I recently dived into something about memory allocator at work, and comes up the idea of porting some well-known memory allocator to Arma Series.

 

And ofc I finally did it! 🤣

 

Here I present you two new CMA for both Arma 3 and Arma 2 Operation Arrowhead, with a potential 5%-10% performance gain than vanilla TBBv4, tested by myself and can be replicated by some players in Arma 3 Discord.

 

These ports are open sourced and binary files are provided as well, I really hope you enjoy it!

 

Note: The binary files on Github Release pages may not always up-to-date with the newest source code. If you know how to compile, I strongly recommend you compile it by yourself.

 

 

Microsoft's mimalloc

 

The first one is Microsoft's mimalloc, which is currently used in Death Stranding, and I highly recommend you try this out first!

 

The port is based on v2.0.3, download link and source code are provided below:

https://github.com/GoldJohnKing/mimalloc/releases

 

As I said in the above release page, mimalloc utilizes (reserved) huge pages, and may give a performance boost of 5-15% than vanilla TBB4 memory allocator, depends on your PC's specifications and actual scenarios.

 

mimalloc.dll is for players with large free memory, as it would lock pages to reduce memory reallocation and fragmentation.

 

If you have a small memory, or you need to run programs alongside with Arma 3 which would take large memory consumptions, or you get conspicuous performance degrades when using mimalloc.dll, then use mimalloc_without_reserved_huge_pages.dll instead. It may better handle memory consumption and help to prevent performance degrade at a long run as well.

 

You can read Microsoft's repo to see what feature mimalloc provides.

 

And this should be the first ever port of Microsoft's mimalloc to Arma Series!

 

 

Intel's tbbmalloc

 

The second one is Intel's tbbmalloc. It is part of Intel's oneAPI Threading Building Blocks project. Arma 3 officially used TBBv4 version 2017U3, it's old and has some potential bugs (for example, the well-known 0xC0000005 STATUS_ACCESS_VIOLATION).

 

Blud and some other people have ported one for arma, known as CMA and xarmalloc or something. Their source code is where I start: without their work, I wont be able to know how to port one for Arma, thanks for their work!

 

However, thier version contains many codes for debug and performance stats purpose, as well as some actually not required codes, which would cause overhead under some situations. Besides, their version has been reported of having memory leak and crash in the long run.

 

My version is based on v2021.5.0, and has been tested for continuously long time run without conspicuous issues, download link and source code are provided below:

https://github.com/GoldJohnKing/oneTBB/releases

 

As I said in the above release page, it only utilizes huge pages of the OS, which may give a performance boost of 5-10% than vanilla TBB4 memory allocator, depends on your PC's specifications and actual scenarios.

 

However, it is not feature-rich than Microsoft's mimalloc. It may not handle memory fragmentation at a long run as Microsoft's mimalloc does.

 

So I suggest you try our both, and select the one suits you, if mimalloc has some problems or you would like to find out yourself.

 

 

Some notes about findings on creating CMA has been posted on BI Wiki, see:

https://community.bistudio.com/wiki/Arma_3:_Custom_Memory_Allocator

 

 

Feedback and code reviews are highly welcomed, no matter if you are a BI staffs senior programmer, community developer, server admin or just a normal player.

 

And if you can read Chinese/Mandarin, you can watch my video about it on Bilibili:

https://www.bilibili.com/video/BV1GZ4y1X752/

https://www.bilibili.com/video/BV1EL4y1n7f2/

https://www.bilibili.com/video/BV1ar4y1S7Jo/

https://www.bilibili.com/video/BV1EM4y1F7DA/

 

 

Thanks the community! Thanks BI! Love ya all! 🤗

 

Edited by GoldJohnKing
2022-05-05
  • Like 7
  • Thanks 2

Share this post


Link to post
Share on other sites

 

25 minutes ago, Valken said:

Merry Christmas! Are these 64-bit aware? 

These are 64-bit only, using AVX instruction set.

  • Like 1

Share this post


Link to post
Share on other sites
15 hours ago, vengeance1 said:

ah...ok...a little ignorant here but what am I suppose to do with this dll file?

You may follow this video. 

 

Share this post


Link to post
Share on other sites

Edit: more reliable results in another post below

I've just done a quick round of YAAB testing. FPS results are average of 3 runs.

  • 56.4 FPS  Intel tbbmalloc   (v2021.5.0, this thread)   
  • 54.9 FPS  Intel TBB 4  (BI default)
  • 54.2 FPS  Microsoft's mimalloc   (v2.0.3, this thread)
  • 53.6 FPS  JEMalloc
  • 51.8 FPS  System

My specs:

AMD Ryzen 9 5900X, 2x16GB DDR4 (16-16-16-36, 3600MHz, 2T), Nvidia 970

 

Edited by ceeeb
  • Haha 1

Share this post


Link to post
Share on other sites
1 hour ago, ceeeb said:

I've just done a quick round of YAAB testing. FPS results are average of 3 runs.

  • 56.4 FPS  Intel tbbmalloc   (v2021.5.0, this thread)   
  • 54.9 FPS  Intel TBB 4  (BI default)
  • 54.2 FPS  Microsoft's mimalloc   (v2.0.3, this thread)
  • 53.6 FPS  JEMalloc
  • 51.8 FPS  System

My specs:

AMD Ryzen 9 5900X, 2x16GB DDR4 (16-16-16-36, 3600MHz, 2T), Nvidia 970

 

Thanks for your test! I'm still working on mimalloc to make it same or better performance than tbb v2021.5.0!

  • Like 1

Share this post


Link to post
Share on other sites

FWIW, I'm now testing different startup parameters since this is the first time I've played Arma on this PC. With hyperthreading and CPU count left off/blank, mimalloc is fastest, then your TBB.

I think YAAB results are too variable to read much into the results of only 3 runs... I'll run a few more now

 

Share this post


Link to post
Share on other sites
2 hours ago, ceeeb said:

FWIW, I'm now testing different startup parameters since this is the first time I've played Arma on this PC. With hyperthreading and CPU count left off/blank, mimalloc is fastest, then your TBB.

I think YAAB results are too variable to read much into the results of only 3 runs... I'll run a few more now

 

 

Have you enabled "Lock Pages" priviledge for your system account in gpedit.msc? If not, tbb, mi and BI default would have the same performance.

 

Share this post


Link to post
Share on other sites

Ok, I've done some more thorough tests using the best 3 allocators. Nice work! Results are from 10x rounds of YAAB. I also watched 'Don't Look Up' 🙂 

  • 58.9 FPS  Microsoft mimalloc (v2.0.3, this thread) locked pages (=baseline + 9.3%)
  • 58.7 FPS  Intel tbbmalloc  (v2021.5.0, this thread) locked pages (=baseline + 8.8%) 
  • 56.0 FPS  Microsoft mimalloc (v2.0.3, this thread) (=baseline + 3.9%)
  • 55.6 FPS  Intel TBB 4 (BI default) locked pages (=baseline +3.1%)
  • 54.4 FPS  Intel tbbmalloc  (v2021.5.0, this thread) (=baseline +1.0%)
  • 53.9 FPS  Intel TBB 4 (BI default) (=baseline)

 

 

  • Like 1
  • Thanks 2

Share this post


Link to post
Share on other sites

So I've been tinkering with ways to optimize the binaries during compilation in VS2022; Mainly by compiling in different versions of AVX, depending on the capabilities of the PC of which friend I'm compiling for, and adding the "/Ob3" option. However I tried to investigate the possibility of any performance gains from profile-guided optimization and it seems Arma 3 refuses to load the malloc and reverts to BI's TBB4 when I compile it with "/GENPROFILE" in the linker's command line. Can a BI dev shed some light on this? Is this a deliberate limitation of the game engine for security reasons or something?

Btw, what are the expected gains from using AVX-512? I have a friend I play with who has a 12th gen i7 and I'd like to know what kind of performance he can expect if he enables AVX-512 in his BIOS.

Share this post


Link to post
Share on other sites
On 4/24/2022 at 10:41 PM, Drift_91 said:

So I've been tinkering with ways to optimize the binaries during compilation in VS2022; Mainly by compiling in different versions of AVX, depending on the capabilities of the PC of which friend I'm compiling for, and adding the "/Ob3" option. However I tried to investigate the possibility of any performance gains from profile-guided optimization and it seems Arma 3 refuses to load the malloc and reverts to BI's TBB4 when I compile it with "/GENPROFILE" in the linker's command line. Can a BI dev shed some light on this? Is this a deliberate limitation of the game engine for security reasons or something?

Btw, what are the expected gains from using AVX-512? I have a friend I play with who has a 12th gen i7 and I'd like to know what kind of performance he can expect if he enables AVX-512 in his BIOS.

 

Hi, I am pretty sure AVX does not bring any performance gain at all: the memory allocator does not utilize it.

 

The only reason that Arma fallback to default memory allocator is, as far as I know, it failed to load the DLL, or it can not find the required export functions (aka. Arma CMA's API).

 

Besides checking your code, some compiler options can also cause DLL load or function export issues. You may want to start from here.

 

Share this post


Link to post
Share on other sites
On 5/5/2022 at 1:03 AM, GoldJohnKing said:

 

Hi, I am pretty sure AVX does not bring any performance gain at all: the memory allocator does not utilize it.

 

The only reason that Arma fallback to default memory allocator is, as far as I know, it failed to load the DLL, or it can not find the required export functions (aka. Arma CMA's API).

 

Besides checking your code, some compiler options can also cause DLL load or function export issues. You may want to start from here.

 

 

AVX being Advanced Vector Instructions would not bring any performance gains to memory allocation and/or management. The instructions are (in general) designed to improve performance of SIMD (Single Instruction Multiple Data) type operations, such as performing matrix math on a large set of objects. Memory allocation and management does not match those usage characteristics.

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×