humvee28 10 Posted November 1, 2011 (edited) Tested only with Shadows off. Sorry for the misleading Text. Benchmark Post updated. Edited November 1, 2011 by Humvee28 Share this post Link to post Share on other sites
On_Sabbatical 11 Posted November 1, 2011 I think i will use windows allocator,it's giving me the best performance and is stable. With TBB4 im getting sometimes few seconds of bad fps then it comes back to normal... and the overall performance is not convincing. Any suggestions or hints ? i'm interested in using these allocators Share this post Link to post Share on other sites
maddogx 13 Posted November 1, 2011 I had a go at playing the Harvest Red mission yesterday (Chernogorsk), using the latest beta and the TBB4 malloc... and was completely blown away. Performance is infinitely better than the last time I tried this particular mission (couple of months ago), and I am finally able to increase model detail to high with negligable FPS loss. Framerates in the mission are now consistently above 30 (usually around 40-50) and even when they do drop to the low 30s it still feels smooth. Prior to these betas, my performance in this particular mission left a lot to be desired, with framerates often dropping into the 20s and feeling quite stuttery. Share this post Link to post Share on other sites
sickboy 13 Posted November 1, 2011 (edited) Sweet - as general performance improvement indication - but I think it's not a fair comparison to compare to results from months ago in light of memory allocator. At least IMO it says nothing about the memory allocator, just that between build X (months ago) and build Y (current build) improvements have been made, which could also be system/setting etc related? I suppose for memory allocator benching it would be great to compare various memory allocators, and compare with Beta version from before the adjusted memory allocators. Edited November 1, 2011 by Sickboy Share this post Link to post Share on other sites
suma 8 Posted November 1, 2011 Note: the allocators are now available with full source code in the Community Wiki, each with a corresponding license. Beware: The allocators other than TBB 4 were not updated for quite some time and would perhaps benefit in bringing upto a more recent version. If anyone wants to do this, I recommend to first compare each version provided by us against the version it is based upon to see what customizations and fixes were made in it by us, as it is possible they will need to be recreated in the recent version as well. Share this post Link to post Share on other sites
maddogx 13 Posted November 1, 2011 Sweet - as general performance improvement indication - but I think it's not a fair comparison to compare to results from months ago in light of memory allocator.At least IMO it says nothing about the memory allocator, just that between build X (months ago) and build Y (current build) improvements have been made, which could also be system/setting etc related? I suppose for memory allocator benching it would be great to compare various memory allocators, and compare with Beta version from before the adjusted memory allocators. My post was really just meant as a subjective comment on the overall performance improvements I'm seeing, not comparing any memory allocators as such. :) I'll leave the scientific testing until someone defines a good process to follow (ideally automated). ;) Share this post Link to post Share on other sites
Guess Who 10 Posted November 1, 2011 Sweet - as general performance improvement indication - but I think it's not a fair comparison to compare to results from months ago in light of memory allocator.At least IMO it says nothing about the memory allocator, just that between build X (months ago) and build Y (current build) improvements have been made, which could also be system/setting etc related? I suppose for memory allocator benching it would be great to compare various memory allocators, and compare with Beta version from before the adjusted memory allocators. It seems that ArmA is benefitting from the TBB4 the most under conditions, where lots of memory is required. My experience with TBB4 in Chernarus Warfare is similar. Difference between TBB3 and TBB4 is around 50 plus percent with the latter. Similar scenarios in Takistan don't benefit that much. Memory usage in Takistan is usally at 1.2G while in Chernarus it goes usually up to 1.5G very fast and further up to 1.8G. So performance gain seems correlated to memory requirement; which is not that surprising cause we are talking Memory Allocator here. Also stability is pretty good, no crashes here for a while. Thumbs up! System is i7 920 | X58 Chipset | 6G 1600Mhz RAM | Samsung SSD | AMD 5870 GPU Share this post Link to post Share on other sites
sickboy 13 Posted November 1, 2011 I'll leave the scientific testing until someone defines a good process to follow (ideally automated). ;)What about kju's benchmark suite, that seems to be excellent and already used for this purpose :D Share this post Link to post Share on other sites
maddogx 13 Posted November 1, 2011 What about kju's benchmark suite, that seems to be excellent and already used for this purpose :D I wasn't aware that such a thing existed, but I guess I'll go look for it and try it out when I get the chance. :) Share this post Link to post Share on other sites
humvee28 10 Posted November 1, 2011 Another influencing Factor for this could be the different Technologies of Memory Management. Remember that the old CPU´s (like mine) got an external Memory Management via Northbrigde, while the new CPU´s (i-series) got it integrated on the Die. :) Share this post Link to post Share on other sites
.kju 3241 Posted November 1, 2011 PvPscene Benchmark Suite Share this post Link to post Share on other sites
maddogx 13 Posted November 1, 2011 PvPscene Benchmark Suite Thanks, Kju! :) Will run some benchmarks when I get home. Share this post Link to post Share on other sites
sickboy 13 Posted November 1, 2011 (edited) I wasn't aware that such a thing existeShortcomings of forum threads in general and thread starters (they can only edit first post) not being thorough enough i'd say :D(e.g include recommend methods of benchmark etc) Edited November 1, 2011 by Sickboy Share this post Link to post Share on other sites
DBGB 10 Posted November 1, 2011 (edited) I built tcmalloc_bi and got a out of mem. error in Arma. Anyway it was quite easy creating the dll using Visual Studio 2010 - the project built a dll file with a size of 184 kb - larger than the TTB versions. Maybe I can set some optimization options in VS - though I'd have too look into that. Anyway here's an excerpt of the arma2oa.RPT == E:\Games\Bohemia Interactive\Expansion\beta\arma2oa.exe == "E:\Games\Bohemia Interactive\Expansion\beta\arma2oa.exe" -nosplash -skipintro -cpucount=12 "-mod=expansion\beta;expansion\beta\expansion -malloc=TCMalloc_bi ===================================================================== Exe timestamp: 2011/10/31 16:31:28 Current time: 2011/11/01 17:21:03 Version 1.59.85889 Allocator: E:\Games\Bohemia Interactive\Expansion\beta\dll\tcmalloc_bi.dll Item str_disp_server_control listed twice Cannot register unknown string STR_VERY_LARGE ... ... Virtual memory total 4095 MB (4294836224 B) Virtual memory free 2951 MB (3095097344 B) Physical memory free 16251 MB (17041092608 B) Page file free 15880 MB (16652148736 B) Process working set 719 MB (754245632 B) Process page file used 755 MB (792195072 B) Longest free VM region: 2146865152 B VM busy 1217036288 B (reserved 342503424 B, committed 874532864 B, mapped 47562752 B), free 3077799936 B Small mapped regions: 8, size 36864 B ErrorMessage: Out of memory (requested 3 KB). footprint 408420352 KB. pages 16384 KB. ... A lot of these errors listed as well Link to 99c702d4 (Obj-224,206:724) not released Link to 9966f292 (Obj-222,203:658) not released I read in another post that tmalloc had been used previously. My experience so far looks like it's neck and neck between TTB v3 and v4... Will try to build some of the other allocators from source given and test. (JE Malloc from VS 2010 generates a dll around 454 kb...located in debug folder....maybe I have to set some VS options to strip it down....anyway this is fun) ---------- Post added at 06:52 PM ---------- Previous post was at 05:54 PM ---------- I got this out put from VS 2010: 1>cl : Command line error D8016: '/ZI' and '/GL' command-line options are incompatible Looks like the compiler CL.EXE command get's the options supplied by some of my default VS settings ?! /c /ZI /nologo /W3 /WX- /Od /Oy- /GL /D WIN32 /D _DEBUG /D _WINDOWS /D _USRDLL /D NEDMALLOC_BI_EXPORTS /D _WINDLL /D _UNICODE /D UNICODE /Gm /RTC1 /MTd /GS /arch:SSE /fp:fast /Zc:wchar_t /Zc:forScope /Fo"Debug\\" /Fd"Debug\vc100.pdb" /Gd /TP /analyze- /errorReport:prompt /GL Enables whole program optimization /ZI Includes debug information in a program database compatible with Edit and Continue Found the /ZI option and changed it to /Zi Generates complete debugging information And now the project build completes - dll size is 383 KB BTW: Was looking at the export section that BI already made (Is to be found in all the sources given from http://http://community.bistudio.com/wiki/ArmA_2:_Custom_Memory_Allocator) extern "C" { DLL_EXPORT size_t __stdcall MemTotalReserved() {return nedmalloc::VirtualReserved;} DLL_EXPORT size_t __stdcall MemTotalCommitted() {return nedmalloc::VirtualReserved;} DLL_EXPORT size_t __stdcall MemFlushCache(size_t size) {size_t before = nedmalloc::VirtualReserved;nedalloc::nedmalloc_trim(0);return before-nedmalloc::VirtualReserved;} DLL_EXPORT void __stdcall MemFlushCacheAll() {nedalloc::nedmalloc_trim(0);} DLL_EXPORT size_t __stdcall MemSize(void *mem) {int isforeign;return nedalloc::nedblksize(&isforeign,mem);} DLL_EXPORT void *__stdcall MemAlloc(size_t size) {return nedalloc::nedmalloc(size);} DLL_EXPORT void __stdcall MemFree(void *mem) {nedalloc::nedfree(mem);} // DLL_EXPORT __stdcall void *MemResize(void *mem, size_t size) {return moz_expand(mem,size);} // TODO: consider implementing expand? This is a nice hint for those that want's to roll their own implementation - Using BI's modified project sources I'm pretty sure that given some time it would be possible to check what to look for and possibly modify in a "3rd" party malloc implementation. So maybe the Hoard is coming in over the horizon... Edited November 1, 2011 by DBGB Added line -fixed some typos Share this post Link to post Share on other sites
maddogx 13 Posted November 1, 2011 (JE Malloc from VS 2010 generates a dll around 454 kb...located in debug folder....maybe I have to set some VS options to strip it down....anyway this is fun) You should be building a release version, not a debug build. :) Right click on the project in VS and select "configuration manager", then switch to the release configuration. Then build. Share this post Link to post Share on other sites
DBGB 10 Posted November 1, 2011 (edited) You should be building a release version, not a debug build. :)Right click on the project in VS and select "configuration manager", then switch to the release configuration. Then build. JEMalloc_bi dll size reduced to 58 KB from 484 KB TCMalloc_bi dll only 36 KB from 184 KB NedMalloc_bi is down from 383 to 80 KB Thx ;-) (Have no clue if the above is super optimized... or if I can set some other options I don't know about yet) Wonder if the debug version contained all kinds of debug symbols and other not used struff - that impacted the performance I saw when testing the different 'debug' builds. BTW: SW License wise - is it legal to distribute the above DLL's without the source / (+with or w/o VS project files) - like for instance somebody can't figure out how to build in VS or anywhere else - can I compile the DLL and send it / post it somewhere without risking violating the SW license for the given source code (GPL vs Booster license vs etc)? ---------- Post added at 10:57 PM ---------- Previous post was at 10:37 PM ---------- Haven't tested the release builds of my malloc builds from previous post... Got curious when I saw that BI also had provided the source code for TBB4 - That code is obviously different from what's available here : http://threadingbuildingblocks.org/ver.php?fid=174 But could provide some insight on how to modify / export the functions from other malloc implementations when adapting it to the interface described in BI's malloc wiki. I wonder if BI's implementation is from the latest code commit from threadingbuildingblocks.org since there is differences. I guess the intel http web download link could be old - maybe somewhere there is a newer repository (subversion/github link please) :-) Well - I'm going to try to build from the TBB site. I recommend using something like BeyondCompare to adapt/modify source when doing this in windows - and you know almost nothing about programming... Found this post "How does TBB load balance between muti-cores" on a TBB forum : http://software.intel.com/en-us/forums/showthread.php?t=86049&o=a&s=lr Reads to me that there's better NUMA awareness using the QuickThreading paradigm - Comparison link between TTB and QT http://www.quickthreadprogramming.com/Comparative%20analysis%20between%20QuickThread%20and%20Intel%20Threading%20Building%20Blocks%20009.htm Edited November 1, 2011 by DBGB Added link ;-) Share this post Link to post Share on other sites
sickboy 13 Posted November 2, 2011 (edited) The available memory allocator sources are now available on GitHub! https://github.com/sickboy/bis-memory_allocators Feel free to fork, share, create pull requests for applying improvements, etc. If there are improvements and license permits it, BI could be interested in including the allocator with the game. Information: Git general: http://www.git-scm.com/ GitHub help: http://help.github.com/ GitHub Forking: http://help.github.com/fork-a-repo/ Dev-Heaven generic info on VCS/SCM; http://dev-heaven.net/projects/heaven/wiki/What_is_a_Version_Control_System Recommended Windows Clients Git Extensions: http://code.google.com/p/gitextensions/ MsysGit (original commandline and GUI client): http://code.google.com/p/msysgit/ TortoiseGit (just alike TortoiseSVN): http://code.google.com/p/tortoisegit/ Will add to BIKI shortly. Edited November 2, 2011 by Sickboy Share this post Link to post Share on other sites
DBGB 10 Posted November 2, 2011 I won't have time to look into messing around with any 'new' malloc implementations during the weekend. I'm traveling from tomorrow but... But one hint regarding the game engines interface. It looks like TTB3 which was mentioned as being used as the default memory allocator in the engine - the interface specification from the BI wiki is taken directly from tbbmalloc.ccp - line 216 to 223 #ifdef _WIN32 #define DLL_EXPORT __declspec(dllexport) extern "C" { DLL_EXPORT size_t __stdcall MemTotalCommitted() {return scalable_footprint();} DLL_EXPORT size_t __stdcall MemTotalReserved() {return scalable_footprint();} DLL_EXPORT size_t __stdcall MemFlushCache(size_t size) {return scalable_trim(size);} DLL_EXPORT void __stdcall MemFlushCacheAll() {scalable_trim((size_t)-1);} DLL_EXPORT size_t __stdcall MemSize(void *mem) {return scalable_msize(mem);} DLL_EXPORT void * __stdcall MemAlloc(size_t size) {return scalable_malloc(size);} DLL_EXPORT void __stdcall MemFree(void *mem) {scalable_free(mem);} So basically from my perspective it's necessary to figure out for MemTotalCommitted() what type is returned and what argument's (pointer/object/struct ref) the function accepts.... DLL_EXPORT size_t __stdcall MemTotalCommitted() {return scalable_footprint();} Points to scalable_footprint() - which again looks like it's 'templated' and really the 'overlloaded?? function internal_footprint that is really MappedMemory So it's a bit tricky to me figuring out atm how I can convert 'others' malloc function calls to the TBB3 interface. I know I need to look up TTB3 it seems - then go figure out if my "custom malloc" implementation have a single or several functions combined that does what TTB3 does - figure out how I can make a call to that/these functions and what kind of data they return and maybe typecast them into something that TTB3 accepts. But nevertheless it's fun to look into - I got some colleagues at work who have given me some directions - although they didn't really understand my motivation for creating a 'custom' dll. Hope that community will start to look into this as well... I definitely need to read up on TTB3 (is that pthreads ?) tonight... Share this post Link to post Share on other sites
suma 8 Posted November 2, 2011 It looks like TTB3 which was mentioned as being used as the default memory allocator in the engine - the interface specification from the BI wiki is taken directly from tbbmalloc.ccp - line 216 to 223 Such interface does not exist in TBB, this is the part which was modified by us. Remember the TBB3 sources are modified, if you want to check their original state, you need to download directly from the TBB site (link also provided in the Community Wiki). As for scalable_footprint, internal_footprint and MappedMemory, those are also our additions. Share this post Link to post Share on other sites
DBGB 10 Posted November 2, 2011 Suma you're right Noob error, I hadn't downloaded the source files from TBB3 = tbb30_20110427oss_src.tgz only tbb30_20110427oss_win So I only searched in BI's TBB3_source dir src - missed that this dir was missing from tbb30_20110427oss_win... Now I can see the modifications...thx Share this post Link to post Share on other sites
sickboy 13 Posted November 2, 2011 The available memory allocator sources are now available on GitHub!https://github.com/sickboy/bis-memory_allocators Feel free to fork, share, create pull requests for applying improvements, etc. If there are improvements and license permits it, BI could be interested in including the allocator with the game. Information: Git general: http://www.git-scm.com/ GitHub help: http://help.github.com/ GitHub Forking: http://help.github.com/fork-a-repo/ Dev-Heaven generic info on VCS/SCM; http://dev-heaven.net/projects/heaven/wiki/What_is_a_Version_Control_System Recommended Windows Clients Git Extensions: http://code.google.com/p/gitextensions/ MsysGit (original commandline and GUI client): http://code.google.com/p/msysgit/ TortoiseGit (just alike TortoiseSVN): http://code.google.com/p/tortoisegit/ Will add to BIKI shortly. Recreated the repo with proper history by using the correct source file versions, and applied the changes now step by step. Original -> BI diffs TBB3: https://github.com/sickboy/bis-memory_allocators/commit/90973dfe5af5ed500a154cba8dd421bd8004d402 TBB4: https://github.com/sickboy/bis-memory_allocators/commit/e94c51640bb398759f52d62df991085f9aae6612 TCMalloc: https://github.com/sickboy/bis-memory_allocators/commit/547f5e7fa673e9d96bcf4edf6ac18ed9b476e88f NedMalloc: https://github.com/sickboy/bis-memory_allocators/commit/bdb2afe59b2985a067a21e932f055830f0eb1c7b JEMalloc: Awaits specific version information Share this post Link to post Share on other sites
DBGB 10 Posted November 2, 2011 (edited) Academic Discussion NUMA_aware_heap_memory_manager_article_final.pdf Source code based on google-perftools-0.97 code is provided in the pdf.s second last page incl. diff Source code link provided here as well: http://developer.amd.com/Assets/NUMA-aware%20TCMalloc.zip Update: I wrote a link to a comparison between TBB4 an Quickthreading in a previous post here : http://forums.bistudio.com/showpost.php?p=2049014&postcount=66 - Quickthreading is apparently building on what's described here: New NUMA Support with Windows Server 2008 R2 and Windows 7 Some MSDN example code is given here : Win7NumaSamples.zip Edited November 2, 2011 by DBGB Added detail about quickthreading and msdn arhcive Share this post Link to post Share on other sites
humvee28 10 Posted November 3, 2011 Benchmark Results (E08, Beta 85876, no Mods) Allocator -- 1st Run -- 2nd Run -- Comments (for Run 1 of 2) tbb4 -------- 36 ------- 36 ------ smooth tbb3 -------- 36 ------- 35 ------ slight Object Plopping Windows ---- 35 ------- 35 ------ more Object Plopping ------------------------------------------------------------------------- Benchmark Results (E08, Beta 85889, No Mods) Same Results as with 85876 ------------------------------------------------------------------------- Benchmark Results (E08, Beta 86055, No Mods) Allocator -- 1st Run -- 2nd Run -- Comments (for Run 1 of 2) tbb4 -------- 43 ------- 43 ------ smooth tbb3 -------- 43 ------- 43 ------ slight Object / LOD Plopping jemalloc ----- 42 ------- 42 ------ s.a. nedmalloc --- 41 ------- 42 ------ s.a. tcmalloc ---- 42 -------- X ------- Out of Memory Error in 2nd Run Windows ---- 42 ------- 42 ------ more Object / LOD Plopping ----------------------------------------------------------------------- Testing Environment : Sys : OS : Win7-64 Home Premium CPU : C2Q Q9650 @ 4Ghz (FSB 445) GPU : ASUS HD 5870 @ Stock Clocks RAM : 4 GB Mushkin 996599 @ 890 Mhz ( FSB - RAM 1:1) @ 5-5-5-15 MB : DFI LP DK P45-T2RS HDD : 2 x Seagate Barracuda 500GB @ RAID 0 PSU : Silverstone Strider Plus 750W Drivers : all the latest (GPU, Chipset etc.) Ingame Settings : VD : 2000 Res : 1920 x 1080 others : all very high except VRAM (Default), AA (Normal), PP (low). Vsync on Config : AToC = 0, GPU Rendered & Detected Frames = 1 Personal Conclusion : Dunno what´s going on, but Beta 86055 gives me better Performance than 85876 and 85889. Nothing has been changed on my Sys. For the best overall Apperance i have to retest with the two "tbb" Mallocs. The new Mallocs doesn´t perform very well here. Tcmalloc caused CTD with Out of Memory Error (RPT and .bidmp sent to Dwarden). Share this post Link to post Share on other sites
On_Sabbatical 11 Posted November 3, 2011 Tested this new allocators (Q6600 @ 3.65 ghz,GTX285 ,4 GO RAM,WD Caviar),on benchmark E08. My graphic settings in order are set to: (Normal,Veryhigh,High,Disabled,Very low,Very low,High,normal,Disabled) Tbb4 = 59 FPS Ned malloc =62 FPS Jemalloc = 56 FPS Tcmalloc = 55 FPS Share this post Link to post Share on other sites
Black Russian 10 Posted November 3, 2011 Maybe this is the wrong place but how can I add the malloc parameter while using the Sixupdater? Share this post Link to post Share on other sites