Jump to content

fred41

Member
  • Content Count

    536
  • Joined

  • Last visited

  • Medals

Everything posted by fred41

  1. @jiltedjock, thanks for your feedback and logs. Your logs are looking good, except at the end there are some fallbacks to small pages. Caused by fragmentation, but not a big problem. Maybe we discussed this already in an earlier thread, but i still think, that putting a swapfile in RAM disk is not really a good idea. You have 16GB RAM and there should be normally nearly no access to the swapfile. I would give the 4GB back to the OS memory pool. However, the easiest way to find this confirmed, is to try it out :) BTW: There is a new "tbbmalloc for arma" version out.
  2. Memory allocation via custom heap replacement, has lowest level and highest performance requirements. If BattleEye would intercept here, this would seriously impact armas performance. So i think i can answer your question, no BattleEye will not flag us. Let's hope, that improvements introduced with tbbmalloc, will later find place in the default allocator.
  3. ... thats interesting. Assuming that no other software, like background virus scanners, are running on your system and causing this fast fragmentation, it seems that your flye-over benchmark touches a weak point in armas cache design. Armas data from large .pbo files (~10GB) are cached in a, so called section object (memory mapped file, arround 1.6GB) outside of armas address space. To avoid to much IO traffic, this large cache contains the most and last used data (textures, terrain and object vertex&triangle data, etc.). If you fast fly-over a large map like altis with ultra settings, this cache have to be updated very frequently and is stressed very heavy. To keep this big cache uptodate and to map small windows of this cache in to armas address space, a lot memory mapping and transfer operations are nessecary in such a scenario. Your OS memory manager allocates and releases permanently alot of memory pages. This could be the cause for the very fast fragmentation observed on your system (TonyGrant seems to have the same symptoms). A possible Solution: Arma could use large pages for this large section object. This could be a much better solution, because it would avoid alot memory transfer this way. But this solution would only work, if a system have enough physical RAM, because large pages are (currently) not pageable (swapable). Greets, Fred41
  4. @ramius, this is a very good question. I personally don't use a large RAMdisk to buffer all .pbo files anymore. I tried it a few months ago, but had to discover, that it doesn't really accelerate arma, if you already have a very fast SSD (sata 3). This was a bit unexpected first, because a good RAMDisk is up to 20 x faster than a good SSD. But what is the reason for that strange behavoir? The answer is, that on modern systems with a lot of RAM and much usage of this RAM, a new bottleneck occurs. The problem is a small buffer on our CPU, the TLB (translation lookup buffer). This buffer is needed for fast translation from virtual to physical memory addresses. The number of pages that this buffer/cache can hold is limited and much to small for the number of pages, that memory intense applications would need. The solution for this problem are, so called "large pages". For example a 2MB large page is 512x larger than a small page (4k) and therefore large pages need only 1/512 of entrys in the TLB. This means that a memory intense application can access large pages faster than small pages, because the TLB can be much more efficient used. Now we have to take in acccount that a very large ramdisk is using a lot of small memory pages and this results in a permanently overwritten TLB. Most of your memory accesses are significantly slower for that reason (up to 25%). But i noticed just now, that i still didn't answered your question ;) I think yes, tbbmalloc will accelerate arma even if you use a large RAMdisk, but you have to try it out for your system/environment. Greets, Fred41 ---------- Post added at 16:56 ---------- Previous post was at 16:51 ---------- ... basically this, trying is mostly much faster than explaining :)
  5. @TonyGrant, these line above means your OS isn't able to find a continous (2MB) block of memory to allocate a large page region and therefore tbbmalloc falls back to small pages (4k). This is not a big problem, but it means your RAM is fragmented and you don't get the full advantage of large page usage. Please read what a wrote to @frag85 (a few posts earlier) about memory fragmentation and how to avoid it. Thanks for posting your log file here. ---------- Post added at 16:06 ---------- Previous post was at 15:49 ---------- @frag85, the first of your three logs looks good now. All large regions are allocated as large pages (LP). In your 2. and the 3. logs there are allocations as small pages (SP) too. What means your system can't find enough large page regions and tbbmalloc is forced to use small pages instead (fragmentation). My system has 16GB and i use a ramdisk (imdisk) with 2GB (only for temporary files to protect my SSD). I never observed such fragmentation on my system, that tbbmalloc is forces to fall back. What exactly does your "Memorys Benchmark"? I think it is possible, that a OS memory intense benchmark, together with a large RAMdisk is causing this fast fragmentation, especially if you use ultra settings for textures, object details and viewdistances. BTW: there is a new build available at github, with improved logging output. Greets, Fred41
  6. ... thanks, this answer is correct :) There is only one exception: If you want to monitor an HC with ASM too (with verify signature on), than signing make sense, because HC is a client too. But in this case, you can sign ASM by yourself, without any trouble, because this signature is not shared with other clients.
  7. ... thanks for your feedback ... @frag85, your log shows the following: The "lock pages in memory privileg is set correctly, but your OS is not able to find unfragmented (2MB blocks) for large page allocation. Your RAM is probably heavy fragmented. Try rebooting your system. Check if there is software in the background running, which causes this heavy fragmentation. @NoPOW, your log looks excellent (much better then my logs). I think win 8.1 handles the large page allocation very much faster than win 7 (i think i have to upgrade soon ^^) @Neodammerung, your log looks good, a typical pattern for a win7/64 system Special thanks for the A2 log too, i didn't tested it wit A2 before. @TonyGrant, try moving the -malloc=tbbmalloc more to the left (first position, before all other parameters), hope it helps Thanks again to everybody :) BTW: Flyover-benchmarks like Helo's A3 Bench are not showing the full potential of tbbmalloc, because they are mainly stressing the graphical subsystem (CPU, VRAM, PCIE bus, OS memory manager) and making very little us of tbbmalloc. Joining a full MP server should make much more use of tbbmalloc, but here the difference is very difficult to measure. I really wish i would know (or could provide) a good allround benchmark for arma (scripting, AI, eventhandling, network stack, etc). Greets, Fred41
  8. UPDATE: The latest version of tbbmalloc for arma is available here: https://github.com/fred41/tbbmalloc_arma I added the whole source code, especially to make it available to BIS devs, but for interested community devs too. Please notice and respect intel's license related requirements (GPL 2.0). For detailed install and usage instructions, please read the readme.md, attached to this repo. I am still interested in feedback. Especially the "malloc_PIDX.log" files (in arma root dir) are very helpfull, to investigate the large page allocation timings for different system setttings. I you want to participate and can provide some feedback, please add informations like: ammount of RAM, CPU (clock), OS, arma build and running mission. Greets, Fred41
  9. this is what i stated: "especially because arma don't allow the custom memory allocator to allocate more then 1GB and try to enforce this limit by permanently flushing the cache, each frame." 1. this is related to memory allocation via custom allocator only and not the overall memory allocation 2. i stated that arma starts to flush the cache to enforce this limit of 1GB, further allocations are still possible, this is not the point The point is, that the efficency of the large object cache is very bad, because of that flush requests (MemFlushCache). Here is a little sequence of a logfile recorded one hour ago (stable 1.08): T: 11.3 | C: 181504k | A: 1289920 | M: 1047 | F: 1046 | S: 2109 | L: 1 T: 11.5 | C: 181504k | A: 1310912 | M: 1060 | F: 1061 | S: 2141 | L: -1 T: 11.4 | C: 181504k | A: 1289920 | M: 1047 | F: 1046 | S: 2109 | L: 1 T: 11.5 | C: 181504k | A: 1310144 | M: 1051 | F: 1051 | S: 2122 | L: 0 T: 11.4 | C: 181504k | A: 1308352 | M: 1050 | F: 1051 | S: 2121 | L: -1 T: 11.6 | C: 181504k | A: 1317568 | M: 1066 | F: 1066 | S: 2148 | L: 0 ..... ..... T: 2.7 | C: 1001600k | A: 53656 | M: 267 | F: 867 | S: 1134 | L: -600 T: 0.0 | C: 1001600k | A: 0 | M: 0 | F: 0 | S: 0 | L: 0 T: 18.5 | C: 1001600k | A: 1773248 | M: 2100 | F: 1474 | S: 3578 | L: 626 T: 2.2 | C: 1001600k | A: 42840 | M: 261 | F: 857 | S: 1118 | L: -596 T: 0.0 | C: 1001600k | A: 0 | M: 0 | F: 0 | S: 0 | L: 0 T: 20.6 | C: 1001600k | A: 1903056 | M: 2512 | F: 1760 | S: 4282 | L: 752 T: 2.6 | C: 1001600k | A: 42872 | M: 262 | F: 858 | S: 1120 | L: -596 T: 188.4 | C: 1001600k | A: 28324675 | M: 53744 | F: 31025 | S: 90351 | L: 22719 T: 63.7 | C: 1001600k | A: 9542614 | M: 134042 | F: 98978 | S: 339354 | L: 35064 MemFlushCache:1 requested:9129984 flushed:10035200 T: 55.5 | C: 1001600k | A: 10658044 | M: 14319 | F: 14037 | S: 28339 | L: 282 MemFlushCache:2 requested:20963328 flushed:21110784 T: 66.5 | C: 1001600k | A: 14051253 | M: 19102 | F: 18084 | S: 37154 | L: 1018 MemFlushCache:3 requested:27721728 flushed:9420800 T: 54.1 | C: 1001600k | A: 4005417 | M: 7526 | F: 6240 | S: 13878 | L: 1286 MemFlushCache:4 requested:27721728 flushed:2326528 T: 45.8 | C: 1001600k | A: 3474971 | M: 5690 | F: 4738 | S: 10550 | L: 952 MemFlushCache:5 requested:27975680 flushed:2408448 T: 112.4 | C: 1001600k | A: 11400461 | M: 31393 | F: 22202 | S: 54342 | L: 9191 MemFlushCache:6 requested:29958144 flushed:1982464 T: 87.4 | C: 1001600k | A: 3638541 | M: 13696 | F: 12174 | S: 26060 | L: 1522 MemFlushCache:7 requested:30216192 flushed:1630208 T: 45.6 | C: 1001600k | A: 2726942 | M: 4786 | F: 4295 | S: 9154 | L: 491 MemFlushCache:8 requested:30420992 flushed:1384448 T: 45.8 | C: 1001600k | A: 2637127 | M: 5522 | F: 6723 | S: 12396 | L: -1201 MemFlushCache:9 requested:31444992 flushed:1384448 T: 38.7 | C: 1001600k | A: 3765313 | M: 5516 | F: 5751 | S: 11341 | L: -235 MemFlushCache:10 requested:31625216 flushed:2539520 T: 42.2 | C: 1001600k | A: 3588055 | M: 4391 | F: 5521 | S: 10151 | L: -1130 MemFlushCache:11 requested:31821824 flushed:1966080 T: 29.3 | C: 1001600k | A: 3043198 | M: 3708 | F: 3164 | S: 6921 | L: 544 MemFlushCache:12 requested:31911936 flushed:2277376 T: 33.5 | C: 1001600k | A: 2962536 | M: 6043 | F: 4120 | S: 10657 | L: 1923 MemFlushCache:13 requested:34377728 flushed:1253376 T: 35.1 | C: 1001600k | A: 5208487 | M: 5981 | F: 7620 | S: 13889 | L: -1639 ..... ..... ..... T: 13.8 | C: 1786864k | A: 1414268 | M: 1818 | F: 1818 | S: 3653 | L: 0 MemFlushCache:414213 requested:297201664 flushed:1253376 T: 13.6 | C: 1786864k | A: 1415948 | M: 1881 | F: 1881 | S: 3779 | L: 0 MemFlushCache:414214 requested:297201664 flushed:1253376 T: 13.8 | C: 1786864k | A: 1417860 | M: 1892 | F: 1890 | S: 3799 | L: 2 MemFlushCache:414215 requested:297201664 flushed:1253376 T: 13.1 | C: 1786864k | A: 1417132 | M: 1878 | F: 1879 | S: 3774 | L: -1 MemFlushCache:414216 requested:297201664 flushed:1253376 T: 14.4 | C: 1786864k | A: 1818332 | M: 5317 | F: 4151 | S: 9522 | L: 1166 MemFlushCache:414217 requested:297799680 flushed:1327104 T: 22.9 | C: 1786864k | A: 1956794 | M: 2682 | F: 2797 | S: 5495 | L: -115 MemFlushCache:414218 requested:298930176 flushed:1769472 T: 41.0 | C: 1786864k | A: 2311065 | M: 3164 | F: 3072 | S: 6231 | L: 92 MemFlushCache:414219 requested:298934272 flushed:2023424 T: 5.5 | C: 1786864k | A: 171952 | M: 2250 | F: 2607 | S: 4920 | L: -357 MemFlushCache:414220 requested:298999808 flushed:139264 I think you can see, that the limit of ~1Gb is triggering this flush attempts. (C: is the reported commited memory, T: is the time in ms since the last MemCommited request from arma) Greets, Fred41
  10. ... i doubt that, i think BIS devs are very busy. But sometimes i wish too, the prioritys would change more to performance related aspects.
  11. ... of course you may, but why me? :) This limit is probably an old one. To support 32-bit OS, with only 2GB address space this limitation makes sense. I think it is time to change this behavoir, because meanwhile we have much more address space available (nearly 4GB, with 64 bit OS).
  12. ... i forgot to mention, this was a low load situation, just 2 players ... (i really wish the engine would be able to load AI calculation to a second core :( ) In fact, the CPU load was to low here to cause intel-speedstep to clock up.
  13. ... yes, that is because script execution in the VM is limited to ~ 3ms per frame and therefore doesn't impact the FPS. I remember, a BECTI server admin observed the same low CPS behavoir and later we found, that the reason was the power-management on his server. Maybe you could try to set the power-option profile to "high performance" and compare your CPS value.
  14. @frag85, thanks for sharing this. I am assuming you tested this now with large pages enabled (lock pages in memory privileg). The slightly higher memory allocation is a result of better large memory object caching and higher allocation granularity. This results in much better memory performance, especially because arma don't allow the custom memory allocator to allocate more then 1GB and try to enforce this limit by permanently flushing the cache, each frame. This behavoir totally kills cache efficiency in the default allocator as expected. tbbmalloc protects the cache by ignoring this strange flush attempts, which result in much higher efficience if allocation size via custom allocator is above 1GB. The second advantage is the use of large pages, which results in ~25% access speed in the related areas and a overall performance increase of around 6%-9% (arma client). As i already stated, in my last post, there is no relation between the custom memory allocator and the strange VRAM allocation pattern and if you think about that, you probably have to agree, that your graph confirm that. BTW: Dwarden is currently testing tbbmalloc on the two CZ servers (stratis, samatra wasteland v9) and i am using the logfiles (thanks to dwarden ) to fine tune some parameters. I will release this allocator in the next days for testing. Greets, Fred41
  15. @Tankbuster, thanks ... :) @kremator, this CPS value can tell you how often FSM conditions are evaluated (per second). Because AI is mainly controlled by FSMs, a low CPS value means slow reacting AI too. Additional, while playing, you will notice that some actions are delayed more than normally, if CPS is very low. There is a relation between the number of scripts running in the VM (schedeled environment) and the FSM condition evaluation frequency. A low CPS value is most likely caused by an overloaded VM (to many scripts spawned, execVM ...). Greets, Fred41
  16. ... thanks alot for this tutorial :)
  17. ... what you most likely see here is a VRAM problem and not related to CPU memory allocation. A better memory allocator will probably not solve this problem. But you could try to lower your texture settings from ultra to very high. Hope it helps, Fred41
  18. UPDATE: 13.12.2013 three **customizable object counter** added (set interval and .sqf command in asm.ini) Three customizable object counters added. Counting interval and .sqf commands for each counter are configured in asm.ini. The graphical display for this three values is currently scaled to max=1000. The value field can show much higher values. For detailed usage instructions, please read the readme.md. Please report bugs/issues, ether here, or in my github repo. BTW: You can not only count objects, but use any .sqf code which return a number value.
  19. ... really 3x ? ... :D ------------------------------------------------------------------------------------------------------------- EDIT: woooow, what is that, unbelievable, just now got +100FPS and that is really not a joke What crazy bug did you guys found and fixed here?
  20. @BL1P, good to know i am not the only old one here -O^O- :D FPS is the average FPS of the last 16 cycles/frames, FPSmin is the FPS minimum of the last 16 cycles/frames. Internal commands used are: diag_fps and diag_fpsmin, sampled each second. The very low FPSmin baseline on your server means, that there are probably some periodical lags/stutters on this instance. This is not nessecary a problem on server, but it make sense to investigate it. @Revolutor, i answered your PM :)
  21. ... thanks for your report ... Perhaps you didn't updated all files (see changelog and readme.md, hint: ASMdll.dll can only be overwritten while arma is not running)?
  22. fred41

    the houses and ruins are the mp problem ...

    ... very interesting find (lets hope vehicles and bodies are not handled this way too) ...
  23. FIX: 08.12.2013 fix: instance slot blocked, caused by arma server crash (full update required)
  24. minor update: object counter (OBJ) added to log file output, only appended if objectcounting is activated via asm.ini. The order of values is the following now: TimeStamp|FPS|CPS|PL#|AIL|AIR|[OBJ] @.kju, this little update should help a bit to show the impact of object numbers to performance and to verify the efficiency of cleanup. I think a more detailed analysing, what object are causing a possible accumulation of objects, should be still done via script in mission (during mission development). Monitoring/logging the networktraffic values would be very helpful, but is still very difficult to implement. I really hope that BIS will add some script commands, to interface with this important engine values (whitout to login as admin ingame and to run ASM as admin).
×