Jump to content
fourjays

File limits on Linux arma3server?

Recommended Posts

Hi,

I have recently been compiling a new modpack for our Arma 3 group and have come across a strange issue on the Linux arma3server. I solved part of it, but am now stuck on a related issue that I have been unable to find a solution for.

 

It started with the server crashing when i was trying to boot up the new modpack, typically just after "InitSound". I removed all the mod folders and added them back in one by one, until the last folder (a collection of maps) caused the crash. I spent the next few hours going through the maps, trying to find a faulty mod, but came up empty. I then tried putting the maps back and removing another folder (CUP Terrains) and everything worked. Then did likewise with another folder (Weapons). If I removed any folder, it worked. With them all it didn't.

 

Through an element of chance, I changed the mod order which caused it to proceed further before failing to start and crashing. In this case it produced a "pipe error" indicating that too many files were opened by the arma3server process. I ran ulimit -n 2048 (increasing it from 1024) and the server started with the full modpack. Success!

 

I then tried to connect both myself and our headless client to the server. I got "Connection failed", meanwhile the headless client stopped just short of the lines where it would say "headlessclient connecting". Suspecting a faulty mod, I again disabled the mod folders and re-enabled one by one, confirming that it booted and that both the HC and I could connect successfully each time. As before, it all worked until the last folder was added. As before, it all worked regardless of which folder I removed, as long as one was removed.

 

So I again dug into the Linux file limits, increasing the ulimit and adjusting the global limits in /etc/security/limits.conf file and /etc/sysctl.conf. None of these have worked and for lack of a better solution I have come to the conclusion that either the arma3server imposes some kind of limit itself, or utilises something with a limit set by an option I haven't yet changed.

 

Following are some log files demonstrating the steps:

Can anyone give any ideas or insight on this please? Seems bizarre that there may be an uncontrollable limit being applied here and I hope I've just missed a setting or something.

Thanks

Share this post


Link to post
Share on other sites

There is an issue with the linux server atm with reading bikeys.

 

The following is not my findings so please dont give any credit to me for this.

B.I have also been made aware of this recently and are working on a solution

The reason they fail is because multithreading issues _on the server_.

This has nothing to do with cpuCount or exThreads settings (set to 1 and 0 respectively during my investigations). The server, when it starts, does start multiple threads and I was able to trace all their OS calls during execution.

In short, BI are being hit by a "thundering herd" mitigation built-into Linux. They spawn a large number of threads that attempt to read files in parallel. When the OS detects such condition it sends a EAGAIN signal that means the operation needs to be attempted again at the next cycle.

There is a function to be utilised by the application in such situation (FUTEX_REQUEUE) but the dedicated server on Linux isn't using it to requeue requests.

The result is that, from the server's point of view, certain key files read attempts fail; in reality they don't, the application is told to try again, but I haven't seen the second attempt materialise:

tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc5600, FUTEX_WAKE_PRIVATE, 1) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 lseek(807, 0, SEEK_CUR) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 read(807, "UK3CB_BAF_Vehicles_4_0\0\224\0\0\0\6\2\0\0\0"..., 4096) = 175
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc55d4, FUTEX_WAIT_PRIVATE, 47393, NULL) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc5600, FUTEX_WAKE_PRIVATE, 1) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 getcwd("/tmp/tmp.ztSvhDuYCu", 1024) = 20
tmp.ztSvhDuYCu.strace.25691:17:20:32 open("/tmp/tmp.ztSvhDuYCu/keys/UK3CB_BAF_Equipment_1_1.bikey", O_RDONLY) = 808
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc55d4, FUTEX_WAIT_PRIVATE, 47395, NULL) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc5600, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)

Normally, there should be another call down the line to read UK3CB_BAF_Equipment_1_1.bikey but there isn't.

Full context of this access follows:

First one process checks if the file exists and is readable:

tmp.ztSvhDuYCu.strace.25673:17:20:32 stat64("keys/UK3CB_BAF_Equipment_1_1.bikey", {st_mode=S_IFREG|0644, st_size=176, ...}) = 0
tmp.ztSvhDuYCu.strace.25673:17:20:32 stat64("keys/UK3CB_BAF_Equipment_1_1.bikey", {st_mode=S_IFREG|0644, st_size=176, ...}) = 0

Then another (later spawned) thread attempts to read it in parallel with other files:

tmp.ztSvhDuYCu.strace.25691-17:20:32 read(807, "UK3CB_BAF_Vehicles_4_0\0\224\0\0\0\6\2\0\0\0"..., 4096) = 175
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc55d4, FUTEX_WAIT_PRIVATE, 47393, NULL) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc5600, FUTEX_WAKE_PRIVATE, 1) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 getcwd("/tmp/tmp.ztSvhDuYCu", 1024) = 20
tmp.ztSvhDuYCu.strace.25691:17:20:32 open("/tmp/tmp.ztSvhDuYCu/keys/UK3CB_BAF_Equipment_1_1.bikey", O_RDONLY) = 808
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc55d4, FUTEX_WAIT_PRIVATE, 47395, NULL) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc5600, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
tmp.ztSvhDuYCu.strace.25691-17:20:32 futex(0xf6cc5600, FUTEX_WAKE_PRIVATE, 1) = 0
tmp.ztSvhDuYCu.strace.25691-17:20:32 lseek(808, 0, SEEK_CUR) = 0

And is being told to try again in a second (EAGAIN). There are no further attempts to access that file.

The current tests log time with 1s resolution, I will run further ones with microsecond resolution to get more accurate results. The logs are on zeus:/tmp for reference. They can be re-created by setting $_COMMAND_DEBUG='1' in any of the .arma3 instance files.

I have observed this problem only on server setups with a large number of content keys in the keychain. So far, only the addon server with 50 (!) keys is triggering this behaviour. Others, with more modest keychains don't.

This also explains why the clients can be kicked sometimes in the heat of the battle. Thread management is a Hard Problem and this is why it's delegated to specialised libraries like pthreads. It is possible that the dedicated server isn't interfacing with the library correctly or isn't handling all the cases like requeueing but this is just a wild speculation on my behalf. It can just be a simple bug.

 

 

You are currently pretty safe up to about 25 keys, after that it starts getting iffy.

 

Options:

1) Turn signature verification off for testing to prove if this is your issue

2) Resign some of the addons with your own key to reduce the number of bikeys you need to host on the server

Share this post


Link to post
Share on other sites

everything must be lower case, i still don't understand why people name their addoins like this   What_TheF_WHY.pbo

 

headless client will crash if signature verification is on, BIS knows about it not sure when they will start working on important things rather than making useless DLC's.

Share this post


Link to post
Share on other sites

Thanks for the replies, but unfortunately neither apply in this case. We already have signature verification off and whenever the mods are put on the server I run a small script that lowercases all of the files (doesn't even boot up unless I do this).

Share this post


Link to post
Share on other sites

Your sever logs last line is 2:21:05 InitSound ...

Maybe try to disable sound  - noSound

Share this post


Link to post
Share on other sites

Your sever logs last line is 2:21:05 InitSound ...

Maybe try to disable sound  - noSound

Already running with -noSound. Why it does InitSound with sound disabled is a mystery. :P But the issue behind this crash was solved by increasing the ulimit. The problem now is no clients can connect unless I reduce the number of files, as if there is a secondary limit in play.

Share this post


Link to post
Share on other sites

Have you had any luck in finding a solution?

 

At a glance, we seem to have run into the exact problem you had except without the crashing. When the full presets active, clients and neither see the server in the server browser or connect. Removing any mods at random will allow both to happen.

Share this post


Link to post
Share on other sites

Have you had any luck in finding a solution?

 

At a glance, we seem to have run into the exact problem you had except without the crashing. When the full presets active, clients and neither see the server in the server browser or connect. Removing any mods at random will allow both to happen.

Not really. Unfortunately the only solution was to splash out on a license for Windows Server and use that (and it functions flawlessly with the same mods). I am certain Linux has a hidden file limit. :(

Share this post


Link to post
Share on other sites

This still continues to be an issue that plagues the linux server. It is quite frustrating having to balance a modpack around a limit of ~730 pbos (ACE+CUP+RHS is still under the limit but that doesn't give a lot of wiggle room). This issue will only effect more users as mods continue to add more pbos (ACE/RHS) and apex will no doubt use more pbos.

 

Here's something I already posted on the dedicated linux feedback thread that may be useful:

 

There seems to be a maximum number of pbos the linux server will work with. If you add too many mods the server will no longer show up in the server listing. This limit seems to be around ~1024 pbos (from all addon folders). This can be reproduced by having over ~730 pbos from various mods in different addon folders as the core game as around ~230. I have increased the maximum number of files a process can open but it seems there is something in the server process that blocks it from using more than ~1024 files. 

 

One interesting to thing to note was that by using -> " cat /proc/sys/fs/file-nr " and measuring the number of file headers currently allocated between when the process was running it was no longer running the difference would always be 1024 allocated file handlers for the Arma 3 server process. This number seems pretty static.

 

 

So even if the user limits are increased there is still a maximum of ~1024 file descriptors used by the linux server process.

Share this post


Link to post
Share on other sites

Yeah...

 

We had a rather large preset update and hit 948 PBOs without much thought, which combined with the games 229 put us over whatever limit at 1177. We've had to remove a bunch of things to get it to show up again and with the new RHS update, that's another 28 things to remove elsewhere to make it work.

Share this post


Link to post
Share on other sites

file descriptors limits can be set in /etc/security/limits.conf

Share this post


Link to post
Share on other sites

file descriptors limits can be set in /etc/security/limits.conf

 

I appreciate the reply, though I have already increased these to be >10,000 yet the arma 3 server process still seems to at most use 1024 file descriptors.

Share this post


Link to post
Share on other sites

I got same problem. Even with

Quote

cat /proc/2353/limits

Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             4096                 15096                processes
Max open files            65536                65536                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       15096                15096                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

 

Server doesn't accept any new network connections. Rised limits, turned off selinux, simpified pam as much as possible. I got a feeling there might be some kind of hidden limit in a kernel setted up at compile time.

 

I'm using

 

Quote

 uname -a
 

Linux CentOS-7-2 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

 

Default precompiled kernel.

 

Thing that confuses me is why there is so little reports about this issue. 

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×