Jump to content
Incontinentia

Squeezing the absolute most performance out of a script: FSMs / CBA state machines vs loops and functions

Recommended Posts

As someone who has learned all he knows about scripting by fannying about in SQF for the past couple of months, I've got a question for the more advanced scripters out there.

 

I was intrigued by this statement in the ACE3 Medical AI notes:

 

Quote

Medical AI is based on a CBA state machine, which means there is next to no performance loss caused by it.

 

I've searched around and found a few nibbles of information here and there (and there's no documentation as yet that I've found on CBA about state machines) but nothing that really provides a decent comparison between an SQF script with a series of looped functions based on lazy evaluation of conditions, and an FSM that does the same. Is there a noticeable difference in performance or are FSMs just a different way of achieving a similar goal? 

 

And if there isn't much performance difference between the two, then what is the most efficient way of squeezing performance out of a complex script? Assuming your functions are as clean and fast as they can be, what are your best practices for making them run with minimal impact? 

 

I'm referring here to scripts with complex conditions which will run on a number of units or objects repeatedly (for reference, I'm trying to turn this into a full-blown undercover simulation and more efficiency means more responsiveness for more units). 

Share this post


Link to post
Share on other sites

I can't speak for CBA, but in my experience, any code ran has an overhead attached to it.

 

EH/FSM etc have higher priority than something like spawn, but it will always cost some cpu to run code.  You can't get something for nothing, no way around it.

 

Again, not sure what they mean by that statement, but if I made a simple FSM loop and put <(nearestObjects [player,[],20000]) select {typeOf _x == ""}> to run on every frame (or as close as the FSM would do to every frame), then of course there would be a massive performance overhead.

 

I think you've answered your own question anyway in that you are aware of making your code better, which is going to have less impact overall.  If it is getting to the stage where performance is getting bogged down in a complex script, then maybe you should re-evaluate what you are doing in the script and maybe try a different way (or chop bits out/simplify it).  Obv, that's not ideal but sometimes, it's just too much for the game.

  • Like 1

Share this post


Link to post
Share on other sites

Yup that makes sense. To be honest, the code I have runs absolutely fine - it's just a little less responsive than it could be and I'd like it to have so little performance impact that mission makers can use it without thinking or worrying about it. 

 

I've read a lot about optimising functions and scripts, but I am still a bit at a loss as to the best practice from a performance perfectionist point of view. I guess it doesn't matter as long as the code runs...

Share this post


Link to post
Share on other sites

I am not trying to correct you (i am not an expert) but you ask for any optimization possibilities.

 

I am always trying to avoid loop operations on large arrays like:

allVariables ,allGroups,allMissionObjects,allMapMarkers,allUnits

For example part of your code:

		{
			if (((side _x) == Civilian) && {!(_x getVariable ["isUndercover", false])}) then {
				if (_percentageTarget > (random 100)) then {
					private _prevGroup = group _x;

					[_x] joinSilent grpNull;
					[_x] joinSilent (group INC_rebelCommander);

					if ((count units _prevGroup) == 0) then {
						deleteGroup _prevGroup; // clean up empty groups
					};
				};
			};
} foreach allunits;

You can do there something that will reduce the index count of iterated array.

 

{
			if (!(_x getVariable ["isUndercover", false])) then {
				if (_percentageTarget > (random 100)) then {
					private _prevGroup = group _x;

					[_x] joinSilent grpNull;
					[_x] joinSilent (group INC_rebelCommander);

					if ((count units _prevGroup) == 0) then {
						deleteGroup _prevGroup; // clean up empty groups
					};
				};
			};
} foreach (allunits select {(side _x) == CIVILIAN});

or

 

{
			if (!(_x getVariable ["isUndercover", false])) then {
				if (_percentageTarget > (random 100)) then {
					private _prevGroup = group _x;

					[_x] joinSilent grpNull;
					[_x] joinSilent (group INC_rebelCommander);

					if ((count units _prevGroup) == 0) then {
						deleteGroup _prevGroup; // clean up empty groups
					};
				};
			};
} foreach (entities [["Civilian"], [], true, true]);

I think using something like this can make some load differences.

 

The second thing and maybe even worst is iterate config entries in a loop for classes that will match condition.

 


 

for "_i" from 0 to (count _cfgVehicles - 1) do {
    _entry = _cfgVehicles select _i;

    if (isclass _entry) then {
        if (
            (getText(_entry >> "faction") in _factions) &&
            {getNumber(_entry >> "scope") >= 2} &&
            {configname _entry isKindOf "Man"}
        ) then {
            _units pushback _entry;
        };
    };
};

There are much quicker way to do this by:

 

_units = []; 
_faction = "BLU_F";  
_cfgVehicles = configFile >> "CfgVehicles"; 
 
{ 
 if ( 
	 (configname _x) isKindOf "Man" &&
	 {getNumber (_cfgVehicles >> (configname _x) >> "scope") >= 2}
 ) then { 
   
  _units pushback (configName _x); 
 }; 
} forEach ("getText (_x >> 'faction') == _faction && {isClass _x}" configClasses _cfgVehicles);

That's for a quick view.

 

 

  • Like 1

Share this post


Link to post
Share on other sites

I wouldn't bother checking each group to make sure it's empty before deleting it.

 

Because deleteGroup automatically ignores non empty groups, I just {deletegroup _x} foreach allGroups;

  • Like 2

Share this post


Link to post
Share on other sites
On 21/01/2017 at 10:49 PM, davidoss said:

The second thing and maybe even worst is iterate config entries in a loop for classes that will match condition.

 


for "_i" from 0 to (count _cfgVehicles - 1) do {
    _entry = _cfgVehicles select _i;

    if (isclass _entry) then {
        if (
            (getText(_entry >> "faction") in _factions) &&
            {getNumber(_entry >> "scope") >= 2} &&
            {configname _entry isKindOf "Man"}
        ) then {
            _units pushback _entry;
        };
    };
};

There are much quicker way to do this by:

 


_units = []; 
_faction = "BLU_F";  
_cfgVehicles = configFile >> "CfgVehicles"; 
 
{ 
 if ( 
	 (configname _x) isKindOf "Man" &&
	 {getNumber (_cfgVehicles >> (configname _x) >> "scope") >= 2}
 ) then { 
   
  _units pushback (configName _x); 
 }; 
} forEach ("getText (_x >> 'faction') == _faction && {isClass _x}" configClasses _cfgVehicles);

That's for a quick view.

 

 

 

 

Interestingly, I compared the two solutions (had to alter yours slightly to work on multiple factions) and the original was 88ms versus 145ms - when I tried swapping it out there was visible stutter each time the function loaded. 

 

The alteration was: ("(getText (_x >> 'faction') in _factions) && {isClass _x}" configClasses _cfgVehicles);

 

Baffling! 

Share this post


Link to post
Share on other sites

when it comes to large arrays, and i'm by no means a trained programmer, i started spreading the handling of each entry over the frames or whatever interval you want. of course this is only an option, if you don't need something instantly constantly but it really helps with stutter and in most cases i have it is fast enough. you basically handle one array entry per frame and thus keep the load even at all times, of course depending on the actual code. it's just a concept i've been playing around with and for my purposes it helped a lot with eliminating bottleneck situations.

 

also another thing which has been brought to my attention by dedmen (not sure if it's the right name on the forums), is that count is the fastest way to cycle through an array. you just need to make sure you return true from inside it. other than that it works the same as foreach.

  • Like 2

Share this post


Link to post
Share on other sites
Quote

also another thing which has been brought to my attention by dedmen (not sure if it's the right name on the forums), is that count is the fastest way to cycle through an array. you just need to make sure you return true from inside it. other than that it works the same as foreach.

 

Really?

 

 ["
_units = [];
_factions = ['BLU_F','OPF_F','CIV_F','IND_F'];  
_cfgVehicles = configFile >> 'CfgVehicles';
 
{
 if (
     (configname _x) isKindOf 'Man' &&
     {isClass _x &&
     getNumber (_cfgVehicles >> (configname _x) >> 'scope') >= 2}

 ) then {
   
  _units pushback (configName _x);
 };true
} count (""(getText (_x >> 'faction')) in _factions"" configClasses _cfgVehicles);
 ",[],10] call BIS_fnc_codePerformance;

10/10 67.8 ms

 

 ["
_units = []; 
_factions = ['BLU_F','OPF_F','CIV_F','IND_F'];  
_cfgVehicles = configFile >> 'CfgVehicles'; 
 
{ 
 if ( 
	 (configname _x) isKindOf 'Man' &&
	 {isClass _x &&
	 getNumber (_cfgVehicles >> (configname _x) >> 'scope') >= 2} 

 ) then { 
   
  _units pushback (configName _x); 
 };
} forEach (""(getText (_x >> 'faction')) in _factions"" configClasses _cfgVehicles);
 ",[],10] call BIS_fnc_codePerformance;

 

10/10 67.4 ms

 

['
_units = []; 
_factions = ["BLU_F","OPF_F","CIV_F","IND_F"];  
_cfgVehicles = configFile >> "CfgVehicles"; 
 
for "_i" from 0 to (count _cfgVehicles - 1) do {
    _entry = _cfgVehicles select _i;

    if (isclass _entry) then {
        if (
            (getText(_entry >> "faction") in _factions) &&
            {getNumber(_entry >> "scope") >= 2} &&
            {configname _entry isKindOf "Man"}
        ) then {
            _units pushback _entry;
        };
    };
};
 ',[],10] call BIS_fnc_codePerformance;

 

10/10 64.7 ms

The last one is the quicker one,

but the differences are not significantly and looks like using configclasses runs smoothly that for loop

 

The condition code passed to configClasses should only be used for simple filter expressions and nothing more

  • Like 1

Share this post


Link to post
Share on other sites

hm. weird. i just compared your count and foreach examples and on average count is faster like i said. i used the debug console performance button though. does that make a difference?

 

but your for "_i" example is not 100% identical is it? you are not doing the "configname" in there several times (which might be bad idea overall). you need to compare near identical examples for conclusive data i think.

 

i'll do some more testing myself. now i'm curious again.

Share this post


Link to post
Share on other sites

count is naturally faster than forEach because forEach also set's the _forEachIndex Variable on every iteration. If you don't need that it's just unnecessary load.. Ofcause it's minuscule. But on very big arrays it adds up.

also the for "_i" from.. Loop will be slower than count because you have the select statement in there. Every Statement in SQF costs time. The less statements the less time you need. Because count/forEach has the _array select _index already builtin and is evaluation that inside the Engine rather than in Script it is also naturally faster.

So the count/forEach debate is rather personal Preference as the difference is not that much. But I'd always prefer a count/forEach over a for "_i" with a select just for readability.

Also for your configClass example you can put all that stuff you have in that extra if Statement into the configClasses filter. Which would completly remove the need to iterate over that Array and check everything.

Your for "_i" example is faster because you are only iterating over the config Classes once. The call to configClasses with the filter actually iterates over every configEntry.. And then you are iterating again over the leftovers.

 

Also your performance tests aren't tailored to test the loop performance. All that other stuff besides the loop has a way bigger impact than the loop itself. And you also can't compare performance of loops if you are using different pieces of code inside different loops.

 

Try this instead:

 

test_units = [];
test_factions = ['BLU_F','OPF_F','CIV_F','IND_F'];
test_cfgVehicles = configFile >> "CfgVehicles";
test_code = {
    if  (isclass _this &&
		{getNumber(_this >> "scope") >= 2} &&
		{getText(_this >> "faction") in test_factions} &&
		{configname _this isKindOf "Man"}
	) then {
		test_units pushback (configName _this);
	};
};


["
	{
		_x call test_code;
		true
	} count test_cfgVehicles;
"] call BIS_fnc_codePerformance;

["
	{ 
		_x call test_code;
	} forEach test_cfgVehicles;
"] call BIS_fnc_codePerformance;


['
	for "_i" from 0 to (count test_cfgVehicles - 1) do {
		_entry = test_cfgVehicles select _i;
		_entry call test_code;
	};
'] call BIS_fnc_codePerformance;

 

 

My recommendation to get good code is to let other people look at your code. After I started on TFAR I spent a bunch of time chatting with the ACE/CBA guys. Also you need to do a bunch of profiling.

To understand what CBA Statemachine does you have to understand what a Statemachine does and what it's benefits are.

You can make your code only as performant as the Engine allows. If that's still too slow you have to split your work up and do each part in a different Frame (If you are working in unscheduled env ofc).

You could use a Statemachine for this. After the first part is done it advances into the next Part and does that.

Also very beneficial is to cache some parts of the calculations. For example Config lookups. The Config will not change after Mission start. So there is absolutely no reason to check which CfgVehicles entries belong to certain Factions more than once. Do It once at Mission start and cache it in a Variable. Config is also especially important because config lookups are slow.

Last week I fixed a Bug inside TFAR where your fps would bog down to <10fps. By adding Config caching I got it to only 30fps and by refining the logic a little bit more it's now running at a smooth 60fps. You don't even notice that it's running anymore.

The result of that calculation inside TFAR was also dependent on Inventory Content. So I cache the result and only recalculate that if the Inventory has Changed (CBA has a nice Eventhandler for that). So i optimized the code down to a negligible performance impact and then i cached it so it rarely gets called at all.

  • Like 2

Share this post


Link to post
Share on other sites
On 25.1.2017 at 10:49 PM, dedmen said:

To understand what CBA Statemachine does you have to understand what a Statemachine does and what it's benefits are.

You can make your code only as performant as the Engine allows. If that's still too slow you have to split your work up and do each part in a different Frame (If you are working in unscheduled env ofc).

You could use a Statemachine for this. After the first part is done it advances into the next Part and does that.

 

interesting. that is very similar to what i started doing with all my unit caching and AI related code. it's what i was hinting at in my other post concerning large arrays. spreading things over frames to keep the load even/not too big. i always thought it's supposed to be something with FSM (finite state machine). thx for the info.

Share this post


Link to post
Share on other sites

Wow - loads of useful information there @dedmen, thank you. This particular script is only ran a handful of times at the very beginning of a mission but caching units in a variable has made that process much quicker. Is there anywhere other than here that you would recommend for flashing a cheeky bit of code to someone? I'm not in any units and can't really do MP and I guess that's where a lot of scripters get feedback...

Share this post


Link to post
Share on other sites

Arma Discord is always a good place. There are always people ready to scream at you for writing Bad code and telling you what exactly is wrong and why. But thats only very good for smaller code snippets. No one has time to look at multi hundred line files or whole frameworks.

  • Like 1

Share this post


Link to post
Share on other sites

Choose the method which gets the job done with the fewest evaluations over time ... simple concept :)

 

I prefer scheduled scripts for many things which don't really require evaluation every frame. The longer you spend talking to the CBA/ACE guys, the more they will tell you to evaluate things every frame, which I believe is good for mod performance but bad for system performance and CPU/FPS.

Share this post


Link to post
Share on other sites
2 hours ago, marceldev89 said:

There is some "documentation" you can find at https://github.com/CBATeam/CBA_A3/pull/389 :)

I spent ages looking for that! Thanks marceldev. 

 

10 hours ago, dedmen said:

Arma Discord is always a good place. There are always people ready to scream at you for writing Bad code and telling you what exactly is wrong and why. But thats only very good for smaller code snippets. No one has time to look at multi hundred line files or whole frameworks.

Haha thanks for the recommendation, I'll look forward to some screaming. 

 

3 hours ago, fn_Quiksilver said:

Choose the method which gets the job done with the fewest evaluations over time ... simple concept :)

 

I prefer scheduled scripts for many things which don't really require evaluation every frame. The longer you spend talking to the CBA/ACE guys, the more they will tell you to evaluate things every frame, which I believe is good for mod performance but bad for system performance and CPU/FPS.

Yeah that makes sense and was kind of what I figured... although mysterious concepts like state machines had me wondering if there was some magic out there to get max performance for both a mod and the game, which was starting to seem a bit unlikely. Now I've read the link above re CBA state machines, it seems like maybe there is. @baermitumlaut - sorry to grab you out of the blue here, but is there any chance you could elaborate on what the magic is behind the CBA state machine over a regular BIS FSM and why it can do the same with less performance impact? 

Share this post


Link to post
Share on other sites

Hi @Incontinentia! Sorry, I've only seen you tagging me just now, I'm usually not active here.

 

The state machine system is very similar to a FSM in its basics, as it provides states and transitions and you can model a finite (or even an infinite!) state machine with it. However, it is a bit more flexible than what BI offers themselves and its full power can be seen if you feed it with a list of things - it can work with anything that supports setVariable (objects, groups and locations would be the most common ones I assume).

The magic here is that it checks the state and transitions of only one element of its list per frame in the unscheduled environment, so the same amount of code is executed every frame by the state machine, no matter the amount of AI (so absolutely no frame rate loss due to the amount of units in the state machine). This is a very efficient way to do controlled load balancing without relying on the scheduler or the scheduled environment in general, which would bog down your performance if you'd try to do a similar thing there (or even get slowed down itself due to other scripts blocking it).

 

So let's compare this with an FSM. Let's say you want to run an FSM for each AI unit you have, like we do with the ACE3 medical AI. What would happen is that each unit will start their own FSM and we would let Arma and its scheduler take care of when what is executed. Which means depending on the current workload, a lot of the FSMs might get (partially) executed at once or only a couple. Depending on the FSM itself, it could execute a lot of code and slow down other scripts and general performance, or if the scheduler is filled with scripts to run, FSM performance will decay and our units would react considerably lower. The more AI, the more FSMs will run and the worse performance will get. Not ideal.

 

Now if you use a CBA state machine instead, only one unit will get processed per frame. So if we have 50 AI units, this would mean that it would take 50 frames for a unit to get processed again, and if we have 100 units it would take 100 frames. At first glance this doesnt seem ideal either - the actual time between units being processed varies on framerate and unit count. However, does this matter? In most cases, no.

First off, the same would happen with an FSM, but far less predictable due to the scheduler. Here, more AI means lower reaction time, but at in an easy to predict linear fashion. You could even run a seperate state machine for each 50 units or similar if you really need faster reaction time. The state machine also doesn't use the scheduler, so scheduled script performance will be unaffected by the amount of the AI in the state machine.

Second, even 100 frames aren't an issue at all, that is 2 seconds with 50fps (the maximum framerate of a dedicated A3 server). If you really need to do time precise stuff (like animations), use CBA_fnc_waitAndExecute from within a state.

Another advantage is that our code within the state machine can be fairly small, because we can expand it over multiple states and transitions. If you take a look at what code runs in the medical AI you will see why it's fast.

 

By no means is the state machine system a solution to all performance problems. Your code will still have to be optimized, and you will still have to think about how to structure your code for what you want to achieve. The state machine system just provides a solid structure to build upon.

 

If you want to know how to build a state machine, check out the examples I included in CBA:

https://github.com/CBATeam/CBA_A3/blob/master/addons/statemachine/example.hpp

https://github.com/CBATeam/CBA_A3/blob/master/addons/statemachine/example.sqf

  • Like 3
  • Thanks 1

Share this post


Link to post
Share on other sites

Wow - thank you so much for such a detailed answer @baermitumlaut! That makes a lot of sense and I can imagine is extremely useful for lots of scripters out there. I've been wondering for a while about finding a reliable way to scale code execution without having to find a sweet spot between responsiveness and scheduler performance. Thanks again, really appreciate the response. 

Share this post


Link to post
Share on other sites

Was just working on some performance testing today. Here's a short script I created today for SQF profiling - others might find useful.

 

Here's how to use it:

 

1. Include the code at the bottom of this post before the SQF code you want to profile.

2. In every method you want to profile, add in calls the start/stop profiling. You can use the #define macros as a shortcut.

 

Lets say you want to profile this code:

 

TEST_Function_1 = {

    sleep 10;

   [] call TEST_Function_2;

   [] call TEST_Function_2;

};

 

TEST_Function_2 = {

    sleep 20;

};

 

You would modify the code to look like this:

 

TEST_Function_1 = {

    PROFILE_START("TEST_Function_1");

    sleep 10;

   [] call TEST_Function_2;

   [] call TEST_Function_2;

    PROFILE_STOP;

};

 

TEST_Function_2 = {

    PROFILE_START("TEST_Function_2");

    sleep 20;

    PROFILE_STOP;

};

 

Then, execute your code you want to profile. When you're done, execute: [] call DUDA_fnc_printProfiles;

 

This will then write the following info to your arma log:

 

The function name, the number of times it was executed, the total time spent executing the function, and the function's self time. Self time is the function's total time executing minus the time spent executing other functions inside your function.

 

For the samples above, you would get the following:

 

TEST_Function_1, Count: 1, Total Time: 50 seconds, Self Time: 10 Seconds

TEST_Function_2, Count: 2, Total Time: 40 seconds, Self Time: 40 Seconds

#define PROFILE_START(METHOD_NAME) [METHOD_NAME,diag_tickTime] call DUDA_fnc_profileMethodStart
#define PROFILE_STOP [diag_tickTime] call DUDA_fnc_profileMethodStop

DUDA_Method_Stack = [];
DUDA_Profiles = [];

DUDA_fnc_profileMethodStart = {
	params ["_methodName",["_time",diag_tickTime]];
	DUDA_Method_Stack pushBack [_methodName,_time];
	([_methodName] call DUDA_fnc_getProfile) params ["_profileIndex","_profile"];
	if(_profileIndex == -1) then {
		DUDA_Profiles pushBack [_methodName,0,0,0];
	};	
};

DUDA_fnc_profileMethodStop = {
	params [["_time",diag_tickTime]];
	private _stackElement = DUDA_Method_Stack deleteAt ((count DUDA_Method_Stack) - 1);
	_stackElement params ["_methodName","_startTime"];
	([_methodName] call DUDA_fnc_getProfile) params ["_profileIndex","_profile"];
	private _totalTime = (_time - _startTime);
	_profile params ["_pName","_pCount","_pTotal","_pSelf"];
	_profile set [1,_pCount + 1];
	_profile set [2,_pTotal + _totalTime];
	_profile set [3,_pSelf + _totalTime];
	DUDA_Profiles set [_profileIndex,_profile];
	if(count DUDA_Method_Stack > 0) then {
		private _caller = DUDA_Method_Stack select ((count DUDA_Method_Stack) - 1);
		_caller params ["_callerMethodName","_callerStartTime"];
		([_callerMethodName] call DUDA_fnc_getProfile) params ["_callerProfileIndex","_callerProfile"];
		_callerProfile params ["_cName","_cCount","_cTotal","_cSelf"];
		_callerProfile set [3,_cSelf - _totalTime];
		DUDA_Profiles set [_callerProfileIndex,_callerProfile];
	};
};

DUDA_fnc_getProfile = {
	params ["_profileName"];
	private _profile = [];
	private _profileIndex = -1;
	{
		if(_x select 0 == _profileName) exitWith {
			_profile = _x;
			_profileIndex = _forEachIndex;
		};
	} forEach DUDA_Profiles;
	[_profileIndex,_profile];
};

DUDA_fnc_printProfiles = {
	{
		diag_log str _x;
	} forEach DUDA_Profiles;
};

 

  • Like 1

Share this post


Link to post
Share on other sites

Duda, thank you very much for sharing that - looks extremely useful. This will presumably not work on spawned functions or will it? 

Share this post


Link to post
Share on other sites
9 minutes ago, Incontinentia said:

Duda, thank you very much for sharing that - looks extremely useful. This will presumably not work on spawned functions or will it? 

 

Correct - would not work with spawned functions.

  • Like 1

Share this post


Link to post
Share on other sites
On 1/25/2017 at 11:36 AM, Incontinentia said:

 

 

Interestingly, I compared the two solutions (had to alter yours slightly to work on multiple factions) and the original was 88ms versus 145ms - when I tried swapping it out there was visible stutter each time the function loaded. 

 

The alteration was: ("(getText (_x >> 'faction') in _factions) && {isClass _x}" configClasses _cfgVehicles);

 

Baffling! 

 

I've found configClasses to be significantly slower than simply for-looping through config entries. I would test your solution using both before making a final decision.

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×