Jump to content
Sign in to follow this  
Prodavec

Fast and reliable array copying after deleteing items?

Recommended Posts

That simple question.

I need fastest and reliable method to copy a big arrays.

I have a problem with that and/or need some explanation of the engine behaviour.

For example we have a code:

_someBigArray = [item1, item2, item3, ..., itemN]; // A bunch of elements

// do something with an array, for example
for "_i" from 0 to ((count _someBigArray)-1) step 1 do
{
   if (!alive (_someBigArray select _i)) then
   {
       // some code
       _someBigArray set [_i, objNull]; // mark items for delete with objNull
   };
};

_someBigArray = _someBigArray - [objNull]; // remove items, marked with objNull
_someOtherBigArray = _someBigArray; // Copy one array to another one

And you'll get problem on the system with very low FPS (1-5): the second array named _someOtherBigArray will consist objNull elements! But objNull's were deleted with binary subtraction! Low FPS is very important condition to get this results.

That is strange for me, because looks unreliable, like

_someBigArray = _someBigArray - [objNull];

was executed in separate thread and scheduler begins to execute next line while binary substraction was not completed.

I tried manual copying:

_someOtherBigArray = [];
for "_i" from 0 to ((count _someBigArray)-1) step 1 do
{
   _someOtherBigArray set [_i, _someBigArray select _i];
};

and it works fine, reliable and I never get objNulls in new array. But this method is not optimal because it loads scheduler with additional work.

What am I doing wrong and what should I do to correctly copy a big arrays in low-fps evironment?

PS

http://community.bistudio.com/wiki/Code_Optimisation#Removing_elements_from_an_array

ARRAYX set [0, objnull];
ARRAYX = ARRAYX - [objnull];

This code will be buggy on low FPS

ARRAYX set [0, objNull];
ARRAYX set [1, objNull];
ARRAYX set [2, objNull];
ARRAYX set [3, objNull];
ARRAYX set [23, objNull];
ARRAYX set [32, objNull];
ARRAYX set [55, objNull];
ARRAYX set [NN, objNull]; // NN - something other
ARRAYX = ARRAYX - [objNull];
ARRAYY = ARRAYX;

Edited by Prodavec

Share this post


Link to post
Share on other sites

According to what i read, what you did first is the fastest way. Still, you can easily test this using a simple function (found on the wiki)

 _fnc_dump = { player globalchat str _this; diag_log str _this; //copytoclipboard str _this; }; _t1 = diag_tickTime; // ... code to test (diag_tickTime - _t1) call _fnc_dump;

Why don't you set up a sleep just before copying your array?

Anyway, i guess nobody really considers trouble happening between one and five FPS, because you simply cannot play the game at those framerates.

Share this post


Link to post
Share on other sites

_someOtherBigArray = _someBigArray; // Copy one array to another one

Your assumption that this creates a copy is incorrect. Arrays are basically pointers, you just copy the reference.

On the other hand, - does create a new array. It would seem that you're getting a reference to the previous array rather than the new one that should have been created for some reason.

Considering you're after performance you should avoid copying the array if at all possible, which means using set instead of -.

To adapt your example with an actual copy of the array being created:

_someBigArray = [item1, item2, item3, ..., itemN]; // A bunch of elements
_someOtherBigArray = [];
_someOtherBigArray resize (count _someBigArray); //Pre-allocate the array

_n = 0;
for "_i" from 0 to ((count _someBigArray)-1) step 1 do
{
   if (alive (_someBigArray select _i)) then
   {
       _someOtherBigArray set [_n, _someBigArray select _i];
       _n = _n + 1;
   };
};

_someOtherBigArray resize _n; //Remove unused elements at the end

Also, depending on where this is used, you might be able to startLoadingScreen to mitigate the FPS issue.

Share this post


Link to post
Share on other sites

Simple way to copy the array:

_someOtherBigArray = [b]+[/b]_someBigArray; // Copy one array to another one

Notice the '+'.

Share this post


Link to post
Share on other sites
_someOtherBigArray = _someBigArray; // Copy one array to another one

Your assumption that this creates a copy is incorrect. Arrays are basically pointers, you just copy the reference.

What do you mean, i'm curious about that, as it's not the first time i read about this, and i'm not sure i understand that.

If I apply modifications to _someBigArray afterwards, they won't be seen in _someOtherBigArray, right? So why is this not considered a copy? (or more precisely, what does "reference" means here?)

Share this post


Link to post
Share on other sites
Why don't you set up a sleep just before copying your array?

Anyway, i guess nobody really considers trouble happening between one and five FPS, because you simply cannot play the game at those framerates.

I don't use sleep because:

1. It is not realiable. It may help, but because of scheduler architecture sometimes you have to increase delay time, sometime decrease. And there's no indication of process completion like _waitUntil {_copyingCompleted};

2. It is not possible to use sleep or waitUntil in non-scheduled environment by design. Function should be universal: may be called in non-scheduled and in scheduled environments.

because you simply cannot play the game at those framerates.

Well, code should be realiable in all possible situations. 0.5-5 FPS is like stress-test and it's not passed.

Your assumption that this creates a copy is incorrect. Arrays are basically pointers, you just copy the reference.

Hmm... you mean in SQF _someArray is just a pointer like in C/C++? What about wiki:

Removing elements from an array

When FIFO removing elements from an array, the set removal method works best, even if it makes a copy of the new array.

ARRAYX set [0, objnull];
ARRAYX = ARRAYX - [objnull];

It looks like the array is just variable, not a pointer. And I did simple test

_array1 = [1, 2, 3];
_array2 = _array1;
_array1 = [1, 1, 1];
hint str _array2;

and I see [1,2,3] not [1,1,1]. If it was the pointer then I should see changed _array1: [1,1,1].

_someBigArray = [item1, item2, item3, ..., itemN]; // A bunch of elements
_someOtherBigArray = [];
_someOtherBigArray resize (count _someBigArray); //Pre-allocate the array

_n = 0;
for "_i" from 0 to ((count _someBigArray)-1) step 1 do
{
   if (alive (_someBigArray select _i)) then
   {
       _someOtherBigArray set [_n, _someBigArray select _i];
       _n = _n + 1;
   };
};

_someOtherBigArray resize _n; //Remove unused elements at the end 

Why are you using RESIZE if you are using SET command? SET automaticaly resizes array if needed. Maybe because it resizes every iteration you have used RESIZE at the begining outside of the loop?

Well. This method works and I'm using it. But in this case you have to pass thru _someBigArray to copy all elements to another one. And this is not good when you've possible alternative variants like

_someOtherBigArray = _someBigArray;

or something else executed inside of engine, not a script layer.

I will try to use Binary Addition to compare two methods, with FOR + SET and =+.

Edited by Prodavec

Share this post


Link to post
Share on other sites

Lots of misunderstandings here. Arrays are references (think of them as pointers if you prefer that) - That however does not mean that all operations on them are in-place (on the same array).

Firstly if you want to copy an array check what I wrote.

Now let's discuss what actually happens here:

_someBigArray = _someBigArray - [objNull]; // remove items, marked with objNull
_someOtherBigArray = _someBigArray; // Copy one array to another one

First assume _someBigArray references an array ARR_A, I'll write this as _someBigArray ---> #ARR_A. When you do binary arithmetic on arrays a new array is created!!. Now what happens is:

_someBigArray = _someBigArray - [objNull]
----- Means something like this -----
#ARR_B = "a [b]new[/b] array with all elements in _someBigArray ---> #ARR_A, except those values that are objNull".
//That is ARR_B is an entirely new array. Now the assignment is done which basically changes the array that _someBigArray points to:
_someBigArray ---> #ARR_B
//Now assuming nothing else references #ARR_A it is probably garbage collected soon.

That was the first line, so now _someBigArray ---> #ARR_B, now to the second line:

_someOtherBigArray = _someBigArray;
------ Means something like this -----
_someOtherBigArray ---> #ARR_B

So both arrays reference the same data. If you do:

_someBigArray set [2, 99]; 
--- Means
#ARR_B set [2, 99];
--- If you later do:
_value = _someOtherBigArray select 2;
--- It means
_value = #ARR_B select 2;

Even if the variables have different name they still reference the same array.

Now for you example:

_array1 = [1, 2, 3];
_array2 = _array1;
_array1 = [1, 1, 1];

What happens here is that [1,2,3] is a new array, lets call it #ARR_R = [1,2,3].

Then _array1 ---> #ARR_R=[1,2,3].

Then _array2 ---> #ARR_R=[1,2,3].

Now for the last line it creates a new array, let's call it #ARR_S=[1,1,1].

Then _array1 ---> #ARR_S=[1,1,1].

But _array2 still references the old array, that is: _array2 ---> #ARR_R=[1,2,3].

That's why you don't see the change.

Edited by Muzzleflash

Share this post


Link to post
Share on other sites
Why are you using RESIZE if you are using SET command? SET automaticaly resizes array if needed. Maybe because it resizes every iteration you have used RESIZE at the begining outside of the loop?

Yes, that's the reason I used it. No point having it resize by one each iteration when we roughly know how large we need it to be.

Share this post


Link to post
Share on other sites

Muzzleflash

Thanks a lot! That info should be in wiki :)

Deadfast

I did few tests:

_msg = "BEGINING BENCHMARK...";
hint _msg;
player groupChat _msg;
diag_log _msg;

// Prepare (here is static part)
// -------------------
_array1 = [];                       // define _array1
_array2 = [];                       // define _array2
for "_i" from 0 to 10000 step 1 do
{
   _array1 set [_i, random 10];    // Fill _array1 with random data, total 10000 items
};
_size = count _array1;              // size of _array1
_array2 resize _size;               // make resize _array2 to prevent resizing in each iteration
_k = 0;                             // counter
// -------------------

_t1 = diag_tickTime;                
// The Test
// -------------------
for "_i" from 0 to 100 step 1 do
{
   // forEach speed test
   /*
   {

   } forEach _array1;
   */
   // RESULTS: 3.24219 / 3.13281 / 3.13281

   // forEach + SET with COUNT command
   /*
   {
       _array2 set [count _array2, _x];
   } forEach _array1;
   */
   // RESULTS
   // With no RESIZE: 66.6172 / 65.2266 / 65.3594
   // With RESIZE: 65.7344 / 66.5781 / 65.2344

   // forEach + SET with counter increased manualy
   /*
   {
       _array2 set [_k, _x];
       _k = _k + 1;
   } forEach _array1;
   */
   // RESULTS
   // With no RESIZE: 84.75 / 85.2188 / 84.4609
   // With RESIZE: 85.5703 / 84.7266 / 85.75

   // FOR speed test
   /*
   for "_n" from 0 to (_size-1) step 1 do
   {

   };
   */
   // RESULTS: 2.64063 / 2.63281 / 2.64063

   // FOR + SET with internal FOR's counter which increased automatically
   /*
   for "_n" from 0 to (_size-1) step 1 do
   {
       _array2 set [_n, _array1 select _n];
   };
   */
   // RESULTS
   // With no RESIZE: 53.5 / 53.0703 / 53.3594
   // With RESIZE: 53.6406 / 53.6641 / 53.3047

   // + Unary Operator (the fastest way!)
   /*
   _array2 =+ _array1;
   */
   // RESULTS: 0.710938 / 0.757813 / 0.734375
};
// -------------------
_t2 = diag_tickTime;

_msg = format ["EXECUTION TIME: %1", (_t2 - _t1)];
hint _msg;
player groupChat _msg;
diag_log _msg;

As you can see RESIZE doesn't influence on the perfomance in my test or influences very-very little (within an error).

So if I want to copy an array I have to use + unary operator:

_array2 =+ _array1;

instead of simple = or SET

If I will have to do some processing I'd prefer to use FOR "<counter>" instead of other methods, because it's fastest way.

Right?

That's why you don't see the change.

Got it!

That's why you don't see the change.

_array1 = [1, 2, 3];
_array2 = _array1;
_array1 set [0, 4];
hint str _array2;

Getting [4,2,3] as I expected.

But I still don't understand, you says:

That was the first line, so now _someBigArray ---> #ARR_B, now to the second line:

Now we have #ARR_B, it doesn't contain objNulls anymore.

I do:

_someOtherBigArray = _someBigArray;

So both arrays reference the same data. If you do:

because it's two pointers which refer on the same array.

But for some reason on extremely low FPS I'm getting something like that:

_someBigArray ---> #ARR_A: [1, 2, 3, objNull, 5, objNull, 7, 8, 9, objNull];
_someBigArray = _someBigArray - [objNull]; // remove objNulls
_someBigArray ---> #ARR_B: [1, 2, 3, 5, 7, 8, 9];
_someOtherBigArray = _someBigArray;

and theoreticaly we should get second pointer to the same array:

_someOtherBigArray ---> #ARR_B: [1, 2, 3, 5, 7, 8, 9];

_someBigArray and _someOtherBigArray refers to #ARR_B. It's easy to understand and it works until we have enough FPS.

Actually I'm getting this when FPS 0.5-2:

_someOtherBigArray ---> #ARR_C: [1, 2, 3, objNull, 5, 7, 8, 9];

WTF? It looks like it's hybrid array of A and B but MUST be #ARR_B with no objNull inside!

I currently cant create new reference because of this:

_someBigArray = _someBigArray - [objNull]; // #ARR_A to #ARR_B but with no objNulls
_someOtherBigArray = _someBigArray; // new pointer to #ARR_B

_someOtherBigArray will consist objNulls. Hm?

Found the roots of the problem. Topic may be closed. Thanks Muzzleflash for explaining how references work.

while {true} do
{
   _someBigArray = [...];
   for .... do
   {
       //...
       _someBigArray = _someBigArray - [objNull];
   };
   _someOtherBigArray = _someBigArray;
   sleep N;
};

_someOtherBigArray will get [objNulls] inside of it at the next iteration of WHILE. Goddamit. :)

Edited by Prodavec

Share this post


Link to post
Share on other sites

Nice testing you did there. The reason +array is so much faster is probably that it is an engine command, whilst all the manual looping is done in sqf.

Right?

Yeah think you got everything right now.

Also a neat trick to use with forEach loops that saves you from having to use a for loop with counter (or manual counting for that matter), or count constantly:

// _numbers is an array with numbers. You for some weird reason want to create another array that doubles these:
_doubleNumbers = [];
{
  _doubleNumbers set [[b]_forEachIndex[/b], _x * 2];
} forEach _numbers;

One warning though: _forEachIndex only works in Arrowhead or Combined Ops (not in vanilla Arma II).

Share this post


Link to post
Share on other sites

Understand. I see perfomances difference betweent forEach and FOR loop and I guess it is better to use FOR loop if it's possible even counter is not used.

But forEach loop has advances (just tested): you'll never get Zero Divisor message:

{
   // a bunch of code here and we're running low-end machine with low FPS
   // _x is unit
   // some one was killed and allUnits array has been decreased... we still working inside of loop with no problems
} forEach allUnits;

for "_i" from 0 to ((count allUnits)-1) step 1 do
{
   _unit = allUnits select _i;
   // some one was killed and we get decreased allUnits, but FOR doesn't "know" about this. Loop condition doesn't evaluate second time.
   // we getting Error Zero Divisor
};

Got some perfomance results:

   // forEach with _forEachIndex instead of COUNT command
   {
       _array2 set [_forEachIndex, _x];
   } forEach _array1;
   // RESULTS
   // With no RESIZE: 56.6719 / 56.3047 / 56.2109
   // With RESIZE: 57.6094 / 57.0625 / 56.7656

Edited by Prodavec

Share this post


Link to post
Share on other sites
But forEach loop has advances (just tested): you'll never get Zero Divisor message:

You could do it like this to prevent this issue:

_allUnits = allUnits;
for "_i" from 0 to ((count allUnits)-1) step 1 do
{
   _unit = _allUnits select _i;
};

Understand. I see perfomances difference betweent forEach and FOR loop and I guess it is better to use FOR loop if it's possible even counter is not used.

I always use forEach whenever I can. I find that forEach is more readable. Your performance test seems to indicate that a for loop is about 5% faster than a forEach loop. However, even on 1 million iterations that is only 3ms difference. The maximum I've ever been on is 5 digit, say 50000 elements, then for loop is only 0.15ms faster. I'll rather have the more shorter more readable forEach loop that is 5% slower and where I don't forget the (count X -1) part - than the 0.15ms. But then again I don't have to live with 2fps :).

Share this post


Link to post
Share on other sites

Well, you're right. I guess for loop is needed for maximum possible perfomance when you're working with large array (or you need a lot of iterations).

You could do it like this to prevent this issue:

Oh... this is additional expression to execute.

But then again I don't have to live with 2fps

I'm very aware of perfomance, because... you know ArmA is lagging most of the time. A lot of people still playing with 20-40 FPS (just check last videos with A3 demonstrating on VERY powerfull PC. Even developers play with lags!), that is not so much especialy in CQB. Or they need to reduce video settings to get 60-100 FPS but graphics becames crappy. I very often hear: "I got new rig with 32 GB RAM, 690 GTX in SLI and i7 3570 processor but ArmA, the game of the 2009 year still lags. WTF?!?!? *going mad*" lol.

So I have to make ideal code :)

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×