OPTIMIZATION WITH B4A
A case where optimization is required:
One of the most time-consuming task in a RTS game is to compute paths for all units because of the number of units and the many computations and memory accesses needed to find a path. If you want to run your game at a fast pace, and thus stay under 16ms past in the Render event of libGDX, you have only 16/100 = 0.16 ms to compute each path for 100 units. In a real situation where you have to check collisions, select targets, animate the moves and shootings, react to user actions, etc., you have in fact less than 0.1 ms left for your path computations (and my current game can have up to 200 units on screen...). Choosing the proper pathfinding algorithm and writing it as optimized as possible becomes very important. Fortunately, many developers and researchers worked on this problem and published their algorithms or source codes on Internet. So it's very easy to find a good implementation of A* for Java, for example, and not very difficult to find better algorithms than A* for specific cases. But the interest and performance of all this material for your case is not always easy to evaluate, and you can read articles claiming a 50% improvement over a well-known algorithm while your implementation of their pseudocode tends to prove the contrary. Unless you ask for code to the authors, you have no idea of the chosen implementation of both algorithms and that can lead to wrong conclusions (including on the authors side). It's very common to see only pseudocode in research works because most researchers are convinced that an unoptimized good algorithm will always beat an optimized bad algorithm and that the implementation in a given language is not the most important. That's only partially true as an american team proved it. The difference between the various implementations in C of the same pseudocode can be huge (up to 20x faster in some cases). Is it the same in Java? That gave me an idea for a quiz that I submitted to the fellow members of this forum. Now here's the tutorial explaining my solution and how I proceeded to optimize the code. I added a few general tips about optimization and faster variants of a few math functions at the end.
An unit blocked in a maze:
Let's consider an unit moving along a pre-computed path. At a time, it will be blocked by other units sharing the same path or by temporary obstacles like closed gates or open bridges. You have two solutions: either the blocked unit stops moving and waits until the path is clear or the unit tries to find another way. For the latter, you're in trouble if re-computing a new path is too costly (e.g. the fastest implementation of the A* algorithm is too slow for a RTS game with 100 units under Android) but blocking the unit can lead easily to a deadlock situation or create stupid behaviors so it's better to find a solution to keep moving. Many solutions to this problem exist, but I'll present only the one that I used in my game because it is simple and does not require multithreading operations.
The idea is to consider only the area where the unit is blocked and try to bypass this area. Typically, a path is made of waypoints. The unit moves from a waypoint to another by computing intermediate moves with a given speed and angle (this is done by the Steering Behaviors library in my game). In other words, my problem is to find an alternate path from the current location of the unit to the waypoint located after the blocking point and, as the other units and obstacles create a kind of maze, it can be resumed to a maze solving in a small grid, with only one ending (the waypoint to reach) and multiple paths.
The DFS algorithm:
This maze has peculiarities that prevent from using some maze solving algorithms or make them highly inefficient: the starting and ending positions are not on the edges, there are 0 to n paths, loops are possible. The Trémaux's algorithm can solve it anyway under its modern form: the Depth-First Search algorithm. Here's this algorithm in its DFS recursive variant, converted from the Java code published on Wikipedia:
B4X:
Dim Visited(SizeX, SizeY) As Boolean
Dim CorrectPath(SizeX, SizeY) As Boolean 'The solution to the maze
Sub SolveMaze As Boolean
'Initializes the arrays
For X = 0 To SizeX - 1
For Y = 0 To SizeY - 1
Visited(X, Y) = False
CorrectPath(X, Y) = False
Next
Next
'Solves the maze (the True values in CorrectPath indicate the path to the end)
Return Recursive_DFS(StartX, StartY)
End Sub
Sub Recursive_DFS(X As Int, Y As Int) As Boolean
'If you reached the end
If X = DestX AND Y = DestY Then
Visited(X, Y) = True
Return True
End If
'If you are on a wall or already were here
If Blocked(X, Y) OR Visited(X, Y) Then Return False
'Marks as visited
Visited(X, Y) = True
If X <> 0 Then 'Checks if not on left edge
If Recursive_DFS(X - 1, Y) Then 'Recalls function one to the left
CorrectPath(X, Y) = True 'Sets that path value to True
Return True
End If
End If
If X <> SizeX - 1 Then 'Checks if not on right edge
If Recursive_DFS(X + 1, Y) Then 'Recalls function one to the right
CorrectPath(X, Y) = True
Return True
End If
End If
If Y <> 0 Then 'Checks if not on bottom edge
If Recursive_DFS(X, Y - 1) Then 'Recalls function one up
CorrectPath(X, Y) = True
Return True
End If
End If
If Y <> SizeY - 1 Then 'Checks if not on top edge
If Recursive_DFS(X, Y + 1) Then 'Recalls function one down
CorrectPath(X, Y) = True
Return True
End If
End If
Return False
End Sub
StartX, StartY and DestX, DestY define the starting point and the destination.
First optimization:
I can remove the CorrectPath array as I don't need to store the path. Why? Because I need an answer only to these two questions: is there an alternate path? If no, the unit is really blocked. If yes, what's the next move to engage the unit on this path? As the obstacles position can change after each computation, it's useless to store a path. Only the first move has some interest and there's a trick to know where to do this first step without ambiguity: taking the destination as the starting point. The DFS algorithm stops as soon as it reaches the end so there's only one path reaching the end, i.e. the current location of the unit as the search direction is reversed.
I can also write differently the first lines because my destination is not supposed to be blocked (it's where my unit comes from).
My new code:
B4X:
Dim Visited(SizeX, SizeY) As Boolean
Sub SolveMaze2 As Boolean
'Initializes the Visited array
For X = 0 To SizeX - 1
For Y = 0 To SizeY - 1
Visited(X, Y) = False
Next
Next
'Is there a path to the end ?
Return Recursive_SolveMaze2(StartX, StartY)
End Sub
Sub Recursive_SolveMaze2(X As Int, Y As Int) As Boolean
'If you are on a wall or already were here
If Blocked(X, Y) OR Visited(X, Y) Then Return False
'Marks as visited
Visited(X, Y) = True
'If you reached the end
If X = DestX AND Y = DestY Then Return True
'Checks if not on left edge
If X <> 0 Then
If Recursive_SolveMaze2(X - 1, Y) Then 'Recalls function one to the left
Return True
End If
End If
'Checks if not on right edge
If X <> SizeX - 1 Then
If Recursive_SolveMaze2(X + 1, Y) Then 'Recalls function one to the right
Return True
End If
End If
'Checks if not on bottom edge
If Y <> 0 Then
If Recursive_SolveMaze2(X, Y - 1) Then 'Recalls function one up
Return True
End If
End If
'Checks if not on top edge
If Y <> SizeY - 1 Then
If Recursive_SolveMaze2(X, Y + 1) Then 'Recalls function one down
Return True
End If
End If
Return False
End Sub
Using the benchmark provided in the quiz thread, I get (the first number is the sum of all timings):
SolveMaze = 28.75 ms / Worst = 0.214 ms
SolveMaze2 = 22.787 ms / Worst = 0.175 ms
The removal of CorrectPath improved the performance, which is not surprising.SolveMaze2 = 22.787 ms / Worst = 0.175 ms
The worst timing is very important to consider as it says how much time has been spent in the worst case. If I multiply this value by the number of units, I have an idea of the maximum number of milliseconds required in my Render event to compute all path changes. That being said, it is very unlikely that all units are blocked at the same time so this value can probably be halved.
Second optimization:
An idea to improve my code is to check whether the iterative version is faster than the recursive one. You can read often that iterative variants are faster because of the memory allocations and deallocations implied by the recursive calls (each function is saved onto the stack with its data). In B4A, unless you use my DataCollection library, you have only the List object at your disposal to store the extra data required by the iterative version. Anyway, it's not the main reason of the very bad results that I get. I am so far from the performance of the recursive variant that I really doubt that the iterative version could be used even after a heavy optimization. Moreover it is less readable than the recursive code so I abandon completely the idea.
That being said, the iterative variant is your only option if you want to get rid of the StackOverflow error which is the main drawback of recursive functions. In my game, the grid is too small to raise the error on most devices and I set a limit in the final code to stop after a certain recursion depth so I'm not concerned.
Third optimization:
At the language level, it is interesting to see if avoiding nested IF by combining them on a single line (thanks to the short-circuit evaluation) could bring an improvement. So I just replace lines like:
B4X:
If X <> 0 Then
If Recursive_SolveMaze2(X - 1, Y) Then 'Recalls function one to the left
B4X:
If X <> 0 AND Recursive_SolveMaze3(X - 1, Y) Then
SolveMaze2 = 22.787 ms / Worst = 0.175 ms
SolveMaze3 = 22.774 ms / Worst = 0.17 ms
The slight difference is not significant at all.SolveMaze3 = 22.774 ms / Worst = 0.17 ms
Fourth optimization:
Another idea is to move the first condition to avoid a recursive call for nothing when the next cell to visit contains an obstacle or has already been visited. I am expecting a lot from this change. I remove:
B4X:
'If you are on a wall or already were here
If Blocked(X, Y) OR Visited(X, Y) Then Return False
B4X:
If X <> 0 AND Not(Blocked(X - 1, Y) OR Visited(X - 1, Y)) AND Recursive_SolveMaze4(X - 1, Y) Then
The result is very surprising:
SolveMaze2 = 22.787 ms / Worst = 0.175 ms
SolveMaze4 = 28.409 ms / Worst = 0.196 ms
It's a lot slower !SolveMaze4 = 28.409 ms / Worst = 0.196 ms
In fact, and I thank Cimperia to have drawn it to my attention, using the NOT operator is a big mistake as it is very slow.
So I change all NOT:
B4X:
If X <> 0 AND Blocked(X - 1, Y) = False AND Visited(X - 1, Y) = False AND Recursive_SolveMaze5(X - 1, Y) Then
SolveMaze2 = 22.787 ms / Worst = 0.175 ms
SolveMaze4 = 28.409 ms / Worst = 0.196 ms
SolveMaze5 = 15.593 ms / Worst = 0.137 ms
SolveMaze4 = 28.409 ms / Worst = 0.196 ms
SolveMaze5 = 15.593 ms / Worst = 0.137 ms
Fifth optimization:
As you can notice, computations of the new X and Y are repeated three times by condition, so I create a variable to store the computation result:
B4X:
Sub Recursive_SolveMaze6(X As Int, Y As Int) As Boolean
'Marks as visited
Visited(X, Y) = True
'If you reached the end
If X = DestX AND Y = DestY Then Return True
'Checks if not on left edge and not blocked or already were here
If X <> 0 Then
Dim NewX As Int = X - 1
If Blocked(NewX, Y) = False AND Visited(NewX, Y) = False AND Recursive_SolveMaze6(NewX, Y) Then
Return True
End If
End If
'Checks if not on right edge and not blocked or already were here
If X <> SizeX - 1 Then
Dim NewX As Int = X + 1
If Blocked(NewX, Y) = False AND Visited(NewX, Y) = False AND Recursive_SolveMaze6(NewX, Y) Then
Return True
End If
End If
'Checks if not on bottom edge and not blocked or already were here
If Y <> 0 Then
Dim NewY As Int = Y - 1
If Blocked(X, NewY) = False AND Visited(X, NewY) = False AND Recursive_SolveMaze6(X, NewY) Then
Return True
End If
End If
'Checks if not on top edge and not blocked or already were here
If Y <> SizeY - 1 Then
Dim NewY As Int = Y + 1
If Blocked(X, NewY) = False AND Visited(X, NewY) = False AND Recursive_SolveMaze6(X, NewY) Then
Return True
End If
End If
Return False
End Sub
SolveMaze5 = 15.593 ms / Worst = 0.137 ms
SolveMaze6 = 15.308 ms / Worst = 0.131 ms
Not very significant. Adding 1 or subtracting 1 is so fast that there's no real advantage to store the computation. However, I favor this way of writing code as it minimizes the risk of error so I keep it.SolveMaze6 = 15.308 ms / Worst = 0.131 ms
Since NewX and NewY are temporary variables (they are not used after a recursive call), it is safe to make them globals but local variables are known to be faster. A quick test confirms it:
SolveMaze6 = 15.308 ms / Worst = 0.131 ms
SolveMaze7 = 16.163 ms / Worst = 0.146 ms
At this point, I run out of ideas to improve the code at the language level. It looks like I created the fastest variant of the DFS algorithm in B4A. Really ?SolveMaze7 = 16.163 ms / Worst = 0.146 ms
...
Last edited: