Positioning algorithm for AGV autonomous driving platform based on artificial neural networks

This paper presents an artificial intelligence algorithm responsible for the autonomy of a platform. The proposed algorithm allows the platform to move from an initial position to a set one without human intervention and with understanding and response to the dynamic environment. The implementation of such a task is possible by using a combination of a camera identifying the environment with a laser LIDAR sensor and a vision system. The signals from the sensors are analysed through convolutional neural networks. Based on AI inference, the platform makes decisions, including determining the optimal path for itself. A transfer learning method will be used to teach the neural network. This article presents the results of learning the applied neural algorithm.


Introduction
The everyday life faced by people with mobility impairments is a real challenge for them. They have problems getting from A to B. Of course, they are helped by people such as carers or nurses. However, they are also people who sooner or later get tired and who have their own daily needs. Electric wheelchairs are another way of helping people with disabilities to get around. These take the strain off the arm muscles of people who have mobility problems or complete leg inertia. However, the biggest problem is the movement of people who are not even able to steer the previously mentioned electric wheelchair. An AGV autonomous driving platform equipped with a neural positioning algorithm comes to the aid of people with disabilities.
An autonomous AGV equipped with a positioning algorithm is an innovative solution that can make everyday life easier for people with mobility impairments. The AGV is connected to the cloud, where data on the location of a given place is stored, as well as who, when, how and why it was prepared. An algorithm controls the platform in such a way that it moves the disabled person between two points, choosing the fastest possible route.
The issue of AGV control is the subject of extensive university research [1]- [3]. These are based on image analysis networks [4]- [6], position analysis networks [7], [8], and fuzzy logic [9]- [13]. This paper describes the study of a platform autonomy support algorithm using a reinforcement learning technique in combination with the use of artificial neural networks. The aim of this method is to train an agent to perform a task in an uncertain environment. The agent receives observations from the LIDAR sensor and a reward for its actions from the environment and in response sends actions (called actions) to the agent. The reward is a measure of the success of the action with respect to the achievement of the task goal, defined by the reward function. The agent's task is to accumulate the reward, i.e. to learn how to interact with the environment in the most beneficial way.

Environment model
The mobile platform for which the algorithm is developed is a four-wheeled mobile robot equipped with a two-dimensional LIDAR environment scanner. Simulink software was used to simulate the operation of the robot and the sensors. An environment model (FIG1) was created containing the following factors: kinematics model, LIDAR sensor operation simulator, collision signalling, adding noise to LIDAR readings and reward function. The mobile robot uses kinematics based on a differential mechanism. The modelled environment was then used in a further scheme to link it to the agent's actions. It uses the output of the environment, these are observations, collision information and reward value. The agent uses this data both during learning and during simulation. For the model created, the actions are the linear and angular velocity preset and the observations are the readings from the LIDAR scanner. These are normalised and given a range of -1 to 1, where the first value represents clockwise rotation at maximum speed or backward travel at maximum speed, and the second value represents counterclockwise rotation at maximum speed or forward travel at maximum speed.

Reward function
A very important element of reinforcement learning is the reward function. It is based on this function that the agent decides which actions to perform and for which it receives the greatest reward. It was used, among others, to ask the final location of our robot. For this purpose, in the course of research work, two reward functions were designed, which allow to reach two different locations on the map. This is defined as a three-dimensional function that has the coordinates of the robot on the and axes, while the reward value is defined on the axis. Each function has one maximum for the coordinates of the robot's final destination. It was also decided that the reward function should only take negative values, this prevents the reward from accumulating while the robot stands still. The designed functions are presented in three-dimensional space (Fig. 2, Fig. 3) and in two-dimensional space superimposed on a room map (Fig. 4, Fig. 5).   The higher the intensity of the colours in the graph, the higher the value of the reward function, which makes the robot more likely to head towards higher values of the reward function. With the help of the reward function it is possible to define the final destination that the robot should go to. As can be seen in Fig. 4, the most intense colours are in the room at the bottom right of the map and this is where the robot will head. Fig. 5 shows the reward function for which the robot will be directed to the room located in the upper right corner of the map.

Training algorithm/result
Once the environment model has been created, and the reward function designed, the process of teaching the agent can proceed. This is a time-consuming process, as each episode involves simulating the agent in the Simulink environment. It also requires a lot of computing power. Learning is completed when the average reward is reached. Fig. 6 shows the progress of agent learning after three hours. The DDPG agent was used because it is often proposed as the first choice in environments with continuous action spaces and observation. The actor and critic functions are deep neural networks. DDPG agent is an actor-critic, model-free algorithm based on the deterministic policy gradient. inadequately developed reward function. The platform instead of avoiding obstacles collided with a wall. In the end we managed to train the agent so that the platform drives correctly. The simulation of the trained algorithm is presented in Fig. 7. As can be seen, the platform spontaneously selects a path to the destination, avoiding on its way an obstacle in the form of a pillar. During the training of the agent in this case the reward function was used, directing the platform to a room located in the lower right corner of the map.

Conclusions
The result of the work is a properly working algorithm that enables the mobile platform to move autonomously to a given destination. It allows for the optimal trajectory to be determined, avoiding obstacles encountered on the way. Ultimately, the algorithm will be implemented in a real AGV robot. The results of work on the hardware implementation of the created algorithm and field tests will be the subject of future publications.
The use of reinforcement learning in system control tasks works very well and significantly facilitates the implementation of such solutions. It also allows to obtain better behaviour of the robot in an unknown environment.
During the work, the learning time of the algorithm, despite the use of parallel learning, proved to be extremely problematic. This meant that even the smallest changes to the structure of the reward, actor or critic functions, or a change in the learning parameters required re-training and waiting for hours for results. This significantly affected the development time of the algorithm.