Action Branching em redes de Aprendizado por Reforço profundo para reduzir dimensionalidade de espaço de ações discreto

Madeira, Charles Andryê GalvãoAlves, Luiz Paulo de Carvalho2022-11-222022-11-222022-07-21ALVES, Luiz Paulo de Carvalho. Action Branching em redes de Aprendizado por Reforço profundo para reduzir dimensionalidade de espaço de ações discreto. Orientador: Charles Andryê Galvão Madeira. 2022. 78 f. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2022.https://repositorio.ufrn.br/handle/123456789/49829Action Branching is a type of Deep Neural Network architecture that uses multiple outputs, each representing a dimension of the action space, to reduce the final size of the output. The Action Branching architecture was proposed and used to solve problems with continuous action spaces, specifically continuous control problems in virtual environments. However, the authors of this architecture claim that it could also be used for problems with discrete action spaces, as long as these problems have high action dimensionality. A high-dimensional problem with a discrete action space is the problem of choosing a position (x, y) in a two-dimensional environment, since the number of possible positions grows quadratically as we increase the maximum size of x and y. This problem of choosing positions becomes tangible in contexts such as real-time strategy games, in which there are thousands of possible positions for carrying out actions at every moment of the game. The objective of this work is to use an Action Branching architecture to reduce the dimensionality of the output of Deep Neural Networks with discrete action spaces, and to evaluate the effect of this architectural modification on the training and learning of the networks. The discrete action spaces used in this work represent positions in a two-dimensional environment, that is, an action or a combination of actions represents the choice of a specific spatial position in the environment. The environments used to carry out the training and evaluation of neural networks were two virtual environments. The first is StarCraft II, a real-time strategy video game published by Blizzard Entertainment. The second is the Clickgame environment, developed by the author in order to carry out experiments in simpler environments. As part of this work, the author developed two Deep Reinforcement Learning algorithms with an Action Branching architecture, the DDQNmo algorithm and the BDQKeras algorithm. Then, several experiments were carried out applying these algorithms in some StarCraft II minigames, and also in the Clickgame environment. Through the experiments it was possible to verify that the implemented algorithms are able to train smaller, faster and more efficient networks than traditional algorithms for the proposed environments. However, scalability and stability problems were identified in these algorithms when applied to more complex problems, indicating the need for future investigations and improvements.Aprendizado ProfundoAprendizagem por ReforçoAction BranchingStarCraft IIDeep LearningReinforcement LearningAction BranchingAction Branching em redes de Aprendizado por Reforço profundo para reduzir dimensionalidade de espaço de ações discretoAction Branching in Deep Reinforcement Learning networks to reduce dimensionality of discrete action spacebachelorThesis