Compressão consciente de modelos de redes neurais profundas baseada em poda seguida de quantização

Fernandes, Marcelo Augusto CostaGoldbarg, Mateus Arnaud Santos de Sousa2024-05-072024-05-072024-02-20GOLDBARG, Mateus Arnaud Santos de Sousa. Compressão consciente de modelos de redes neurais profundas baseada em poda seguida de quantização. Orientador: Dr. Marcelo Augusto Costa Fernandes. 2024. 61f. Dissertação (Mestrado em Engenharia Elétrica e de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2024.https://repositorio.ufrn.br/handle/123456789/58271Deep learning techniques, particularly deep neural networks (DNNs), have been successfully utilized in many problems. However, these types of algorithms require significant computational effort due to the large number of parameters and mathematical operations involved, which can be problematic for applications with limited computational resources, low latency requirements, or low power consumption. Therefore, this work proposes the application of a new training strategy for aware compression of DNN models based on pruning, quantization, and pruning followed by quantization, capable of reducing processing time and memory footprint. The compression strategy was applied in two domains. In the first domain, for automatic modulation classification, it was possible to reduce the model size by 13 times while maintaining an accuracy only 1.8% lower than that of the uncompressed model. In the second domain, the same technique was applied to an image classification model to validate its performance in microservices environments. The results showed a significant reduction in the model size, reaching approximately 7.6 times smaller, while the accuracy remained comparable to that of the uncompressed model. The application of this technique in such an environment proved effective in reducing inference time, memory consumption, and CPU usage. Additionally, it contributed to the efficiency of the system, enhancing its scalability.Acesso AbertoAprendizagem profundaQuantização conscienteEscalabilidadeMicroserviçosClassificação automática de modulaçãoCompressão consciente de modelos de redes neurais profundas baseada em poda seguida de quantizaçãomasterThesisCNPQ::ENGENHARIAS::ENGENHARIA ELETRICA