Santana Júnior, Orivaldo Vieira deLima, Anderson Costa2024-09-112024-09-112024-07-25LIMA, Anderson Costa. Mineração de dados educacionais e machine learning para análise e prevenção da evasão escolar em um curso de graduação. Orientador: Dr. Orivaldo Vieira de Santana Júnior. 2024. 110f. Dissertação (Mestrado Profissional em Ciência, Tecnologia e Inovação) - Escola de Ciências e Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2024.https://repositorio.ufrn.br/handle/123456789/60118Universities face the challenge of transforming a large amount of student data into actionable insights to enhance academic management and reduce dropout rates in higher education. A promising approach to identify factors influencing academic performance is Educational Data Mining (MDE) and Machine Learning (ML). This research aims to develop a method to uncover key characteristics related to dropout in the Interdisciplinary in Science and Technology (C&T) program at the Federal University of Rio Grande do Norte (UFRN), focusing on students enrolled between 2014 and 2023. Through a literature review, suitable ML algorithms were identified for a hybrid approach, combining Random Forest (classification) and Self-Organizing Maps (clustering), with SHapley Additive exPlanations (SHAP) for explainability analysis. The process involved Knowledge Discovery in Databases adapted with stages (data collection, preprocessing, feature mapping, training and testing, explainability analysis). As a result, a predictive model using Random Forest was developed, achieving an initial accuracy of 93% in identifying at-risk students, and subsequently 91% and 89% for unknown data, demonstrating consistency and generalization capability. The research revealed that dropout is influenced by various factors, including curriculum, socioeconomic, and demographic aspects. Analysis with Self-Organizing Maps created a feature map illustrating the relationship between attributes and students' educational status. Combining with SHAP provided comprehensive insights into attribute influences on model predictions, highlighting the importance of variables such as academic performance, age at enrollment, hometown, and socioeconomic status. Finally, a Minimum Viable Product (MVP) was developed as a proof of concept to showcase prediction results and the explainability of findings, with descriptive and predictive analyses of patterns affecting student retention.Acesso AbertoEvasão escolarAnálise preditivaRandom ForestSelf-organizing MapsSHapley Additive exPlanationsMineração de dados educacionais e machine learning para análise e prevenção da evasão escolar em um curso de graduaçãomasterThesisCNPQ::OUTROS::CIENCIAS