Araújo, Daniel Sabino Amorim deBatista, Jonathan Jalles Silva2025-04-082025-04-082024-12-17BATISTA, Jonathan Jalles Silva. Recomendação de produtos financeiros utilizando aprendizado de máquina. Orientador: Dr. Daniel Sabino Amorim de Araújo. 2024. 72f. Dissertação (Mestrado Profissional em Tecnologia da Informação) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2024.https://repositorio.ufrn.br/handle/123456789/63425Recommendation systems play a crucial role across various sectors, including finance, by offering personalized suggestions to customers based on their past preferences. In the financial and credit industries, such systems have the potential to revolutionize how institutions engage with customers, particularly through personalized recommendations for financial products like investments, insurance, and loans. This study analyzed 208,570 customer records from three types of financial insurance services to develop a solution capable of supporting marketing strategies for offering these services to the clients of a fintech. Clustering with k-means, tested with two to six clusters, revealed significant customer segmentation patterns. While the two-cluster configuration achieved the highest Silhouette Score (0.4169), the four-cluster approach provided more informative segmentation for strategic purposes. For predictive modeling, after initial tests and random hyperparameter search with 5-fold and 10-fold cross-validation, XGBoost and LightGBM achieved 82.0% recall and 80.5 % F1-score. LightGBM was selected for final evaluation on validation data due to cost-benefit considerations. When applied to the validation set, which had a significantly different insurance distribution from the training data due to covariate shift, the model’s performance dropped significantly to 43.1% recall and F1-score to 39.0%. The model performed best with Insurance C (63.7% recall and 60.0% F1-score) but struggled with Insurances A (45.8% recall and 26.7% F1-score) and B (2.6% recall and 4.8% F1-score). When trained on 80% of the combined training and validation dataset and validated on the remaining 20%, LightGBM showed substantial improvements for Insurances A and C, achieving recall scores of 83.3% and 81.6 and F1-scores of 83.0% and 77.6%, respectively. In all cases, the models struggled with Insurance B. Although the model’s performance with the combined dataset was significantly better, covariate shifts pose a notable challenge in developing the solution for the purpose of this study.Acesso AbertoSegurosFintechClusterizaçãoModelagem preditivaRecomendação de produtos financeiros utilizando aprendizado de máquinaFinancial products recommendation using machine learningmasterThesisCNPQ::ENGENHARIAS