Silva, Ivanovitch Medeiros Dantas daAndrade, Matheus Gomes Diniz2025-01-212025-01-212025-01-16ANDRADE, Matheus Gomes Diniz. Metodologia orientada a grandes modelos de linguagens para extração de conhecimento em textos acadêmicos. 2025. 63 f. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) - Departamento de Engenharia da Computação, Universidade Federal do Rio Grande do Norte, Natal, 2025.https://repositorio.ufrn.br/handle/123456789/61490The exponential growth of scientific production in recent decades, driven by the widespread availability of online journals and the need for academic progression, has generated a large volume of academic publications. Due to this growth, efficient access to academic information has become a significant challenge. Traditional academic information retrieval systems heavily rely on keywords provided by authors, which can introduce biases and limit the diversity and relevance of the results. In this context, this work proposes a methodology based on Large Language Models (LLMs) to optimize information retrieval in academic databases. The developed methodology integrates Retrieval-Augmented Generation (RAG) techniques and semantic analysis for natural language queries. Using tools such as LangChain and ChromaDB, the methodology was designed to include metadata-based filters and align searches with the Sustainable Development Goals (SDGs). The process also employs preprocessing and vector storage techniques, ensuring efficiency in data indexing and retrieval. The results of the case study indicate that the applied methodology enables personalized queries and precise responses. The system was able to perform complex searches, combining multiple criteria such as category, publication year, author, advisor, and SDGs. Additionally, the approach proved to be cost-efficient, with a maximum cost of $0.001385 in the test queries.Attribution 3.0 Brazilhttp://creativecommons.org/licenses/by/3.0/br/Recuperação de InformaçõesGrandes Modelos de LinguagemBancos de Dados VetoriaisObjetivos de Desenvolvimento SustentávelProcessamento de Linguagem NaturalMetodologia orientada a grandes modelos de linguagens para extração de conhecimento em textos acadêmicosMethodology oriented towards large language models for knowledge extraction in academic textsbachelorThesis