Avaliação eperimental do uso de agentes baseados em LLMs como assistentes de pesquisa científica

Silva, IvanovitchSilva, Larissa Kelmer de Menezes2025-07-142025-07-142025-07-07SILVA, Larissa Kelmer de Menezes . Avaliação eperimental do uso de agentes baseados em LLMs como assistentes de pesquisa científica. 2025. 124 f. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2025.https://repositorio.ufrn.br/handle/123456789/64314This study presents an experimental evaluation of Agent Laboratory, a tool based on agents powered by Large Language Models (LLMs), specifically designed to support scientific research tasks in the field of machine learning. Four experiments were conducted across two thematic domains — Edge AI and Smart Cities — each explored under two different settings: a constrained one with detailed instructions and an open-ended one emphasizing creativity. Each experiment was structured into three phases: literature review, research planning, and execution with result interpretation. The agent’s output was evaluated using a human rubric grounded in established benchmarks such as LEADER, MT-Bench, and AgentEval, considering six dimensions: scientific relevance, originality, clarity, practical feasibility, fidelity to the literature, and thematic synthesis. The goal of this research is to assess the viability of LLM-based agents in assisting machine learning research workflows and to identify both their strengths and limitations in automating this process. The results indicate that the tool performs well in generating coherent and original research proposals, especially in guided scenarios. However, critical limitations were observed during implementation, including the unjustified substitution of models and datasets, failure to meet planned metrics, and lack of justification for technical choices. Literature reviews were often shallow and lacked transparency, while final reports frequently omitted key sections or overstated findings. Despite these issues, the tool demonstrated potential for automating early-stage components of the ML research workflow — particularly idea generation and structural planning. This work contributes a reproducible, benchmark-based methodology for evaluating LLM-guided research agents and underscores the ongoing need for human oversight to ensure scientific rigor and reliability.pt-BRAttribution-ShareAlike 3.0 Brazilhttp://creativecommons.org/licenses/by-sa/3.0/br/Agentes de IAModelos de LinguagemPesquisa AutomatizadaAvaliação de LLMsEdge AICidades InteligentesAvaliação eperimental do uso de agentes baseados em LLMs como assistentes de pesquisa científicaExperimental evaluation of the use of LLM-based agents as scientific research assistantsbachelorThesisOUTROSENGENHARIAS