Sakamoto, TetsuCosta, Priscila Caroline de Sousa2023-03-222023-03-222022-12-15COSTA, Priscila Caroline de Sousa. Identificação de homólogos remotos utilizando ferramentas de alinhamento estrutural de proteínas e aprendizado de máquina. Orientador: Tetsu Sakamoto. 2022. 50f. Dissertação (Mestrado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022.https://repositorio.ufrn.br/handle/123456789/51918Proteomics studies have shown the large number of proteins discovered and their importance for the study of life. However, there is still a high percentage of these proteins that have not been functionally annotated and that for health and biotechnological advances this definition of unknown proteins is essential. The functions of proteins are defined by their conformity and three-dimensional structure, therefore, data on the three-dimensional structure of these proteins help define their functions. Currently, there is a large amount and diversity of proteins that have their sequence characterized, but there is still a methodological bottleneck to obtain their structural data. With the recent development of the AlphaFold program, which accurately predicts the three-dimensional structure of proteins from their amino acid sequence, this bottleneck can be overcome. Thus, the aim of this project is to evaluate the impact of using these structural prediction tools on protein functional annotations. In this work, we seek to help in the functional description of protein domains of unknown function (DUF). For this, predicted data of its three-dimensional structure were submitted to computational tools that perform a search for other structures that share structural similarity. The present study demonstrates that many domains can benefit from this analysis. In addition, we generated a classification model using the SVM method, which proved to be effective, presenting a ROC AUC value of 0.9191 and standard deviation of 0.0099, capable of identifying whether two proteins that share a structural similarity are remote homologues, or that is, whether they are derived from a common ancestor. This classifier will be used to analyze the similarity results and suggest functions for these domains. In this way, it would be possible to identify the structural similarity between proteins that share low sequence similarity.Acesso AbertoHomólogo remotoDUFAlphafoldSimilaridade estrutural de proteínasFATCATIdentificação de homólogos remotos utilizando ferramentas de alinhamento estrutural de proteínas e aprendizado de máquinaIdentification of remote homologous using protein structural alignment tools and machine learningmasterThesisCNPQ::CIENCIAS BIOLOGICAS