Use este identificador para citar ou linkar para este item: https://repositorio.ufrn.br/handle/123456789/27235
Título: On the impact of the pangenome and annotation discrepancies while building protein sequence databases for bacteria proteogenomics
Autor(es): Machado, Karla C. T.
Fortuin, Suereta
Tomazella, Gisele Guicardi
Fonseca, Andre F.
Warren, Robin Mark
Wiker, Harald G.
Souza, Sandro José de
Souza, Gustavo Antonio de
Palavras-chave: databases;proteomics;proteogenomics;mass spectrometry;pangenome
Data do documento: 20-Jun-2019
Resumo: n proteomics, peptide information within mass spectrometry (MS) data from a specific organism sample is routinely matched against a protein sequence database that best represent such organism. However, if the species/strain in the sample is unknown or genetically poorly characterized, it becomes challenging to determine a database which can represent such sample. Building customized protein sequence databases merging multiple strains for a given species has become a strategy to overcome such restrictions. However, as more genetic information is publicly available and interesting genetic features such as the existence of pan- and core genes within a species are revealed, we questioned how efficient such merging strategies are to report relevant information. To test this assumption, we constructed databases containing conserved and unique sequences for 10 different species. Features that are relevant for probabilistic-based protein identification by proteomics were then monitored. As expected, increase in database complexity correlates with pangenomic complexity. However, Mycobacterium tuberculosis and Bordetella pertussis generated very complex databases even having low pangenomic complexity. We further tested database performance by using MS data from eight clinical strains from M. tuberculosis, and from two published datasets from Staphylococcus aureus. We show that by using an approach where database size is controlled by removing repeated identical tryptic sequences across strains/species, computational time can be reduced drastically as database complexity increases.
URI: https://repositorio.ufrn.br/jspui/handle/123456789/27235
Aparece nas coleções:ICe - Artigos publicados em periódicos

Arquivos associados a este item:
Arquivo Descrição TamanhoFormato 
SandroSouza_ICe_2019_On the impact of the pangenome and annotation.pdfSandroSouza_ICe_2019_On the impact of the pangenome and annotation1,58 MBAdobe PDFThumbnail
Visualizar/Abrir


Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.