Coelho et al. BMC Medical Genomics           (2020) 13:30 
https://doi.org/10.1186/s12920-020-0694-1
SOFTWARE Open Access
neoANT-HILL: an integrated tool for
identification of potential neoantigens
Ana Carolina M. F. Coelho1, André L. Fonseca1, Danilo L. Martins1, Paulo B. R. Lins1, Lucas M. da Cunha1,2 and
Sandro J. de Souza1,3,4*
Abstract
Background: Cancer neoantigens have attracted great interest in immunotherapy due to their capacity to elicit
antitumoral responses. These molecules arise from somatic mutations in cancer cells, resulting in alterations on the
original protein. Neoantigens identification remains a challenging task due largely to a high rate of false-positives.
Results: We have developed an efficient and automated pipeline for the identification of potential neoantigens.
neoANT-HILL integrates several immunogenomic analyses to improve neoantigen detection from Next Generation
Sequence (NGS) data. The pipeline has been compiled in a pre-built Docker image such that minimal
computational background is required for download and setup. NeoANT-HILL was applied in The Cancer Genome
Atlas (TCGA) melanoma dataset and found several putative neoantigens including ones derived from the recurrent
RAC1:P29S and SERPINB3:E250K mutations. neoANT-HILL was also used to identify potential neoantigens in RNA-
Seq data with a high sensitivity and specificity.
Conclusion: neoANT-HILL is a user-friendly tool with a graphical interface that performs neoantigens prediction
efficiently. neoANT-HILL is able to process multiple samples, provides several binding predictors, enables
quantification of tumor-infiltrating immune cells and considers RNA-Seq data for identifying potential neoantigens.
The software is available through github at https://github.com/neoanthill/neoANT-HILL.
Keywords: Neoantigens, Cancer, Immunogenomic analyses
Background In the last few years, advances in next-generation se-
Recent studies have demonstrated that T cells can quencing have provided an accessible way to generate
recognize tumor-specific antigens that bind to human patient-specific data, which allows the prediction of
leukocyte antigens (HLA) molecules at the surface of tumor neoantigens in a rapid and comprehensive man-
tumor cells [1, 2]. During tumor progression, accumulat- ner [7]. Several approaches have been developed, such as
ing somatic mutations in the tumor genome can affect pVAC-Seq [8], MuPeXI [9], TIminer [10] and TSNAD
protein-coding genes and result in mutated peptides [1]. [11], which predict potential neoantigens produced by
These mutated peptides, which are present in the malig- non- synonymous mutations. However, none of these
nant cells but not in the normal cells, may act as neoan- proposed tools considers tumor transcriptome sequen-
tigens and trigger T-cell responses due to the lack of cing data (RNA-seq) for identifying somatic mutations.
thymic elimination of autoreactive T-cells (central toler- Moreover, only one of these tools provides quantifica-
ance) [3–5]. As result, these neoantigens appear to rep- tion of the fraction of tumor-infiltrating immune cell
resent ideal targets attracting great interest for cancer types (Supplementary: Table S1).
immunotherapeutic strategies, including therapeutic vac- Here we present a versatile tool with a graphical
cines and engineered T cells [1, 6]. user interface (GUI), called neoANT-HILL, designed
to identify potential neoantigens arising from cancer
* Correspondence: sandro@imd.ufrn.br somatic mutations. neoANT-HILL integrates comple-
1Bioinformatics Multidisciplinary Enviroment (BioME), Institute Metropolis mentary features to prioritizing mutant peptides based
Digital, Federal University of Rio Grande do Norte, UFRN, Natal, Brazil
3Brain Institute, Federal University of Rio Grande do Norte, UFRN, Natal, Brazil on predicted binding affinity and mRNA expression
Full list of author information is available at the end of the article level (Fig. 1). We used datasets from GEUVADIS
© The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Coelho et al. BMC Medical Genomics           (2020) 13:30 Page 2 of 8
Fig. 1 Overall workflow of neoANT-HILL. The neoANT-HILL was designed to analyze NGS data, such as genome (WGS or WES) and transcriptome
(RNA-Seq) data. Basically, it takes as input distinct data types, including raw and pre-aligned sequences from RNA-Seq, as well as, variant calling
files (VCF) from genome or transcriptome data (dotted lines indicate that the VCF must be previously created by the user). The blue boxes
represent the transcriptome analyses, which should be carried out using data in either BAM format (variant calling) or fastq format (expression,
HLA typing and tumor-infiltrating immune cells). The neoANT-HILL can perform gene expression (Kallisto), variant calling (GATK4 | Mutect2), HLA
typing (Optitype), and Tumor-infiltrating immune cells (quanTIseq). The gene expression quantification is used as input to identify molecular
signatures associated with immune cell diversity into the tumor samples. On the other hand, the gray boxes represent common steps to genome
and transcriptome data. NeoANT-HILL uses the variant calling data to reconstruct the proteins sequences using as reference the NCBI RefSeq
database. The VCF files can be either generated by using our pipeline or by external somatic variant-calling software. Next, reconstructed proteins
are submitted to neoepitope binding prediction using HLA alleles from Optitype results or defined by the user. Finally, all steps and results are
shown into a user-friendly interface
RNA sequencing project [12] to demonstrate that or a tumor transcriptome sequence data (RNA-seq) in
RNA-seq is also a potential source of mutation detec- which somatic mutation will be called following GATK
tion. Finally, we applied our pipeline on a large mel- best practices [14, 15] with Mutect2 [16] on tumor-only
anoma cohort from The Cancer Genome Atlas [13] mode. However, the RNA-seq data must be previously
to demonstrate its utility in predicting and suggesting aligned to the reference genome (BAM) by the user. The
potential neoantigens that could be used in personal- size of corresponding BAM files from the RNA-Seq can
ized immunotherapy. be a limiting factor in the analysis. Since neoANT-HILL
is run locally, the user must guarantee that enough space
Implementation and memory are available for a proper execution of the
neoANT-HILL requires a variant list for potential program. In the current implementation, neoANT-HILL
neoantigen prediction. Our pipeline is able to handle a supports VCF files generated using the human genome
VCF file (single- or multi-sample) for the genome data version GRCh37. The variants are properly annotated by
Coelho et al. BMC Medical Genomics           (2020) 13:30 Page 3 of 8
Fig. 2 Screenshots of neoANT-HILL interface. a Processing tab for submitting genome or transcriptome data. b Processing tab for parameters
selection to run neoepitope binding affinity prediction. On this tab, all the parameters can be defined by the users through selection boxes,
ranging from the MHC class, corresponding prediction methods, to parallelization settings. The length and HLA alleles parameters allow multiple
selections, although that might interfere in the processing time. c Binding prediction results tab shows an interactive table which reports all
predicted neoepitopes and information about each prediction, respectively. The interactive table shows several columns, such as the donor gene,
HLA allele, mutation type, reference (Ref_Peptide) and altered (Alt_peptide) peptides sequences, reference (Ref_IC50) and altered (Alt_IC50)
binding affinity scores, binding affinity category (High, Moderate, Low, and Non-binding) and differential agretopicity index (DAI)
Coelho et al. BMC Medical Genomics           (2020) 13:30 Page 4 of 8
snpEff [17] to identify non-synonymous mutations (mis- Our software was developed under a pre-built Docker
sense, frameshift and inframe). image. The required dependencies are packed up, which
Once the VCF files have been annotated, the result- simplify the installation process and avoid possible in-
ing altered amino acid sequences are inferred from compatibilities between versions. As previously de-
the NCBI Reference Sequence database (RefSeq) [18]. scribed, several analyses are supported and each one
For frameshift mutations, the altered amino acid se- relies on different tools. Several scripts were imple-
quence is inferred by translating the resulting cDNA mented on Python to complete automate the execution
sequence. Altered epitopes (neoepitopes) are trans- of these single tools and data integration.
lated into a 21-mer sequence where the altered resi-
due is at the center. If the mutation is at the Results
beginning or at the end of the transcript, the neoepi- neoANT-HILL was designed through a user-friendly
tope sequence is built by taking the 20 following or graphical interface (Fig. 2) implemented on Flask frame-
preceding amino acids, respectively. The neoepitope work. The interface comprises three main sections: (i)
sequence and its corresponding wild-type are stored Home (Fig. 2a), (ii) Processing (Fig. 2b), and (iii) Results
in a FASTA file. Non-overlapping neoepitopes can be (Fig. 2c). neoANT-HILL stores the outputs in sample-
derived from frameshift mutations. specific folders. Our pipeline provides a table of ranked
A list of HLA haplotypes is also required. If this data predicted neoantigens with HLA alleles, variant informa-
had not been provided by the user, neoANT-HILL in- tion, binding prediction score (neoepitope and wild-
cludes the Optitype algorithm [19] to infers class-I HLA type) and binding affinity classification. When optional
molecules from RNA-Seq. The subsequent step is the analyses are set by the user, the outputs are stored in
binding affinity prediction between the predicted neoepi- separated tabs. Gene expression is provided as a list with
topes and HLA molecules. This can be executed on sin- corresponding RNA expression levels and it is used to
gle or multi-sample using parallelization with the filter the neoantigens candidates.
custom configured parameters. The correspondent wild-
type sequences are also submitted at this stage, which al- Variant identification on RNA-Seq
lows calculation of the fold change between wild-type We evaluate the utility of RNA-seq for identifying
and neoepitopes binding scores, known as differential frameshift, indels and point mutations by using samples
agretopicity index (DAI) as proposed by [20]. This add- (n = 15) from the GEUVADIS RNA sequencing project.
itional neoantigen quality metric contributes to a better
prediction of neoantigens that can elicit an antitumor re-
Table 1 Top 15 potential shared neoantigens based on TCGA-
sponse [21]. SKCM cohort. Recurrent mutations observed on TCGA-SKCM
neoANT-HILL employs seven binding prediction algo- cohort. The amino acid (AA) residue changes caused by somatic
rithms from Immune Epitope Database (IEDB) [22], in- mutations are highlighted in the (neo) epitopes sequences. The
cluding NetMHC (v. 4.0) [23, 24], NetMHCpan (v. 4.0) frequency represents the number of samples showing the
[25], NetMHCcons [26], NetMHCstabpan [27], Pick- corresponding mutation
Pocket [28], SMM [29] and SMMPMBEC [30], and the Gene AA change Neoepitope HLA haplotype Frequency
MHCflurry algorithm [31] for HLA class I. The user is RAC1 P29S FSGEYITV HLA-A*02:01 17/466
able to specify the neoepitope lengths to perform bind-
KLHDC7A E635K HTATVRAKK HLA-A*11:01 12/466
ing predictions. Each neoepitope sequence is parsed
through a sliding window metric. Our pipeline also em- INMT S212F YMVGKREFFCV HLA-A*02:01 9/466
ploys four IEDB-algorithms for HLA class II binding af- CDH6 S524L FLFSLAPEAA HLA-A*02:01 8/466
finity prediction: NetMHCIIpan (v. 3.1) [32], NN-align ZBED2 E157K GTMALWASQRK HLA-A*11:01 8/466
[33], SMM- align [34], and Sturniolo [35]. CRNKL1 S128F LQVPLPVPRF HLA-A*15:01 7/466
Moreover, when the unmapped RNA-seq reads are IL37 S202L FLFQPVCKA HLA-A*02:01 7/466
available (fastq), neoANT-HILL can quantify the expres-
SERPINB3 E250K LSMIVLLPNK HLA-A*11:01 6/466
sion levels of genes carrying a potential neoantigen. Our
pipeline uses the Kallisto algorithm [36] and the output DNAJC5B E22K STTGEALYK HLA-A*11:01 6/466
is reported as transcripts per million (TPM). Potential MYO7B E512K MSIISLLDK HLA-A*11:01 6/466
neoantigens arising from genes showing an expression MORC1 E878K IQNTYMVQYK HLA-A*11:01 6/466
level under 1 TPM are excluded. In addition, neoANT- SCN7A S445F IEMKKRSPIF HLA-A*15:01 6/466
HILL also offers the possibility of estimating quantita- PSG9 E404K KISKSMTVK HLA-A*11:01 6/466
tively, via deconvolution, the relative fractions of tumor-
RAC1 P29L FLGEYIPTV HLA-A*02:01 5/466
infiltrating immune cell types through the use of quan-
TIseq [37]. NUTF2 Q20K SSFIQHYYK HLA-A*11:01 5/466
Coelho et al. BMC Medical Genomics           (2020) 13:30 Page 5 of 8
Although these samples are not derived from tumor RNA-Seq data has been shown to be more challenging,
cells, the goal of these analysis was to benchmark the ef- it is an interesting alternative for genome sequencing
ficiency of our pipeline to detect somatic mutations from and a large amount of tumor RNA-seq samples do not
RNA-Seq data. We limited our analysis to variants with have normal matched data [39, 40].
read depth (DP) > = 10 and supported by at least five
reads. The overall called variants were then compared to Use case
the corresponding genotypes (same individuals) provided We applied our pipeline on a large melanoma cohort
by the 1000 Genomes Project Consortium (1KG) [38]. (SKCM, n = 466) from TCGA to demonstrate its util-
We found that on average 71% of variants in coding re- ity in identifying potential neoantigens. We found ap-
gions detected by RNA-seq were confirmed by the gen- proximately 198,000 instances of predicted neoantigens
ome sequencing (concordant calls) (Supplementary binding to HLA-I. It is important to note that the large
Table S2). Variants in genes that are not expressed can- number of mutant peptides is due to: i) the larger cohort
not be detected by RNA-seq and RNA editing sites size, ii) the high mutational burden of melanoma and iii)
could partially explain the discordant calls. Furthermore, the large set of HLA alleles that was used to run the bind-
some of the discrepancies can be also due to low cover- ing prediction. These neoepitopes were classified as strong
age in the genome sequence, which generated a false- (IC50 under 50 nM), intermediate (IC50 between 50 nM
negative in the calling. Although calling variants from and 250 nM) or weak binders (IC50 over 250 nM and
Fig. 3 Distribution of recurrent missense mutations that generated high-affinity neoantigens. The y-axis shows peptide coverage based on the
number of epitope binding predictions in each region. The coverage was calculated by increasing the overall frequency of each amino acid by
one, including non-high-affinity regions. The allele classification is shown as colored lines. The x-axis shows the protein length, and also contains
information about conserved domains for each protein. a P29S and RAC1 gene generated recurrent mutant peptides with strong affinity to HLA-
A*02:01 and P29L generated peptides with strong affinity to HLA-A*02:01 or HLA-A*11:01, depending on peptide length (b) E250K in SERPINB3
gene generate a recurrent potential neoantigen that binds to HLA-A*11:01
Coelho et al. BMC Medical Genomics           (2020) 13:30 Page 6 of 8
under 500 nM) (Supplementary Table S3). We limited our Any restrictions to use by non-academics: None.
analyses to high binding affinity candidates to reduce po-
tential false positives. Supplementary information
We observed that the majority of strong binder mu- Supplementary information accompanies this paper at https://doi.org/10.
1186/s12920-020-0694-1.
tant peptides are private and unique, which is likely
linked to the high intratumor genetic diversity. However,
Additional file 1.
we observed that frequent mutations may be likely to
Additional file 2.
generate recurrent mutant peptides (Table 1). These re-
Additional file 3.
current neoantigens are interesting since they could be
used as a vaccine for more than one patient. Figure 3
Abbreviations
shows potential neoepitopes arising from recurrent mu- DAI: Differential Agretopicity Index; DP: Read Depth; GUI: Graphical User
tations. The potential neoantigen (FSGEYIPTV), which Interface; HLA: Human Leukocyte Antigens; NGS: Next Generation
was predicted to form a complex with HLA-A*02:01 al- Sequencing; RNA-Seq: RNA-Sequencing; SKCM: Skin Cutaneous Melanoma;
TCGA: The Cancer Genome Atlas; TPM: Transcripts per Million
lele, was found to be shared among 17 samples (3.65%).
It was generated from the P29S mutation in gene RAC1 Acknowledgments
(Fig. 3a). RAC1 P29S have been described as a candidate Not applicable.
biomarker for treatment with anti-PD1 or anti-PD-L1 Author’s contributions
antibodies [41]. Another mutation (P29L) in the same ACMFC, DLM and PRBL designed and carried out the implementation of the
gene formed a recurrent potential neoantigen (FLGEYIPTV) computational pipeline. DLM led debugging efforts, LMC contributed to
design the computational pipeline. ACMFC and ALF analyzed the data. SJS
and was found in 5 samples (1.07%). As another example, supervised the project. ACMFC and ALF discussed the results and
we can also highlight the potential shared neoantigen commented on the manuscript in consultation with SJS. SJS reviewed and
(LSMIVLLPNK) related to mutation E250K in the SER- edited the manuscript. All authors read and approved the final manuscript.
PINB3 gene (Fig. 3b). This was found in 6 samples (1.29%) Funding
and it was likely to form a complex with the HLA-A*11:01 This work was supported by a CAPES grant (23038.004629/2014–19). ACMFC,
allele. Mutations in SERPINB3 have also been related to re- DLM, ALF, LMC and PRBL were supported by CAPES. The funding body had
no role in the design of the study and collection, analysis, and interpretation
sponse to immunotherapy [42]. of data and in writing the manuscript.
Conclusion Availability of data and materials
The RNA-Seq dataset from Geuvadis RNA sequencing project analyzed dur-
We present neoANT-HILL, a completely integrated, effi- ing the current study are available in the ArrayExpress database (http://www.
cient and user-friendly software for predicting and ebi.ac.uk/arrayexpress/) under the accession number E-GEUV-1. The corre-
screening potential neoantigens. We have shown that sponding genotyping data (Phase I) from each sample are available from the
1000 Genomes Project and was downloaded from the FTP site hosted at the
neoANT-HILL can predict neoantigen candidates, which EBI.
can be targets for immunotherapies and predictive bio- ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/ (ftp://ftp.1000genomes.
markers for immune responses. Our pipeline is available ebi.ac.uk/vol1/ftp/phase1/data/NA12812/exome_alignment/NA12812.
mapped.SOLID.bfast.CEU.exome.20110411.bam, ftp://ftp.1000genomes.ebi.ac.
through a user-friendly graphical interface which enables uk/vol1/ftp/phase1/data/NA12749/exome_alignment/NA12749.mapped.
its usage by users without advanced programming skills. illumina.mosaik.CEU.exome.20110521.bam, ftp://ftp.1000genomes.ebi.ac.uk/
Furthermore, neoANT-HILL offers several binding pre- vol1/ftp/phase1/data/NA20510/exome_alignment/NA20510.mapped.SOLID.
bfast.TSI.exome.20110521.bam, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/
diction algorithms for both HLA classes and can process phase1/data/NA19119/exome_alignment/NA19119.mapped.illumina.mosaik.
multiple samples in a single running. Unlike the majority YRI.exome.20110411.bam, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/
of existing tools, our pipeline enables the quantification data/NA19204/exome_alignment/NA19204.mapped.illumina.mosaik.YRI.
exome.20110411.bam,
of tumor-infiltrating lymphocytes and considers RNA- ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA18498/exome_
Seq data for variant identification. The source code is alignment/NA18498.mapped.illumina.mosaik.YRI.exome.20110411.bam, ftp://
available at https://github.com/neoanthill/neoANT- ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA12489/exome_alignment/
NA12489.mapped.SOLID.bfast.CEU.exome.20110411.bam,
HILL. ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA20752/exome_
alignment/NA20752.mapped.illumina.mosaik.TSI.exome.20110521.bam, ftp://
Availability and requirements ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA18517/exome_alignment/
NA18517.mapped.illumina.mosaik.YRI.exome.20110521.bam, ftp://ftp.1
Project name: neoANT-HILL. 000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA11992/exome_alignment/
Project home page: https://github.com/neoanthill/ NA11992.mapped.SOLID.bfast.CEU.exome.20110411.bam, ftp://ftp.1
neoANT-HILL 000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA19144/exome_alignment/NA1
9144.mapped.illumina.mosaik.YRI.exome.20110411.bam, ftp://ftp.1
Operating system(s): Unix-based operating system, 000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA20759/exome_alignment/NA2
Mac OS, Windows. 0759.mapped.illumina.mosaik.TSI.exome.20110521.bam, ftp://ftp.1
Programming language: Python 2.7. 000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA19137/exome_alignment/NA1
9137.mapped.illumina.mosaik.YRI.exome.20110411.bam, ftp://ftp.1
Other requirements: Docker. 000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA19257/exome_alignment/NA1
License: Apache License 2.0. 9257.mapped.illumina.mosaik.YRI.exome.20110521.bam, ftp://ftp.1
Coelho et al. BMC Medical Genomics           (2020) 13:30 Page 7 of 8
000genomes.ebi.ac.uk/vol1/ftp/phase1/data/NA12006/exome_alignment/ Carracedo Á, Antonarakis SE, Häsler R, Syvänen AC, van Ommen GJ, Brazma
NA12006.mapped.SOLID.bfast.CEU.exome.20110411.bam). The melanoma A, Meitinger T, Rosenstiel P, Guigó R, Gut IG, Estivill X, Dermitzakis ET.
TCGA mutation data was downloaded from the cBio datahub (https://github. Transcriptome and genome sequencing uncovers functional variation in
com/cBioPortal/datahub/blob/master/public/skcm_tcga/data_mutations_ humans. Nature. 2013;501(7468):506.
extended.txt). 13. Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K,
Shmulevich I, Sander C, Stuart JM. The Cancer genome atlas pan-Cancer
Ethics approval and consent to participate analysis project. Nat Genet. 2013;45(10):1113–20.
Not applicable. 14. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis
AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM,
Consent for publication Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ. A framework for
Not applicable. variation discovery and genotyping using next-generation DNA sequencing
data. Nat Genet. 2011;43(5):491–8.
Competing interests 15. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-
The authors declare that they have no competing interests. Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV,
Altshuler D, Gabriel S, MA DP. From FastQ Data to High-Confidence Variant
Author details Calls: The Genome Analysis Toolkit Best Practices Pipeline. Curr Protoc
1Bioinformatics Multidisciplinary Enviroment (BioME), Institute Metropolis Bioinformatics. 2013;43:11.10.1–11.10.33.
Digital, Federal University of Rio Grande do Norte, UFRN, Natal, Brazil. 2PhD 16. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C,
Program in Bioinformatics, UFRN, Natal, Brazil. 3Brain Institute, Federal Gabriel S, Meyerson M, Lander ES, Getz G. Sensitive detection of somatic
University of Rio Grande do Norte, UFRN, Natal, Brazil. 4Institutes for Systems point mutations in impure and heterogeneous cancer samples. Nat
Genetics, West China Hospital, Sichuan University, Chengdu, China. Biotechnol. 2013;31(3):213–9.
17. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X,
Received: 2 October 2019 Accepted: 11 February 2020 Ruden DM. A program for annotating and predicting the effects of single
nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila
melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6(2):80–92.
References 18. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R,
Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A,
1. Efremova M, Finotello F, Rieder D, Trajanoski Z. Neoantigens generated by
Badretdin A, Bao Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E,
individual mutations and their role in cancer immunity and
Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina
immunotherapy. Front Immunol. 2017. https://doi.org/10.3389/fimmu.2017.
W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, KM MG, Murphy
01679.
MR, O'Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C,
2. Kato T, Matsuda T, Ikeda Y, Park JH, Leisegang M, Yoshimura S, Hikichi T,
Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan
Harada M, Zewde M, Sato S, Hasegawa K, Kiyotani K, Nakamura Y. Effective
AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T,
screening of T cells recognizing neoantigens and construction of T-cell
Dicuccio M, Kitts P, Murphy TD, Pruitt KD. Reference sequence (RefSeq)
receptor-engineered T cells. Oncotarget. 2018. https://doi.org/10.18632/
database at NCBI: current status, taxonomic expansion, and functional
oncotarget.24232.
annotation. Nucleic Acids Res. 2015;44(D1):D733–45.
3. Snyder A, Makarov V, Merghoub T, Yuan J, Zaretsky JM, Desrichard A, Walsh
LA, Postow MA, Wong P, Ho TS, Hollmann TJ, Bruggeman C, Kannan K, Li Y, 19. Szolek A, Schubert B, Mohr C, Sturm M, Feldhahn M, Kohlbacher O.
Elipenahli C, Liu C, Harbison CT, Wang L, Ribas A, Wolchok JD, Chan TA. OptiType: precision HLA typing from next-generation sequencing data.
Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl Bioinformatics. 2014;30(23):3310–6.
J Med. 2014. https://doi.org/10.1056/nejmoa1406498. 20. Duan F, Duitama J, Sahar AIS, Cory MA, Steven AC, Arpita PP, Tatiana B,
4. Bailey P, Chang DK, Forget M, Lucas FAS, Alvarez HA, Haymaker C, David M, John S, Alessandro S, Brian MB, Ion IM, Pramod KS. Genomic and
Chattopadhyay C, Kim S, Ekmekcioglu S, Grimm EA, Biankin AV, Hwu P, bioinformatic profiling of mutational neoepitopes reveals rules to predict
Maitra A, Roszik J. Exploiting the neoantigen landscape for immunotherapy anticancer immunogenicity. J Exp Med. 2014;211(11):2231–48.
of pancreatic ductal adenocarcinoma. Sci Rep. 2016. https://doi.org/10.1038/ 21. Richman LP, Vonderheide RH, Aj R. Neoantigen dissimilarity to the self-
srep35848. proteome predicts immunogenicity and response to immune checkpoint
5. Riaz N, Morris L, Havel JJ, Makarov V, Desrichard A, Chan TA. The role of blockade. Cell Syst. 2019;9:375–82.
neoantigens in response to immune checkpoint blockade. Int Immunol. 22. Vita R, Mahajan S, Overton JA, Dhanda SK, Martini S, Cantrell JR, Wheeler DK,
2016. https://doi.org/10.1093/intimm/dxw019. Sette A, Peters B. The ImmuneEpitope database (IEDB): 2018 update. Nucleic
6. Lu Y, Robbins PF. Cancer immunotherapy targeting neoantigens. Semin Acids Res. 2019;47:D339–D343.
Immunol. 2016;28(1):22–7. 23. Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural
7. Liu XS, Mardis ER. Applications of Immunogenomics to Cancer. Cell. 2017; networks: application to the MHC class I system. Bioinformatics. 2015;32(4):
168(4):600–12. 511–7.
8. Hundal J, Carreno BM, Petti AA, Linette GP, Griffith OL, Mardis ER, Griffith M. 24. Nielsen M, Lundegaard C, Worning P, Lauemøller SL, Lamberth K, Buus
pVAC-Seq: A genome-guided in silico approach to identifying tumor S, Brunak S, Lund O. Reliable prediction of T-cell epitopes using neural
neoantigens. Genome Med. 2016;8(1). networks with novel sequence representations. Protein Sci. 2003;12(5):
9. Bjerregaard A, Nielsen M, Hadrup SR, Szallasi Z, Eklund AC. MuPeXI: 1007–17.
prediction of neo-epitopes from tumor sequencing data. Cancer Immunol 25. Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0:
Immunother. 2017;66(9):1123–30. improved peptide–MHC class I interaction predictions integrating eluted
10. Tappeiner E, Finotello F, Charoentong P, Mayer C, Rieder DE, Trajanoski Z. ligand and peptide binding affinity data. J Immunol. 2017;199(9):3360–8.
TIminer: NGS data mining pipeline for cancer immunology and 26. Karosiene E, Lundegaard C, Lund O, Nielsen M. NetMHCcons: a consensus
immunotherapy. Bioinformatics. 2017;33(19):3140–1. method for the major histocompatibility complex class I predictions.
11. Zhou Z, Lyu X, Wu J, Yang X, Wu S, Zhou J, Gu X, Su Z, Chen S. TSNAD: an Immunogenetics. 2011;64(3):177–86.
integrated software for cancer somatic mutation and tumour-specific 27. Rasmussen M, Fenoy E, Harndahl M, Kristensen AB, Nielsen IK, Nielsen M,
neoantigen detection. R Soc Open Sci. 2017;4(4):170050. Buus S. Pan-specific prediction of peptide–MHC class I complex stability, a
12. Lappalainen T, Sammeth M, Friedländer MR, PAC ‘t H, Monlong J, Rivas MA, correlate of T cell immunogenicity. J Immunol. 2016;197(4):1517–24.
Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, Barann M, Wieland T, 28. Zhang H, Lund O, Nielsen M. The PickPocket method for predicting binding
Greger L, van Iterson M, Almlöf J, Ribeca P, Pulyakhina I, Esser D, Giger T, specificities for receptors based on receptor pocket similarities: application
Tikhonov A, Sultan M, Bertier G, DG MA, Lek M, Lizano E, HPJ B, Padioleau I, to MHC-peptide binding. Bioinformatics. 2009;25(10):1293–9.
Schwarzmayr T, Karlberg O, Ongen H, Kilpinen H, Beltran S, Gut M, Kahlem 29. Peters B, Sette A. Generating quantitative models describing the sequence
K, Amstislavskiy V, Stegle O, Pirinen M, Montgomery SB, Donelly P, MI MC, specificity of biological processes with the stabilized matrix method. BMC
Flicek P, Strom TM, Consortium TG, Lehrach H, Schreiber S, Sudbrak R, Bioinformatics. 2005;6(1):132.
Coelho et al. BMC Medical Genomics           (2020) 13:30 Page 8 of 8
30. Kim Y, Sidney J, Pinilla C, Sette A, Peters B. Derivation of an amino acid
similarity matrix for peptide:MHC binding and its application as a Bayesian
prior. BMC Bioinformatics. 2009;10(1):394.
31. O'Donnell TJ, Rubinsteyn A, Bonsack M, Riemer AB, Laserson U,
Hammerbacher J. MHCflurry: Open-Source Class I MHC Binding Affinity
Prediction. Cell Systems. 2018;7(1):129–132.e4.
32. Karosiene E, Rasmussen M, Blicher T, Lund O, Buus S, Nielsen M.
NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method
including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-
DQ. Immunogenetics. 2013;65(10):711–24.
33. Nielsen M, Lund O. NN-align. An artificial neural network-based alignment
algorithm for MHC class II peptide binding prediction. BMC Bioinformatics.
2009;10(1):296.
34. Nielsen M, Lundegaard C, Lund O. Prediction of MHC class II binding affinity
using SMM-align, a novel stabilization matrix alignment method. BMC
Bioinformatics. 2007;8(1):238.
35. Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U, Braxenthaler
M, Gallazzi F, Protti MP, Sinigaglia F, Hammer J. Generation of tissue-specific
and promiscuous HLA ligand databases using DNA microarrays and virtual
HLA class II matrices. Nat Biotechnol. 1999;17(6):555–61.
36. Bray N, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq
quantification. Nat Biotechnol. 2016;34(5):525–7.
37. Finotello F, Mayer C, Plattner C, Laschober G, Rieder D, Hackl H, Krogsdam
A, Loncova Z, Posch W, Wilflingseder D, Sopper S, Ijsselsteijn M, Brouwer TP,
Johnson D, Xu Y, Wang Y, Sanders ME, Estrada MV, Ericsson-Gonzalez P,
Charoentong P, Balko J, NFDCC d M, Trajanoski Z. Molecular and
pharmacological modulators of the tumor immune contexture revealed by
deconvolution of RNA-seq data. Genome Med. 2019;11(1):34.
38. 1000 Genomes Project Consortium. A global reference for human genetic
variation. Nature. 2015;526(7571):68–74.
39. Piskol R, Ramaswami G, Li J. Reliable identification of genomic variants from
RNA-Seq data. Am J Hum Genet. 2013;93(4):641–51.
40. Coudray A, Battenhouse AM, Bucher P, Iyer VR. Detection and
benchmarking of somatic mutations in cancer genomes using RNA-seq
data. PeerJ. 2018;6:e5362.
41. Vu HL, Rosenbaum S, Purwin TJ, Davies MA, Aplin AE. RAC1 P29S regulates
PD-L1 expression in melanoma. Pigment Cell Melanoma Res. 2015;28(5):
590–8.
42. Riaz N, Havel JJ, Kendall SM, Makarov V, Walsh LA, Desrichard A, Weinhold
N, Chan TA. Recurrent SERPINB3 and SERPINB4 mutations in patients who
respond to anti-CTLA4 immunotherapy. Nat Genet. 2016;48(11):1327–9.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.