Studying Evolutionary Patterns in Cancer Risk Genes with Computational Tools to Create Sequence Alignments
Conference Year
January 2021
Abstract
Background
An important aspect of cancer research is identifying inherited genetic variants that cause increased cancer susceptibility, then acting to improve cancer outcomes. We can predict the pathogenicity of genetic variants through computational methods, by aligning sequences of species (protein multiple sequence alignments, PMSA). Patterns of evolutionary conservation can help interpret which variants are important for human disease.
Objective
Use computational tools to assess PMSA quality and investigate evolutionary patterns that help predict pathogenic and benign variants. We hypothesize that current alignment tools cannot be fully automated to create good alignments at a large scale.
Methods
We used several computational tools: The National Center for Biotechnology Information BLAST tool to gather sequences of species, its CLINVAR database to identify variants, Clustal Omega to create PMSAs, and Phylip ProtPars to determine evolutionary variation. We chose 32 genes associated with hereditary cancers, counted pathogenic variants, and measured substitutions/site (a measure of conservation), PMSA gaps, and insertions.
Results
94% of genes had small gaps (5-100 amino acids), 100% had small insertions, and 87.5% had large gaps and/or insertions. Most alignments need significant manual adjustment.
Conclusions
While automated existing programs are very helpful in PMSAs, the process remains labor-intensive and cannot be fully automated.
Primary Faculty Mentor Name
Marc Greenblatt
Graduate Student Mentors
Alexander Karabachev
Faculty/Staff Collaborators
Marc Greenblatt (Research Mentor), Alexander Karabachev (Medical Student Mentor)
Status
Undergraduate
Student College
College of Arts and Sciences
Second Student College
Patrick Leahy Honors College
Program/Major
Biological Science
Primary Research Category
Health Sciences
Secondary Research Category
Biological Sciences
Studying Evolutionary Patterns in Cancer Risk Genes with Computational Tools to Create Sequence Alignments
Background
An important aspect of cancer research is identifying inherited genetic variants that cause increased cancer susceptibility, then acting to improve cancer outcomes. We can predict the pathogenicity of genetic variants through computational methods, by aligning sequences of species (protein multiple sequence alignments, PMSA). Patterns of evolutionary conservation can help interpret which variants are important for human disease.
Objective
Use computational tools to assess PMSA quality and investigate evolutionary patterns that help predict pathogenic and benign variants. We hypothesize that current alignment tools cannot be fully automated to create good alignments at a large scale.
Methods
We used several computational tools: The National Center for Biotechnology Information BLAST tool to gather sequences of species, its CLINVAR database to identify variants, Clustal Omega to create PMSAs, and Phylip ProtPars to determine evolutionary variation. We chose 32 genes associated with hereditary cancers, counted pathogenic variants, and measured substitutions/site (a measure of conservation), PMSA gaps, and insertions.
Results
94% of genes had small gaps (5-100 amino acids), 100% had small insertions, and 87.5% had large gaps and/or insertions. Most alignments need significant manual adjustment.
Conclusions
While automated existing programs are very helpful in PMSAs, the process remains labor-intensive and cannot be fully automated.