Studying Evolutionary Patterns in Cancer Risk Genes with Computational Tools to Create Sequence Alignments

Conference Year

January 2021

Abstract

Background

An important aspect of cancer research is identifying inherited genetic variants that cause increased cancer susceptibility, then acting to improve cancer outcomes. We can predict the pathogenicity of genetic variants through computational methods, by aligning sequences of species (protein multiple sequence alignments, PMSA). Patterns of evolutionary conservation can help interpret which variants are important for human disease.

Objective

Use computational tools to assess PMSA quality and investigate evolutionary patterns that help predict pathogenic and benign variants. We hypothesize that current alignment tools cannot be fully automated to create good alignments at a large scale.

Methods

We used several computational tools: The National Center for Biotechnology Information BLAST tool to gather sequences of species, its CLINVAR database to identify variants, Clustal Omega to create PMSAs, and Phylip ProtPars to determine evolutionary variation. We chose 32 genes associated with hereditary cancers, counted pathogenic variants, and measured substitutions/site (a measure of conservation), PMSA gaps, and insertions.

Results

94% of genes had small gaps (5-100 amino acids), 100% had small insertions, and 87.5% had large gaps and/or insertions. Most alignments need significant manual adjustment.

Conclusions

While automated existing programs are very helpful in PMSAs, the process remains labor-intensive and cannot be fully automated.

Primary Faculty Mentor Name

Marc Greenblatt

Graduate Student Mentors

Alexander Karabachev

Faculty/Staff Collaborators

Marc Greenblatt (Research Mentor), Alexander Karabachev (Medical Student Mentor)

Status

Undergraduate

Student College

College of Arts and Sciences

Second Student College

Patrick Leahy Honors College

Program/Major

Biological Science

Primary Research Category

Health Sciences

Secondary Research Category

Biological Sciences

Abstract only.

Share

COinS
 

Studying Evolutionary Patterns in Cancer Risk Genes with Computational Tools to Create Sequence Alignments

Background

An important aspect of cancer research is identifying inherited genetic variants that cause increased cancer susceptibility, then acting to improve cancer outcomes. We can predict the pathogenicity of genetic variants through computational methods, by aligning sequences of species (protein multiple sequence alignments, PMSA). Patterns of evolutionary conservation can help interpret which variants are important for human disease.

Objective

Use computational tools to assess PMSA quality and investigate evolutionary patterns that help predict pathogenic and benign variants. We hypothesize that current alignment tools cannot be fully automated to create good alignments at a large scale.

Methods

We used several computational tools: The National Center for Biotechnology Information BLAST tool to gather sequences of species, its CLINVAR database to identify variants, Clustal Omega to create PMSAs, and Phylip ProtPars to determine evolutionary variation. We chose 32 genes associated with hereditary cancers, counted pathogenic variants, and measured substitutions/site (a measure of conservation), PMSA gaps, and insertions.

Results

94% of genes had small gaps (5-100 amino acids), 100% had small insertions, and 87.5% had large gaps and/or insertions. Most alignments need significant manual adjustment.

Conclusions

While automated existing programs are very helpful in PMSAs, the process remains labor-intensive and cannot be fully automated.