Date of Completion


Document Type

Honors College Thesis


Microbiology and Molecular Genetics

First Advisor

Indra Neil Sarkar


Translational Bioinformatics, Comparative Genomics, Computational Biology, Bioinformatics, Alzheimer Disease, Phylogenomics


The characterization of complex diseases remains a great challenge for biomedical researchers due to the myriad interactions of genetic and environmental factors. Adaptation of phylogenomic techniques to increasingly available genomic data provides an evolutionary perspective that may elucidate important unknown features of complex diseases. Here an automated method is presented that leverages publicly available genomic data and phylogenomic techniques. The approach is tested with nine genes implicated in the development of Alzheimer Disease, a complex neurodegenerative syndrome.

The developed technique, which is an update to a previously described Perl script called “ASAP,” was implemented through a suite of Ruby scripts entitled “ASAP2,” first compiles a list of sequence-similarity based orthologues using PSI-BLAST and a recursive NCBI BLAST+ search strategy, then constructs maximum parsimony phylogenetic trees for each set of nucleotide and protein sequences, and calculates phylogenetic metrics (partitioned Bremer support values, combined branch scores, and Robinson-Foulds distance) to provide an empirical assessment of evolutionary conservation within a given genetic network.

This study demonstrates the potential for using automated simultaneous phylogenetic analysis to uncover previously unknown relationships among disease-associated genes that may not have been apparent using traditional, single-gene methods. Furthermore, the results provide the first integrated evolutionary history of an Alzheimer Disease gene network and identify potentially important co-evolutionary clustering around components of oxidative stress pathways.


The described software (ASAP2) is publicly accessible at the following GitHub repository:


Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

figure_legends-20140502.docx (84 kB)
Figure legends

figure1_asap2_workflow.eps (5822 kB)
Figure 1: ASAP2 Workflow

figure2_AD_Nucleotide_Trees.eps (10493 kB)
Figure 2: AD Nucleotide Trees

figure3_AD_Protein_Trees.eps (10335 kB)
Figure 3: AD Protein Trees

figure4_nuc_pbsup_annotated.eps (2803 kB)
Figure 4: Nucleotide PBS Tree

figure5_prot_pbsup_annotated.eps (2792 kB)
Figure 5: Protein PBS Tree

figure6a_rf_nuc_pairs.eps (2576 kB)
Figure 6a: AD RF Nucleotide Pairs

figure6b_rf_prot_pairs.eps (1843 kB)
Figure 6b: AD RF Protein Pairs

figure7a_mtdna_prot.eps (1836 kB)
Figure 7a: mtDNA RF Protein Paris

figure7b_mtdna_nuc.eps (1972 kB)
Figure 7b: mtDNA RF Nucleotide Pairs

figure7c_both_prot.eps (2244 kB)
Figure 7c: AD+mtDNA RF Protein Pairs

figure7d_both_nuc.eps (2343 kB)
Figure 7d: AD+mtDNA RF Nucleotide Pairs