Date of Award


Document Type


Degree Name

Master of Science (MS)


Complex Systems and Data Science

First Advisor

Laurent Hébert-Dufresne

Second Advisor

E. Ross Colgate


Diarrhea remains a leading cause of childhood morbidity and mortality, especially inchildren under 5 years of age. Enteric pathogen co-infection has been found to cause increased diarrheal severity and some pathogens may work in tandem to enhance infection. It is important to identify patterns of pathogen co-infection and their impact on diarrhea to inform intervention strategies for improving child health. In this thesis, I use data from the MAL-ED and PROVIDE birth cohort studies in Bangladesh to generate bipartite networks that represent the relationship between stool samples and pathogens. I randomly rewire the network while preserving connectivity to randomize the data to control for pathogen prevalence and distributions of pathogens per sample. In the top 12 pairs ranked by their discordance from the null ensemble, I find that bacteria-bacteria pairs often appear at higher than expected frequency, while bacteria-virus pairs appear less often than expected at random. Campylobacter species and E. coli groups (ETEC and EPEC) co-infect much more than expected and in a non-random manner, while rotavirus-associated co-infections vary depending on infant vaccination status. Individuals who experience these co-infections have more days of diarrhea on average compared to those that have had neither pathogen, either, or both but not concurrently. To incorporate information from the environment, I use a joint species distribution model (JSDM) to model both potential biotic relationships between pathogens and a shared response to the environment. I use two different models, partitioned according to the age of the participant during stool collection. I select a subset of environmental metrics from the entire multi-country MAL-ED data set with forward step-wise regression for model selection using 10-fold cross validation. I use a Gibbs sampler to generate a posterior distribution of regression coefficients to describe a pathogen’s response to the environment, as well as residual correlations to represent potential biotic relationships. I find that the strength of the correlations due to the environment is stronger than the residual correlations between pathogens. Some pathogen pairs are approximated well via the posterior predictive distribution using only the environmental information, while some pathogens require both environmental predictors and biotic interactions. The work presented here is a two step pipeline: the first step, to identify pathogens through the configuration model that co-occur more or less than expected due to their prevalence alone, and the second step to characterise those co-occurrences based on their response to the environment. Altogether, this work identifies pathogen pairs that might become targets for clinical or environmental interventions, and contributes to our understanding of the networked and ecological mechanisms and underlying structure of enteric pathogen co-infection.



Number of Pages

153 p.

Included in

Microbiology Commons