Date of Completion


Document Type

Honors College Thesis



Thesis Type

Honors College

First Advisor

Jean Gabriel Young

Second Advisor

Bernard Cole


Bayesian, logistic regression, normal prior, double exponential prior, regularized horseshoe prior, FIDDLE, C. diff.


Healthcare-associated Clostridiodides Difficile (C. diff.) infections are one of the most common healthcare associated infections in the U.S., leading to thousands of deaths per year. Machine learning algorithms have shown some ability to predict who is most vul- nerable to C. diff. infection utilizing electronic health records obtained soon after admittance, but these models have shown insufficient predictive capability. We extracted data from the electronic medical records provided in the MIMIC-III Clinical Database which contains data from the Beth Israel Deaconess Medical Center between 2001 and 2012, resulting in very large predictor matrices. We aimed to predict which patients would receive a positive test for C. diff. using a Bayesian logistic regression model. We examined the impact of three different priors, a normal, double exponential, and regularized horseshoe prior to understand how prior choice influenced predictive capability and the size of coefficients. We used cross-validation to test the predictive capability of each prior, and compared results between models using ROC and PR curves. Our results show that of the three priors, the regularized horseshoe prior achieves the highest prediction accuracy.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.