Date of Completion

2022

Document Type

Honors College Thesis

Department

Data Science and Complex Systems

Thesis Type

Honors College

First Advisor

James P. Bagrow

Second Advisor

Laurent Hébert-Dufresne

Keywords

Open Science, Traceability, Open Access, Software Engineering, Bipartite Network, Co-occurrence, GitHub, Citation Network, Code Artifacts, Reproducibility

Abstract

Reproducibility is the foundation of published science by which results are validated or refuted and is a key principle of open science. The relative novelty of the current open science paradigm demands inspection of its reproducibility and citing or attribution practices. We extract over 60,000 links to GitHub repository code artifacts within paper texts from the Semantic Scholar Open Research Corpus. We examine these artifacts, extrapolating that a majority of them involve a repository directly created by an author of the paper they were found in. We describe several qualities of this set of links including the degree distribution of linked papers, the frequency of links found over time, and the bidirectionality of the link from repository to paper. We look at the co-occurrence of citations to papers and their associated repositories through the underlying network structure. Finally, we attempt to elucidate the presence of missing or deleted traces to code artifacts.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Share

COinS