Date of Completion
Honors College Thesis
Data Science and Complex Systems
James P. Bagrow
Open Science, Traceability, Open Access, Software Engineering, Bipartite Network, Co-occurrence, GitHub, Citation Network, Code Artifacts, Reproducibility
Reproducibility is the foundation of published science by which results are validated or refuted and is a key principle of open science. The relative novelty of the current open science paradigm demands inspection of its reproducibility and citing or attribution practices. We extract over 60,000 links to GitHub repository code artifacts within paper texts from the Semantic Scholar Open Research Corpus. We examine these artifacts, extrapolating that a majority of them involve a repository directly created by an author of the paper they were found in. We describe several qualities of this set of links including the degree distribution of linked papers, the frequency of links found over time, and the bidirectionality of the link from repository to paper. We look at the co-occurrence of citations to papers and their associated repositories through the underlying network structure. Finally, we attempt to elucidate the presence of missing or deleted traces to code artifacts.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
Friedrichsen, Alex P.; Bagrow, James P. Ph.D; and Hébert-Dufresne, Laurent Ph.D, "Exploring code dissemination in open science: traces of GitHub projects in the literature" (2022). UVM Honors College Senior Theses. 464.
Available for download on Sunday, May 19, 2024