Date of Completion
2022
Document Type
Honors College Thesis
Department
Data Science and Complex Systems
Thesis Type
Honors College
First Advisor
James P. Bagrow
Second Advisor
Laurent Hébert-Dufresne
Keywords
Open Science, Traceability, Open Access, Software Engineering, Bipartite Network, Co-occurrence, GitHub, Citation Network, Code Artifacts, Reproducibility
Abstract
Reproducibility is the foundation of published science by which results are validated or refuted and is a key principle of open science. The relative novelty of the current open science paradigm demands inspection of its reproducibility and citing or attribution practices. We extract over 60,000 links to GitHub repository code artifacts within paper texts from the Semantic Scholar Open Research Corpus. We examine these artifacts, extrapolating that a majority of them involve a repository directly created by an author of the paper they were found in. We describe several qualities of this set of links including the degree distribution of linked papers, the frequency of links found over time, and the bidirectionality of the link from repository to paper. We look at the co-occurrence of citations to papers and their associated repositories through the underlying network structure. Finally, we attempt to elucidate the presence of missing or deleted traces to code artifacts.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
Recommended Citation
Friedrichsen, Alex P.; Bagrow, James P. Ph.D; and Hébert-Dufresne, Laurent Ph.D, "Exploring code dissemination in open science: traces of GitHub projects in the literature" (2022). UVM Patrick Leahy Honors College Senior Theses. 464.
https://scholarworks.uvm.edu/hcoltheses/464