The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

Conference Year

January 2022

Abstract

Communications surrounding open-source projects largely occurs outside the repository. Historically, large communities used a collection of mailing lists to discuss their projects. Software development and communication happening on different channels complicates the study of open-source projects. Here, we combine and standardize mailing lists of the Python community, as well as the Golang, Angular and Node.js communities. We focus on the CPython repository and merge the technical layer (GitHub account file collaborations) with the social layer (emails), identifying 33% of GitHub contributors in the mailing-list data. We then explore correlations between the social messaging and the structure of the collaboration network.

Primary Faculty Mentor Name

Laurent Hébert-Dufresne

Faculty/Staff Collaborators

Jean-Gabriel Young, James Bagrow

Status

Graduate

Student College

College of Engineering and Mathematical Sciences

Program/Major

Computer Science

Primary Research Category

Arts & Humanities

Abstract only.

Share

COinS
 

The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

Communications surrounding open-source projects largely occurs outside the repository. Historically, large communities used a collection of mailing lists to discuss their projects. Software development and communication happening on different channels complicates the study of open-source projects. Here, we combine and standardize mailing lists of the Python community, as well as the Golang, Angular and Node.js communities. We focus on the CPython repository and merge the technical layer (GitHub account file collaborations) with the social layer (emails), identifying 33% of GitHub contributors in the mailing-list data. We then explore correlations between the social messaging and the structure of the collaboration network.