Societal Computing Thesis Proposal
- Remote Access - Zoom
- Virtual Presentation - ET
- HUILIAN SOPHIE QIU
- Ph.D. Student
- Ph.D. Program in Societal Computing
- Institute for Software Research, Carnegie Mellon University
Enhancing Diversity and Inclusion in Open-Source Software Communities
Open-source software (OSS) today is ubiquitous and indispensable, supporting applications in virtually every domain. Therefore, sustaining this digital infrastructure is of utmost societal importance. One of the significant challenges in OSS sustainability is its low gender diversity. It is a well-known fact that the open-source software community is heavily skewed towards men. Studies have found that a low gender diversity environment is non-inclusive to non-male people. Women are one of the under-represented groups, taking up at most 10% of the OSS population. Several studies have demonstrated that women face more discrimination; for example, in some ecosystems, women have lower code acceptance rates, longer code review delays, and doubts about their skills and abilities. The low diversity and non-inclusive culture can lead to three major challenges. First, it limits the contributor pool, which harms OSS sustainability because OSS projects need a constant supply of effort for development and maintenance. Second, it impedes project success because evidence shows that a higher gender diverse team is more productive and has better performance. Third, it affects gender representation and equity, thus preventing all contributors from enjoying the benefits of OSS, such as finding a job.
While studies show the presence of discrimination, relatively little is known about why this happens and what might be an effective intervention. This thesis includes a series of mixed-methods empirical studies that aim to explain the low representation of women among other minority groups. Because OSS development is a socio-technical activity, I use social sciences and humanities theories, such as sociology, economics, and linguistics, to derive hypotheses and explain and contextualize results. Using the results from these studies as a foundation, I also take one step further to designing and prototyping/testing/piloting interventions.
This thesis consists of four studies that address the diversity and inclusion problem at different stages of a typical OSS contributor's career trajectory. The first two studies focus on bringing more women into OSS. While there are many all-female code camps for young female students, the first study focused on a code camp designed for female adults who wanted to get a taste of programming by teaching them computer programming and introducing them to OSS. We evaluated its effectiveness on participants' perceived programming ability and willingness to continue studying computer programming. The second study aimed to understand better how a newcomer could choose more suitable first projects. Informed by signaling theory, borrowed from economics and biology, we studied how signals, i.e., visible cues, on GitHub can be used to determine if a project is suitable for new contributors.
Once newcomers become contributors, the more pressing problem is retention. Building on social capital theory, we built a survival model to test the roles of social capital in an OSS contributor's prolonged participation. Since communication is essential to online collaboration, we also study contributors' retention using text analytics. Motivated by linguistic theories, such as the politeness framework, I propose to build machine learning models to detect negative interactions during the code review process. Finally, I propose interventions that can enhance diversity and inclusion in OSS.
Bogdan Vasilescu (Chair)
Emerson Murphy-Hill (Google Research)
Zoom Participation. See announcement.