Analyzing a Data Science Online Practitioner Community: Trends and Implications for Data Science Project Management

Research output: Chapter in Book/Entry/PoemConference contribution

1 Scopus citations

Abstract

The overarching goal of this research was to gain an understanding of what the data science Reddit online community discussed before, during, and after COVID-19. We used a publicly available Reddit API to harvest the r/datascience subreddit first level post data. We then performed manual annotation to explore the taxonomy of trends and themes discussed by the practitioners who belonged to reddit data science community. Then, we augmented the manually annotated data using a BERT model with topic modeling. In short, the key discussion themes, in order of frequency, were: Education, Jobs, Methods (of data science), Hardware and data collection, Data visualization, and Quality. The Quality theme includes discussions on bias, transparency, and fairness. Hence, a key finding was that there were very few discussions on data science project quality, especially trying to minimize the risk of machine learning bias. As discussions on bias are not yet common, data science teams should proactively identify and address potential questions and concerns that might arise in data science projects, especially the need to increase the team's focus on potential bias and fairness.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
EditorsShusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2673-2681
Number of pages9
ISBN (Electronic)9781665480451
DOIs
StatePublished - 2022
Event2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan
Duration: Dec 17 2022Dec 20 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022

Conference

Conference2022 IEEE International Conference on Big Data, Big Data 2022
Country/TerritoryJapan
CityOsaka
Period12/17/2212/20/22

Keywords

  • CoP
  • Data Science
  • Online Communities
  • Project Management
  • community of practice

ASJC Scopus subject areas

  • Modeling and Simulation
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Analyzing a Data Science Online Practitioner Community: Trends and Implications for Data Science Project Management'. Together they form a unique fingerprint.

Cite this