The Origin and Value of Disagreement Among Data Labelers: A Case Study of Individual Differences in Hate Speech Annotation

Yisi Sang, Jeffrey Stanton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Human annotated data is the cornerstone of today’s artificial intelligence efforts, yet data labeling processes can be complicated and expensive, especially when human labelers disagree with each other. The current work practice is to use majority-voted labels to overrule the disagreement. However, in the subjective data labeling tasks such as hate speech annotation, disagreement among individual labelers can be difficult to resolve. In this paper, we explored why such disagreements occur using a mixed-method approach – including interviews with experts, concept mapping exercises, and self-reporting items – to develop a multidimensional scale for distilling the process of how annotators label a hate speech corpus. We tested this scale with 170 annotators in a hate speech annotation task. Results showed that our scale can reveal facets of individual differences among annotators (e.g., age, personality, etc.), and these facets’ relationships to an annotator’s final label decision of an instance. We suggest that this work contributes to the understanding of how humans annotate data. The proposed scale can potentially improve the value of the currently discarded minority-vote labels.

Original languageEnglish (US)
Title of host publicationInformation for a Better World
Subtitle of host publicationShaping the Global Future - 17th International Conference, iConference 2022, Proceedings
EditorsMalte Smits
PublisherSpringer Science and Business Media Deutschland GmbH
Pages425-444
Number of pages20
ISBN (Print)9783030969561
DOIs
StatePublished - 2022
Event17th International Conference on Information for a Better World: Shaping the Global Future, iConference 2022 - Virtual, Online
Duration: Feb 28 2022Mar 4 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13192 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Information for a Better World: Shaping the Global Future, iConference 2022
CityVirtual, Online
Period2/28/223/4/22

Keywords

  • Content moderation
  • Data labeler
  • Disagreement
  • Hate speech
  • Label
  • Multidimensional scale

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'The Origin and Value of Disagreement Among Data Labelers: A Case Study of Individual Differences in Hate Speech Annotation'. Together they form a unique fingerprint.

Cite this