HClaimE: A tool for identifying health claims in health news headlines

Shi Yuan, Bei Yu

Research output: Contribution to journalArticle

3 Scopus citations

Abstract

This study tackles the problem of extracting health claims from health research news headlines, in order to carry out veracity check. A health claim can be formally defined as a triplet consisting of an independent variable (IV – namely, what is being manipulated), a dependent variable (DV – namely, what is being measured), and the relation between the two. In this study, we develop HClaimE, an information extraction tool for identifying health claims in news headlines. Unlike the existing open information extraction (OpenIE) systems that rely on verbs as relation indicators, HClaimE focuses on finding relations between nouns, and draws on the linguistic characteristics of news headlines. HClaimE uses a Naïve Bayes classifier that combines syntactic and lexical features for identifying IV and DV nouns, and recognizes relations between IV and DV through a rule-based method. We conducted an evaluation on a set of health news headlines from ScienceDaily.com, and the results show that HClaimE outperforms current OpenIE systems: the F-measures for identifying headlines without health claims is 0.60 and that for extracting IV-relation-DV is 0.69. Our study shows that nouns can provide more clues than verbs for identifying health claims in news headlines. Furthermore, it also shows that dependency relations and bag-of-words can distinguish IV-DV noun pairs from other noun pairs. In practice, HClaimE can be used as a helpful tool to identifying health claims in news headlines, which can then be further compared against authoritative health claims for veracity. Given the linguistic similarity between health claims and other causal claims, e.g., impacts of pollution on the environment, HClaimE may also be applicable for extracting claims in other domains.

Original languageEnglish (US)
Pages (from-to)1220-1233
Number of pages14
JournalInformation Processing and Management
Volume56
Issue number4
DOIs
StatePublished - Jul 2019

Keywords

  • Health claim
  • Information extraction
  • Information quality
  • Lexical feature
  • Natural language processing
  • Syntactic feature

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'HClaimE: A tool for identifying health claims in health news headlines'. Together they form a unique fingerprint.

  • Cite this