Abstract
This study tackles the problem of extracting health claims from health research news headlines, in order to carry out veracity check. A health claim can be formally defined as a triplet consisting of an independent variable (IV – namely, what is being manipulated), a dependent variable (DV – namely, what is being measured), and the relation between the two. In this study, we develop HClaimE, an information extraction tool for identifying health claims in news headlines. Unlike the existing open information extraction (OpenIE) systems that rely on verbs as relation indicators, HClaimE focuses on finding relations between nouns, and draws on the linguistic characteristics of news headlines. HClaimE uses a Naïve Bayes classifier that combines syntactic and lexical features for identifying IV and DV nouns, and recognizes relations between IV and DV through a rule-based method. We conducted an evaluation on a set of health news headlines from ScienceDaily.com, and the results show that HClaimE outperforms current OpenIE systems: the F-measures for identifying headlines without health claims is 0.60 and that for extracting IV-relation-DV is 0.69. Our study shows that nouns can provide more clues than verbs for identifying health claims in news headlines. Furthermore, it also shows that dependency relations and bag-of-words can distinguish IV-DV noun pairs from other noun pairs. In practice, HClaimE can be used as a helpful tool to identifying health claims in news headlines, which can then be further compared against authoritative health claims for veracity. Given the linguistic similarity between health claims and other causal claims, e.g., impacts of pollution on the environment, HClaimE may also be applicable for extracting claims in other domains.
Original language | English (US) |
---|---|
Pages (from-to) | 1220-1233 |
Number of pages | 14 |
Journal | Information Processing and Management |
Volume | 56 |
Issue number | 4 |
DOIs | |
State | Published - Jul 2019 |
Keywords
- Health claim
- Information extraction
- Information quality
- Lexical feature
- Natural language processing
- Syntactic feature
ASJC Scopus subject areas
- Information Systems
- Media Technology
- Computer Science Applications
- Management Science and Operations Research
- Library and Information Sciences