Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Yingya Li, Bei Yu

Research output: Chapter in Book/Entry/PoemConference contribution

Abstract

Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.

Original languageEnglish (US)
Title of host publicationInformation in Contemporary Society - 14th International Conference, iConference 2019, Proceedings
EditorsMichelle H. Martin, Natalie Greene Taylor, Bonnie Nardi, Caitlin Christian-Lamb
PublisherSpringer Verlag
Pages679-689
Number of pages11
ISBN (Print)9783030157418
DOIs
StatePublished - 2019
Event14th International Conference on Information in Contemporary Society, iConference 2019 - Washington, United States
Duration: Mar 31 2019Apr 3 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11420 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Information in Contemporary Society, iConference 2019
Country/TerritoryUnited States
CityWashington
Period3/31/194/3/19

Keywords

  • Biomedicine
  • Machine learning
  • Rule-based approach
  • Text classification

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts'. Together they form a unique fingerprint.

Cite this