Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Yingya Li, Bei Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.

Original languageEnglish (US)
Title of host publicationInformation in Contemporary Society - 14th International Conference, iConference 2019, Proceedings
EditorsMichelle H. Martin, Natalie Greene Taylor, Bonnie Nardi, Caitlin Christian-Lamb
PublisherSpringer Verlag
Pages679-689
Number of pages11
ISBN (Print)9783030157418
DOIs
StatePublished - Jan 1 2019
Event14th International Conference on Information in Contemporary Society, iConference 2019 - Washington, United States
Duration: Mar 31 2019Apr 3 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11420 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Information in Contemporary Society, iConference 2019
CountryUnited States
CityWashington
Period3/31/194/3/19

Fingerprint

Learning systems
Machine Learning
Randomized Clinical Trial
Health
Text Classification
Summarization
High Accuracy
Covering
Evaluate

Keywords

  • Biomedicine
  • Machine learning
  • Rule-based approach
  • Text classification

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Li, Y., & Yu, B. (2019). Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts. In M. H. Martin, N. G. Taylor, B. Nardi, & C. Christian-Lamb (Eds.), Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings (pp. 679-689). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11420 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-15742-5_64

Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts. / Li, Yingya; Yu, Bei.

Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings. ed. / Michelle H. Martin; Natalie Greene Taylor; Bonnie Nardi; Caitlin Christian-Lamb. Springer Verlag, 2019. p. 679-689 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11420 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, Y & Yu, B 2019, Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts. in MH Martin, NG Taylor, B Nardi & C Christian-Lamb (eds), Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11420 LNCS, Springer Verlag, pp. 679-689, 14th International Conference on Information in Contemporary Society, iConference 2019, Washington, United States, 3/31/19. https://doi.org/10.1007/978-3-030-15742-5_64
Li Y, Yu B. Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts. In Martin MH, Taylor NG, Nardi B, Christian-Lamb C, editors, Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings. Springer Verlag. 2019. p. 679-689. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-15742-5_64
Li, Yingya ; Yu, Bei. / Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts. Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings. editor / Michelle H. Martin ; Natalie Greene Taylor ; Bonnie Nardi ; Caitlin Christian-Lamb. Springer Verlag, 2019. pp. 679-689 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{05ddccb1286646f8a961454f3ea95e5d,
title = "Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts",
abstract = "Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.",
keywords = "Biomedicine, Machine learning, Rule-based approach, Text classification",
author = "Yingya Li and Bei Yu",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-15742-5_64",
language = "English (US)",
isbn = "9783030157418",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "679--689",
editor = "Martin, {Michelle H.} and Taylor, {Natalie Greene} and Bonnie Nardi and Caitlin Christian-Lamb",
booktitle = "Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings",

}

TY - GEN

T1 - Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

AU - Li, Yingya

AU - Yu, Bei

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.

AB - Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.

KW - Biomedicine

KW - Machine learning

KW - Rule-based approach

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=85064043651&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064043651&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-15742-5_64

DO - 10.1007/978-3-030-15742-5_64

M3 - Conference contribution

AN - SCOPUS:85064043651

SN - 9783030157418

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 679

EP - 689

BT - Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings

A2 - Martin, Michelle H.

A2 - Taylor, Natalie Greene

A2 - Nardi, Bonnie

A2 - Christian-Lamb, Caitlin

PB - Springer Verlag

ER -