TY - GEN
T1 - Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts
AU - Li, Yingya
AU - Yu, Bei
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.
AB - Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.
KW - Biomedicine
KW - Machine learning
KW - Rule-based approach
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85064043651&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064043651&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-15742-5_64
DO - 10.1007/978-3-030-15742-5_64
M3 - Conference contribution
AN - SCOPUS:85064043651
SN - 9783030157418
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 679
EP - 689
BT - Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings
A2 - Martin, Michelle H.
A2 - Taylor, Natalie Greene
A2 - Nardi, Bonnie
A2 - Christian-Lamb, Caitlin
PB - Springer Verlag
T2 - 14th International Conference on Information in Contemporary Society, iConference 2019
Y2 - 31 March 2019 through 3 April 2019
ER -