MetaExtract: An NLP system to automatically assign metadata

Ozgur Yilmazel, Christina M. Finneran, Elizabeth D. Liddy

Research output: Chapter in Book/Entry/PoemConference contribution

36 Scopus citations

Abstract

We have developed MetaExtract, a system to automatically assign Dublin Core + GEM metadata using extraction techniques from our natural language processing research. MetaExtract is comprised of three distinct processes: eQuery and HTML-based Extraction modules and a Keyword Generator module. We conducted a Web-based survey to have users evaluate each metadata element's quality. Only two of the elements, Title and Keyword, were shown to be significantly different, with the manual quality slightly higher. The remaining elements for which we had enough data to test were shown not to be significantly different; they are: Description, Grade, Duration, Essential Resources, Pedagogy-Teaching Method, and Pedagogy-Group.

Original languageEnglish (US)
Title of host publicationProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004
EditorsH. Chen, M. Christel, E.P. Lim
Pages241-242
Number of pages2
StatePublished - 2004
EventProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global reach and Diverse Impact, JCDL 2004 - Tucson, AZ, United States
Duration: Jun 7 2004Jun 11 2004

Publication series

NameProceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004

Other

OtherProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global reach and Diverse Impact, JCDL 2004
Country/TerritoryUnited States
CityTucson, AZ
Period6/7/046/11/04

Keywords

  • Design
  • Measurement

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'MetaExtract: An NLP system to automatically assign metadata'. Together they form a unique fingerprint.

Cite this