Improved document representation for classification tasks ii ii for the intelligence commiumity

Ozgur Yilmazel, Svetlana Symonenko, Niranjan Balasubramanian, Elizabeth D. Liddy

Research output: Contribution to conferencePaper

1 Scopus citations

Abstract

Research within a larger, multi-faceted risk assessment project for the Intelligence Community (IC) combines Natural Language Processing (NLP) and Machine Learning techniques to detect potentially malicious shifts in the semantic content of information either accessed or produced by insiders within an organization. Our hypothesis is that the use of fewer, more discriminative linguistic features can outperform the traditional bag-of-words (BOW) representation in classification tasks. Experiments using the standard Support Vector Machine algorithm and the LibSVM algorithm compared the BOW representation and two NLP representations. Classification results on NLP-based document representation vectors achieved greater precision and recall using forty-nine times fewer features than the BOW representation. The NLP-based representations improved classification performance by producing a lower dimensional but more linearly separable feature space that modeled the problem domain more accurately. Results demonstrate that document representation using sophisticated NLP-extracted features improved text classification effectiveness and efficiency with the SVM and LibSVM algorithms.

Original languageEnglish (US)
Pages76-82
Number of pages7
StatePublished - Dec 1 2005
Event2005 AAAI Spring Symposium - Stanford, CA, United States
Duration: Mar 21 2005Mar 23 2005

Other

Other2005 AAAI Spring Symposium
CountryUnited States
CityStanford, CA
Period3/21/053/23/05

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Improved document representation for classification tasks ii ii for the intelligence commiumity'. Together they form a unique fingerprint.

  • Cite this

    Yilmazel, O., Symonenko, S., Balasubramanian, N., & Liddy, E. D. (2005). Improved document representation for classification tasks ii ii for the intelligence commiumity. 76-82. Paper presented at 2005 AAAI Spring Symposium, Stanford, CA, United States.