Free-text retrieval is less effective than it might be because of its dependence on notions that evolved with controlled vocabulary representation and searching. The structure and nature of the discourse level features of natural language text types are not incorporated. In an attempt to address this problem, an exploratory study was conducted for the purpose of determining whether information abstracts reporting on empirical work do possess a predictable discourse-level structure and whether there are lexical clues that reveal this structure. A three phase study was conducted, with Phase I making use of four tasks to delineate the structure of empirical abstracts based on the internalized notions of 12 expert abstractors. Phase II consisted of a linguistic analysis of 276 empirical abstracts that suggested a linguistic model of an empirical abstract, which was tested in Phase III with a two stage validation procedure using 68 abstracts and four abstractors. Results indicate that expert abstractors do possess an internalized structure of empirical abstracts, whose components and relations were confirmed repeatedly over the four tasks. Substantively the same structure revealed by the experts was manifested in the sample of abstracts, with a relatively small set of recurring lexical clues revealing the presence and nature of the text components. Abstractors validated the linguistic model at an average level of 86%. Results strongly support the presence of a detectable structure in the text-type of empirical abstracts. Such a structure may be of use in a variety of text-based information processing systems. The techniques developed for analyzing natural language texts for the purpose of providing more useful representations of their semantic content offer potential for application of other types of natural language texts.
ASJC Scopus subject areas
- Information Systems
- Media Technology
- Computer Science Applications
- Management Science and Operations Research
- Library and Information Sciences