This study investigated two hypotheses concerning the use of anaphors in information retrieval. The first hypothesis, that anaphors tend to refer to integral concepts rather than to peripheral concepts, was well supported. Two samples of documents, one in psychology and the other in computer science, were examined by subject experts who judged the centrality of phrases that were referred to anaphorically. The second hypothesis, that various term weighting schemes are affected differently by anaphoric resolution, was also well supported. It was found that schemes that incorporate document length into the calculations produce much smaller increases in term weights for terms occurring in anaphoric resolutions than those that do not consider document length. Further analysis revealed that increases in query term occurrences after resolution of anaphors do not help to discriminate between relevant and nonrelevant documents. It is concluded that although anaphoric resolution has potential for better representing the "aboutness" of a document, it may have more value in the representation of the relationships among concepts in the document, rather than in the representation of the concepts themselves.
ASJC Scopus subject areas
- Information Systems
- Media Technology
- Computer Science Applications
- Management Science and Operations Research
- Library and Information Sciences