Abstract
In this paper, we describe our experience with collecting and creating an annotated corpus of multi-party online conversations in a chat-room environment. This effort is part of a larger project to develop computational models of social phenomena such as agenda control, influence, and leadership in on-line interactions. Such models will help capturing the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper we describe data collection method used and the characteristics of the initial dataset of English chat. We have devised a multi-tiered collection process in which the subjects start from simple, free-flowing conversations and progress towards more complex and structured interactions. In this paper, we report on the first two stages of this process, which were recently completed. The third, large-scale collection effort is currently being conducted. All English dialogue has been annotated at four levels: communication links, dialogue acts, local topics and meso-topics.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 |
Publisher | European Language Resources Association (ELRA) |
Pages | 2007-2013 |
Number of pages | 7 |
ISBN (Electronic) | 2951740867, 9782951740860 |
State | Published - Jan 1 2010 |
Externally published | Yes |
Event | 7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta Duration: May 17 2010 → May 23 2010 |
Other
Other | 7th International Conference on Language Resources and Evaluation, LREC 2010 |
---|---|
Country/Territory | Malta |
City | Valletta |
Period | 5/17/10 → 5/23/10 |
ASJC Scopus subject areas
- Education
- Library and Information Sciences
- Linguistics and Language
- Language and Linguistics