MPC: A multi-party chat corpus for modeling social phenomena in discourse

Samira Shaikh, Tomek Strzalkowski, Aaron Broadwell, Jennifer Stromer-Galley, Sarah Taylor, Nick Webb

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

In this paper, we describe our experience with collecting and creating an annotated corpus of multi-party online conversations in a chat-room environment. This effort is part of a larger project to develop computational models of social phenomena such as agenda control, influence, and leadership in on-line interactions. Such models will help capturing the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper we describe data collection method used and the characteristics of the initial dataset of English chat. We have devised a multi-tiered collection process in which the subjects start from simple, free-flowing conversations and progress towards more complex and structured interactions. In this paper, we report on the first two stages of this process, which were recently completed. The third, large-scale collection effort is currently being conducted. All English dialogue has been annotated at four levels: communication links, dialogue acts, local topics and meso-topics.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
PublisherEuropean Language Resources Association (ELRA)
Pages2007-2013
Number of pages7
ISBN (Electronic)2951740867, 9782951740860
StatePublished - Jan 1 2010
Externally publishedYes
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: May 17 2010May 23 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
CountryMalta
CityValletta
Period5/17/105/23/10

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Fingerprint Dive into the research topics of 'MPC: A multi-party chat corpus for modeling social phenomena in discourse'. Together they form a unique fingerprint.

  • Cite this

    Shaikh, S., Strzalkowski, T., Broadwell, A., Stromer-Galley, J., Taylor, S., & Webb, N. (2010). MPC: A multi-party chat corpus for modeling social phenomena in discourse. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 2007-2013). European Language Resources Association (ELRA).