Extending the MPC corpus to Chinese and Urdu - A multiparty multi-lingual chat corpus for modeling social phenomena in language

Ting Liu, Samira Shaikh, Tomek Strzalkowski, Aaron Broadwell, Jennifer Stromer-Galley, Sarah Taylor, Umit Boz, Xiaoai Ren, Jingsi Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

In this paper, we report our efforts in building a multi-lingual multi-party online chat corpus (MMPC) in order to develop a firm understanding in a set of social constructs such as agenda control, influence, and leadership as well as to computationally model such constructs in online interactions. These automated models will help capture the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper, we first introduce our experiment design and data collection method in Chinese and Urdu, and then report on the current stage of our data collection. We annotated the collected corpus on four levels: communication links, dialogue acts, local topics, and meso-topics. Results from the analyses of annotated data on different languages indicate some interesting phenomena, which are reported in this paper.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
EditorsMehmet Ugur Dogan, Joseph Mariani, Asuncion Moreno, Sara Goggi, Khalid Choukri, Nicoletta Calzolari, Jan Odijk, Thierry Declerck, Bente Maegaard, Stelios Piperidis, Helene Mazo, Olivier Hamon
PublisherEuropean Language Resources Association (ELRA)
Pages2868-2873
Number of pages6
ISBN (Electronic)9782951740877
StatePublished - 2012
Externally publishedYes
Event8th International Conference on Language Resources and Evaluation, LREC 2012 - Istanbul, Turkey
Duration: May 21 2012May 27 2012

Publication series

NameProceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012

Other

Other8th International Conference on Language Resources and Evaluation, LREC 2012
CountryTurkey
CityIstanbul
Period5/21/125/27/12

Keywords

  • Annotation
  • Multi-lingual Multi-party online-chat
  • Post-session survey
  • Social phenomena

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Extending the MPC corpus to Chinese and Urdu - A multiparty multi-lingual chat corpus for modeling social phenomena in language'. Together they form a unique fingerprint.

  • Cite this

    Liu, T., Shaikh, S., Strzalkowski, T., Broadwell, A., Stromer-Galley, J., Taylor, S., Boz, U., Ren, X., & Wu, J. (2012). Extending the MPC corpus to Chinese and Urdu - A multiparty multi-lingual chat corpus for modeling social phenomena in language. In M. U. Dogan, J. Mariani, A. Moreno, S. Goggi, K. Choukri, N. Calzolari, J. Odijk, T. Declerck, B. Maegaard, S. Piperidis, H. Mazo, & O. Hamon (Eds.), Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012 (pp. 2868-2873). (Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012). European Language Resources Association (ELRA).