TY - GEN
T1 - Extending the MPC corpus to Chinese and Urdu - A multiparty multi-lingual chat corpus for modeling social phenomena in language
AU - Liu, Ting
AU - Shaikh, Samira
AU - Strzalkowski, Tomek
AU - Broadwell, Aaron
AU - Stromer-Galley, Jennifer
AU - Taylor, Sarah
AU - Boz, Umit
AU - Ren, Xiaoai
AU - Wu, Jingsi
N1 - Funding Information:
This research was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the U.S. Army Research Lab. All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of IARPA, the ODNI or the U.S. Government.
PY - 2012
Y1 - 2012
N2 - In this paper, we report our efforts in building a multi-lingual multi-party online chat corpus (MMPC) in order to develop a firm understanding in a set of social constructs such as agenda control, influence, and leadership as well as to computationally model such constructs in online interactions. These automated models will help capture the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper, we first introduce our experiment design and data collection method in Chinese and Urdu, and then report on the current stage of our data collection. We annotated the collected corpus on four levels: communication links, dialogue acts, local topics, and meso-topics. Results from the analyses of annotated data on different languages indicate some interesting phenomena, which are reported in this paper.
AB - In this paper, we report our efforts in building a multi-lingual multi-party online chat corpus (MMPC) in order to develop a firm understanding in a set of social constructs such as agenda control, influence, and leadership as well as to computationally model such constructs in online interactions. These automated models will help capture the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper, we first introduce our experiment design and data collection method in Chinese and Urdu, and then report on the current stage of our data collection. We annotated the collected corpus on four levels: communication links, dialogue acts, local topics, and meso-topics. Results from the analyses of annotated data on different languages indicate some interesting phenomena, which are reported in this paper.
KW - Annotation
KW - Multi-lingual Multi-party online-chat
KW - Post-session survey
KW - Social phenomena
UR - http://www.scopus.com/inward/record.url?scp=84874810239&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874810239&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84874810239
T3 - Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
SP - 2868
EP - 2873
BT - Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
A2 - Dogan, Mehmet Ugur
A2 - Mariani, Joseph
A2 - Moreno, Asuncion
A2 - Goggi, Sara
A2 - Choukri, Khalid
A2 - Calzolari, Nicoletta
A2 - Odijk, Jan
A2 - Declerck, Thierry
A2 - Maegaard, Bente
A2 - Piperidis, Stelios
A2 - Mazo, Helene
A2 - Hamon, Olivier
PB - European Language Resources Association (ELRA)
T2 - 8th International Conference on Language Resources and Evaluation, LREC 2012
Y2 - 21 May 2012 through 27 May 2012
ER -