TY - JOUR
T1 - The good, the bad, and the ugly
T2 - uncovering novel research opportunities in social media mining
AU - Liu, Huan
AU - Morstatter, Fred
AU - Tang, Jiliang
AU - Zafarani, Reza
N1 - Funding Information:
This material is based upon works supported by, or in part by, the U. S. Army Research Office under Grant Number #025071, National Science Foundation under Grant Number (IIS-1217466) and Office of Naval Research under Grant Numbers (N000141410095, N00014-16-1-2257). The authors are grateful to the former and current members of ASU DMML laboratory and collaborators in these projects.
Funding Information:
This material is based upon works supported by, or in part by, the U. S. Army Research Office under Grant Number #025071, National Science Foundation under Grant Number (IIS-1217466) and Office of Naval Research under Grant Numbers (N000141410095, N00014-16-1-2257). The authors are grateful to the former and current members of ASU DMML laboratory and collaborators in these projects.
Publisher Copyright:
© 2016, Springer International Publishing Switzerland.
PY - 2016/11/1
Y1 - 2016/11/1
N2 - Big data is ubiquitous and can only become bigger, which challenges traditional data mining and machine learning methods. Social media is a new source of data that is significantly different from conventional ones. Social media data are mostly user-generated, and are big, linked, and heterogeneous. We present the good, the bad and the ugly associated with the multi-faceted social media data and exemplify the importance of some original problems with real-world examples. We discuss bias in social media data, evaluation dilemma, data reduction, inferring invisible information, and big-data paradox. We illuminate new opportunities of developing novel algorithms and tools for data science. In our endeavor of employing the good to tame the bad with the help of the ugly, we deepen the understanding of ever growing and continuously evolving data and create innovative solutions with interdisciplinary and collaborative research of data science.
AB - Big data is ubiquitous and can only become bigger, which challenges traditional data mining and machine learning methods. Social media is a new source of data that is significantly different from conventional ones. Social media data are mostly user-generated, and are big, linked, and heterogeneous. We present the good, the bad and the ugly associated with the multi-faceted social media data and exemplify the importance of some original problems with real-world examples. We discuss bias in social media data, evaluation dilemma, data reduction, inferring invisible information, and big-data paradox. We illuminate new opportunities of developing novel algorithms and tools for data science. In our endeavor of employing the good to tame the bad with the help of the ugly, we deepen the understanding of ever growing and continuously evolving data and create innovative solutions with interdisciplinary and collaborative research of data science.
KW - Big-data paradox
KW - Data analytics
KW - Data mining
KW - Evaluation
KW - Social media
UR - http://www.scopus.com/inward/record.url?scp=85026368303&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026368303&partnerID=8YFLogxK
U2 - 10.1007/s41060-016-0023-0
DO - 10.1007/s41060-016-0023-0
M3 - Article
AN - SCOPUS:85026368303
SN - 2364-415X
VL - 1
SP - 137
EP - 143
JO - International Journal of Data Science and Analytics
JF - International Journal of Data Science and Analytics
IS - 3-4
ER -