TY - GEN
T1 - On data collection, graph construction, and sampling in Twitter
AU - Wendt, Jeremy D.
AU - Wells, Randy
AU - Field, Richard V.
AU - Soundarajan, Sucheta
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/21
Y1 - 2016/11/21
N2 - We present a detailed study on data collection, graph construction, and sampling in Twitter. We observe that sampling on semantic graphs (i.e., graphs with multiple edge types) presents fundamentally distinct challenges from sampling on traditional graphs. The purpose of our work is to present new challenges and initial solutions for sampling semantic graphs. Novel elements of our work include the following: (1) We provide a thorough discussion of problems encountered with naïve breadth-first search on semantic graphs. We argue that common sampling methods such as breadth-first search face specific challenges on semantic graphs that are not encountered on graphs with homogeneous edge types. (2) We present two competing methods for creating semantic graphs from data collects, corresponding to the interactions between sampling of different edge types. (3) We discuss new metrics specific to graphs with multiple edge types, and discuss the effect of the sampling method on these metrics. (4) We discuss issues and potential solutions pertaining to sampling semantic graphs.
AB - We present a detailed study on data collection, graph construction, and sampling in Twitter. We observe that sampling on semantic graphs (i.e., graphs with multiple edge types) presents fundamentally distinct challenges from sampling on traditional graphs. The purpose of our work is to present new challenges and initial solutions for sampling semantic graphs. Novel elements of our work include the following: (1) We provide a thorough discussion of problems encountered with naïve breadth-first search on semantic graphs. We argue that common sampling methods such as breadth-first search face specific challenges on semantic graphs that are not encountered on graphs with homogeneous edge types. (2) We present two competing methods for creating semantic graphs from data collects, corresponding to the interactions between sampling of different edge types. (3) We discuss new metrics specific to graphs with multiple edge types, and discuss the effect of the sampling method on these metrics. (4) We discuss issues and potential solutions pertaining to sampling semantic graphs.
UR - http://www.scopus.com/inward/record.url?scp=85006741699&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006741699&partnerID=8YFLogxK
U2 - 10.1109/ASONAM.2016.7752360
DO - 10.1109/ASONAM.2016.7752360
M3 - Conference contribution
AN - SCOPUS:85006741699
T3 - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
SP - 985
EP - 992
BT - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
A2 - Kumar, Ravi
A2 - Caverlee, James
A2 - Tong, Hanghang
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
Y2 - 18 August 2016 through 21 August 2016
ER -