On data collection, graph construction, and sampling in Twitter

Jeremy D. Wendt, Randy Wells, Richard V. Field, Sucheta Soundarajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We present a detailed study on data collection, graph construction, and sampling in Twitter. We observe that sampling on semantic graphs (i.e., graphs with multiple edge types) presents fundamentally distinct challenges from sampling on traditional graphs. The purpose of our work is to present new challenges and initial solutions for sampling semantic graphs. Novel elements of our work include the following: (1) We provide a thorough discussion of problems encountered with naïve breadth-first search on semantic graphs. We argue that common sampling methods such as breadth-first search face specific challenges on semantic graphs that are not encountered on graphs with homogeneous edge types. (2) We present two competing methods for creating semantic graphs from data collects, corresponding to the interactions between sampling of different edge types. (3) We discuss new metrics specific to graphs with multiple edge types, and discuss the effect of the sampling method on these metrics. (4) We discuss issues and potential solutions pertaining to sampling semantic graphs.

Original languageEnglish (US)
Title of host publicationProceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages985-992
Number of pages8
ISBN (Electronic)9781509028467
DOIs
StatePublished - Nov 21 2016
Event2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016 - San Francisco, United States
Duration: Aug 18 2016Aug 21 2016

Other

Other2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
CountryUnited States
CitySan Francisco
Period8/18/168/21/16

Fingerprint

twitter
semantics
Sampling
Semantics
interaction

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Sociology and Political Science
  • Communication

Cite this

Wendt, J. D., Wells, R., Field, R. V., & Soundarajan, S. (2016). On data collection, graph construction, and sampling in Twitter. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016 (pp. 985-992). [7752360] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASONAM.2016.7752360

On data collection, graph construction, and sampling in Twitter. / Wendt, Jeremy D.; Wells, Randy; Field, Richard V.; Soundarajan, Sucheta.

Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 985-992 7752360.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wendt, JD, Wells, R, Field, RV & Soundarajan, S 2016, On data collection, graph construction, and sampling in Twitter. in Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016., 7752360, Institute of Electrical and Electronics Engineers Inc., pp. 985-992, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, San Francisco, United States, 8/18/16. https://doi.org/10.1109/ASONAM.2016.7752360
Wendt JD, Wells R, Field RV, Soundarajan S. On data collection, graph construction, and sampling in Twitter. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 985-992. 7752360 https://doi.org/10.1109/ASONAM.2016.7752360
Wendt, Jeremy D. ; Wells, Randy ; Field, Richard V. ; Soundarajan, Sucheta. / On data collection, graph construction, and sampling in Twitter. Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 985-992
@inproceedings{20e571b79728437bb08ca4267f198bea,
title = "On data collection, graph construction, and sampling in Twitter",
abstract = "We present a detailed study on data collection, graph construction, and sampling in Twitter. We observe that sampling on semantic graphs (i.e., graphs with multiple edge types) presents fundamentally distinct challenges from sampling on traditional graphs. The purpose of our work is to present new challenges and initial solutions for sampling semantic graphs. Novel elements of our work include the following: (1) We provide a thorough discussion of problems encountered with na{\"i}ve breadth-first search on semantic graphs. We argue that common sampling methods such as breadth-first search face specific challenges on semantic graphs that are not encountered on graphs with homogeneous edge types. (2) We present two competing methods for creating semantic graphs from data collects, corresponding to the interactions between sampling of different edge types. (3) We discuss new metrics specific to graphs with multiple edge types, and discuss the effect of the sampling method on these metrics. (4) We discuss issues and potential solutions pertaining to sampling semantic graphs.",
author = "Wendt, {Jeremy D.} and Randy Wells and Field, {Richard V.} and Sucheta Soundarajan",
year = "2016",
month = "11",
day = "21",
doi = "10.1109/ASONAM.2016.7752360",
language = "English (US)",
pages = "985--992",
booktitle = "Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - On data collection, graph construction, and sampling in Twitter

AU - Wendt, Jeremy D.

AU - Wells, Randy

AU - Field, Richard V.

AU - Soundarajan, Sucheta

PY - 2016/11/21

Y1 - 2016/11/21

N2 - We present a detailed study on data collection, graph construction, and sampling in Twitter. We observe that sampling on semantic graphs (i.e., graphs with multiple edge types) presents fundamentally distinct challenges from sampling on traditional graphs. The purpose of our work is to present new challenges and initial solutions for sampling semantic graphs. Novel elements of our work include the following: (1) We provide a thorough discussion of problems encountered with naïve breadth-first search on semantic graphs. We argue that common sampling methods such as breadth-first search face specific challenges on semantic graphs that are not encountered on graphs with homogeneous edge types. (2) We present two competing methods for creating semantic graphs from data collects, corresponding to the interactions between sampling of different edge types. (3) We discuss new metrics specific to graphs with multiple edge types, and discuss the effect of the sampling method on these metrics. (4) We discuss issues and potential solutions pertaining to sampling semantic graphs.

AB - We present a detailed study on data collection, graph construction, and sampling in Twitter. We observe that sampling on semantic graphs (i.e., graphs with multiple edge types) presents fundamentally distinct challenges from sampling on traditional graphs. The purpose of our work is to present new challenges and initial solutions for sampling semantic graphs. Novel elements of our work include the following: (1) We provide a thorough discussion of problems encountered with naïve breadth-first search on semantic graphs. We argue that common sampling methods such as breadth-first search face specific challenges on semantic graphs that are not encountered on graphs with homogeneous edge types. (2) We present two competing methods for creating semantic graphs from data collects, corresponding to the interactions between sampling of different edge types. (3) We discuss new metrics specific to graphs with multiple edge types, and discuss the effect of the sampling method on these metrics. (4) We discuss issues and potential solutions pertaining to sampling semantic graphs.

UR - http://www.scopus.com/inward/record.url?scp=85006741699&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006741699&partnerID=8YFLogxK

U2 - 10.1109/ASONAM.2016.7752360

DO - 10.1109/ASONAM.2016.7752360

M3 - Conference contribution

AN - SCOPUS:85006741699

SP - 985

EP - 992

BT - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -