Predicted max degree sampling: Sampling in directed networks to maximize node coverage through crawling

Ricky Laishram, Katchaguy Areekijseree, Sucheta Soundarajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sampling through crawling is an important research topic in social network analysis. However there is very little existing work on sampling through crawling in directed networks. In this paper we present a new method of sampling a directed network, with the objective of maximizing the node coverage. Our proposed method, Predicted Max Degree (PMD) Sampling, works by predicting which k open nodes are most likely to have the highest number of unobserved neighbors in a particular iteration. These nodes are queried, and the whole process repeats until all the available budget has been used up. We compared PMD against three baseline algorithms with three networks, and saw large improvements vs. baseline sampling algorithms: With a budget of 2000, PMD found 15%, 87.4% and 170.2% more nodes than the closest baseline algorithm in the wiki-Votes, soc-Slashdot and webGoogle networks respectively.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4008-4010
Number of pages3
ISBN (Electronic)9781467390040
DOIs
StatePublished - Feb 2 2017
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: Dec 5 2016Dec 8 2016

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
CountryUnited States
CityWashington
Period12/5/1612/8/16

Fingerprint

Sampling
Electric network analysis

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Cite this

Laishram, R., Areekijseree, K., & Soundarajan, S. (2017). Predicted max degree sampling: Sampling in directed networks to maximize node coverage through crawling. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 4008-4010). [7841092] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7841092

Predicted max degree sampling : Sampling in directed networks to maximize node coverage through crawling. / Laishram, Ricky; Areekijseree, Katchaguy; Soundarajan, Sucheta.

Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc., 2017. p. 4008-4010 7841092.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Laishram, R, Areekijseree, K & Soundarajan, S 2017, Predicted max degree sampling: Sampling in directed networks to maximize node coverage through crawling. in Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016., 7841092, Institute of Electrical and Electronics Engineers Inc., pp. 4008-4010, 4th IEEE International Conference on Big Data, Big Data 2016, Washington, United States, 12/5/16. https://doi.org/10.1109/BigData.2016.7841092
Laishram R, Areekijseree K, Soundarajan S. Predicted max degree sampling: Sampling in directed networks to maximize node coverage through crawling. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc. 2017. p. 4008-4010. 7841092 https://doi.org/10.1109/BigData.2016.7841092
Laishram, Ricky ; Areekijseree, Katchaguy ; Soundarajan, Sucheta. / Predicted max degree sampling : Sampling in directed networks to maximize node coverage through crawling. Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 4008-4010
@inproceedings{c884b9e6cbbe48188bb261a50bec91d6,
title = "Predicted max degree sampling: Sampling in directed networks to maximize node coverage through crawling",
abstract = "Sampling through crawling is an important research topic in social network analysis. However there is very little existing work on sampling through crawling in directed networks. In this paper we present a new method of sampling a directed network, with the objective of maximizing the node coverage. Our proposed method, Predicted Max Degree (PMD) Sampling, works by predicting which k open nodes are most likely to have the highest number of unobserved neighbors in a particular iteration. These nodes are queried, and the whole process repeats until all the available budget has been used up. We compared PMD against three baseline algorithms with three networks, and saw large improvements vs. baseline sampling algorithms: With a budget of 2000, PMD found 15{\%}, 87.4{\%} and 170.2{\%} more nodes than the closest baseline algorithm in the wiki-Votes, soc-Slashdot and webGoogle networks respectively.",
author = "Ricky Laishram and Katchaguy Areekijseree and Sucheta Soundarajan",
year = "2017",
month = "2",
day = "2",
doi = "10.1109/BigData.2016.7841092",
language = "English (US)",
pages = "4008--4010",
booktitle = "Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Predicted max degree sampling

T2 - Sampling in directed networks to maximize node coverage through crawling

AU - Laishram, Ricky

AU - Areekijseree, Katchaguy

AU - Soundarajan, Sucheta

PY - 2017/2/2

Y1 - 2017/2/2

N2 - Sampling through crawling is an important research topic in social network analysis. However there is very little existing work on sampling through crawling in directed networks. In this paper we present a new method of sampling a directed network, with the objective of maximizing the node coverage. Our proposed method, Predicted Max Degree (PMD) Sampling, works by predicting which k open nodes are most likely to have the highest number of unobserved neighbors in a particular iteration. These nodes are queried, and the whole process repeats until all the available budget has been used up. We compared PMD against three baseline algorithms with three networks, and saw large improvements vs. baseline sampling algorithms: With a budget of 2000, PMD found 15%, 87.4% and 170.2% more nodes than the closest baseline algorithm in the wiki-Votes, soc-Slashdot and webGoogle networks respectively.

AB - Sampling through crawling is an important research topic in social network analysis. However there is very little existing work on sampling through crawling in directed networks. In this paper we present a new method of sampling a directed network, with the objective of maximizing the node coverage. Our proposed method, Predicted Max Degree (PMD) Sampling, works by predicting which k open nodes are most likely to have the highest number of unobserved neighbors in a particular iteration. These nodes are queried, and the whole process repeats until all the available budget has been used up. We compared PMD against three baseline algorithms with three networks, and saw large improvements vs. baseline sampling algorithms: With a budget of 2000, PMD found 15%, 87.4% and 170.2% more nodes than the closest baseline algorithm in the wiki-Votes, soc-Slashdot and webGoogle networks respectively.

UR - http://www.scopus.com/inward/record.url?scp=85015202385&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015202385&partnerID=8YFLogxK

U2 - 10.1109/BigData.2016.7841092

DO - 10.1109/BigData.2016.7841092

M3 - Conference contribution

AN - SCOPUS:85015202385

SP - 4008

EP - 4010

BT - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -