Predicted max degree sampling: Sampling in directed networks to maximize node coverage through crawling

Ricky Laishram, Katchaguy Areekijseree, Sucheta Soundarajan

Research output: Chapter in Book/Entry/PoemConference contribution

6 Scopus citations

Abstract

When studying large-scale networks, including the Web and computer networks, it is first necessary to collect appropriate data. There is a large body of literature on web crawling and network sampling in general; however, this work typically assumes that a query on a node either reveals all edges incident to that node (in the case of an undirected graph) or all edges outgoing from that node (e.g., in the case of a web crawler). There is relatively little work on sampling directed networks where incoming and outgoing edges may be obtained with separate queries. This type of sampling is relevant to networks like Twitter, in which the 'Friends' and 'Follower' connections are reciprocal relationships (i.e., if A is a follower of B, then B is a friend of A). In this paper we present Predicted Max Degree Sampling (PMD), a new method of sampling a directed network, with the objective of maximizing the total number of nodes observed. We consider two types of crawling, corresponding to the scenarios in which there is or is not a limit on the number of nodes returned by a query. We compared PMD against three baseline algorithms with three real networks, and saw large improvements vs. baseline sampling algorithms: With a budget of 2000, PMD found 15% to 170.2% more nodes than the closest baseline algorithm for the different datasets considered.

Original languageEnglish (US)
Title of host publication2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages940-945
Number of pages6
ISBN (Electronic)9781538627846
DOIs
StatePublished - Nov 20 2017
Event2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2017 - Atlanta, United States
Duration: May 1 2017May 4 2017

Publication series

Name2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2017

Other

Other2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2017
Country/TerritoryUnited States
CityAtlanta
Period5/1/175/4/17

ASJC Scopus subject areas

  • Hardware and Architecture
  • Control and Optimization
  • Artificial Intelligence
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Predicted max degree sampling: Sampling in directed networks to maximize node coverage through crawling'. Together they form a unique fingerprint.

Cite this