Guidelines for online network crawling: A study of data collection approaches and network properties

Katchaguy Areekijseree, Ricky Laishram, Sucheta Soundarajan

Research output: Chapter in Book/Entry/PoemConference contribution

7 Scopus citations


Over the past two decades, online social networks have attracted a great deal of attention from researchers. However, before one can gain insight into the properties or structure of a network, one must first collect appropriate data. Data collection poses several challenges, such as API or bandwidth limits, which require the data collector to carefully consider which queries to make. Many online network crawling methods have been proposed, but it is not always clear which method should be used for a given network. In this paper, we perform a detailed, hypothesis-driven analysis of several online crawling algorithms, ranging from classical crawling methods to modern, state-of-the-art algorithms, with respect to the task of collecting as much data (nodes or edges) as possible given a fixed query budget. We show that the performance of these methods depends strongly on the network structure. We identify three relevant network characteristics: community separation, average community size, and average node degree. We present experiments on both real and synthetic networks, and provide guidelines to researchers regarding selection of an appropriate sampling method.

Original languageEnglish (US)
Title of host publicationWebSci 2018 - Proceedings of the 10th ACM Conference on Web Science
PublisherAssociation for Computing Machinery, Inc
Number of pages10
ISBN (Electronic)9781450355636
StatePublished - May 15 2018
Event10th ACM Conference on Web Science, WebSci 2018 - Amsterdam, Netherlands
Duration: May 27 2018May 30 2018


Other10th ACM Conference on Web Science, WebSci 2018


  • Complex networks
  • Experiments
  • Network crawling
  • Network sampling
  • Online sampling algorithm

ASJC Scopus subject areas

  • Computer Networks and Communications


Dive into the research topics of 'Guidelines for online network crawling: A study of data collection approaches and network properties'. Together they form a unique fingerprint.

Cite this