Abstract
Over the past two decades, online social networks have attracted a great deal of attention from researchers. However, before one can gain insight into the properties or structure of a network, one must first collect appropriate data. Data collection poses several challenges, such as API or bandwidth limits, which require the data collector to carefully consider which queries to make. Many online network crawling methods have been proposed, but it is not always clear which method should be used for a given network. In this paper, we perform a detailed, hypothesis-driven analysis of several online crawling algorithms, ranging from classical crawling methods to modern, state-of-the-art algorithms, with respect to the task of collecting as much data (nodes or edges) as possible given a fixed query budget. We show that the performance of these methods depends strongly on the network structure. We identify three relevant network characteristics: community separation, average community size, and average node degree. We present experiments on both real and synthetic networks, and provide guidelines to researchers regarding selection of an appropriate sampling method.
Original language | English (US) |
---|---|
Title of host publication | WebSci 2018 - Proceedings of the 10th ACM Conference on Web Science |
Publisher | Association for Computing Machinery, Inc |
Pages | 57-66 |
Number of pages | 10 |
ISBN (Electronic) | 9781450355636 |
DOIs | |
State | Published - May 15 2018 |
Event | 10th ACM Conference on Web Science, WebSci 2018 - Amsterdam, Netherlands Duration: May 27 2018 → May 30 2018 |
Other
Other | 10th ACM Conference on Web Science, WebSci 2018 |
---|---|
Country/Territory | Netherlands |
City | Amsterdam |
Period | 5/27/18 → 5/30/18 |
Keywords
- Complex networks
- Experiments
- Network crawling
- Network sampling
- Online sampling algorithm
ASJC Scopus subject areas
- Computer Networks and Communications