Abstract
In this work, we propose Max-Node sampling, a novel sampling algorithm for data collection. The goal of Max-Node is to maximize the number of nodes observed in the sample, given a budget constraint. Max-Node is based on the intuition that networks contain many densely connected regions (i.e., communities), that may be only weakly connected to another, and to maximize the number of nodes observed, it is critical to transition between communities. The two key phases of our algorithm are Expansion and Densification. The goal of the Expansion phase is to transition to unobserved regions, while the Densification phase aims to collect as many nodes in the current community. We conduct experiments on several real networks, and show an improvement of up to 40% vs. the baselines.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 3944-3946 |
Number of pages | 3 |
ISBN (Electronic) | 9781467390040 |
DOIs | |
State | Published - Feb 2 2017 |
Event | 4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States Duration: Dec 5 2016 → Dec 8 2016 |
Other
Other | 4th IEEE International Conference on Big Data, Big Data 2016 |
---|---|
Country/Territory | United States |
City | Washington |
Period | 12/5/16 → 12/8/16 |
Keywords
- Algorithms
- Complex Network
- Data Collection
- Data Crawling
- Large Graph
- Network Sampling
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Hardware and Architecture