TY - GEN
T1 - Predicted max degree sampling
T2 - 2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2017
AU - Laishram, Ricky
AU - Areekijseree, Katchaguy
AU - Soundarajan, Sucheta
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/20
Y1 - 2017/11/20
N2 - When studying large-scale networks, including the Web and computer networks, it is first necessary to collect appropriate data. There is a large body of literature on web crawling and network sampling in general; however, this work typically assumes that a query on a node either reveals all edges incident to that node (in the case of an undirected graph) or all edges outgoing from that node (e.g., in the case of a web crawler). There is relatively little work on sampling directed networks where incoming and outgoing edges may be obtained with separate queries. This type of sampling is relevant to networks like Twitter, in which the 'Friends' and 'Follower' connections are reciprocal relationships (i.e., if A is a follower of B, then B is a friend of A). In this paper we present Predicted Max Degree Sampling (PMD), a new method of sampling a directed network, with the objective of maximizing the total number of nodes observed. We consider two types of crawling, corresponding to the scenarios in which there is or is not a limit on the number of nodes returned by a query. We compared PMD against three baseline algorithms with three real networks, and saw large improvements vs. baseline sampling algorithms: With a budget of 2000, PMD found 15% to 170.2% more nodes than the closest baseline algorithm for the different datasets considered.
AB - When studying large-scale networks, including the Web and computer networks, it is first necessary to collect appropriate data. There is a large body of literature on web crawling and network sampling in general; however, this work typically assumes that a query on a node either reveals all edges incident to that node (in the case of an undirected graph) or all edges outgoing from that node (e.g., in the case of a web crawler). There is relatively little work on sampling directed networks where incoming and outgoing edges may be obtained with separate queries. This type of sampling is relevant to networks like Twitter, in which the 'Friends' and 'Follower' connections are reciprocal relationships (i.e., if A is a follower of B, then B is a friend of A). In this paper we present Predicted Max Degree Sampling (PMD), a new method of sampling a directed network, with the objective of maximizing the total number of nodes observed. We consider two types of crawling, corresponding to the scenarios in which there is or is not a limit on the number of nodes returned by a query. We compared PMD against three baseline algorithms with three real networks, and saw large improvements vs. baseline sampling algorithms: With a budget of 2000, PMD found 15% to 170.2% more nodes than the closest baseline algorithm for the different datasets considered.
UR - http://www.scopus.com/inward/record.url?scp=85041291557&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041291557&partnerID=8YFLogxK
U2 - 10.1109/INFCOMW.2017.8116502
DO - 10.1109/INFCOMW.2017.8116502
M3 - Conference contribution
AN - SCOPUS:85041291557
T3 - 2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2017
SP - 940
EP - 945
BT - 2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 May 2017 through 4 May 2017
ER -