TY - GEN

T1 - Measuring the sampling robustness of complex networks

AU - Areekijseree, Katchaguy

AU - Soundarajan, Sucheta

PY - 2019/8/27

Y1 - 2019/8/27

N2 - When studying a network, it is often of interest to understand the robustness of that network to noise. Network robustness has been studied in a variety of contexts, examining network properties such as the number of connected components and the lengths of shortest paths. In this work, we present a new network robustness measure, which we refer to as ‘sampling robustness’. The goal of the sampling robustness measure is to quantify the extent to which a network sample collected from a graph with errors is a good representation of a network sample collected from that same graph, but without errors. These errors may be introduced by humans or by the system (e.g., mistakes from the respondents or a bug in an API program), and may affect the performance of a data collection algorithm and the quality of the obtained sample. Thus, when data analysts analyze the sampled network, they may wish to know whether such errors will affect future analysis results. We demonstrate that sampling robustness is dependent on a few, easily-computed properties of the network: the leading eigenvalue, average node degree and clustering coefficient. In addition, we introduce regression models for estimating sampling robustness given an obtained sample. As a result, our models can estimate the sampling robustness with MSE < 0.0015 and the model has an R-squared of up to 75%.

AB - When studying a network, it is often of interest to understand the robustness of that network to noise. Network robustness has been studied in a variety of contexts, examining network properties such as the number of connected components and the lengths of shortest paths. In this work, we present a new network robustness measure, which we refer to as ‘sampling robustness’. The goal of the sampling robustness measure is to quantify the extent to which a network sample collected from a graph with errors is a good representation of a network sample collected from that same graph, but without errors. These errors may be introduced by humans or by the system (e.g., mistakes from the respondents or a bug in an API program), and may affect the performance of a data collection algorithm and the quality of the obtained sample. Thus, when data analysts analyze the sampled network, they may wish to know whether such errors will affect future analysis results. We demonstrate that sampling robustness is dependent on a few, easily-computed properties of the network: the leading eigenvalue, average node degree and clustering coefficient. In addition, we introduce regression models for estimating sampling robustness given an obtained sample. As a result, our models can estimate the sampling robustness with MSE < 0.0015 and the model has an R-squared of up to 75%.

UR - http://www.scopus.com/inward/record.url?scp=85078826314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85078826314&partnerID=8YFLogxK

U2 - 10.1145/3341161.3342873

DO - 10.1145/3341161.3342873

M3 - Conference contribution

AN - SCOPUS:85078826314

T3 - Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019

SP - 294

EP - 301

BT - Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019

A2 - Spezzano, Francesca

A2 - Chen, Wei

A2 - Xiao, Xiaokui

PB - Association for Computing Machinery, Inc

T2 - 11th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019

Y2 - 27 August 2019 through 30 August 2019

ER -