TY - GEN
T1 - Unsupervised nonparametric anomaly detection
T2 - 2014 52nd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2014
AU - Zou, Shaofeng
AU - Liang, Yingbin
AU - Poor, H. Vincent
AU - Shi, Xinghua
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/1/30
Y1 - 2014/1/30
N2 - An anomaly detection problem is investigated, in which s out of n sequences are anomalous and need to be detected. Each sequence consists of m independent and identically distributed (i.i.d.) samples drawn either from a nominal distribution p or from an anomalous distribution q that is distinct from p. Neither p nor q is known a priori. Two scenarios respectively with s known and unknown are studied. Distribution-free tests are constructed based on the metric of the maximum mean discrepancy (MMD). It is shown that if the value of s is known, as n goes to infinity, the number m of samples in each sequence should be of order O(log n) or larger to guarantee that the constructed test is exponentially consistent. On the other hand, if the value of s is unknown, the number m of samples in each sequence should be of the order strictly greater than O(log n) to guarantee the constructed test is consistent. The computational complexity of all tests are shown to be polynomial. Numerical results are provided to confirm the theoretic characterization of the performance. Further numerical results on both synthetic data sets and real data sets demonstrate that the MMD-based tests outperform or perform as well as other approaches.
AB - An anomaly detection problem is investigated, in which s out of n sequences are anomalous and need to be detected. Each sequence consists of m independent and identically distributed (i.i.d.) samples drawn either from a nominal distribution p or from an anomalous distribution q that is distinct from p. Neither p nor q is known a priori. Two scenarios respectively with s known and unknown are studied. Distribution-free tests are constructed based on the metric of the maximum mean discrepancy (MMD). It is shown that if the value of s is known, as n goes to infinity, the number m of samples in each sequence should be of order O(log n) or larger to guarantee that the constructed test is exponentially consistent. On the other hand, if the value of s is unknown, the number m of samples in each sequence should be of the order strictly greater than O(log n) to guarantee the constructed test is consistent. The computational complexity of all tests are shown to be polynomial. Numerical results are provided to confirm the theoretic characterization of the performance. Further numerical results on both synthetic data sets and real data sets demonstrate that the MMD-based tests outperform or perform as well as other approaches.
UR - http://www.scopus.com/inward/record.url?scp=84946685199&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946685199&partnerID=8YFLogxK
U2 - 10.1109/ALLERTON.2014.7028541
DO - 10.1109/ALLERTON.2014.7028541
M3 - Conference contribution
AN - SCOPUS:84946685199
T3 - 2014 52nd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2014
SP - 836
EP - 841
BT - 2014 52nd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2014
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 September 2014 through 3 October 2014
ER -