TY - JOUR
T1 - Determining Spurious Correlation between Two Variables with Common Elements
T2 - Event Area-Weighted Suspended Sediment Yield and Event Mean Runoff Depth
AU - Gao, Peng
AU - Zhang, Lianjun
N1 - Publisher Copyright:
© 2016, Copyright 2016 by American Association of Geographers Initial submission.
PY - 2016/4/2
Y1 - 2016/4/2
N2 - Spurious correlation is a classic statistical pitfall pervasive to many disciplines including geography. Although methods of calculating the spurious correlation between two variables possessing a common element in the form of sum, ratio, or product have been developed for a long time, controversial assertions on whether the spurious correlation should be treated or ignored are still prevalent. In this study, we examined this well-known but intriguing issue using the data representing two nonindependent variables, event area-weighted suspended sediment yield (SSYe) and event mean runoff depth (h). By transferring the correlation between SSYe and h to that between suspended sediment transport rate (Qs) and water discharge (Q), we developed a new method of determining whether Qs is truly correlated to Q. The method involves calculating coefficients of spurious correlation ((Formula presented.)) and the associated “pure” spurious correlation ((Formula presented.)), a hypothesis test, and regression between (Formula presented.) and (Formula presented.). Our analysis showed that (1) there exists a true correlation between SSYe and h and (2) the spurious correlation is strongly related to the variability of the variables. We then proposed a general rule stating that the apparent spurious correlation between two variables could be ignored if the two have a true causal relation. At last, we distinguished the difference between spurious correlation and spurious reference.
AB - Spurious correlation is a classic statistical pitfall pervasive to many disciplines including geography. Although methods of calculating the spurious correlation between two variables possessing a common element in the form of sum, ratio, or product have been developed for a long time, controversial assertions on whether the spurious correlation should be treated or ignored are still prevalent. In this study, we examined this well-known but intriguing issue using the data representing two nonindependent variables, event area-weighted suspended sediment yield (SSYe) and event mean runoff depth (h). By transferring the correlation between SSYe and h to that between suspended sediment transport rate (Qs) and water discharge (Q), we developed a new method of determining whether Qs is truly correlated to Q. The method involves calculating coefficients of spurious correlation ((Formula presented.)) and the associated “pure” spurious correlation ((Formula presented.)), a hypothesis test, and regression between (Formula presented.) and (Formula presented.). Our analysis showed that (1) there exists a true correlation between SSYe and h and (2) the spurious correlation is strongly related to the variability of the variables. We then proposed a general rule stating that the apparent spurious correlation between two variables could be ignored if the two have a true causal relation. At last, we distinguished the difference between spurious correlation and spurious reference.
KW - sediment transport
KW - spurious correlation
KW - spurious inference
UR - http://www.scopus.com/inward/record.url?scp=84959234424&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959234424&partnerID=8YFLogxK
U2 - 10.1080/00330124.2015.1065548
DO - 10.1080/00330124.2015.1065548
M3 - Article
AN - SCOPUS:84959234424
SN - 0033-0124
VL - 68
SP - 261
EP - 270
JO - Professional Geographer
JF - Professional Geographer
IS - 2
ER -