TY - JOUR
T1 - Leverage and Cook distance in regression with geostatistical data
T2 - methodology, simulation, and applications related to geographical information
AU - Giraldo, Ramón
AU - Leiva, Víctor
AU - Christakos, George
N1 - Funding Information:
The authors thank the Editors and three Reviewers for their constructive comments on an earlier version of this manuscript which led to an improved version. The research was partially funded by FONDECYT, project grant number 1200525 (V. Leiva) from the National Agency for Research and Development (ANID) of the Chilean government under the Ministry of Science, Technology, Knowledge, and Innovation
Publisher Copyright:
© 2022 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2022
Y1 - 2022
N2 - Regression is often conducted assuming independent model errors. The detection of atypical values in regression (leverage and influential points) assumes independent errors. However, such independence could be unrealistic in geostatistics. In this article, we propose a methodology based on least squares and geostatistics to identify such values in spatial regression. Our procedure uses the hat matrix to detect leverage points. A modified Cook distance is employed to confirm whether these points are influential. The methodology is evaluated with stationary and non-stationary geostatistical data. We apply this methodology to real georeferenced data related to depth, dissolved oxygen, and temperature. First, an autoregressive model is fitted to depth data. Second, a regression between oxygen and temperature is estimated. In both models, spatial correlation is assumed to determine the parameters, leverage, and influential observations. Our methodology can be used in regression with geographical information to avoid misinterpreted results. Not considering this information may under- or over-estimate geographical indicators, such as the mean depth, which can affect the circulation of water masses or dissolved oxygen variability. Our results reveal that including spatial dependence to identify high leverage points is relevant and must be considered in any geostatistical analysis.
AB - Regression is often conducted assuming independent model errors. The detection of atypical values in regression (leverage and influential points) assumes independent errors. However, such independence could be unrealistic in geostatistics. In this article, we propose a methodology based on least squares and geostatistics to identify such values in spatial regression. Our procedure uses the hat matrix to detect leverage points. A modified Cook distance is employed to confirm whether these points are influential. The methodology is evaluated with stationary and non-stationary geostatistical data. We apply this methodology to real georeferenced data related to depth, dissolved oxygen, and temperature. First, an autoregressive model is fitted to depth data. Second, a regression between oxygen and temperature is estimated. In both models, spatial correlation is assumed to determine the parameters, leverage, and influential observations. Our methodology can be used in regression with geographical information to avoid misinterpreted results. Not considering this information may under- or over-estimate geographical indicators, such as the mean depth, which can affect the circulation of water masses or dissolved oxygen variability. Our results reveal that including spatial dependence to identify high leverage points is relevant and must be considered in any geostatistical analysis.
KW - Geostatistics
KW - global influence
KW - hat matrix
KW - linear regression
KW - Monte Carlo simulation
KW - spatial correlation
UR - http://www.scopus.com/inward/record.url?scp=85141206387&partnerID=8YFLogxK
U2 - 10.1080/13658816.2022.2131790
DO - 10.1080/13658816.2022.2131790
M3 - Article
AN - SCOPUS:85141206387
JO - International Journal of Geographical Information Science
JF - International Journal of Geographical Information Science
SN - 1365-8816
ER -