TY - JOUR
T1 - Leverage and Cook distance in regression with geostatistical data
T2 - methodology, simulation, and applications related to geographical information
AU - Giraldo, Ramón
AU - Leiva, Víctor
AU - Christakos, George
N1 - Publisher Copyright:
© 2022 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2023
Y1 - 2023
N2 - Regression is often conducted assuming independent model errors. The detection of atypical values in regression (leverage and influential points) assumes independent errors. However, such independence could be unrealistic in geostatistics. In this article, we propose a methodology based on least squares and geostatistics to identify such values in spatial regression. Our procedure uses the hat matrix to detect leverage points. A modified Cook distance is employed to confirm whether these points are influential. The methodology is evaluated with stationary and non-stationary geostatistical data. We apply this methodology to real georeferenced data related to depth, dissolved oxygen, and temperature. First, an autoregressive model is fitted to depth data. Second, a regression between oxygen and temperature is estimated. In both models, spatial correlation is assumed to determine the parameters, leverage, and influential observations. Our methodology can be used in regression with geographical information to avoid misinterpreted results. Not considering this information may under- or over-estimate geographical indicators, such as the mean depth, which can affect the circulation of water masses or dissolved oxygen variability. Our results reveal that including spatial dependence to identify high leverage points is relevant and must be considered in any geostatistical analysis.
AB - Regression is often conducted assuming independent model errors. The detection of atypical values in regression (leverage and influential points) assumes independent errors. However, such independence could be unrealistic in geostatistics. In this article, we propose a methodology based on least squares and geostatistics to identify such values in spatial regression. Our procedure uses the hat matrix to detect leverage points. A modified Cook distance is employed to confirm whether these points are influential. The methodology is evaluated with stationary and non-stationary geostatistical data. We apply this methodology to real georeferenced data related to depth, dissolved oxygen, and temperature. First, an autoregressive model is fitted to depth data. Second, a regression between oxygen and temperature is estimated. In both models, spatial correlation is assumed to determine the parameters, leverage, and influential observations. Our methodology can be used in regression with geographical information to avoid misinterpreted results. Not considering this information may under- or over-estimate geographical indicators, such as the mean depth, which can affect the circulation of water masses or dissolved oxygen variability. Our results reveal that including spatial dependence to identify high leverage points is relevant and must be considered in any geostatistical analysis.
KW - Geostatistics
KW - Monte Carlo simulation
KW - global influence
KW - hat matrix
KW - linear regression
KW - spatial correlation
UR - http://www.scopus.com/inward/record.url?scp=85141206387&partnerID=8YFLogxK
U2 - 10.1080/13658816.2022.2131790
DO - 10.1080/13658816.2022.2131790
M3 - Article
AN - SCOPUS:85141206387
SN - 1365-8816
VL - 37
SP - 607
EP - 633
JO - International Journal of Geographical Information Science
JF - International Journal of Geographical Information Science
IS - 3
ER -