Leverage and Cook distance in regression with geostatistical data: methodology, simulation, and applications related to geographical information

Ramón Giraldo, Víctor Leiva, George Christakos

Research output: Contribution to journalArticlepeer-review

Abstract

Regression is often conducted assuming independent model errors. The detection of atypical values in regression (leverage and influential points) assumes independent errors. However, such independence could be unrealistic in geostatistics. In this article, we propose a methodology based on least squares and geostatistics to identify such values in spatial regression. Our procedure uses the hat matrix to detect leverage points. A modified Cook distance is employed to confirm whether these points are influential. The methodology is evaluated with stationary and non-stationary geostatistical data. We apply this methodology to real georeferenced data related to depth, dissolved oxygen, and temperature. First, an autoregressive model is fitted to depth data. Second, a regression between oxygen and temperature is estimated. In both models, spatial correlation is assumed to determine the parameters, leverage, and influential observations. Our methodology can be used in regression with geographical information to avoid misinterpreted results. Not considering this information may under- or over-estimate geographical indicators, such as the mean depth, which can affect the circulation of water masses or dissolved oxygen variability. Our results reveal that including spatial dependence to identify high leverage points is relevant and must be considered in any geostatistical analysis.

Original languageEnglish
JournalInternational Journal of Geographical Information Science
DOIs
StateAccepted/In press - 2022
Externally publishedYes

Keywords

  • Geostatistics
  • global influence
  • hat matrix
  • linear regression
  • Monte Carlo simulation
  • spatial correlation

Fingerprint

Dive into the research topics of 'Leverage and Cook distance in regression with geostatistical data: methodology, simulation, and applications related to geographical information'. Together they form a unique fingerprint.

Cite this