PLS1-MD: A partial least squares regression algorithm for solving missing data problems

Víctor González, Ramón Giraldo, Víctor Leiva

Producción científica: Contribución a una revistaArtículorevisión exhaustiva


In this article, we propose a methodology that modifies the partial least squares (PLS) regression algorithm. Certain steps of the algorithm are adjusted to address the estimation problem in multiple linear regression when there are missing data (MD). The modified algorithm is called PLS1-MD and is based on the available data principle, allowing for multiple regression analysis even when there are missing values in the response or some of the explanatory variables, without the need for imputation. PLS1-MD can be applied under conditions of multicollinearity (where the explanatory variables are correlated, resulting in linear combinations among columns of the design matrix) and high dimensionality (where the number of individuals is less than the number of variables). The PLS1-MD algorithm ensures orthogonality, orthonormality of the coefficient vector, and optimality at each stage. The procedure is illustrated using the Cornell and Yarn datasets, which are widely known in the context of PLS1 regression. For this purpose, 10% of the data is randomly deleted and labeled as MD. The results indicate that the estimates obtained with the PLS1-MD algorithm are very similar to those generated when applying PLS1 to the set of observations with no MD. This new algorithm does not require imputing missing values, thus preserving the properties of centrality and orthogonality. We compare the results obtained using our approach with those obtained using the R libraries named pls and plsdepot. Under the scenario of no MD, we obtain the same results. In the presence of MD, the library pls cannot be used and only plsdepot solves the problem when there are MD in the explanatory variables.

Idioma originalInglés
Número de artículo104876
PublicaciónChemometrics and Intelligent Laboratory Systems
EstadoPublicada - 15 sep. 2023


Profundice en los temas de investigación de 'PLS1-MD: A partial least squares regression algorithm for solving missing data problems'. En conjunto forman una huella única.

Citar esto