A sample selection bias problem arises when a variable of interest or response is correlated with a latent variable. This problem is presented when the response variable has part of its observations censored. The Heckman sample selection model is based on the bivariate normality assumption and fits both response and latent variables. Recently, this assumption has been relaxed to more flexible models based on the Student-t distribution, which has appealing statistical properties. In this article, we introduce an extention of the Heckman sample selection model to the wide class of symmetric distributions. In the new class of sample selection models, covariates are used to describe its dispersion and correlation parameters explaining heteroscedasticity and sample selection bias, respectively. We derive mathematical and statistical properties of the introduced model, and estimate its parameters with the maximum likelihood method. The case of the bivariate Heckman-Student-t model, as a special member of the family of symmetric Heckman models, is analyzed. Monte Carlo simulations are performed to assess the statistical behavior of the estimation method. Two real data sets are analyzed to illustrate our results.
- Heckman models
- Multivariate elliptically contoured distributions
- Symmetric distributions
- Varying correlation and dispersion