TY - JOUR
T1 - Real-time crash prediction in an urban expressway using disaggregated data
AU - Basso, Franco
AU - Basso, Leonardo J.
AU - Bravo, Francisco
AU - Pezoa, Raul
N1 - Publisher Copyright:
© 2017 Elsevier Ltd
PY - 2018/1
Y1 - 2018/1
N2 - We develop accident prediction models for a stretch of the urban expressway Autopista Central in Santiago, Chile, using disaggregate data captured by free-flow toll gates with Automatic Vehicle Identification (AVI) which, besides their low failure rate, have the advantage of providing disaggregated data per type of vehicle. The process includes a random forest procedure to identify the strongest precursors of accidents, and the calibration/estimation of two classification models, namely, Support Vector Machine and Logistic regression. We find that, for this stretch of the highway, vehicle composition does not play a first-order role. Our best model accurately predicts 67.89% of the accidents with a low false positive rate of 20.94%. These results are among the best in the literature even though, and as opposed to previous efforts, (i) we do not use only one partition of the data set for calibration and validation but conduct 300 repetitions of randomly selected partitions; (ii) our models are validated on the original unbalanced data set (where accidents are quite rare events), rather than on artificially balanced data.
AB - We develop accident prediction models for a stretch of the urban expressway Autopista Central in Santiago, Chile, using disaggregate data captured by free-flow toll gates with Automatic Vehicle Identification (AVI) which, besides their low failure rate, have the advantage of providing disaggregated data per type of vehicle. The process includes a random forest procedure to identify the strongest precursors of accidents, and the calibration/estimation of two classification models, namely, Support Vector Machine and Logistic regression. We find that, for this stretch of the highway, vehicle composition does not play a first-order role. Our best model accurately predicts 67.89% of the accidents with a low false positive rate of 20.94%. These results are among the best in the literature even though, and as opposed to previous efforts, (i) we do not use only one partition of the data set for calibration and validation but conduct 300 repetitions of randomly selected partitions; (ii) our models are validated on the original unbalanced data set (where accidents are quite rare events), rather than on artificially balanced data.
KW - Automatic vehicle identification
KW - Logistic regression
KW - Real-time crash prediction
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=85035138390&partnerID=8YFLogxK
U2 - 10.1016/j.trc.2017.11.014
DO - 10.1016/j.trc.2017.11.014
M3 - Article
AN - SCOPUS:85035138390
SN - 0968-090X
VL - 86
SP - 202
EP - 219
JO - Transportation Research Part C: Emerging Technologies
JF - Transportation Research Part C: Emerging Technologies
ER -