Subsampling the concurrent AdaBoost algorithm: An efficient approach for large datasets

Héctor Allende-Cid, Diego Acuña, Héctor Allende

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work we propose a subsampled version of the Concurrent AdaBoost algorithm in order to deal with large datasets in an efficient way. The proposal is based on a concurrent computing approach focused on improving the distribution weight estimation in the algorithm, hence obtaining better capacity of generalization. On each round, we train in parallel several weak hypotheses, and using a weighted ensemble we update the distribution weights of the following boosting rounds. Instead of creating resamples of size equal to the original dataset, we subsample the datasets in order to obtain a speed-up in the training phase. We validate our proposal with different resampling sizes using 3 datasets, obtaining promising results and showing that the size of the resamples does not affect considerably the performance of the algorithm, but the execution time improves greatly.

Original languageEnglish
Title of host publicationProgress in Pattern Recognition, Image Analysis, Computer Vision, and Applications - 21st Iberoamerican Congress, CIARP 2016, Proceedings
EditorsCesar Beltran-Castanon, Fazel Famili, Ingela Nystrom
PublisherSpringer Verlag
Pages318-325
Number of pages8
ISBN (Print)9783319522760
DOIs
StatePublished - 2017
Externally publishedYes
Event21st Iberoamerican Congress on Pattern Recognition, CIARP 2016 - Lima, Peru
Duration: 8 Nov 201611 Nov 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10125 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st Iberoamerican Congress on Pattern Recognition, CIARP 2016
Country/TerritoryPeru
City Lima
Period8/11/1611/11/16

Keywords

  • Classification
  • Concurrent AdaBoost
  • Large data sets classification
  • Machine learning
  • Subsampling

Fingerprint

Dive into the research topics of 'Subsampling the concurrent AdaBoost algorithm: An efficient approach for large datasets'. Together they form a unique fingerprint.

Cite this