Multi-Label Text Classification with Multi-Variate Bernoulli Model and Label Dependent Representation

Research output: Contribution to journalArticlepeer-review

Abstract

The allocation of natural language texts to one or more predefined categories or classes based on their content is an important component and a recent need in many information organization and management tasks. Automatic text classification is the task of categorizing documents to a predefined set of classes by a computational method or model. Text representation for classification purposes has been traditionally approached using a vector space model due to its simplicity and good performance. On the other hand, multi-label automatic text classification has been typically addressed either by transforming the problem under study to apply binary techniques or by adapting binary algorithms to work with multiple labels. In this article, the objective is to evaluate a term-weighting factor in the Boolean model for text representation in multi-label classification, using a mix of two approaches: problem transformation and model adaptation. This term-weighting factor and the combination of approaches in the automatic text classification was tested with four different sets of textual data used in the specialized literature and compared with alternative techniques by means of three measures of evaluation. The results present improvements of more than 10% in the performance of the classifiers, attributed to our proposal, in all the cases analyzed.

Original languageEnglish
Pages (from-to)549-567
Number of pages19
JournalRevista Signos
Volume53
Issue number104
DOIs
StatePublished - 2020
Externally publishedYes

Keywords

  • Multi-label
  • problem transformation
  • term weighting
  • text classification
  • text representation

Fingerprint

Dive into the research topics of 'Multi-Label Text Classification with Multi-Variate Bernoulli Model and Label Dependent Representation'. Together they form a unique fingerprint.

Cite this