Abstract
This paper presents a methodological proposal por the automatic induction of a multilingual taxonomy of discourse markers which, in the case of English, correspond to units such as however, therefore, by the way, etc. First, a method is proposed to separate such units from the rest of the vocabulary using a measure of information, followed by a method to group them using a parallel corpus. Finally, this categorization is used as the basis for the extraction and classification of new units. Apart from the method, the first results are described, which consist of a database that currently surpasses 2600 units.
Translated title of the contribution | Automatic induction of a multilingual taxonomy of discourse markers: first results in Spanish, English, French, German and Catalan |
---|---|
Original language | Spanish |
Pages (from-to) | 127-138 |
Number of pages | 12 |
Journal | Procesamiento de Lenguaje Natural |
Volume | 67 |
DOIs | |
State | Published - Sep 2021 |