This paper presents a methodological proposal por the automatic induction of a multilingual taxonomy of discourse markers which, in the case of English, correspond to units such as however, therefore, by the way, etc. First, a method is proposed to separate such units from the rest of the vocabulary using a measure of information, followed by a method to group them using a parallel corpus. Finally, this categorization is used as the basis for the extraction and classification of new units. Apart from the method, the first results are described, which consist of a database that currently surpasses 2600 units.
|Translated title of the contribution||Automatic induction of a multilingual taxonomy of discourse markers: first results in Spanish, English, French, German and Catalan|
|Number of pages||12|
|Journal||Procesamiento de Lenguaje Natural|
|State||Published - Sep 2021|