A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora

Hernán Robledo, Rogelio Nazar

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

Resumen

We propose a method for the automatic induction of categories of Spanish discourse markers using parallel corpora, based on a quantitative and empirical approach that minimises explicit linguistic knowledge. We conducted the analysis the using a large Spanish-English parallel corpus. First, we used this corpus to obtain a list of parenthetical discourse markers in each language. Then, we used it as a “semantic mirror”, inspecting the English equivalences and assessing which Spanish discourse markers fulfil a similar function in discourse and vice versa. The result of this procedure is an emerging categorisation of discourse markers. The main contribution is to offer empirical evidence for the adequacy of existing manually-compiled taxonomies and the potential for discovery of new, unaccounted categories. In this article we focus on units pertaining to the Spanish language but, since the method is purely quantitative, it is possible to apply it to different languages as well.

Idioma originalInglés
Páginas (desde-hasta)500-527
Número de páginas28
PublicaciónInternational Journal of Corpus Linguistics
Volumen28
N.º4
DOI
EstadoPublicada - 20 jul. 2023

Huella

Profundice en los temas de investigación de 'A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora'. En conjunto forman una huella única.

Citar esto