A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora

Hernán Robledo, Rogelio Nazar

Research output: Contribution to journalArticlepeer-review

Abstract

We propose a method for the automatic induction of categories of Spanish discourse markers using parallel corpora, based on a quantitative and empirical approach that minimises explicit linguistic knowledge. We conducted the analysis the using a large Spanish-English parallel corpus. First, we used this corpus to obtain a list of parenthetical discourse markers in each language. Then, we used it as a “semantic mirror”, inspecting the English equivalences and assessing which Spanish discourse markers fulfil a similar function in discourse and vice versa. The result of this procedure is an emerging categorisation of discourse markers. The main contribution is to offer empirical evidence for the adequacy of existing manually-compiled taxonomies and the potential for discovery of new, unaccounted categories. In this article we focus on units pertaining to the Spanish language but, since the method is purely quantitative, it is possible to apply it to different languages as well.

Original languageEnglish
Pages (from-to)500-527
Number of pages28
JournalInternational Journal of Corpus Linguistics
Volume28
Issue number4
DOIs
StatePublished - 20 Jul 2023

Keywords

  • Spanish
  • clustering
  • discourse markers
  • inductive methods
  • parallel corpus

Fingerprint

Dive into the research topics of 'A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora'. Together they form a unique fingerprint.

Cite this