Formalización de reglas para la detección del plural en castellano en el caso de unidades no diccionarizadas

Translated title of the contribution: Formalization of rules for the detection of plurals in Spanish in the case of out-of-vocabulary units

Rogelio Nazar, Amparo Galdames

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This paper presents a formalization of rules on plural formation in Spanish to be used in the processing of specialized terminology, as it is frequently the case that terms are not found in dictionaries of general language and therefore they cannot be lemmatized or POS-tagged. The absence of terms in general dictionaries has negative effects in tasks such as terminology extraction, particularly in the case of morphologically rich languages. We attack the problem by cascading through multiple trasnfser rules, regular expressions and lexical aquisition from large corpora. Results show significant reduction of the error rate of two POS-taggers: TreeTagger and UDPipe. We offer an open-source implementation which works as a post-process, cleaning up after the tagger.

Translated title of the contributionFormalization of rules for the detection of plurals in Spanish in the case of out-of-vocabulary units
Original languageFrench
Pages (from-to)17-32
Number of pages16
JournalLinguamatica
Volume11
Issue number2
DOIs
StatePublished - 2019

Keywords

  • Lemmatization
  • Out-of-vocabulary units
  • Part-of-speech tagging
  • Rules for plural in Spanish

Fingerprint

Dive into the research topics of 'Formalization of rules for the detection of plurals in Spanish in the case of out-of-vocabulary units'. Together they form a unique fingerprint.

Cite this