TY - JOUR
T1 - Formalización de reglas para la detección del plural en castellano en el caso de unidades no diccionarizadas
AU - NAZAR, ROGELIO ANTONIO
AU - Galdames, Amparo
N1 - Publisher Copyright:
© 2019 Universidade do Minho. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019
Y1 - 2019
N2 - This paper presents a formalization of rules on plural formation in Spanish to be used in the processing of specialized terminology, as it is frequently the case that terms are not found in dictionaries of general language and therefore they cannot be lemmatized or POS-tagged. The absence of terms in general dictionaries has negative effects in tasks such as terminology extraction, particularly in the case of morphologically rich languages. We attack the problem by cascading through multiple trasnfser rules, regular expressions and lexical aquisition from large corpora. Results show significant reduction of the error rate of two POS-taggers: TreeTagger and UDPipe. We offer an open-source implementation which works as a post-process, cleaning up after the tagger.
AB - This paper presents a formalization of rules on plural formation in Spanish to be used in the processing of specialized terminology, as it is frequently the case that terms are not found in dictionaries of general language and therefore they cannot be lemmatized or POS-tagged. The absence of terms in general dictionaries has negative effects in tasks such as terminology extraction, particularly in the case of morphologically rich languages. We attack the problem by cascading through multiple trasnfser rules, regular expressions and lexical aquisition from large corpora. Results show significant reduction of the error rate of two POS-taggers: TreeTagger and UDPipe. We offer an open-source implementation which works as a post-process, cleaning up after the tagger.
KW - Lemmatization
KW - Out-of-vocabulary units
KW - Part-of-speech tagging
KW - Rules for plural in Spanish
UR - http://www.scopus.com/inward/record.url?scp=85079594362&partnerID=8YFLogxK
U2 - 10.21814/lm.11.2.285
DO - 10.21814/lm.11.2.285
M3 - Article
AN - SCOPUS:85079594362
SN - 1647-0818
VL - 11
SP - 17
EP - 32
JO - Linguamatica
JF - Linguamatica
IS - 2
ER -