TY - JOUR
T1 - Detección automática de chilenismos verbales a partir de reglas morfosintácticas. Resultados preliminares
AU - Koza, Walter A.
AU - Faccio, Pedro Alfaro
AU - Gamboa, Ricardo Martínez
N1 - Publisher Copyright:
© 2015 Sociedad Española para el Procesamiento del Lenguaje Natural.
PY - 2015/3/1
Y1 - 2015/3/1
N2 - In this paper, the tasks made for obtaining an automatic extractor for verbal chilenismos using natural language rules are described. With this objective, a formalization of lexical, morphological and syntactic features was made, for a subsequent computational implementation. Firstly, verbal chilenismos were classified in four kinds, according to the use registered in the dictionaries and syntactic features: pure, pure-clitic, of sense, and of senseclitic. Secondly, syntactic rules were established for the automatic recognition. Smorph and Post Smorph Module were used in the computational work, both use natural language rules. The method was tested in a corpus composed by 5194 tweets produced in Chile, obtaining 85.54% of precision, 96.16% of coverage, and 90.53% of F-measure. The results show that this method is able for this kind of work, all the same, some limitations and mistakes were detected and more specific and new rules are necessary for the recognition task and for filtering wrong tagged. This research was founded by FONDECYT 11130469 project.
AB - In this paper, the tasks made for obtaining an automatic extractor for verbal chilenismos using natural language rules are described. With this objective, a formalization of lexical, morphological and syntactic features was made, for a subsequent computational implementation. Firstly, verbal chilenismos were classified in four kinds, according to the use registered in the dictionaries and syntactic features: pure, pure-clitic, of sense, and of senseclitic. Secondly, syntactic rules were established for the automatic recognition. Smorph and Post Smorph Module were used in the computational work, both use natural language rules. The method was tested in a corpus composed by 5194 tweets produced in Chile, obtaining 85.54% of precision, 96.16% of coverage, and 90.53% of F-measure. The results show that this method is able for this kind of work, all the same, some limitations and mistakes were detected and more specific and new rules are necessary for the recognition task and for filtering wrong tagged. This research was founded by FONDECYT 11130469 project.
KW - Automatic detection
KW - MPS
KW - Morphosyntactic rules
KW - Smorph
KW - Verbal chilenismo
UR - http://www.scopus.com/inward/record.url?scp=84925664611&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84925664611
SN - 1135-5948
VL - 54
SP - 69
EP - 76
JO - Procesamiento de Lenguaje Natural
JF - Procesamiento de Lenguaje Natural
ER -