In this paper, the tasks made for obtaining an automatic extractor for verbal chilenismos using natural language rules are described. With this objective, a formalization of lexical, morphological and syntactic features was made, for a subsequent computational implementation. Firstly, verbal chilenismos were classified in four kinds, according to the use registered in the dictionaries and syntactic features: pure, pure-clitic, of sense, and of senseclitic. Secondly, syntactic rules were established for the automatic recognition. Smorph and Post Smorph Module were used in the computational work, both use natural language rules. The method was tested in a corpus composed by 5194 tweets produced in Chile, obtaining 85.54% of precision, 96.16% of coverage, and 90.53% of F-measure. The results show that this method is able for this kind of work, all the same, some limitations and mistakes were detected and more specific and new rules are necessary for the recognition task and for filtering wrong tagged. This research was founded by FONDECYT 11130469 project.
|Translated title of the contribution||Automatic detection of verbal chilenismos using morphosyntactic rules. First results|
|Number of pages||8|
|Journal||Procesamiento de Lenguaje Natural|
|State||Published - 1 Mar 2015|