A taxonomy of Spanish nouns, a statistical algorithm to generate it and its implementation in open source code

Rogelio Nazar, IRENE RENAU ARAQUE

Resultado de la investigación: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

4 Citas (Scopus)

Resumen

In this paper we describe our work in progress in the automatic development of a taxonomy of Spanish nouns, we offer the Perl implementation we have so far, and we discuss the different problems that still need to be addressed. We designed a statistically-based taxonomy induction algorithm consisting of a combination of different strategies not involving explicit linguistic knowledge. Being all quantitative, the strategies we present are however of different nature. Some of them are based on the computation of distributional similarity coefficients which identify pairs of sibling words or co-hyponyms, while others are based on asymmetric co-occurrence and identify pairs of parent-child words or hypernym-hyponym relations. A decision making process is then applied to combine the results of the previous steps, and finally connect lexical units to a basic structure containing the most general categories of the language. We evaluate the quality of the taxonomy both manually and also using Spanish Wordnet as a gold-standard. We estimate an average of 89.07% precision and 25.49% recall considering only the results which the algorithm presents with high degree of certainty, or 77.86% precision and 33.72% recall considering all results.

Idioma originalInglés
Título de la publicación alojadaProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditoresNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
EditorialEuropean Language Resources Association (ELRA)
Páginas1485-1492
Número de páginas8
ISBN (versión digital)9782951740891
EstadoPublicada - 1 ene 2016
Evento10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Eslovenia
Duración: 23 may 201628 may 2016

Serie de la publicación

NombreProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Conferencia

Conferencia10th International Conference on Language Resources and Evaluation, LREC 2016
PaísEslovenia
CiudadPortoroz
Período23/05/1628/05/16

Huella Profundice en los temas de investigación de 'A taxonomy of Spanish nouns, a statistical algorithm to generate it and its implementation in open source code'. En conjunto forman una huella única.

Citar esto