We present a methodology based on co-occurrence statistics between headwords and words in their definitions in order to derive hypernymy relations from a lexicographic corpus, as part of a more extensive project devoted to the creation of a general purpose Spanish ontology of nouns and its application to the study of predicate-argument structures. The idea of the present proposal is to extract these semantic relations using a statistical technique that allows to combine diverse lexicographic resources. We find that hypernyms of a word are frequently used in its definitions and, similarly, its hyponyms usually are those which have frequent mentions to this word in their definitions. This creates a statistical association between words that allows for a taxonomic structuring of a vocabulary. Preliminary results show precision figures of 71,57% in hypernyms and of 67,97% in hyponyms.
|Number of pages||8|
|Journal||Procesamiento de Lenguaje Natural|
|State||Published - 1 Sep 2012|
- Co-occurrence statistics
- Computational lexicography
- Hypernymy relations
- Taxonomy extraction