Abstract
We present a methodology based on co-occurrence statistics between headwords and words in their definitions in order to derive hypernymy relations from a lexicographic corpus, as part of a more extensive project devoted to the creation of a general purpose Spanish ontology of nouns and its application to the study of predicate-argument structures. The idea of the present proposal is to extract these semantic relations using a statistical technique that allows to combine diverse lexicographic resources. We find that hypernyms of a word are frequently used in its definitions and, similarly, its hyponyms usually are those which have frequent mentions to this word in their definitions. This creates a statistical association between words that allows for a taxonomic structuring of a vocabulary. Preliminary results show precision figures of 71,57% in hypernyms and of 67,97% in hyponyms.
Original language | English |
---|---|
Pages (from-to) | 83-90 |
Number of pages | 8 |
Journal | Procesamiento de Lenguaje Natural |
Volume | 49 |
State | Published - Sep 2012 |
Externally published | Yes |
Keywords
- Co-occurrence statistics
- Computational lexicography
- Hypernymy relations
- Ontologias
- Taxonomy extraction