Co-occurrence graphs applied to taxonomy extraction in scientific and technical corpora

ROGELIO ANTONIO NAZAR, Jorge Vivaldi, Leo Wanner

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Word co-occurrence graphs have been used in computational linguistics mainly for word sense disambiguation and induction, but until very recently, not for the extraction of hypernymy relations, where the methodology most often applied is the use of lexico-syntactic patterns. In this paper, we show that it is possible to use word co-occurrence statistics to extract IS-A relations between entities in scientific and technical corpora. We exploit the fact that word co-occurrence often has a direction, that is, a term might co-occur with another, but this is very often not true the other way round. This means that one can represent co-occurrence as a directed graph and this graph resembles a taxonomy. In this paper we present an experiment with texts randomly extracted from the Spanish Wikipedia, but our findings suggest that this co-occurrence behavior is a macroscopic and intrinsic property of argumentative discourse in general.

Original languageEnglish
Pages (from-to)67-74
Number of pages8
JournalProcesamiento de Lenguaje Natural
Volume49
StatePublished - Sep 2012
Externally publishedYes

Keywords

  • Co-occurrence statistics
  • Distributional semantics
  • Ontology learning
  • Quantitative linguistics
  • Taxonomy extraction

Fingerprint Dive into the research topics of 'Co-occurrence graphs applied to taxonomy extraction in scientific and technical corpora'. Together they form a unique fingerprint.

Cite this