Development of an automatic extractor of medical term candidates with linguistic techniques for Spanish

Walter Koza, María José Ojeda, Mirian Muñoz, Sofía Koza, Eduin Yepes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

An automatic method to extract term candidates from the medical field by applying linguistic techniques is presented. Semantic, morphological and syntactic rules were used to develop this term extractor. On the first phase, the detection was performed by applying a standard dictionary. This dictionary was uploaded to the analyzer software that assigned the tag 'TC' ('Term Candidates') to the words that could be considered terms. Morphological and syntactic rules were used to try to deduce the part of speech of the words that were not considered on the dictionary (WNCD). Afterwards, nominal phrases that included WNCD were gathered to extract them as term candidates of the field. Smorph, Post Smorph Module (MPS) - both work on groups- and Xerox's Xfst were the software used in this project. Smorph performs the morphological analysis of character strings, which yields morphological and POS tagging allocation for each occurrence according to the features given. MPS, in turn, uses the output of Smorph as its input and, from recomposition, decomposition and correspondence rules established by the user, analyzes the headword string that results from the morphological analysis. Xfst is a finite state tool that works on character strings assigning previously stated categories to allow, then, the automatic analysis of expressions. This method was tested on a section of the corpus of clinical cases collected by Burdiles (CCCM - 2009) of 217258 words. The results were evaluated according to precision and recall measures under expert guidance.

Original languageEnglish
Title of host publication2013 2nd International Conference on Informatics and Applications, ICIA 2013
PublisherIEEE Computer Society
Pages53-58
Number of pages6
ISBN (Print)9781467352550
DOIs
StatePublished - 2013
Externally publishedYes
Event2013 2nd International Conference on Informatics and Applications, ICIA 2013 - Lodz, Poland
Duration: 23 Sep 201325 Sep 2013

Publication series

Name2013 2nd International Conference on Informatics and Applications, ICIA 2013

Conference

Conference2013 2nd International Conference on Informatics and Applications, ICIA 2013
Country/TerritoryPoland
CityLodz
Period23/09/1325/09/13

Keywords

  • automatic extraction
  • linguistic information
  • medical terminology

Fingerprint

Dive into the research topics of 'Development of an automatic extractor of medical term candidates with linguistic techniques for Spanish'. Together they form a unique fingerprint.

Cite this