TY - GEN
T1 - Formalizing Predicates for Discovery Under the Lexicon Grammar Framework
AU - Jacobsen, Javiera
AU - Koza, Walter
AU - Muñoz, Mirian
AU - Saiz, Francisca
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - This text proposes a method for automatic analysis of predicates for discovery (PD) in Spanish. A PD is a predicative unit that projects an argument structure (AS) whose meaning alludes to ‘something that is found by someone -or something- somewhere’ (e.g., ‘encontrar’, ‘hallar’). This type of task is useful in fields such as medicine, since it offers the possibility of automatically identifying findings of interest (diseases, test results, etc.) in large text corpora. The present work is based on Lexicon Grammar (LG), which proposes a formalization from the nature of arguments (object classes) and transformational possibilities. The methodology is carried out as follows: (i) manual identification of PDs from a corpus of gynecology and obstetrics; (ii) elaboration of LG tables for each PD, where object classes are categorized and possible transformations are listed; and (iii) computational modeling. For the last stage, electronic dictionaries and computer-generated grammars were built in NooJ. The algorithm with automatically detected and generated ASs from PDs (325 grammatical sentences) was evaluated against an annotated corpus (1000 manually-annotated sentences, randomly extracted from a corpus of 5 million words). Results gave 98% accuracy, 88% coverage, and 92% F-measure.
AB - This text proposes a method for automatic analysis of predicates for discovery (PD) in Spanish. A PD is a predicative unit that projects an argument structure (AS) whose meaning alludes to ‘something that is found by someone -or something- somewhere’ (e.g., ‘encontrar’, ‘hallar’). This type of task is useful in fields such as medicine, since it offers the possibility of automatically identifying findings of interest (diseases, test results, etc.) in large text corpora. The present work is based on Lexicon Grammar (LG), which proposes a formalization from the nature of arguments (object classes) and transformational possibilities. The methodology is carried out as follows: (i) manual identification of PDs from a corpus of gynecology and obstetrics; (ii) elaboration of LG tables for each PD, where object classes are categorized and possible transformations are listed; and (iii) computational modeling. For the last stage, electronic dictionaries and computer-generated grammars were built in NooJ. The algorithm with automatically detected and generated ASs from PDs (325 grammatical sentences) was evaluated against an annotated corpus (1000 manually-annotated sentences, randomly extracted from a corpus of 5 million words). Results gave 98% accuracy, 88% coverage, and 92% F-measure.
KW - Automatic analyses
KW - Lexicon grammar
KW - Predicates for discovery
UR - http://www.scopus.com/inward/record.url?scp=85123283081&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-92861-2_6
DO - 10.1007/978-3-030-92861-2_6
M3 - Conference contribution
AN - SCOPUS:85123283081
SN - 9783030928605
T3 - Communications in Computer and Information Science
SP - 62
EP - 71
BT - Formalizing Natural Languages
A2 - Bigey, Magali
A2 - Richeton, Annabel
A2 - Silberztein, Max
A2 - Thomas, Izabella
PB - Springer Science and Business Media Deutschland GmbH
T2 - 15th International Conference, NooJ 2021
Y2 - 9 June 2021 through 11 June 2021
ER -