TY - JOUR
T1 - Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis
AU - Magna, Andres Alejandro Ramos
AU - ALLENDE CID, HÉCTOR GABRIEL
AU - Taramasco, Carla
AU - Becerra, Carlos
AU - Figueroa, Rosa L.
N1 - Publisher Copyright:
© 2013 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - Currently, one of the main challenges for information systems in healthcare is focused on support for health professionals regarding disease classifications. This work presents an innovative method for a recommendation system for the diagnosis of breast cancer using patient medical histories. In this proposal, techniques of natural language processing (NLP) were implemented on real datasets: one comprised 160, 560 medical histories of anonymous patients from a hospital in Chile for the following categories: breast cancer, cysts and nodules, other cancer, breast cancer surgeries and other diagnoses; and the other dataset was obtained from the MIMIC III dataset. With the application of word-embedding techniques, such as word2vec's skip-gram and BERT, and machine learning techniques, a recommendation system as a tool to support the physician's decision-making was implemented. The obtained results demonstrate that using word embeddings can define a good-quality recommendation system. The results of 20 experiments with 5-fold cross-validation for anamnesis written in Spanish yielded an F1 of 0.980 ± 0.0014 on the classification of 'cancer' versus 'not cancer' and 0.986 ± 0.0014 for 'breast cancer' versus 'other cancer'. Similar results were obtained with the MIMIC III dataset.
AB - Currently, one of the main challenges for information systems in healthcare is focused on support for health professionals regarding disease classifications. This work presents an innovative method for a recommendation system for the diagnosis of breast cancer using patient medical histories. In this proposal, techniques of natural language processing (NLP) were implemented on real datasets: one comprised 160, 560 medical histories of anonymous patients from a hospital in Chile for the following categories: breast cancer, cysts and nodules, other cancer, breast cancer surgeries and other diagnoses; and the other dataset was obtained from the MIMIC III dataset. With the application of word-embedding techniques, such as word2vec's skip-gram and BERT, and machine learning techniques, a recommendation system as a tool to support the physician's decision-making was implemented. The obtained results demonstrate that using word embeddings can define a good-quality recommendation system. The results of 20 experiments with 5-fold cross-validation for anamnesis written in Spanish yielded an F1 of 0.980 ± 0.0014 on the classification of 'cancer' versus 'not cancer' and 0.986 ± 0.0014 for 'breast cancer' versus 'other cancer'. Similar results were obtained with the MIMIC III dataset.
KW - anamnesis
KW - deep learning
KW - machine learning
KW - Natural language processing (NLP)
KW - recommendation system
UR - http://www.scopus.com/inward/record.url?scp=85086989625&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3000075
DO - 10.1109/ACCESS.2020.3000075
M3 - Article
AN - SCOPUS:85086989625
VL - 8
SP - 106198
EP - 106213
JO - IEEE Access
JF - IEEE Access
SN - 2169-3536
M1 - 9108225
ER -