Currently, one of the main challenges for information systems in healthcare is focused on support for health professionals regarding disease classifications. This work presents an innovative method for a recommendation system for the diagnosis of breast cancer using patient medical histories. In this proposal, techniques of natural language processing (NLP) were implemented on real datasets: one comprised 160, 560 medical histories of anonymous patients from a hospital in Chile for the following categories: breast cancer, cysts and nodules, other cancer, breast cancer surgeries and other diagnoses; and the other dataset was obtained from the MIMIC III dataset. With the application of word-embedding techniques, such as word2vec's skip-gram and BERT, and machine learning techniques, a recommendation system as a tool to support the physician's decision-making was implemented. The obtained results demonstrate that using word embeddings can define a good-quality recommendation system. The results of 20 experiments with 5-fold cross-validation for anamnesis written in Spanish yielded an F1 of 0.980 ± 0.0014 on the classification of 'cancer' versus 'not cancer' and 0.986 ± 0.0014 for 'breast cancer' versus 'other cancer'. Similar results were obtained with the MIMIC III dataset.
- Natural language processing (NLP)
- deep learning
- machine learning
- recommendation system