TY - GEN
T1 - A new information theory based clustering fusion method for multi-view representations of text documents
AU - Zamora, Juan
AU - Sublime, Jérémie
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - Multi-view clustering is a complex problem that consists in extracting partitions from multiple representations of the same objects. In text mining and natural language processing, such views may come in the form of word frequencies, topic based representations and many other possible encoding forms coming from various vector space model algorithms. From there, in this paper we propose a clustering fusion algorithm that takes clustering results acquired from multiple vector space models of given documents, and merges them into a single partition. Our fusion method relies on an information theory model based on Kolmogorov complexity that was previously used for collaborative clustering applications. We apply our algorithm to different text corpuses frequently used in the literature with results that we find to be very satisfying.
AB - Multi-view clustering is a complex problem that consists in extracting partitions from multiple representations of the same objects. In text mining and natural language processing, such views may come in the form of word frequencies, topic based representations and many other possible encoding forms coming from various vector space model algorithms. From there, in this paper we propose a clustering fusion algorithm that takes clustering results acquired from multiple vector space models of given documents, and merges them into a single partition. Our fusion method relies on an information theory model based on Kolmogorov complexity that was previously used for collaborative clustering applications. We apply our algorithm to different text corpuses frequently used in the literature with results that we find to be very satisfying.
KW - Corpus analysis
KW - Information theory
KW - Multi-view clustering
UR - http://www.scopus.com/inward/record.url?scp=85088527814&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-49570-1_11
DO - 10.1007/978-3-030-49570-1_11
M3 - Conference contribution
AN - SCOPUS:85088527814
SN - 9783030495695
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 156
EP - 167
BT - Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings
A2 - Meiselwitz, Gabriele
PB - Springer
T2 - 12th International Conference on Social Computing and Social Media, SCSM 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020
Y2 - 19 July 2020 through 24 July 2020
ER -