A new information theory based clustering fusion method for multi-view representations of text documents

Juan Zamora, Jérémie Sublime

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Multi-view clustering is a complex problem that consists in extracting partitions from multiple representations of the same objects. In text mining and natural language processing, such views may come in the form of word frequencies, topic based representations and many other possible encoding forms coming from various vector space model algorithms. From there, in this paper we propose a clustering fusion algorithm that takes clustering results acquired from multiple vector space models of given documents, and merges them into a single partition. Our fusion method relies on an information theory model based on Kolmogorov complexity that was previously used for collaborative clustering applications. We apply our algorithm to different text corpuses frequently used in the literature with results that we find to be very satisfying.

Original languageEnglish
Title of host publicationSocial Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings
EditorsGabriele Meiselwitz
PublisherSpringer
Pages156-167
Number of pages12
ISBN (Print)9783030495695
DOIs
StatePublished - 2020
Event12th International Conference on Social Computing and Social Media, SCSM 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020 - Copenhagen, Denmark
Duration: 19 Jul 202024 Jul 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12194 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th International Conference on Social Computing and Social Media, SCSM 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020
Country/TerritoryDenmark
CityCopenhagen
Period19/07/2024/07/20

Keywords

  • Corpus analysis
  • Information theory
  • Multi-view clustering

Fingerprint

Dive into the research topics of 'A new information theory based clustering fusion method for multi-view representations of text documents'. Together they form a unique fingerprint.

Cite this