Recent advances in high-dimensional clustering for text data

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Scopus citations

Abstract

Clustering has become an important tool for every data scientist as it allows to perform exploratory data analysis and summarize large amounts of data. Specifically for text data, clustering faces other challenges derived from the high-dimensional space into which the data is represented. Furthermore and in spite of the fact that important contributions have already been made, scalability presents an important challenge when the whole-data-in-memory approach is no longer valid for real scenarios where data is collected in massive volumes. This chapter reviews the recent contributions on high-dimensional text data clustering with particular emphasis on scalability issues and also on the impact of the curse of dimensionality over the distance-based clustering methods.

Original languageEnglish
Title of host publicationStudies in Fuzziness and Soft Computing
PublisherSpringer Verlag
Pages323-337
Number of pages15
DOIs
StatePublished - 1 Oct 2017
Externally publishedYes

Publication series

NameStudies in Fuzziness and Soft Computing
Volume349
ISSN (Print)1434-9922

Fingerprint

Dive into the research topics of 'Recent advances in high-dimensional clustering for text data'. Together they form a unique fingerprint.

Cite this