Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Probabilistic latent semantic indexing
3.922
Zitationen
1
Autoren
1999
Jahr
Abstract
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain speci c synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing LSI by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and de nes a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching metho d s a s w ell as over LSI. In particular, the combination of models with di erent dimensionalities has proven to be advantageous.
Ähnliche Arbeiten
Visualizing Data using t-SNE
2008 · 35.663 Zit.
Data mining: concepts and techniques
2012 · 28.852 Zit.
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
1987 · 20.036 Zit.
A density-based algorithm for discovering clusters in large spatial Databases with Noise
1996 · 19.116 Zit.
The WEKA data mining software
2009 · 17.791 Zit.