Hauptnavigation

Pages about teaching are available in German only Zurück zu der Liste der Abschlussarbeiten

Choosing the number of Gaussian clusters

Title Choosing the number of Gaussian clusters
Description

Choosing the number of clusters remains a very practical challenge. We have previously discussed the problems even for k-means (in Stop using the elbow criterion for k-means and how to choose the number of clusters instead).

But k-means is a very simple case. Choosing the number of Gaussians for Gaussian Mixture Modeling is a closely related challenge where some of above methods (e.g., BIC) can also be used in a similar way.

Qualification
  • Good knowledge in statistics (as these measures are statistical)
  • Strong programming skills (complex Java framework to work with)
Proposal

You task is to add functionality to ELKI (Java Data Mining framework) to help choosing the number of clusters for Gaussian Mixture modeling.

The methods for choosing the number of clusters of these papers are to be reproduced:

  • Banfield, Jeffrey D., and Adrian E. Raftery. "Model-based Gaussian and non-Gaussian clustering." Biometrics (1993): 803-821.
  • Feng, Yu, and Greg Hamerly. "PG-means: learning the number of clusters in data." Advances in neural information processing systems 19 (2006).
Thesistype Bachelorthesis
Second Tutor Schubert, Erich
Professor Schubert, Erich
Status Vorgemerkt