MSc Defences Fall/Winter 2024

See the list of MSc defences at DIKU this winter. The list will be updated continuously.

Information about the thesis, supervisor, location of the defence, etc. can be found on the respective events below.

Physics

 

Name of student(s)  

Adrián Avelino Sousa-Poza

Study Programme  

Physics

Title  

A Novel Gaussian Mixture Model Approach for a Deep Generative Decoder using Expectation-Maximisation

Abstract  

Gaussian Mixture Models (GMMs) are robust clustering and efficient estimators for densitybased distributions. They are commonly optimised using the Expectation-Maximization (EM) algorithm. In deep learning frameworks, gradient-based optimisers like Adam are often employed
for convenience, even though they have potential inefficiencies. This thesis investigates the performance differences between EM and Adam optimisers for GMMs. It explores the integration of an EM-based GMM into the Deep Generative Decoder (DGD) framework implemented in
PyTorch.

We first cover the theoretical background (see section 2) of machine learning (see section 2.1), deep neural networks (section 2.2), optimisation algorithms (section 2.3), Gaussian mixture models (section 2.4) and a Deep Generative Decoder (section 2.5).

Building upon the theory, we introduce an EM-based GMM implemented in PyTorch and demonstrate its functionality across various data sizes, feature dimensions, cluster counts and initialisations. EM converges with fewer iterations on multivariate normal datasets and reaches lower log-likelihoods and better clustering metrics than Adam without previously tuning hyperparameters.

Building upon these findings, we integrate the EM-based GMM into the DGD framework and evaluate the model on image datasets (MNIST and Fashion-MNIST) and a bulk gene expression dataset derived from the Genotype-Tissue Expression (GTEx) project and The Cancer Genome
Atlas (TCGA). The integrated model effectively reconstructs images and captures essential features in a low-dimensional latent space, forming clusters corresponding to different classes. Proper reconstruction is achieved when applied to the gene expression data, but the model struggles to achieve distinct cluster separation between different tissue types.

The thesis shows that EM is a suitable optimisation method for GMMs within a deep generative decoder framework. Nevertheless, more work needs to be done to address the limitations of creating well-separated clusters by improving the architecture of the DGD, adjusting the hyperparameters of the optimisers, and incorporating more informative and representative priors.

Supervisor(s)  

Anders Krogh

External examiner(s)  

Jesper Ferkinghoff-Borg

Date and time  

19.11.2024 14:30-16:00

Room  

Panum, Seminar-rum 33.4.D