MSc Defences Fall/Winter 2024

See the list of MSc defences at DIKU this winter. The list will be updated continuously.

Information about the thesis, supervisor, location of the defence, etc. can be found on the respective events below.

Bioinformatics

 

Name of student(s)  

Runda Xie

Study Programme  

Bioinformatics

Title  

In Silico Perturbation: A Deep Generative Approach to Gene Expression Prediction

Abstract  

Accurately simulating cellular responses to perturbations is a core goal in computational biology. There have been attempts to predict gene perturbations using generative models, showing significant potential. Here, we test an in silico perturbation method based on the deep generative model, Deep Generative Decoder, to predict gene perturbations using parameterized low-dimensional representations for high-dimensional single-cell gene expression data. By applying gene-specific gradients and updating representations, we demonstrate that this method successfully simulates perturbation-induced changes in gene expression. The model effectively forecasts regulatory directions of differentially
expressed genes, achieving strong Pearson correlation coefficients. Overall, this perturbation method provides a simple framework and shows considerable potential for simulating perturbation effect.

Supervisor(s)  

Anders Krogh and Viktoria Schuster

External examiner(s)  

Ole Lund

Date and time  

11.12.2024 10:30-11:30

Room  

Panum, Seminar-rum 33.4.D

 

 

Name of student(s)  

Zhen Han

Study Programme  

Bioinformatics

Title  

Comparative Analysis of Deep Learning Models for Human Trait and Phenotype Prediction from Genotype Data

Abstract  

This project presents a comprehensive comparative analysis of deep learning models for predicting human traits and phenotypes from genotype data using a human origins dataset. The study evaluates three main models: HyenaDNA, Nucleotide Transformer, and EIR, as well as various data transformations and sampling strategies. The main findings show that, contrary to traditional methods used in natural language and DNA sequence processing, fixed position sampling consistently outperforms random sampling for genetic data analysis. Moderate length sequences generally produce the best results for Transformerbased models, but longer sequences lead to increased performance fluctuations
and decreased performance. Among Transformer-based models, HyenaDNA shows superior performance when processing longer sequences, with greater stability and efficiency. The study also shows that the GLN model significantly outperforms Transformer-based methods on direct genotype data, with the best results achieved when combined with haplotype data. Notably, the performance of the GLN model continues to improve with increasing input length, indicating its excellent scalability to larger genetic datasets. The study highlights the importance of specialized methods for genetic data analysis, as traditional sequence processing methods may not be the best choice in this area.

Keywords: deep learning, genetic data analysis, ancestry prediction, hyena
DNA, GLN model, Transformer model, sampling strategy

Supervisor(s)  

Anders Krogh

External examiner(s)  

Ole Lund

Date and time  

11.12.2024 9:00-10:00

Room  

Panum, Seminar-rum 33.4.D

 

Computer Science

 

Name of student(s)  

Athanasios Soulis

Study Programme  

Computer Science

Title  

Reciprocal Social Touch mediated through Ultrasound Haptics in Virtual Reality

Abstract  

Mediated social touch enables meaningful, affective social interaction
from a distance. Based on the literature regarding mediated social touch
using ultrasound haptic feedback, the experience of being a receiver and
initiator of touch has been studied, but never when an individual assumes
both roles simultaneously. This thesis describes an empirical study designed to explore real-time reciprocal social touch mediated through congruent visuotactile feedback using ultrasound haptics in Virtual Reality
(VR). In this study, I attempt to answer whether the social aspect of conveying emotion through touch is supported in this mediated interaction. I
investigate how the emotional aspect of this touch evolves over the reciprocal communication and whether the type of relationship between participants affects the emotional evolution or perception of the touch. Results
show that this medium can convey emotions ranging from neutral to exciting and from pleasant to unpleasant with varying degrees of accuracy,
falling short in replicating the full spectrum of physical touch. The study
confirms that touch perception evolves during a continuous mediated interaction, with no evidence suggesting that the type of relationship affects
this interaction. However, the participants of the study indicated that the
ultrasound haptic feedback was not sufficiently expressive for social interaction, leading them to rely primarily on visual feedback. I discuss the
limitations of this setup and suggest directions for future exploration in
the area of reciprocal social touch mediated through ultrasound haptics
in VR.

Supervisor(s)  

Joanna Bergström

External examiner(s)  

Henrique Galvan Debarba

Date and time  

04.12.2024 15:00-16:00

Room  

Meeting room 0-11 Sigurdsgade 41.

 

 

Name of student(s)  

Qianxi YANG

Study Programme  

Computer Science

Title  

How can AI be designed to generate images on marginalized groups (such as autism) in a chatbot to facilitate user self-reflection?

Abstract  

The development of artificial intelligence (AI) has significantly evolved both the professional and personal spheres. Specifically, the advent of generative AI has notably gained public attention for its talent to revolutionize digital content creation, such as images, music, and text.

Generative models, trained on extensive datasets, tend to generate new content that mirrors the examples they were taught. Since these datasets primarily comprise internet data, which includes both accurate and inaccurate information as well as societal and cultural stereotypes and biases behind a cloak of objectivity (Esser et al., 2020), these
models have replicated existing biases in the world (Weise and Metz, 2023). Meanwhile, as public awareness continues to evolve, biases in AI, particularly those related to gender, disabilities, and race (Mack et al., 2024), have gained attention. However, less obvious biases, such as those concerning neurodiversity, still largely go unrecognized by the public
and are subtly perpetuated (Brandsen et al., 2024b).

In this paper, we will address stereotypes and biases in generative AI, focusing on text-toimage models where stereotypes and biases are more obvious. We focus on how these stereotypes and biases manifest in autism and propose methods to enhance the diversity of the generated images. We will also introduce a new perspective on how AI should
respond to marginalized groups and facilitate user self-reflection within a chatbot.

Supervisor(s)  

Pernille Bjørn, Karl-Emil Kjær Bilstrup and Kellie Dunn

External examiner(s)  

Claus Witfelt

Date and time  

05.12.2024

Room  

Online

 

Physics

 

Name of student(s)  

Adrián Avelino Sousa-Poza

Study Programme  

Physics

Title  

A Novel Gaussian Mixture Model Approach for a Deep Generative Decoder using Expectation-Maximisation

Abstract  

Gaussian Mixture Models (GMMs) are robust clustering and efficient estimators for densitybased distributions. They are commonly optimised using the Expectation-Maximization (EM) algorithm. In deep learning frameworks, gradient-based optimisers like Adam are often employed
for convenience, even though they have potential inefficiencies. This thesis investigates the performance differences between EM and Adam optimisers for GMMs. It explores the integration of an EM-based GMM into the Deep Generative Decoder (DGD) framework implemented in
PyTorch.

We first cover the theoretical background (see section 2) of machine learning (see section 2.1), deep neural networks (section 2.2), optimisation algorithms (section 2.3), Gaussian mixture models (section 2.4) and a Deep Generative Decoder (section 2.5).

Building upon the theory, we introduce an EM-based GMM implemented in PyTorch and demonstrate its functionality across various data sizes, feature dimensions, cluster counts and initialisations. EM converges with fewer iterations on multivariate normal datasets and reaches lower log-likelihoods and better clustering metrics than Adam without previously tuning hyperparameters.

Building upon these findings, we integrate the EM-based GMM into the DGD framework and evaluate the model on image datasets (MNIST and Fashion-MNIST) and a bulk gene expression dataset derived from the Genotype-Tissue Expression (GTEx) project and The Cancer Genome
Atlas (TCGA). The integrated model effectively reconstructs images and captures essential features in a low-dimensional latent space, forming clusters corresponding to different classes. Proper reconstruction is achieved when applied to the gene expression data, but the model struggles to achieve distinct cluster separation between different tissue types.

The thesis shows that EM is a suitable optimisation method for GMMs within a deep generative decoder framework. Nevertheless, more work needs to be done to address the limitations of creating well-separated clusters by improving the architecture of the DGD, adjusting the hyperparameters of the optimisers, and incorporating more informative and representative priors.

Supervisor(s)  

Anders Krogh

External examiner(s)  

Jesper Ferkinghoff-Borg

Date and time  

19.11.2024 14:30-16:00

Room  

Panum, Seminar-rum 33.4.D