Datalogisk Institut

MSc Defences Fall/Winter 2024

See the list of MSc defences at DIKU this winter. The list will be updated continuously.

Information about the thesis, supervisor, location of the defence, etc. can be found on the respective events below.

Bioinformatics

11 December: Runda Xie

Name of student(s)	Runda Xie
Study Programme	Bioinformatics
Title	In Silico Perturbation: A Deep Generative Approach to Gene Expression Prediction
Abstract	Accurately simulating cellular responses to perturbations is a core goal in computational biology. There have been attempts to predict gene perturbations using generative models, showing significant potential. Here, we test an in silico perturbation method based on the deep generative model, Deep Generative Decoder, to predict gene perturbations using parameterized low-dimensional representations for high-dimensional single-cell gene expression data. By applying gene-specific gradients and updating representations, we demonstrate that this method successfully simulates perturbation-induced changes in gene expression. The model effectively forecasts regulatory directions of differentially expressed genes, achieving strong Pearson correlation coefficients. Overall, this perturbation method provides a simple framework and shows considerable potential for simulating perturbation effect.
Supervisor(s)	Anders Krogh and Viktoria Schuster
External examiner(s)	Ole Lund
Date and time	11.12.2024 10:30-11:30
Room	Panum, Seminar-rum 33.4.D

11 December: Zhen Han

Name of student(s)	Zhen Han
Study Programme	Bioinformatics
Title	Comparative Analysis of Deep Learning Models for Human Trait and Phenotype Prediction from Genotype Data
Abstract	This project presents a comprehensive comparative analysis of deep learning models for predicting human traits and phenotypes from genotype data using a human origins dataset. The study evaluates three main models: HyenaDNA, Nucleotide Transformer, and EIR, as well as various data transformations and sampling strategies. The main findings show that, contrary to traditional methods used in natural language and DNA sequence processing, fixed position sampling consistently outperforms random sampling for genetic data analysis. Moderate length sequences generally produce the best results for Transformerbased models, but longer sequences lead to increased performance fluctuations and decreased performance. Among Transformer-based models, HyenaDNA shows superior performance when processing longer sequences, with greater stability and efficiency. The study also shows that the GLN model significantly outperforms Transformer-based methods on direct genotype data, with the best results achieved when combined with haplotype data. Notably, the performance of the GLN model continues to improve with increasing input length, indicating its excellent scalability to larger genetic datasets. The study highlights the importance of specialized methods for genetic data analysis, as traditional sequence processing methods may not be the best choice in this area. Keywords: deep learning, genetic data analysis, ancestry prediction, hyena DNA, GLN model, Transformer model, sampling strategy
Supervisor(s)	Anders Krogh
External examiner(s)	Ole Lund
Date and time	11.12.2024 9:00-10:00
Room	Panum, Seminar-rum 33.4.D

Computer Science

4 December: Athanasios Soulis

Name of student(s)	Athanasios Soulis
Study Programme	Computer Science
Title	Reciprocal Social Touch mediated through Ultrasound Haptics in Virtual Reality
Abstract	Mediated social touch enables meaningful, affective social interaction from a distance. Based on the literature regarding mediated social touch using ultrasound haptic feedback, the experience of being a receiver and initiator of touch has been studied, but never when an individual assumes both roles simultaneously. This thesis describes an empirical study designed to explore real-time reciprocal social touch mediated through congruent visuotactile feedback using ultrasound haptics in Virtual Reality (VR). In this study, I attempt to answer whether the social aspect of conveying emotion through touch is supported in this mediated interaction. I investigate how the emotional aspect of this touch evolves over the reciprocal communication and whether the type of relationship between participants affects the emotional evolution or perception of the touch. Results show that this medium can convey emotions ranging from neutral to exciting and from pleasant to unpleasant with varying degrees of accuracy, falling short in replicating the full spectrum of physical touch. The study confirms that touch perception evolves during a continuous mediated interaction, with no evidence suggesting that the type of relationship affects this interaction. However, the participants of the study indicated that the ultrasound haptic feedback was not sufficiently expressive for social interaction, leading them to rely primarily on visual feedback. I discuss the limitations of this setup and suggest directions for future exploration in the area of reciprocal social touch mediated through ultrasound haptics in VR.
Supervisor(s)	Joanna Bergström
External examiner(s)	Henrique Galvan Debarba
Date and time	04.12.2024 15:00-16:00
Room	Meeting room 0-11 Sigurdsgade 41.

5 December: Qianxi Yang

Name of student(s)	Qianxi YANG
Study Programme	Computer Science
Title	How can AI be designed to generate images on marginalized groups (such as autism) in a chatbot to facilitate user self-reflection?
Abstract	The development of artificial intelligence (AI) has significantly evolved both the professional and personal spheres. Specifically, the advent of generative AI has notably gained public attention for its talent to revolutionize digital content creation, such as images, music, and text. Generative models, trained on extensive datasets, tend to generate new content that mirrors the examples they were taught. Since these datasets primarily comprise internet data, which includes both accurate and inaccurate information as well as societal and cultural stereotypes and biases behind a cloak of objectivity (Esser et al., 2020), these models have replicated existing biases in the world (Weise and Metz, 2023). Meanwhile, as public awareness continues to evolve, biases in AI, particularly those related to gender, disabilities, and race (Mack et al., 2024), have gained attention. However, less obvious biases, such as those concerning neurodiversity, still largely go unrecognized by the public and are subtly perpetuated (Brandsen et al., 2024b). In this paper, we will address stereotypes and biases in generative AI, focusing on text-toimage models where stereotypes and biases are more obvious. We focus on how these stereotypes and biases manifest in autism and propose methods to enhance the diversity of the generated images. We will also introduce a new perspective on how AI should respond to marginalized groups and facilitate user self-reflection within a chatbot.
Supervisor(s)	Pernille Bjørn, Karl-Emil Kjær Bilstrup and Kellie Dunn
External examiner(s)	Claus Witfelt
Date and time	05.12.2024
Room	Online

Physics

19 November: Adrián Avelino Sousa-Poza

Name of student(s)	Adrián Avelino Sousa-Poza
Study Programme	Physics
Title	A Novel Gaussian Mixture Model Approach for a Deep Generative Decoder using Expectation-Maximisation
Abstract	Gaussian Mixture Models (GMMs) are robust clustering and efficient estimators for densitybased distributions. They are commonly optimised using the Expectation-Maximization (EM) algorithm. In deep learning frameworks, gradient-based optimisers like Adam are often employed for convenience, even though they have potential inefficiencies. This thesis investigates the performance differences between EM and Adam optimisers for GMMs. It explores the integration of an EM-based GMM into the Deep Generative Decoder (DGD) framework implemented in PyTorch. We first cover the theoretical background (see section 2) of machine learning (see section 2.1), deep neural networks (section 2.2), optimisation algorithms (section 2.3), Gaussian mixture models (section 2.4) and a Deep Generative Decoder (section 2.5). Building upon the theory, we introduce an EM-based GMM implemented in PyTorch and demonstrate its functionality across various data sizes, feature dimensions, cluster counts and initialisations. EM converges with fewer iterations on multivariate normal datasets and reaches lower log-likelihoods and better clustering metrics than Adam without previously tuning hyperparameters. Building upon these findings, we integrate the EM-based GMM into the DGD framework and evaluate the model on image datasets (MNIST and Fashion-MNIST) and a bulk gene expression dataset derived from the Genotype-Tissue Expression (GTEx) project and The Cancer Genome Atlas (TCGA). The integrated model effectively reconstructs images and captures essential features in a low-dimensional latent space, forming clusters corresponding to different classes. Proper reconstruction is achieved when applied to the gene expression data, but the model struggles to achieve distinct cluster separation between different tissue types. The thesis shows that EM is a suitable optimisation method for GMMs within a deep generative decoder framework. Nevertheless, more work needs to be done to address the limitations of creating well-separated clusters by improving the architecture of the DGD, adjusting the hyperparameters of the optimisers, and incorporating more informative and representative priors.
Supervisor(s)	Anders Krogh
External examiner(s)	Jesper Ferkinghoff-Borg
Date and time	19.11.2024 14:30-16:00
Room	Panum, Seminar-rum 33.4.D