MSc Defences Summer 2022
See the list of MSc defences at DIKU this summer. The list will continuously be updated.
Information about the thesis, supervisor, location of the defence, etc., can be found on the respective events below.
Study Programme
Computer Science
Title
Generating Generators - A CLS-based Pipeline for Unity
Abstract
This project aims to provide a prototyping pipeline for game development with the focus of using procedural generation. Inspiring game and level designers to develop game ideas with a foundation in procedural generation. Furthermore, the prototyping tool provides a high-level user interface allowing the game and level designers to generate prototypes
without having to consult programmers that must spend time on prototypes that may or may not be used. The pipeline uses the CLS framework by Bessai et. al. ([1]) for code generation and features Unity integration.
Supervisor
Boris Düdder
External examiner
Mads Rosendahl
Time and place
10:00 at Sigurdsgade 41, room 2.03
Study Programme
Computer Science
Title
Measuring the Trustworthiness of Decentralized Identity Systems
Abstract
For decentralized identity systems to be adopted, they must be trustworthy. In this paper we examine decentralized identity systems with respect to trustworthiness. We focus on three components in particular that require trust: (1) blockchains, (2) identity issuers and (3) zero-knowledge proofs used for selective disclosure of credentials. For blockchains, we specifically wish to measure the trustworthiness of node providers that many applications depend upon. We propose an online algorithm to measure the trustworthiness of nodes inspired by SourceRank that builds on the assumption that agreement between different sources is a good indicator of trust. For identity issuers, we measure their trustworthiness based on their individual trust attributes. We argue that without perfect knowledge, it is not possible to know which or how much each individual trust attribute contributes to the total trustworthiness of an issuer, and therefore some subjective assumptions are needed. We suggest a subjective assumption that is often used for identification in the real world today and show how this subjective assumption can be used to quantify the trustworthiness of identity issuers. Lastly, we discuss the implications of using zero-knowledge proofs in an identity context for both provers and verifiers. We argue that they can allow for stronger privacy guarantees and we relate this discussion to previous research on privacy.
Supervisor
Boris Düdder
External examiner
Mads Rosendahl
Time and place
10.06.2022 11:00
Sigurdsgade 41, room 2.03
Study Programme
Physics
Title
Reducing the Need for General Anesthesia in Children Undergoing Neuroimaging by Preparation and Motion Correction
Abstract
In Magnetic Resonance Imaging (MRI) of the human brain, the most common cause of artifacts generation is head motion. This problem is more severe when it comes to scanning children. This is because, contrary to adult patients who have in general better control over their movements, pediatric patients have a harder time doing so. The size of the scanner combined with the loud noises make this situation even worse. As a result, doctors usually resort in administering General Anesthesia (GA) to the children to get a diagnosable image. This however entails many health risks and financial problems because of the sedation through drugs and the
additional personnel needed.
In this project we came up with a method to avoid those problems and still get good images. Instead of sedating the children, we trained them to become familiar with the whole experience of getting inside a massive scanner and thus be calmer during the actual scan while, at the same time, they had the opportunity to watch a movie and be distracted from the loud noises. We also used a motion correction device which would, up to a certain degree, correct the motion artifacts of the image. However, this technique was applied on approximately half of the children, chosen randomly, in an attempt to check whether it was the training or the motion
correction that was more effective. The protocols consisted of T1 and T2 weighted, 2D or 3D encoded sequences and diffusion weighted imaging, depending on the structures and the parts of the brain that we wanted to focus on respectively.
As part of our evaluation we tried to find a reliable way to assess the quality of the images by implementing image quality metrics that can produce a result without using a second image serving as a ground truth, since there is only one image provided in real clinical scans. Additionally, we studied how accurately some metrics based on external motion measurements predict the image quality. These methods were first tested on data acquired from healthy volunteer patients and they were afterwards applied on the data from the pediatric patients. All those metrics were compared with quality scores based on the Likert scale given by doctors.
The final results of the thesis indicate that the image quality metrics showed a weak correlation with the results provided by the doctors while those depending on motion measurements turned out to be more reliable. Furthermore, we observed a significant decrease in the motion during the acquisition before and after the training, meaning that our initial assumption was correct, having the children undergo a training before getting scanned can put out the need for sedation.
Supervisor(s)
Melanie Ganz Benjaminsen
Heloisa Nunes Bordall
External examiners
Tim Bjørn Dyrby
Time and place
13.06.2022 10:00
Rigshospitalet (NRU, afsnit 8057, conference room, Inge Lehmanns Vej 6-8, 2100 København)
Study Programme
Computer Science
Title
Distance Oracles: Exploring the Pattern Approach for Exact Distances in Planar Graphs
Abstract
We give a survey on the literature on distance oracles spanning exact and approximate distance reporting in the setting of both general and planar graphs.
Given an undirected, unweighted planar graph, Mozes, Fredslund-Hansen and Wulff-Nilsen gave the first truly sub-quadtric size oracle answering exact distances in constant time. We give a thorough description of this oracle, contributing with an analysis of quadratic preprocessing time and a new analysis of space requirement, achieving their stated result while cor-recting the inadequacies of the original paper.
A key element in achieving this result lies in bounding the number of distinct discrete derivatives w.r.t. the distance between a source vertex and vertices on a cycle, commonly referred to as patterns. It has been proven that a cubic upper bound exists, however it is hypothesized that a quadratic upper bound is likely true. We give a quadratic lower bound and strengthen the quadratic hypothesis through experimentation. In addition, we give new analysis predicting the performance of the oracle above assuming the quadratic hypothesis true, yielding stronger guarantees than previously known.
A second key element is to decompose the input graph into r -divisions, for which we give a much simplified scheme based on tree regions. Given an n-vertex planar non-triangulated graph G and a parameter 0 < r ≤ n, our scheme decomposes G into Θ(n/r ) connected regions of size Θ(r ) vertices, each with just 1 hole, in linear time. This matches the complex decom-position scheme by Mozes, Klein and Sommer except for a regionboundary guarantee, not required for some applications including this oracle.
Finally we show an implementation of the oracle using our simplified decomposition scheme. Through experimentation we thus provide evidence of a truly sub-quadratic distance oracle answering queries in constant time, straight forward to implement and without showing expensive constants invalidating applicability.
Supervisor(s)
Christian Wulff-Nilsen
External examiner(s)
Jesper Larsen
Time and place
13.06.2022 09:30
Lille UP1 på DIKU, Universitetsparken 1
Study Programme
Computer Science
Title
Multi-task learning on images of beetles using deep neural networks
Abstract
Deep learning image classification is usually performed at only one classification level. However, explicit hierarchical information about the objects being classified is rarely leveraged. We investigate ways of leveraging such hierarchical information, concretely in the context of classification of images of rove beetles. We propose a novel multi-task classifier, the Fusion model. The fusion model is characterized by 1) a number of tracks corresponding to the hierarchical levels used for classification and 2) a method by which information can be shared across said tracks. Thereby the fusion model allows flexible parameter sharing between different classification tasks. We also propose an extension to the fusion model, which incorporates a reconstructive task. We hypothesise that including a reconstructive task will be amenable to classification task performance and vice versa. However, we cannot report conclusive evidence for or against the utility of these proposed models. Experiments indicate that there may be fundamental issues with the dataset of rove beetles. In particular, we observe in our analysis that independently trained models consistently misclassify specific samples from the rove beetle dataset, which we confirm to be mislabelled samples. In future work, we wish to investigate to what extent samples from the rove beetle dataset may be mislabelled. All our code is publicly available at https://github.com/pants007/msc-thesis-code.
Supervisor(s)
Kim Steenstrup Pedersen
External examiner(s)
Rasmus Reinhold Paulsen
Time and place
13.06.2022 10:30 - 12:15
Zoo Museum, Universitetsparken 15, lille mødelokale
Study Programme
IT and Cognition
Title
Recommender system for Pleaz, a video streaming service
Abstract
The goal of this thesis is to describe the process of designing a recommendation system tailored to the needs of Pleaz and its users. Pleaz is a video streaming service that offers videos of physical exercise to its clients. Since the Pleaz platform (pleaz.io) is still in a relatively early stage of development, there has been no recommendation system implemented. Furthermore, there were no robust data logging systems that would provide data for such a system. This meant that our first task was to implement said logging systems into the existing web platform. With the collected user interaction data, we implemented several recommendation system models based on popular approaches - collaborative filtering, content-based recommendations, and Top-K. We based the collaborative filtering models on widely-recognized methods such as KNN and SVD. Content-based recommendations were based on a similarity matrix built from available video tags and descriptions obtained from the website. The recommendations were then generated by creating a user profile created for each user based on their past interactions. In order to evaluate the systems’ precisions, we used popular measures such as hit rate, mean reciprocal rank, coverage, diversity, RMSE, MAE, and efficiency. We looked at recommendation overlaps between various models. We repeated the process of training and evaluating the models for two different signals (based on different user interaction data) in order to see which signal in combination with which model would have the best score.
Supervisor
Maria Maistro
External examiner
Luca Maria Aiello
Time and place
14.06.2022 09:00
Room 3-0-25 at DIKU, Universitetsparken 1, 2100 CPH Ø
Study Programme
Physics
Title
Analysis of dynamic three-dimensional images for sperm motion studies
Abstract
Negative trends in fertility and quality of sperm have been reported, with evidence to suspect that endocrine disrupters impact sperm motion. In particular, they can provoke alternations between progressive and hyperactivated motility, lowering the chances of fertilisation. We monitored boar sperm cells to determine their motion descriptors and correlate them to the motility regime. Our raw data were dynamic three-dimensional images of boar spermatozoa obtained by light sheet fluorescence microscopy. We describe the methodology followed to go from images, with the mere information being pixel intensities, to physical measures. We defined a centroid and basis for the spermatozoon’s head. Then, we computed motion descriptors such as the angular rotation and speed by following the head’s movement through time. We found that the head does not have an intrinsic rotation while swimming, but it shows symmetric bending about its three coordinate axes. The imaging technique and segmentation enlarged the actual head, resulting in average head dimensions five times the value given in the literature and high uncertainty in speed determination. Therefore, we could not confidently assure the motility regime of the cells in our sample, but we saw that they move ballistically even without a concentration gradient to guide them. In conclusion, this thesis conceived a methodology to measure the sperm cell’s speed, rotation, and motion regime from 3D videos. At the same time, the imaging technique fulfilled the expectations of being an effective tool for studying sperm dynamics.
Supervisor
Jon Sporring
External examiner
TBD
Time and place
14.06.2022 time: TBD
Lille UP1, Universitetsparken 1, 2100 København Ø
Study Programme
Computer Science
Title
Meaning Representation of Legal Documents
Abstract
AMR has shown to be useful for many downstream tasks such as Common Sense Reasoning and Information Extraction. In this work, I explore the use of AMR for legal and logical reasoning. Specifically, I am proposing four model architectures that utilize semantic encoding in the form of AMR. The AMR graphs are generated by using an AMR parser. The presented architectures use graph linearization in combination with pre-trained language models (PLM). The PLMs are Legal-BERT, which was pre-trained on legal text, and AMRBART, which was pre-trained on AMR graphs.
Additionally, I propose an architecture that combines AMR and textual input by concatenating text and AMR embeddings extracted from PLMs. The results show that the AMR models are not able to outperform text-only baseline models. Although the performance could not be improved, it was shown that the AMR models can solve different instances than the text models. The error analysis further revealed that AMR parsing quality is the biggest problem, especially regarding inputs with multiple sentences. Furthermore, a theoretical analysis of how logical relations are represented in AMR has shown that AMR might be helpful to capture logical relations in some logical statements while it might not be useful for others.
Supervisor(s)
Daniel Hershcovich
External examiner(s)
Christian Hardmeier
Time and place
14.06.2022 14:00
Online
Study Programme
Computer Science
Title
Multilingual Information Retrieval with word embeddings
Abstract
Multilingual Information Retrieval (MLIR) is a task of searching and ranking relevant documents from a collection written in various languages, given the user’s query. With easy access to multilingual resources, there are many use cases of the problem, e.g., searching for patents, medical treatments, foreign news, or culture. On top of the challenges coming from searching the monolingual collection, MLIR faces the challenge of translation and creating a ranking that is not groundlessly biased
towards any language. A solution to these problems is offered by multilingual word embeddings which proved to achieve high performance in many language understanding tasks. In this thesis, I explore multilingual BERT word embeddings in the document ranking task for the multilingual IR. Additionally, I investigate the impact of the embedding space alignment to remove the language-specific characteristics and allow better document comparison and, as a result, better final ranking.
The findings show that zero-shot training works well in the MLIR setting, even if the pre-trained models are developed using only English data, and significantly improve the performance compared to the baseline. Additionally, the vector space alignment improves the language distribution over the top-ranked documents.
Supervisor(s)
Maria Maistro
External examiner(s)
Luca Maria Aiello
Time and place
14.06.2022 10:30
3-0-25 at DIKU, Universitetsparken 1
Study Programme
Computer Science
Title
Visual growth patterns in Metarhizium
Abstract
Deep learning models are used for an increasing variety of purposes. With capabilities such as semantic segmentation, they are ideal for tasks in which objects need to be localized and measured. Deep learning models are however restricted by the requirement of large datasets for training.
The Department of Plant and Environmental Sciences at the University of Copenhagen (PLEN) has acquired the Reshape Imaging System (RIS) which is a machine capable of imaging petri dishes at set intervals. This creates large amounts of image data which can then be used to develop such deep learning models for analysis. To train models, true labels are needed. These can be created in an intelligent manner employing active learning.
In this thesis, the possibility of applying deep learning models to growth analysis of the fungi Metarhizium is investigated. Two experiments were conducted on the RIS, where Metarhizium was inoculated on petri dishes in the center and multiple places respectively. A pipeline for model development regarding such experiments is suggested. U-net models were developed for semantic segmentation and produced good results on both experiments. The first U-net produced segmentations on a test set with an average Dice score of 0.9647. For the second experiment Monte Carlo Dropout was applied as a way to increase the efficiency of the manual annotation procedure. This experiment shows that models can be trained with fewer annotations when the unlabelled data is selected intelligently for manual annotation. This segmentation task was harder in nature, as the fungi were located in differing areas of the petri dishes. The final U-net segmentation model achieved an average Dice score of 0.8471 on a test set. This score can largely be excused by the nature of the Dice score. In the start of the growth phase, when the fungi first appear, the model is usually off by a few frames, producing segmentations with a Dice score of 0 in these instances dragging the average Dice score down.
In conclusion deep learning models are capable of measuring the growth of Metarhizium via image data from the RIS.
The production of manual annotations for training data can be made more efficient with Monte Carlo Dropout.
Supervisor
Jon Sporring
External examiner
Rasmus Reinhold Paulsen
Time and place
Small UP1 at DIKU
Study Programme
Computer Science
Title
Visual assessment in Virtual Reality 2D contra 3
Abstract
Objective: Virtual Reality as a technology has been growing in the use of human-centered design. The technology focuses on visual processing of a 3D environment to fill the users entire field of view. The measurement of the visual processing can be done by applying an assessment to the process. The visual processing assessment is a tool for neuropsychology that can help us demonstrate how users perceive VR in comparison to the dimensional changes. Method: We applied the use of an assessment procedure derived from the theory of visual attention (TVA) [Bundesen (1990)], to expand upon the study of [Foerster et al. (2016)], to check if adding an additional dimensional component to their experiment procedure would benefit and be comparable to the two-dimensional trial testing. We used the game engine Unity to develop our application for the commercially available head-mounteddisplay Oculus Quest 2. We measured, with the use of TVA, three components to account for this change; the threshold of conscious perception, the capacity of visual working memory, and the speed of visual processing, as well as their predicted graph on the TVA model. Results: We found our 3D testings were comparable to the 2D, via the TVA assessment data of all components and their predicted graphing. Furthermore, data is applicable to the extension of [Foerster et al. (2016)], where all component data sits within their data bracket and thus supplement their findings. Conclusion: Thus, we demonstrated that adding a three-dimensional depth perception to a VR application, is in fact a proficient visual processing assessment tool, that works just as effectively as 2D, which extends the papers [Foerster et al. (2016)] and [Foerster et al. (2019)]. However, 3D does not add further beneficial additions other than supplying depth perception to the VR environment which may be more comfortable to users.
Supervisor
Johanna Bergström
External examiner
Henrique Galvan Debarba
Time and place
15.06.2022 14:00
Sigurdsgade 41, DIKU
Study Programme
Sundhed og Informatik
Title
Guiding Principles for Creating Ground Truth Data to use in Machine Learning
Abstract
N/A
Supervisor(s)
Finn Kensing and Hubert Dariusz Zajac
External examiner
Keld Bødtker
Time and place
15.06.2022 11:00
UP1 room 2-0-06, Universitetsparken 1, 2100 CPH Ø
Study Programme
Bioinformatics
Title
Tissue isolation in spatial epigenomics data using image processing
Abstract
Epigenomic heterogeneity is a crucial part of cell differentiation, development and pathological changes of many diseases. With the emergence of spatial epigenomics techniques, it becomes possible to characterize epigenomic markers across tissue domains. This characterization might provide new insight into the interaction between spatial context and epigenomic states. However, few spatial domain analysis tools are available for spatial epigenomics data. Here, I expanded the usage of Vesalius, an in silico anatomization tool designed for transcriptomics data, to spatial epigenomics data. Developed Vesalius successfully detected tissue architectures in mouse embryo and brain.
Supervisor
Anders Krogh
External examiner
Jes Frellsen
Time and place
15.06.2022 14:00
Meeting Room 3 (3. sal), BRIC
Study Programme
IT and Cognition
Title
Investigating the Utility of Different Ways to Share User Experiences
Abstract
Research in positive psychology shows that experiential consumption contributes to peoples’ happiness more than material. However, the possession of technologies does not clearly fall either in the experiential or material category. Technologies are considered material goods that afford new life experiences. Research on experiential consumption shows that one of the greatest values of experiences is encouraging social connection. In this work, I assumed that sharing user experiences with other people makes interaction with technologies more meaningful and unique and situates UX closer to experiences in the experientialmaterial paradigm. To investigate this question, I conducted five mini-studies. First, I identified different ways in which people could share their user experiences. I have shown that there are several ways to share experiences with technology: (1) interact with technology while physically being with other people, (2) experience technology synchronously while being in different locations, (3) and talk about the experience with technology after it happened. Then I explored the benefits that each type of sharing brings to the user and the effect of sharing on the lasting meaning of the user experience. I found that for each of the sharing types, sharing experiences with others fills the needs for relatedness and self-actualization that are essential to well-being. It is interesting that experiences created together with other people tend to be inherently more positive and contribute more to long-term importance than those that we have alone or those we talk about. Prominent that people perceive hedonic attributes of the products in shared experiences as higher, even though technologies often lack features that give affordances to sharing. Overall, more unique and unrepeatable, shared experiences seem to be closer to real-life experiences that contribute to the personal life story. In contrast, private and more general use of technology seems to be more similar to material product ownership. These findings expand our understanding of positive UX and open up new possibilities for experience design. It is not very intuitive that experiences have more utility than just technologies or other material possessions. Experiences are fleeting and cannot be reused. However, psychology shows that they live in our memory, form our identity and bring happiness. Thus, based on my findings that sharing makes the user experiences more meaningful and beneficial, the inclusion of others in the experience of using technology can transform the mere possession of technologies into a life-like experience and boost happiness.
Supervisor
Kasper Hornbæk
External examiner
Henrique Galvan Debarba
Time and place
15.06.2022 07:30
Location TBD
Study Programme
Computer Science
Title
Improving the User Experiences in VR by Affecting Time Perception With High Social Density
Abstract
The term user experience describes user perceptions and responses that result from the use of a product. Losing track of time is a commonly reported experience and an indication of a good user experience (UX).
Therefore, there might be potential relationships between user experience and time perception.
The ideas behind this work arise from threefold elements. 1. This work assumes that better user experiences relate to the phenomenon of time underestimation or faster self-perceived time speed; 2. A previous study showed that participants underestimated the duration of video clips with high social density and overestimated the duration of video clips with low social density in virtual reality (VR) [114]; 3. The effects of social facilitation/inhibition caused by the presence of others will influence individual behavior and performance [13], and the effects will be stronger if there are more people [56, 64]. Thus, this work attempts to use social density as an independent variable and asks participants to estimate seven minutes while playing a bubble-hitting task in VR. In this work, different social density means a different number of virtual agents in the scenarios.
This work presents the hypotheses in three aspects: 1. High social density will lead to time underestimation or faster self-sensed time speed; 2. High social density will lead to better user experiences, which could be revealed by higher scores on UX-related questions; 3. High social density will lead to stronger social facilitation/inhibition effects, which will make the easy task (bubble hitting task) easier and make the hard task (timing task) harder.
Results of this work against the hypothesis of shorter time estimation in high social density conditions, but support most of the UX-related hypotheses and almost all the hypotheses associated with social
facilitation/inhibition effects. Specifically, the results show that participants spent more time in the high social density condition (mean 437.38 seconds) than in the low social density condition (mean 403.25 seconds) until they thought they had played for seven minutes. However, there was a trend that participants had a faster self-sensed time speed under high social density conditions. Furthermore, participants had significantly higher scores on involvement and overall satisfaction, while they had a significantly lower score on the sense of mastery under conditions of high social density. According to environmental psychology studies, for a timing task as in this work, people are expected to spend more time in a comfortable environment, that is, they will spend more time in a comfortable environment before they feel it is time to stop [38]. Longer time estimation and faster self-sensed time of high social density conditions in this work imply that people may think quiet VR environments with high social density are more comfortable than the same environments with low social density. The results also provide guidelines for future VR application designers, stating that they can improve users’ involvement
3 and overall satisfaction by increasing the number of virtual agents in the virtual environment. In addition, the results indicate a new way to adjust the self-sensed difficulty in applications such as games without
adjusting the objective difficulty, that is, increasing the social density of the scenarios. Limitations and future work are discussed in the last part of this report. In all, high social density in a quiet VR environment leads to significantly higher involvement and overall satisfaction, while significantly reducing the sense of mastery. These may be caused by social
facilitation/inhibition effects. Future VR applications should consider increasing the number of virtual agents in environments to improve user experiences and self-sensed difficulty.
Supervisor
Kasper Hornbæk
External examiner
Henrique Galvan Debarba
Time and place
9:30
Place: TBD
Study Programme
Mathematics
Title
Sigma-Modules: An Algebraic Language for Invertible Database Queries
Abstract
This thesis develops a domain-specific language for the manipulation of database relations and queries. This language is built specifically to give rise to an algebraic structure that we will call sigmamodules, an extension of ordinary modules.
We explore how free modules are used to represent database relations, with linear maps between them representing queries. The central construction in our study is the tensor product of modules, which allows us to represent some, but not all, linear maps. In fact, notable linear maps such as the identity are missing from this space,
an omission that sigma-modules seek to address. After the construction of our language, we study some of its most important algebraic properties, and finally provide examples of how the most common database operations are translated into this language.
Supervisor
Fritz Henglein
External examiner
Rasmus Ejlers Møgelberg
Time and place
16.06.2022 15:00 - 16:00
PLTC meeting room (772-01-0-S29)
Study Programme
Computer Science
Title
Probing Pre-trained Langugage Models for Cross-Cultural Differences in Values
Abstract
The ubiquity of Language Models (LMs) in Natural Language Processing research and the daily lives of people around the world calls for a more thorough understanding of their embedded biases and behaviour in different contexts. Further, language we use often reflects our social, cultural and political values and since LMs are trained on web text, they pick up on these values. In order to investigate and better understand this phenomena, I present the first study, to the best of my knowledge, to surface bias towards values across different cultures picked up by pre-trained LMs. I develop a framework for probing three widely used multilingual LMs namely mBERT, XLM and XLM-R by creating probes in 13 languages based on questions asked in two wide ranging value surveys. The surveys were developed by social scientists and psychologists in order to analyse values across different cultures and there is a large body of work in disciplines within the social sciences using results from the surveys to get insights and develop theories. Through my experiments, I find that each of three models capture differences in values across cultures. There are, however, discrepancies in the values picked up by the LMs across different cultures. They vary in range as well as have low alignment with each other in terms predicted value scores. I also find that the predicted scores for values across cultures do not correlate with survey data filled in by people from those cultures. I present a discussion around the implications of this misalignment when using models in a cross-cultural context and suggest ways for bridging the gap.
Keywords: probing, language models, values, cross-cultural
Supervisor
Isabelle Augenstein
External examiner
Zeljko Agic
Time and place
12:50 -14:10
UP1-1-1-N116B, DIKU, Universitetsparken 1, 2100 CPH Ø
Study Programme
Computer Science
Title
Evaluating Parsers with Chinese Challenge Set
Abstract
In dependency parsing and abstract meaning representation (AMR) parsing, there may be several grammatically correct ways to parse a sentence. Considering the meaning of the sentence can help disambiguation, as there may be only one semantically correctly way to parse. The ability to disambiguate is important for the parsers since many downstream tasks including machine translation require clear and unambiguous results. To assess parsers in such situations, we proposed a Chinese challenge set focusing on four sentence structures that involve ambiguities resolved by selectional preferences and commonsense reasoning. We investigated some well-performing dependency parsers from CoNLL 2018 shared task (Zeman et al., 2018) in depth and evaluated some of them on our challenge set, showing that each parser is biased differently, and that stanza is doing the best while UDPipe1 is the worst. We also find out that tokenization is a major cause of deterioration on parsing performance for Chinese parsers. We further evaluated these parsers with general Chinese, English and German test set, the result shows the relative performance between parsers is consistent throughout different languages, and Chinese is the most confusing language for all the parsers. We also inspected some AMR parsers and evaluated HanLP (He and Choi, 2021) with our dataset. The result shows the parser has strong bias finding the token that the quantifier phrase is modifying, and that the parser has some ability to distinguish sentences with syntactic ambiguity in other conditions.
Supervisor(s)
Daniel Hershcovich
External examiner(s)
Zeljko Agic
Time and place
14.06.2022 10:00
Online
Study Programme
Computer Science
Title
Uncertainty and exaggerations of scientific findings in social media
Abstract
Misinformation is a major element of today’s news cycle and social media. Therefore, we examine exaggeration and uncertainty when it comes to social media data and press releases that are related to scientific studies. We also study the tools that can be used for this, which is MT-PET model and a certainty estimator. We create an extended MT-PET model to handle an uncertainty task. Even though we are able to find relevant data for our model to use, we hardly see any datasets that are appropriately labelled that we can use. To solve this problem we use active learning to further improve upon the viable data samples and we also improve the models’ verbalizers using PETAL. We demonstrate with this that the extended MT-PET model is capable of capturing elements of both uncertainty and exaggeration. By analyzing the collected social media data, we observe that some correlation between high certainty and exaggeration exists. Finally we are also able to observe that exaggerated claims on social media tends to have a higher traction.
Supervisor(s)
Isabelle Augenstein
External examiner(s)
Zeljko Agic
Time and place
16 June, 14:40-16:30
Room UP1-1-1-N116B, Universitetsparken 1, 2100 CPH Ø
Study Programme
Physics
Title
Segmentation of micro-CT scans of biomaterials and steps towards their joint reconstruction and segmentation
Abstract
The delivery of vaccines through oral administration is becoming popular, due to the low cost, the convenient distribution as well as the unnecessary contact between the health provider and the patient. Toxoids are used as vaccines, but here, we note that after vaccination, an immunological response is formed against the molecular markers of the toxoid without resulting in toxin-induced illness. Such preparation is also known as
an anatoxin. In this study, we worked with Diphtheria anatoxin (DT) and Tetanus anatoxin (TT). Due to the harsh environment of the gastrointestinal tract, the immunological response is difficult to be achieved from the anatoxins. A promising adjuvant, i.e toxoid carrier, for the oral delivery of vaccines is the Santa Barbara Amorphous-15 (SBA-15). The key feature of SBA-15 effectiveness is the determination of the anatoxin agglomeration in
the carrier, which constitutes the central interest of this work. X-ray micro-CT data were acquired to determine the way that different morphologies of SBA-15 affect the incorporation. The analysis of the data is performed using Analytical Reconstruction Techniques (ANRT) and Iterative Segmentation Techniques (IST). The thesis is separated into two parts. The first half is centered on a series of implementations for computing solutions to discretized inverse problems. The area of application is the territory of Computed Tomography (CT). Iterative Reconstruction Techniques (IRT) are studied, derived from the Kaczmarz iterative solver. IRT are responsible for the restoration and reconstruction of the obtained signals, and often provide better results compared to ANRT. Segmentation
techniques, which are the cornerstone of the analysis of the samples, will be discussed subsequently. The chosen IST are built from spatial regularisation and K-means clustering. The second part focuses on IST that are employed to extract information about the agglomeration of the anatoxins inside the hierarchical structure of SBA-15.Three different morphologies of SBA-15 were examined. For this case, only ANRT is considered as a means of restoration and reconstruction. The benchmark that shaped this study is the detection of an unexpected inclusion in the samples, resulting from the calcination procedure used during the samples’ preparation. Segmentation techniques drawn from already existing libraries are not adaptive to the samples’ special traits, i.e could not detect
and classify successfully this inclusion. Therefore, a segmentation technique was built particularly to deal with this specific characteristic. For the study of both reconstruction and segmentation, proximal mappings are considered and spatial regularization techniques. These methods enable the visualization of the anatoxins, and have implications for the immunological response. The first important remark of the analysis is that the distribution of the anatoxins’ in the samples is not uniform. The anatoxin is found agglomerated in locations with high concentration of Phosphate Buffered Saline (PBS), a buffer solution commonly used in biological research. The quantification of the samples’ revealed a significantly higher percentage of the inclusion in one of the three examined carriers, denoted as SBA-15a. The agglomeration ratios are proved to be slightly greater in the DT than in the TT samples. Among the three different carriers, SBA-15a, SBA15b and SBA-15c, the first one presents the highest loading capacity in terms of mass of anatoxin incorporation, but also the highest agglomeration percentages compared to the others. This sample is followed by SBA-15b and SBA-15c.
Supervisors
François Bernard Lauze
Heloísa Nunes Bordallo
External examiner
Rajmund Mokso
Time and place
17.06.2022 10:00
Auditorium 10, HCØ, Universitetsparken 5, 2100 København Ø.
Study Programme
Bioinformatics
Title
Should we treat articles equally? Introducing article weights in text mining of the scientific literature
Abstract
Text mining plays a significant part in relation discovery and evaluation of the named entities; it is an evidence channel that can reveal new potential interactions between proteins for network databases like STRING. Traditionally, text mining tool treats the articles in the corpus equally, while papers vary in quality, influence, and field of study. In this thesis, I propose several methods of weighting the articles and applying them to the co-occurrence calculation of the entities in the corpus. I generated journal-based, article-based, and similarity-based weighting schemes and benchmarked them on the KEGG pathway dataset. The results show that the influence of an article (impact factors and citations) does not correlate with the discovers of true-positive interactions between protein pairs. I also found that the similarity between articles and those referenced by reliable public databases like is a good indicator of the quality of the article. Using it as the weighting scheme can facilitate the discovery of interactions.
Supervisor
Anders Krogh
External examiner
Federico De Masi
Time and place
17.06.2022 14:30
Panum (room tbd)
Study Programme
Mathematics & Computer Science
Title
Analysis of Liquidity Provision within Uniswap V2
Abstract
This project comprises of an introduction to Decentralized Finance, its infrastructure, and Decentralized Exchanges, as well as an analysis of the various aspects that impact gains for liquidity providers of the largest Constant Product Market Makers: Uniswap-V2. A digital twin of the CPMM has been built in order to undertake simulations and analysis. Data collected directly from Uniswap has been used to set the initial parameters of simulations to recreate trading trends that are currently occurring in the market. Experiments are conducted varying the number of liquidity providers interacting, their initial holdings and tokens tradable. Moreover, the impact of different trading fees on liquidity providers’ returns has been investigated. Results show that providing strategically liquidity to pools proves to be a profitable strategy that yields positive annual returns up to 1%. Experiments indicate also that the presence of other active providers influences returns to different extents and in different way. Smaller liquidity providers are seen to benefit from the presence of wealthier providers, while bigger liquidity providers are negatively impacted by active providers with similar portfolios. As expected, gains are directly correlated to pools’ trading volumes, as they are the source of profit for liquidity providers. This can be seen by the strategies adopted by providers.
Supervisor(s)
Omri Ross
External examiner(s)
Patrick Bahr
Time and place
17.06.2022 11:00
UP1-2-0-04
Study Programme
Computer Science
Title
Efficient generation of map graphs
Abstract
This thesis assumes a basic level of knowledge in graph theory and optionally GIS (Geographic Information System) as most of the project will be working with map data.
The project will be working with real world data typically used in GIS software, where the goal is to generate networks based on the paths and their associated data (such as speed limits, capacity, length etc.). QGIS is open-source project, which works with spatial data visualization and processing. I will be drawing comparisons to one of the processing tools they provide to generate the shortest path between two points by creating a network from the paths. The main problem is the conversion of the data to a obtain a reasonable network, that operates as expected in most if not all cases, where QGIS falls short in some data-sets and I will cover these cases in this project and which improvements are to be made.
The methods I will be using to generate such a network, are range searching datastructures as it can significantly reduce the problem complexity to only look at neighbourhoods of paths. The reason for looking at a neighbourhood of paths, is that it is only expected to connect to other proximate paths, that could be considered to intersect.
The solution should also strife towards creating a network, s.t. it can solve general graph problems, such as, but not limited to, shortest path or fastest route (ETA - Estimated Time of Arrival). The solution will also be implemented as a prototype as means to test practical differences between QGIS’s solution for creating networks and my own.
The project did succeed at providing a better solution for generating networks than QGIS for the data-sets chosen. It provided better practical run-times in the prototype, but not theoretical as there might be hidden costs in the library used to assist with the porting of QGIS’s solution to the prototype. The proposed solution also shows better quality (connections between paths, where it is expected) and provided the functionality for shortest path, fastest route and more.
Supervisor(s)
Pawel Winter
External examiner(s)
Jesper Larsen
Time and place
17.06.2022 10:00
A102 på HCØ, Universitetsparken 5
Study Programme
Computer Science
Title
Detecting the different types of relationships between lifestyle factors and diseases with deep learning
Abstract
Background: Many natural language processing techniques have been widely applied in the biomedical area to explore the associations between genes and diseases. Lifestyle factors are also receiving increasing attention from the scientific community in the past few years, however, there is still a lack of systematic research to discover their connection with diseases.
Aim of the project: This project aims to create a deep-learning-based model that will be able to detect and distinguish between the various types of associations that lifestyle factors can have with diseases. It will then be possible to apply it to the entire scientific literature to predict possible pairs of lifestyle factors and diseases and to examine the nature of their associations.
Results: 400 abstracts were sampled from the biomedical literature and the relationships between diseases and lifestyle factors were annotated in those. In total 2031 relationships were annotated belonging to eight different types. This dataset of associations was used to fine-tune a BioBERT model. The average F-score of the best-trained model for the relation extraction task
was 60.86%.
Conclusions: In this project a corpus with high-quality lifestyle factors - diseases associations annotations was created for the first time. This corpus was in turn used to fine-tune a deep learning model that can detect different types of associations between lifestyle factors and diseases. Due to the complexity of detecting several types of associations, more high-quality annotated data are needed to develop better models.
Supervisor(s)
Anders Krogh, Lars Juhl Jensen, Katerina Nastou and Esmaeil Nourani
External examiner(s)
Federico De Masi
Time and place
17.06.2022 13:00
Panum (Room: tbd)
Study Programme
Sundhed og Informatik
Title
Capacity utilization at the Department of Allergology, Dermatology and Venereology at Gentofte Hospital
Abstract
Purpose: Personnel capacity utilization in hospitals constitutes a growing problem. Personnel capacity utilization means the need for an efficient utilization of the clinical staff at the hospitals. The Department of Allergology, Dermatology and Venereology at Gentofte Hospital have struggled with personnel capacity utilization for many years, and this has motivated the department to seek a solution. The department has inquired into a better data driven planning tool that will solve their challenges with personnel capacity utilization and thus upkeeping their treatment rate and the patient’s 30-day right to inquiry. The purpose of this qualitative study is to explore the issues the department is facing in regard to their personnel capacity utilization and it is also furthermore relevant to explore which solutions and techniques from the literature can be implemented to solve this problem.
Method: This study was split in two phases: the data collection phase in which we used the data collection method Contextual Inquiry and the data analysis phase in which we used the data analysis methods Affinity Diagram, Sequence Diagram, Flow Diagram and Diagnostic Map.
Findings: This study presents 9 overall findings related to personnel capacity utilization. The most significant findings were that the department is at varying times faced with problems related to demand and supply, which has a significant impact on their ability to upkeep the patient’s 30-day right to inquiry. Furthermore, the department does not have a fixed number of doctors employed, and that the retention of these doctors is important in order to maintain the patient’s 30-day right to inquiry. The non-observance of the patient's 30-day right to inquiry has organizational and economic consequences for the department, and it has health consequences for the patient.
Conclusion: We propose, based on the found literature, that the department needs to change their communication patterns regarding developing the rapport and using Sundhedsplatformen and that data must be made more accessible to the local rapport developer.
Supervisor
Tariq Osman Andersen
External examiner
Mikael Brasholt Skov
Time and place
20.06.2022 12:30 - 14:30
Sigurdsgade 41 lokale 2-03
Study Programme
Computer Science
Title
Measuring Convexity in Polygons
Abstract
Computational geometry has long been an established field of computer science with applications in many different areas. Often the problems have both numerical and combinatorial solutions each with their own advantages and drawbacks.
A problem in this category is finding a measure for convexity in polygons. This is a challenge on many levels, as the definition of the problem can take many forms. The measurement can by be defined by the ratio of reflex vertices, the largest convex polygon contained, the probability two points the polygon can see each other or any other quantity that is bounded by one.
We focus on the probability of two uniformly sampled points in the polygon being able to see each other, known as the Beer index. By definition this measurement lend it self well to approximation by numerical methods but different combinatorial solutions also exist to the problem. We give an overview of different sampling based algorithms to approximate the index, and the different combinatorial solutions, each with their own tradeoffs. These are weighed against each other for implementation purposes and a select few are implemented.
We explain the implementation details of the various methods, and issues faced when going from theory to practice are discussed. The performance of the implementations in practice are weighed against their expected performance, given by theoretical bound on the sampling algorithms.
Supervisor(s)
Mikkel Abrahamsen
External examiner(s)
Rüdiger Riko Jacob
Time and place
20.06.2022 11:00
Øvelseslokale 1-0-34 på DIKU, Universitetsparken 1
Study Programme
IT and Cognition
Title
Current state of the art in super-resolution and light enhancement on real-world data
Abstract
Given a lowresolution, lowquality input image, superresolution is the process of producing a highresolution, highquality output image. The input image is usually degraded by different factors, for example downscaling, random noise, or blurring
resulting from the imaging sensor, lens, or motion. In addition, lowlight enhancement is the process of producing an image with normal lighting, given an input image taken in poor lighting conditions. In order to combine these procedures in an endtoend setup, one requires a dataset with lowresolution, lowlight images that are paired with highresolution, normallight images. The synthesized data that is usually generated, following a degradation process, is not necessarily identical to that of realworld scenarios. Recently, the availability of realworld lowlight datasets such as RELLISUR, published at NeurIPS 2021, has made it possible to develop more realistic endtoend lowlight enhancement superresolution algorithms based on real data. In this work, this dataset is used to train and evaluate an existing stateoftheartmodel, namely EDSR. In addition, a comparison is performed to test the claim that endtoend training yields better results than sequential processing for joint lowlight enhancement and superresolution. To handle lowlight enhancement separately, the stateoftheart ZeroDCE approach is chosen. Traditional methods are also used, such as bilinear and bicubic interpolation for upscaling, and histogram equalization for lowlight enhancement.
Supervisor
Søren Ingvor Olsen
External examiner
Lee Herluf Lund Lassen
Time and place
21.06.2022 (time: TBD)
Location TBD
Study Programme
Computer Science
Title
Reverse mode automatic differentiation of\\histograms in Futhark
Abstract
This thesis presents a method of differentiating the parallel Futhark-construct reduce_by_index by integrating a new rewrite rule into the pipeline of the compiler, transforming the original code into a program that computes the adjoint of its input using reverse mode automatic differentiation. This thesis provides the relevant background of automatic differentiation and its forward- and reverse mode, along with the Futhark language itself. It will cover the main rewrite rule for reverse mode, and how it is used to derive the reverse mode rewrite rule for reduce_by_index. Lastly we cover an implementation of the derived rule, and evaluate its correctness and performance.
Supervisor(s)
Cosmin Eugen Oancea
External examiner(s)
Mads Rosendahl
Time and place
Time TBD
PLTC meeting room, HCØ, Universitetsparken 5, 2100 CPH Ø
Study Programme
Computer Science
Title
Application-level data integrity enforcement in event-driven microservices
Abstract
Microservices are a new architectural approach for creating software systems as a
collection of self-contained, loosely connected modules with their data stores. Although separate services are designed and maintained independently to offer particular system features, such as data correlation, they frequently rely on one another. Asynchronous events combined with queue messages are a popular inter-service communication strategy. Event-driven microservices produce and broadcast events to other parties that need operating over such data or reacting to such external changes. Decentralized data ownership across microservices as a consequence has changed the response. Strong consistency guarantees are typically replaced by the eventual consistency model to protect the flexibility and availability of microservice systems. Because application developers must integrate a considerable amount of state management logic and validations at the application layer, the process brings additional obstacles. Application-level logic makes it harder to reason about the consistency of the application’s state management and its influence on the entire state of the application. To address these issues, we begin by examining prominent open-source microservice repositories for common patterns and deficiencies in application data management that result in anomalies. Based on discovered trends in the applications, we conducted a benchmark measurement of their performance and suggest three kinds of data constraints that developers may use to reason about data integrity. We apply constraint enforcement at the application level and investigate the implications on system performance. The imposed application evaluation demonstrates that adopting the restrictions eliminates the program’s data integrities at an acceptable cost to overall application performance.
Supervisor(s)
Prof. Yongluan Zhou and Rodrigo Laigner
External examiner(s)
Philippe Bonnet
Time and place
21.06.2022 10:00 - 11:00
Study Programme
Computer Science
Title
Reverse Automatic Differentiation in Futhark
Abstract
In this thesis, we describe our work with reverse automatic differentiation (AD) in the dataparallel programming language Futhark. The main motivation is to extend and optimise reverse AD in Futhark’s compiler to permit more expressive programs to by differentiated automatically and efficiently.
Futhark’s AD consists of a set of rewrite rules that are used to transform a program to its differentiated counterpart. We present reverse mode AD rewrite rules for the operations reduce-by-index and scan. Reduce-by-index, also known as multi-reduce, has a generic cases and multiple
special cases, of which the latter are loosely described by Schenck et al. 2022. We formulate and present a rewrite rule for the generic case and present specific rewrite rules for the special
cases as Futhark pseudo-code.
Likewise, we examine the reverse AD rewrite rules for scan presented by Schenck et al. 2022, one of which we have simplified with a performance benefit. The existing AD implementation is modified to work when the scan operates on tuples. We have extended the generic case with specialised rewrite rules for scan operators whose Jacobian matches specific patterns.
We have implemented the presented rewrite rules of both reduce-by-index and scan in Futhark’s compiler. The performance of differentiated programs is evaluated experimentally and compared to its primal program performance and the program differentiated with forward AD instead. In many case, this demonstrates reasonable reverse AD overheads and competitive performance to Futhark’s established forward AD implementation.
Supervisor
Cosmin Eugen Oancea
External examiner
Mads Rosendahl
Time and place
Time TBD
PLTC meeting room, HCØ, Universitetsparken 5
Study Programme
Computer Science
Title
Texture-based analysis of hippocampus for the prediction of Alzheimer's Disease using 2D Convolutional Neural Networks
Abstract
Alzheimer’s Dementia (AD) is a progressive and neurodegenerative disease, and as of today, it doesn’t have a cure. However, an early diagnosis of the disease can accelerate drug trials and improve the patients’ life. Brain changes start many years before the onset of symptoms, especially in the hippocampal area. These changes can be studied through MRI scans, and therefore, many biomarkers have been established in this area for the diagnosis of AD: volume, shape and texture. All of these biomarkers have successfully been studied using traditional Machine Learning (ML) tools that rely on manual feature engineering. In the case of texture analysis, a filter bank needs to be defined to extract the statistical patterns in the hippocampus.
In this study, we perform a texture-based analysis of the hippocampus using 2D Convolutional Neural Networks (CNNs) using T1-weighted MRI scans from the ADNI database. CNNs learn filters of increasing level of abstraction to get a set of features from each input, based on these features, binary classification is performed. Using off-the-shelf networks
(ResNet50, EfficientNetB0 and MobileNetV2) pre-trained on ImageNet, we build AD classifiers for different orientations of the bilateral hippocampus and fuse their scores. To preprocess the data and segment the hippocampus we use two different pipelines: FreeSurfer and FAST-AID Brain. To alleviate the problem of data scarcity we use data augmentation
techniques.
The results show benefits in using pre-trained networks and the importance of proper regularisation when training Deep Learning (DL) models using small datasets. Lightweight models with smaller capacity (EfficientNetB0 and MobileNetV2 networks) outperformed those with higher complexity (ResNet50). Furthermore, data augmentation gave consistently better results for the lightweight networks. We also showed the importance of image registration to help convergence and generalisation.
Supervisor(s)
Mads Nielsen and Mostafa Mehdipour Ghazi
External examiner
Rasmus Reinhold Paulsen
Time and place
10:00
Vibenhuset, Lyngbyvej 2, 2100 København Ø
Study Programme
Computer Science
Title
Sketching and Privacy for Anonymised Histograms
Abstract
In the realm of big data many statistical aggregates can be infeasible to compute due to space constraints. One approach to mitigate this problem is to work in a data streaming model where items of the large dataset arrives in a sequential manner. Algorithms working in this paradigm produces a compressed view of the dataset, often called a sketch, and
this can be used to approximate the wanted aggregates.
Such a data stream can be seen a multiset of tokens, meaning each token has a count on how many times it occurs. The aggregate of interest is the anonymised histogram, which is a multiset of the frequencies occurring in the data stream. In this thesis we will propose an algorithm to solve the problem of estimating the anonymised histogram.
The algorithm consists of a bottom-k sketch attached with counters and an unbiased estimator that provides an estimate on the number of distinct tokens that has frequency i in a stream. The estimator used by the algorithm will be analysed using two different approaches. In both cases we provide probabilistic bounds of the estimator to be within
multiplicative factors of the target in terms of an error parameter ε, and the results will be compared. In addition, a lower bound on the space usage will be shown, which shows the algorithm is optimal in some settings. The algorithm has been implemented and experiments has been made to showcase its performance in various settings and to
validate some theoretical findings.
The anonymised histogram also appears in sensitive contexts such as the degree distribution of social networks or password-frequency lists, which motivates the need of making the sketch differentially private. To analyse this aspect, we also introduce differential privacy for anonymised histograms and how a sketch can be make differentially private. Then we propose a potential method on how to estimate the histogram given a private sketch.
Supervisor
Rasmus Pagh
External examiner
Martin Aumüller
Time and place
10:45 at øv-3-0-25, DIKU (UP1)
Study Programme
Computer Science
Title
Simultaneous Synthesis and Segmentation of Brain MRI Modalities
Abstract
Automatic segmentation of brain tumors such as low-grade and high-grade gliomas from magnetic resonance imaging (MRIs) requires acquisition of multiple sequences, which include T1-weighted, post-contrast T1-weighted and fluid attenuated inversion recovery (FLAIR). However, scans acquired in hospital settings most often do not contain all modalities due to practical and logistical reasons, and as a result limit the quality of the obtained segmentation maps. Given the presence of limited data, the existing imputation methods for generation of missing modalities treat the imputation as a preprocessing step, and separate the process of synthesis from the process of segmentation. In this paper, we propose a method for simultaneous, end-to-end optimization of the synthesis network and the segmentation network, which produces realistic post-contrast T1-weighted images from T1-weighted and FLAIR images, and improves the segmentation of brain tumors from T1-weighted and FLAIR images only, with a mean per-class dice increase of 4.8%.
Keywords: Gliomas, MRI Modalities, Deep Learning, Brain Tumor
Segmentation, Simultaneous Synthesis and Segmentation
Supervisor
Mads Nielsen
External examiner
Rasmus Reinhold Paulsen
Time and place
11:00 at Vibenhuset, Lyngbyvej 2
Study Programme
Computer Science
Title
Short-term prediction in patient-centered care
Abstract
Background: Healthcare is one of the most data intensive industries in the world, and with the field of AI, there is great potential to use data to move from prevention to prediction. Furthermore, AI tools are used for self-tracking technology which people are generating data from. Through these data we are enabling a more patient centered self-care environment. This is not only empowering the patient but also makes them more engaged with their disease.
Objective: This thesis attempts to examine how a possible temporary risk assessment may affect patients with an ICD. Furthermore, to find out whether a temporary risk assessment, can be considered valuable enough for the patient before the doctor has validated it. Also, we will be investigating their confidence in such technologies.
Methods: During the exploratory study we interviewed 4 patients with an ICD (Implantable Cardioverter Defibrillator) implanted, of 2 interview rounds. The aim of the first interview is to create an understanding within the patients' knowledge about their disease, AI, and the data they produce. The aim of the second round of interviews was to clarify the content preferences of patients between two different mock-ups. Here we used different qualitative methodological tools to collect and analyse the empirical data.
Conclusion: The analysis revealed a large heterogeneity among the four patients. It has been shown how AI can be used to make many tasks in healthcare more efficient. It also highlighted how some patient-centred care is neglected by AI-based tools. Validation from a doctor was presented as being decisive if disagreement between doctor and AI arose. It can therefore be concluded that a temporally risk assessment is not considered to have high enough value to stand alone before the physician has reviewed the algorithm's assessment
Supervisor(s)
Stina Matthiesen
External examiner(s)
Keld Bødtker
Time and place
21.06.2022 14:10 - 16:10
Sigurdsgade 41, 2200 Kbh N
Study Programme
Sundhed og Informatik
Title
The impact of Sundhedsplatformen on nurses' perceptions of medication errors
Abstract
Background: In parallel with the growing development and use of electronic health records (EHR) and other forms of technology in healthcare, there is increasing global awareness on these systems help improve patient safety and reduce medication errors. While there has been a great deal of focus on determining how EHR systems can reduce and introduce new types of errors in the medication process, there is little awareness on whether and how EHR systems impact patient safety culture among hospital frontline staff, including nurses.
Purpose: This project focuses on the way in which nurses' perceptions of medication errors are affected and changed by the use of technologies in an emergency department in the Capital Region of Denmark.
Research design: A qualitative study has been conducted using the methods individual
interviews and focus group interviews as well as field observations. Data analysis has been done through Abductive Grounded Theory. The theoretical framework for the project is Actornetwork theory (ANT) and Edgar Schein’s model of Organizational culture.
Results: The project data are based on five individual interviews, two focus group interviews and field notes. The analysis has contextualized the nurses' tasks in the medicine module in Sundhedsplatformen, MDA and Rover, and highlights four key results: 1) It is assumed that dispensed medicine has been administered despite a lack of documentation. 2) Delegated prescribing rights camouflage medication errors 3) Distinctive prescriptions increase nurses' awareness of medication errors 4) Challenges with Rover mean that the alternative, i.e. managing patient records from a computer, is relatively easier.
Conclusion: Application patterns of technologies are created locally, where nurses adapt e.g. the medicine module in SP and Rover to local conditions. These patterns are created on the basis of the nurses' core values. Technologies in hospitals have not influenced nurses' perceptions of medication errors and patient safety. Technologies have increased awareness of certain situations where errors may occur but at the same time, new areas have emerged where new errors may occur. Hospitals are complex organisations characterized by constant change due to unforeseen patient care pathways and external conditions. These conditions are not taken into consideration in the structure of the systems. Therefore, continuous evaluation of the application patterns of technologies in different practices and their impact on patient safety is necessary.
Supervisor
Henriette Mabeck
External examiner
Yutaka Yoshinaka
Time and place
22.06.2022 09:00
øv – 2-0-25 på biocenter
Study Programme
Statistics
Title
Generalising Stein Variational Gradient Descent Through Rényi's alpha-divergence
Abstract
Stein Variational Gradient Descent (Liu and Wang, 2016) is a particle transport algorithm based on the KL divergence between two densities. We propose a wider class of algorithms based on Rényi’s α-divergence, Rényi’s α-divergence being a set of divergences that include KL divergence. We go on to show that there exist cases for which SVGD results in higher KL divergence between the target density and a kernel density estimation than an algorithm with similar parameters based on Rényi’s α-divergence.
Supervisor
Thomas Wim Hamelryck
External examiner
Lars Nørvang Andersen
Time and place
23.06.2022 13:00
TBD
Study Programme
Computer Science
Title
Futhark language server and VS Code extension development
Abstract
The thesis describes the implementation of a language server and VS Code extension for the Futhark programming language. Futhark is a young programming language and there are few tools that help developers code. So we developed a language server and a ready-to-use VS code extension that provides basic language support. We also discussed the workaround for error recovery, and some design choices of a language server. The Futhark language server is published as part of Futhark since its 0.21.9 release, and the VS Code extension is available in the marketplace with around 50 downloads.
Supervisor
Troels Henriksen
External examiner
Magnus Madsen
Time and place
TBD
Study Programme
Computer Science
Title
Touch The Nature Through Ultrasound: Using mid-air haptic patterns to simulate touching natural elements in VR
Abstract
Haptic feedback enables users to feel their surroundings in virtual reality (VR). Devices such as STRATOS Explore by Ultrahaptic emit ultrasound wave sensations that users can feel in mid-air. We investigate in this paper whether ultrasound wave sensations can be used as haptic feedback for touching natural elements in VR. To do this, we have created eight different sensations and tested their resemblance to the natural elements; water, fire, soil, and air on 16 participants through an interview study.
Based on the participants’ ratings, we have chosen the best (i.e. congruent) and worst (i.e. incongruent) representation for each element. To check whether the design of the sensations has a positive impact on the level of enjoyment and immersion in VR, we have conducted a second user study with a new group of 12 participants and presented the chosen feedback sensations and no haptic feedback through virtual environments corresponding to the congruent feedback. The result shows a statistically significant difference between the three types of feedback. The use of congruent ultrasound feedback obtains a higher level of immersion and enjoyment compared to the use of both incongruent ultrasound as well as no feedback. Furthermore, incongruent ultrasound feedback gives a higher immersion and enjoyment than using no feedback.
Key words: Ultrasound • Haptic feedback • Virtual reality (VR)
Supervisor
Hasti Seifi
External examiner
Timothy Robert Merritt
Time and place
23.06.2022 17:40 - 18:40
Sigurdsgade 41
Study Programme
Computer Science
Title
Embedding programmable touch sensations in digital text
Abstract
Reading is an inevitable part of our everyday tasks. Whether is in a computer, on e-book or on our mobile devices, we are constantly reading unconsciously. But, is there more information we can obtain from just plain text? Illustrations that accompany text have been doing that for decades, representing part of the information as an image. However, there might be different ways of retrieving information for a text than with the use of sight. What if we can incorporate another sense to enrich this reading experience? This report presents a software toolkit that provides
tactile stimuli whilst reading. These stimuli are called haptic effects and represent meta-information about the text that is being read. We conducted an exploratory user study where participants tested our implementation of the first version of the software toolkit that incorporated haptic sensations into text. Our results show that,
although participants agreed this technology might not fit light reading on adults, it has big potential for educational purposes.
Supervisor
Hasti Seifi
External examiner
Timothy Robert Merritt
Time and place
23.06.2022 16:40-17:30
Sigurdsgade 41
Study Programme
Mathematics
Title
High Order FEM Deformable Body Simulator
Abstract
Deformable body simulation is a very important branch of physics-based simulation methods, and is widely used in virtual reality, animation, games and other related industries. Among them, finite element method is one of the main methods to realize deformation body simulation. However, since FEM requires numerous complex numerical calculations, its computational complexity will be so high that cannot meet people’s requirements for visual effects and simulation speed. Therefore, we focus on the improvements on numerical method of the simulation and the FEM itself to build a more accurate and fast deformation simulator. Among them, for the numerical method of simulation, we introduce several linear solvers including conjugate gradient method, conjugate residual method, etc., and propose several preconditioners to accelerate these linear solvers, such as Jacobi preconditioner, Multigrid preconditioner, etc. Through experimental verification, it is found that due to the complexity of energy in FEM, which will bring non-convexity to the linear system, the conjugate residual is generally better than the conjugate gradient when dealing with non-positive definite matrices. And after adding the preconditioner, the speed of reducing relative error in the iterative process will be
obviously better than the situation without precondition. For FEM, we propose high-order FEM to improve the accuracy of the simulation by increasing the order of the interpolation basis function. We give a general algorithm for solving the reference tetrahedral element shape function of isoparametric space by using the Vandermonde matrix, and then build a structure to solve the force and Hessian matrix under high-order FEM. Finally, since the deformation gradient is no longer a constant matrix in a high-order finite element, we propose two numerical quadrature methods including Gauss quadrature and Hammer quadrature to calculate the force matrix and Hessian matrix. Through experiments, it is found that the high-order FEM simulation is significantly better than the linear case, and can obtain more accurate simulation results with fewer iteration steps.
Supervisor
Kenny Erleben
External examiner
Søren Peder Madsen
Time and place
24.06.2022 08:00
Kenny Erleben's office.
Study Programme
IT and Cognition
Title
Belief Bias in Machine Reading Comprehension
Abstract
Belief bias exposes human’s reluctance to inferences and premises that violate human commonly shared beliefs about the world. Current machine reading comprehension models also suffer such analogous bias, and their performances degrade significantly when predicting contexts that are not aligned with commonly shared beliefs, which brings a serious challenge to applying machine reading comprehension models to real applications. This thesis demonstrates that most machine reading comprehension models are sensitive to belief bias by exploring the difference between the performances of the models on the Auto-Locke dataset and on common question answering datasets. The roles of Adversarial Training, Multi-dataset Joint Training, Masking, and Resampling in mitigating belief bias have also been explored. Adversarial training significantly improves the model’s performance on our re-biased dataset and Auto-Locke dataset, but also greatly drops its performance on the original dataset. Multi-dataset joint training compensates for the shortcomings of adversarial training and successfully reduces the belief bias of the model while maintaining its robustness and generalizability. Our designed masking strategy not only effectively mitigates belief bias from the model but also outperforms the baseline model on both the original dataset and the Auto-Locke dataset. The resampling method shows the effect of the distribution of question types in the training set on model performance. The comparative analysis of the re-bias dataset and the Auto-Locke dataset shows that the re-bias dataset is as reliable as the Auto-Locke dataset in measuring belief bias. The analysis of all results in this thesis reveals that there is a strong positive correlation between word coverage and model performance on the Auto-Locke dataset. This thesis provides a comprehensive exploration of belief bias in machine reading comprehension models and experiments on several methods to mitigate belief bias. The methods designed in this thesis successfully reduce the belief bias of the model while maintaining its robustness and generalizability.
Supervisor
Anders Søgaard
External examiner
Zeljko Agic
Time and place
24.06.2022 09:15
Location TBD
Study Programme
Computer Science
Title
Continuous Collision Detection in Softbody Simulators
Abstract
The department of computer science at the University of Copenhagen have an an ongoing physics simulator project in the works named LibRainbow. The simulator is written in Python and is intended to be able to simulate 3D scenes of both rigid bodies and soft-bodies. The simulator uses discrete collision detection, which is efficient but has multiple shortcomings with regard to accuracy. The goal of this project is to alter the functionality of the softbody portion of the LibRainbow simulator to use continuous collision detection, in order to avoid tunnelling. In this project I used Pybind11 to make a wrapper to a C++ library called Tight-Inclusion. This allows for very efficient estimates for the time of impact for collisions in the simulator. The bounding volumes k-DOP used in the narrowing phase is extended to include the candidate location. The contact determination is changed to use the current collisions instead of using penetrations. All of this together allows LibRainbow to run using a continuous collision detection scheme, though it has to be stated that the simulator still runs excessively slowly and there is a lot of optimization work still left to do.
Supervisor(s)
Kenny Erleben
External examiner(s)
Sarah Maria Niebe Abel
Time and place
24.06.2022 13:00
DIKU, Kenny Erlebens office, Univetsitetsparken 1
Study Programme
Physics
Title
Brain Volumetric Changes due to COVID-19: A Retrospective Study on a Danish Cohort
Abstract
Since the start of the COVID-19 pandemic, evidence for a neurological manifestation of the disease has been accumulating. Recent studies have shown structural changes in the brain following an infection, including regional brain atrophy. The core focus of this thesis is to validate the recent findings on a Danish cohort consisting of 24,261 patients who had clinical magnetic resonance imaging (MRI) scans of the brain.
Two studies are considered in this work. A study comparing differences in longitudinal changes in cases vs. matched controls and a cross-sectional study testing for differences in regional brain volumes between cases and controls. In both studies, patient and scan selection, multiple matching procedures, and deep learning-based segmentation of brain MRI scans were performed.
In the cross-sectional study, significant differences in ventricular volume, which is a measure of brain atrophy, were observed when comparing 113 patients who tested positive for SARS-CoV-2 to matched controls. Limiting the analysis to 20 severe cases showed a similar effect. Moreover, a significant increase in ventricular volume was also shown on 77 cases who had brain scans in the first three months after infection but was not measured when analyzing scans of 40 cases performed after this period.
Supervisor
Mads Nielsen
External examiner
Sune Nørhøj Jespersen
Time and place
27.06.2022 13:00
Vibenhuset, Lyngbyvej 2, 2100 Ø
Study Programme
Computer Science
Title
Extension of Declarative Choreographies and Its Application in Distributed Systems
Abstract
Choreography programming has been proposed as a good way to design safe distributed systems. DCR choreography, based on DCR graphs notation, is a declarative model. The corresponding safety of DCR graphs is the basis for guarantee the safety property of a DCR choreography and the criteria of projectability also depend on the relationships in the DCR graphs. We extend DCR choreographies to include time and data constraints. We also implement tools to handle the extended DCR choreography, from designing to distributing the end-points in a real distributed system. To bridge the gap between theory and practice, we use conformance check to determine the difference between theoretical and practical semantics. We design algorithms to coordinate end-points in asynchronous systems and make the composition of end-points isomorphic to the choreography.
Keywords: DCR Choreography, Liveness, Conformance Check, Distributed Systems
Supervisor(s)
Thomas Troels Hildebrandt and Hugo Andrés López
External examiner(s)
Alceste Scalas
Time and place
27.06.2022 - 13:30
room 2.08 at Sigurdsgade 41.
Study Programme
Computer Science
Title
Phylogenetic Dimensionality Reduction in Non-linear Geometries
Abstract
This project analyses how dimensionality reduction can be performed on highly covariate biological traits. The geometry of the spaces in which these traits reside are often highly non-linear and thus a more novel approach than classical dimensionality reduction algorithms is required. The goal of these algorithms is to minimize the distorting effects of the implicit linear approximations performed on the surface of curved geometries. We first investigate the deep theory connected to both non-linear geometries, as well as the continuous trait models that model evolution as a stochastic process. Following this theory, we then propose two new algorithms; The first is a simple adaptation of a previously know algorithm of Polly et al.
(2013) to curved domains. The second is similar to ordinary PCA except that it considers small increments of a possibly non-linear brownian motion process, whilst implicitly removing the effects of biological covariance. We apply both our two new methods, as well as previously known methods to a range of synthetic datasets for visual comparison. Finally we apply our method to a dataset of traits from birds, AVONET (Tobias et al., 2022), and conclude that it does provide a similar estimate to the algorithm of Polly et al. (2013), but has additional flexibility built in.
Supervisor(s)
Stefan Horst Sommer
External examiner(s)
Jes Frellsen
Time and place
27.06.2022
Small UP1
Study Programme
Computer Science
Title
Applications of Transformer Networks in Reinforcement Learning
Abstract
Increasing usage of Transformer networks has in recent years shown great
performance improvements of state-of-the-art models within several Machine Learning subfields, as they are able to encode long-term dependencies. Tangentially, Reinforcement Learning (RL) research has seen moves toward environments requiring the incorporation of long-term memory for agents to excel, due to the lack of Markovian rewards and state transitions. Such environments have commonly been addressed with simpler recurrent architectures, whilst applications of Transformers remain relatively understudied. We seek to study the applicability of Transformer networks within RL and to what extent they can encode long-term dependencies to improve performance in these environments. As such, we introduce common baseline models with and without memory mechanisms, and different Transformer variants. These models are evaluated on reactive RL tasks, short-term memory versions of classic tasks, and the Numpad environment as the primary task requiring long-term memory.
We show that models with simple memory mechanisms are able to perform well in both reactive, short- and long-term memory environments. Secondly, we show that Transformer-based models can be difficult to implement correctly and fine-tune for optimal performance. We conclude, based on our experiments in the RL setting, that other memory mechanisms remain superior in terms of learning speed and performance, given the same amount of training time. We further discuss improvements to our implementations, with the potential for decreased learning time and increased model performance
Supervisor(s)
Stefan Horst Sommer
External examiner(s)
Jan Frellsen
Time and place
27.06.2022 13:30
Small UP1
Study Programme
IT and Cognition
Title
Prediction of medical outcome based on 3D serial data
Abstract
Knee osteoarthritis is a major issue both for society and individuals. In order to prevent or develop a treatment of the disease, better prediction models are needed. Segmentation methods of magnetic resonance imaging images have improved over the past years. Additionally, the Osteoarthritis Initiative, [41], has provided a large scale open-source data set. This opens for the possibility of predicting total knee replacement surgery with deep learning based on MR-images, which is what we will do in this thesis. Additionally the progression of knee osteoarthritis might reveal which knees that will undergo a total knee replacement surgery within a year. To investigate this we will consider the effect of adding a temporal dimension. We found that using the temporal information was key, and using MRI has a large prediction potential. In the end, the best-performing model was not a neural network but a linear filter, and reached an AUC score of 0.89 on unseen data, rivaling state of the art.
Supervisor
Erik Bjørnager Dam
External examiner
Dan Witzner Hansen
Time and place
28.06.2022 10:30
DIKU, room 2-0-06, Universitetsparken 1, 2100 CPH Ø
Study Programme
Computer Science
Title
Applying the method of Expander Decomposition to Fully-Dynamic Graph-Theoretic Problems
Abstract
We describe in great detail the (α, φ)-Expander Hierarchy described by Saranurak et.al. in [1] along with some of its applications. We also show that the Expander Hierarchy presents useful heurestics for the Vertex Labeled Distance Oracle problem which have potential for actual algorithms. Lastly, we use our gained understanding of the algorithm behind the Expander Hierarchy to obtain a framework that allows the user to solve certain problems which more quickly at the expense of an approximation factor.
Supervisor(s)
Jacob Holm
External examiner(s)
Eva Rotenberg
Time and place
TBD
Study Programme
Computer Science
Title
Reducing Synchronous GPU Memory Transfers
Abstract
We present a series of dataflow dependent program transformations that reduce memory transfers between a GPU and its host, and show how the problem of minimising memory transfers to the host amounts to finding minimum vertex cuts in a series of data dependency graphs. We provide a specialised algorithm to solve these minimisation problems, based on the Ford-Fulkerson max-flow algorithm, and detail techniques to model conditional execution and loops in a pure functional programming language.
We present our work in context of the array programming language Futhark, in whose compiler we have implemented our techniques. Empirical evaluation of 27 benchmark programs on four GPUs show mean speedups of 117–158%, heavily skewed by significant improvements to a few programs.
Supervisor
Troels Henriksen
External examiner
Patrick Bahr
Time and place
04.07.2022 14:00
PLTC-mødelokalet
Study Programme
Computer Science
Title
Total Resource Accounting in a Circular Economy
Abstract
In recent years, people are becoming increasingly aware of the issuesrelated to the way humans have treated the environment until now. More and more people want to do better, and so the demand for "green" products increases. But how can you guarantee that such resources are as environmentally conscious as they claim? With the help of software, digital twins of resources can be tracked and traced, and the supply chain modeled digitally, allowing for documentation of the complete history of resources, both natural and synthetic.
This report presents the design and implementation of an application that does just that. The application is based on existing models for algebraic resource accounting and discrete modelling methods. The application presented is a prototype, but does allow for modeling any supply chain in a circular economy.
Supervisor
Fritz Henglein
External examiner
TBD
Time and place
TBD
Study Programme
Computer Science
Title
Leveraging cutting planes reasoning for faster Pseudo-Boolean solving and optimization
Abstract
Among the execution of modern Boolean Satisfiability (SAT) and Mixed Integer Programming (MIP) solvers, presolving is a commonly employed procedure between the encoding and solving phases with multiple preprocessing techniques to simplify the instance size or improve its “strength”. However, this powerful tool has not yet been widely used in Pseudo-Boolean Solving/Optimization closely related to SAT and MIP. In this report, we study categories of SAT/MIP preprocessing techniques and try to adapt or lift them into Pseudo-Boolean configuration. The experiments show that a considerable number of instances could be solved faster after being processed by the presolving when using a Pseudo-Boolean solver, while this still fails to help solve more instances within a certain time limit.
Supervisor
Jakob Nordström
External examiner
TBD
Time and place
TBD