PhD defence by Lys Sanz Moreta

lys sanz moreta banner

Title

Deep probabilistic programming applied to proteins

Summary

The content of this thesis covers several concepts associated with structural bioinformatics, molecular evolution, and probabilistic programming. It includes new methods for performing protein superposition, protein structure prediction, and ancestral sequence resurrection.

The first manuscript embarks into protein superposition by presenting Theseus-PP. This new method uses a Bayesian approach, instead of the Maximum Likelihood method implemented in the original Theseus, which allows introducing relevant priors over the model's parameters. The superposition model is contemplated as a new type of error loss function that will assist during protein structure inference.

The second manuscript extends the previous Theseus-PP into Theseus-HMC, this method uses Hamiltonian Monte Carlo inference, concretely the No-U turns sampler, to allow the computation of uncertainty over the parameters needed for the superposition problem. 

The third manuscript implements an adaptation of the generative Deep Markov Model for the prediction of protein fragments libraries. Deep Markov Models are an extension of classical Hidden Markov Models that instead use both amortized inference and gated neural networks (such as recurrent neural networks) over the emission and transition probabilities to preserve long-range dependencies across the sequences. This new variation of the DMM benefits from Bayesian inference to compute uncertainty over the fragment's predictions. 

The last manuscript proposes a unique approach to Ancestral Protein Resurrection that overcomes factorized evolution and encodes sequence evolution using a tree-structured Ornstein–Uhlenbeck latent process.

Assessment Committee

  • Professor, Fritz Henglein, Department of Computer Science, University of Copenhagen
  • Associate Professor, Jes Frellsen, Technical University of Denmark
  • Associate Professor, Ingemar Andre, Lund University

Academic Supervisor

Thomas Wim Hamelryck, PLTC Section, University of Copenhagen.

Moderator

Associate Professor, Wouter Boomsma, University of Copenhagen

 

For a digital copy of the thesis, please go to https://di.ku.dk/english/research/phd/.