Deep Probabilistic Programming Applied Protein Superposition: Protein Structure Prediction and Ancestral Sequence Resurrection

Research output: Book/ReportPh.D. thesis

The content of this thesis covers several concepts associated with structural bioinformatics, molecular evolution, and probabilistic programming. It includes new methods for performing protein superposition, protein structure prediction, and ancestral sequence resurrection.
The first manuscript embarks into protein superposition by presenting Theseus-PP [1]. This new method uses a Bayesian approach, instead of the Maximum Likelihood method implemented in the original Theseus [2], which allows introducing relevant priors over the model’s parameters. The superposition
model is contemplated as a new type of error loss function that will assist during protein structure inference.
The second manuscript extends the previous Theseus-PP into Theseus-HMC [3], this method uses Hamiltonian Monte Carlo inference, concretely the No-U turns sampler [4], to allow the computation of uncertainty over the parameters needed for the superposition problem.
The third manuscript implements an adaptation of the generative Deep Markov Model [5] for the prediction of protein fragments libraries [6]. Deep Markov Models are an extension of classical Hidden Markov Models that instead use both amortized inference and gated neural networks (such as recurrent neural networks [7] ) over the emission and transition probabilities to preserve long-range dependencies across the sequences. This new variation of the DMM benefits from Bayesian inference to compute uncertainty over the fragment’s predictions.
The last manuscript proposes a unique approach to Ancestral Protein Resurrection that overcomes factorized evolution and encodes sequence evolution using a tree-structured Ornstein–Uhlenbeck latent process [8].
Original languageEnglish
PublisherDepartment of Computer Science, Faculty of Science, University of Copenhagen
Number of pages138
Publication statusPublished - 2022

ID: 310388138