MSc thesis defence by Venelin Banov


Predicting structural features of proteins using convolutional neural networks


The three-dimensional structure of the protein is strongly related to its functions and can be determined by its amino acid sequence with the help of the amino acid frequencies at each position. Therefore, its prediction has become a crucial task in solving many biological problems and it still remains a challenging task. Convolutional Neural Networks have become an active area of research for predicting the structural features of the protein in the last few years and have proven to be useful for modeling the secondary structure label. In comparison to the work on secondary structure, much less work has been done on directly predicting the φ, ψ angles.

In this work, a convolutional neural network model is suggested which achieved 81% Q3 accuracy for secondary structure prediction. In addition, another CNN was developed to predict the continuous real-values of φ, ψ. Two CNN architectures are investigated employing different loss functions, one that uses the absolute difference of the predicted and real values directly and another that decomposes the angles into their cosine and sine components. The latter proved to be more efficient and achieved ten-fold cross-validated mean absolute errors of 22.7° and 24.3° for the angle fluctuation of φ and ψ, respectively.

Supervisor: Wouter Boomsma

External examiner: René Thomsen