Ab Initio Protein Structure Prediction Using Bezier Curve Representation – Københavns Universitet

Ab Initio Protein Structure Prediction Using Bezier Curve Representation

Specialeforsvar v/Rasmus Fonseca

Protein Structure Prediction (PSP) is the problem of predicting the three-dimensional native structure of the protein using knowledge of the amino acid sequence only. Ab initio PSP attempts to do this without assuming that any other protein folds the same way, as opposed to threading.

In this thesis a novel representation of the protein structure is proposed. The purpose of this representation is to reduce the search-space of PSP and thereby enable ab initio search heuristics to generate good quality solutions for relatively large proteins (more than 120 amino acids) in a limited amount of time (less than a day). It does so by using Bezier curves to represent segments of helices, strands and coils.

Six different search heuristics are adapted to this novel representation and are all compared. Two of these heuristics (randomized Hill Climbing and Monte Carlo) are simple heuristics that are often used for protein structure prediction.

Two heuristics (Variable Neighborhood Search and Iterated Local Search) have never been applied to protein structure prediction before but their ability to escape local minima is expected to make them useful. Finally, the two population-based heuristics (Bee Colony Optimization and Firefly Algorithm) have both (in some form) previously been applied to protein structure prediction but only in few published works.

The Bezier curve representation and the heuristics are used to predict structures for three proteins from the two latest CASP experiments. The quality of the overall fold is measured using GDT_10. Based on this measure the predictions in this thesis would be ranked second, third and seventh for three proteins, even without refining the best found structures.
Combined with the potential improvements that could be gained from a better energy function and longer computational time the Bezier curve representation seems very promising.

The results of comparing the heuristics indicated that the Bee Colony Optimization and randomized Hill Climbing generated more structures than the remaining heuristics. Generating many structures resulted in a higher probability of finding near-native structures. The quality of the structures generated by the rest of the heuristics was very similar which suggests that the good results can be attributed to the novel representation.


Vejleder: Pawel Winter

Censor: Jesper Larsen, DTU

Language: Danish