Protein Structure Prediction using String Kernels

Date of Submission: 
March 3, 2006
Report Number: 
06-005
Report PDF: 
Abstract: 
With recent advances in large scale sequencing technologies, we have seen an exponential growth in protein sequence information. Currently, our ability to produce sequence information far out-paces the rate at which we can produce structural and functional information. Consequently, researchers increasingly rely on computational techniques to extract useful information from known structures contained in large databases, though such approaches remain incomplete. As such, unraveling the relationship between pure sequence information and three dimensional structure remains one of the great fundamental problems in molecular biology. In this report we aim to show several ways in which researchers try to characterize the structural, functional and evolutionary nature of proteins. Specifically, we focus on three common prediction problems, secondary structure prediction, remote homology and fold prediction. We describe a class of methods employing large margin classifiers with novel kernel functions for solving these problems, supplemented with a thorough evaluation study.