Building Multiclass Classifiers for Remote Homology Detection and Fold Recognition

Date of Submission: 
April 5, 2006
Report Number: 
06-013
Report PDF: 
Abstract: 

Motivation: Protein remote homology prediction and recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines currently one of the most effective methods for solving these problem. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the general multiclass remote homology prediction and fold recognition problems.

Methods: We developed a number of methods for building SVMbased multiclass classification schemes in the context of the protein classification. These methods includes schemes that build an SVM-based multiclass model, schemes that employ second level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes.

Results: We performed a comprehensive study analyzing different approaches using four different datasets. Our results that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to qualitatively improve the prediction results.

Website: http://bioinfo.cs.umn.edu/supplements/mc-fold/

Keywords: fold recognition, remote homology, multiclass, hierarchical, structured learning, support vector machines.