Affinity-based Structure-Activity-Relationship Models: Improving Structure-Activity-Relationship Models by Incorporating Activity Information from Related Targets

Date of Submission: 
May 29, 2009
Report Number: 
Report PDF: 
Structure-activity-relationship SAR models are used to inform and guide the iterative optimization of chemical leads, and play a fundamental role in modern drug discovery. In this paper we present a new class of methods for building SAR models, referred to as affinity-based, that utilize activity information from different targets. These methods first identify a set of targets that are related to the target under consideration and then they employ various machine-learning techniques that utilize activity information from these targets in order to build the desired SAR model. We developed different methods for identifying the set of related targets, which take into account the primary sequence of the targets or the structure of their ligands,and we also developed different machine learning techniques that were derived by using principles of semi-supervised learning, multi-task learning, and classifier ensembles.The comprehensive evaluation of these methods shows that they lead to considerable improvements over the standard SAR models that are based only on the ligands of the target under consideration. On a set of 117 protein targets obtained from PubChem, these affinity-based methods achieve an ROC score that is on the average 7.0% - 7.2% higher than that achieved by the standard SAR models. Moreover, on a set of targets belonging to six protein families, the affinity-based methods outperform chemogenomics-based approaches by 4.33%.