Remote Protein Homology Detection Using Physicochemical Properties

Óscar Bedoya

Abstract


A new method for remote protein homology detection, called CDA (Characteristic Distribution Analysis), is presented. The CDA method uses the distributions of physicochemical properties of amino acids for each protein. Given the training sequences of a SCOP (Structural Classification Of Proteins) family, a characteristic distribution is achieved by averaging the values of the distributions of its proteins. The hypothesis in this research is that each protein family F has a characteristic distribution that separates its sequences from the rest of the proteins in a dataset. A set of 72 physicochemical properties was selected to create different characteristic distributions of the same family. Each characteristic distribution is used as a classifier. Finally, a Naive Bayes classifier is trained to combine the information of the individual classifiers and obtain a better decision. We found that each family has a set of physicochemical properties that allow the discrimination of their sequences better. CDA achieves a True Positive (TP) rate of 0,793, a False Positive (FP) rate of 0,005, and a Receiver Operating Characteristic (ROC) area of 0,918. The CDA method outperforms some of the current strategies such as SVM-PCD and SVM-RQA.


Keywords


Remote Homology Detection, Physicochemical Properties, SCOP Family.

Full Text:

PDF

Article Metrics

Abstract Views
84




Metrics Loading ...

Refbacks

  • There are currently no refbacks.


UNIVERSIDAD EIA

Sede de Las Palmas: Km 2 + 200 Vía al Aeropuerto José María Córdova Envigado, Colombia. Código Postal: 055428
Tel: (574) 354 90 90. Fax: (574) 386 11 60

Sede de Zúñiga: Calle 25 Sur 42-73 Envigado, Colombia. Código Postal: 055420
Tel: (574) 354 90 90. Fax: (574) 331 34 78
NIT: 890.983.722-6

Sistema OJS - Metabiblioteca |