Abstract

Amyloids are proteins forming aberrant intramolecular contact sites characteristic of fibrils instead of functional structure. Recent studies show that only short segments of aminoacids can be responsible for amyloidogenic properties. Here we propose an original machine learning method for classification of biological sequences based on discovering a segment with a discriminative pattern of correlations between sequence elements. The pattern is based on location of correlated pairs of elements in the window. The algorithm first recognizes the most relevant training segment in each positive training instance. Then the classification is based on maximal differences between correlation matrix of the relevant segments in positive training sequences and the matrix from negative training segments. The method was applied for recognition of amyloidogenic fragments in aminoacid sequences. It was trained on available datasets of hexapeptides with the amyloidogenic classification, using 5 or 6-residue sliding windows. Depending on the choice of training and testing datasets, area under curve of receiver operating characteristic (AUC ROC) of the method obtained the value up to 0.80 for experimental, and 0.95 for computationally generated (3D profile) datasets. The method reveals the characteristic correlation pattern of the data. Moreover, the method finds the segments with the strongest classification pattern, also in long training sequences. The method, applied to the problem of recognition of amyloidogenic segments, showed a good potential for various classification bioinformatical problems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.