Abstract

Self-organizing map (SOM)-based motif mining, despite being a promising approach for problem solving, mostly fails to offer a consistent interpretation of clusters with respect to the mixed composition of signal and noise in the nodes. The main reason behind this shortcoming comes from the similarity metrics used in data assignment, specially designed with the biological interpretation for this domain, which are not meant to consider the inevitable noise mixture in the clusters. This limits the explicability of the majority of clusters that are supposedly noise dominated, degrading the overall system clarity in motif discovery. This paper aims to improve the explicability aspect of learning process by introducing a composite similarity function (CSF) that is specially designed for the k -mer-to-cluster similarity measure with respect to the degree of motif properties and embedded noise in the cluster. Our proposed motif finding algorithm in this paper is built on our previous work robust elicitation algorithms for discovering (READ) [1] and termed READ Deoxyribonucleic acid motifs using CSFs (READ(csf)), which performs slightly better than READ and shows some remarkable improvements over SOM-based SOMBRERO and SOMEA tools in terms of F-measure on the testing data sets. A real data set containing multiple motifs is used to explore the potential of the READ(csf) for more challenging biological data mining tasks. Visual comparisons with the verified logos extracted from JASPAR database demonstrate that our algorithm is promising to discover multiple motifs simultaneously.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.