Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs

Xuhua Xia

doi:10.3390/computation5040043

Abstract

A self-organizing map (SOM) is an artificial neural network algorithm that can learn from the training data consisting of objects expressed as vectors and perform non-hierarchical clustering to represent input vectors into discretized clusters, with vectors assigned to the same cluster sharing similar numeric or alphanumeric features. SOM has been used widely in transcriptomics to identify co-expressed genes as candidates for co-regulated genes. I envision SOM to have great potential in characterizing heterogeneous sequence motifs, and aim to illustrate this potential by a parallel presentation of SOM with a set of numerical vectors and a set of equal-length sequence motifs. While there are numerous biological applications of SOM involving numerical vectors, few studies have used SOM for heterogeneous sequence motif characterization. This paper is intended to encourage (1) researchers to study SOM in this new domain and (2) computer programmers to develop user-friendly motif-characterization SOM tools for biologists.

Highlights

A self-organizing map or SOM [1] is a grid of artificial neurons that are used to learn patterns from training data and use the learned pattern to perform non-hierarchical clustering to represent input vectors as discretized clusters, with vectors in the same cluster sharing similar features
While SOM has almost always been presented as a non-hierarchical clustering method for numerical vectors (e.g., [1] and pp. 231–250 of [8]), it theoretically can be adapted to any set of objects from which a pairwise distance between two objects can be computed
Several studies have demonstrated the values of using SOM to characterize sequence motifs [17,18,19,20,21,22], but their efforts do not seem sufficiently appreciated by biologists

Summary

Introduction

A self-organizing map or SOM [1] is a grid of artificial neurons that are used to learn patterns from training data and use the learned pattern to perform non-hierarchical clustering to represent input vectors as discretized clusters, with vectors in the same cluster sharing similar features. SOM involves setting up a grid of artificial neurons, initializing them either with random values or with values from routine multidimensional scaling methods such as PCA, computing a distance (or similarity) between an input vector and each neuron to identify the winning neuron (which has the shortest distance or greatest similarity to the input vector), revising the features of the winning neuron and its neighbors as a learning process, and continuing with other input vectors until the process is converged (i.e., when the vector values of neurons no longer change) Such a trained SOM can be used to classify input vectors that are not in the training data. The readers will find it easy to understand SOM with sequence motifs as input

Distance or Similarity between Two Vectors

Distance for Homologous Input Sequences

Distance for Non-Homologous Sequences

Training Data

SOM Grid Size and Initialization

Update SOM

Identify the Winning Node

Learning by Revising the Winning Node and Its Neighbors

Learning by Revising the Winning Node and its Neighbors

The Fit of SOM to Input Data

Software Implementing SOM with PWM

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computation	Publication Date: Sep 26, 2017
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computation

Lead the way for us

Similar Papers

Application of Self Organizing Map and SRTM data to characterize yardangs in the Lut desert, Iran
Amir Houshang Ehsani ... Friedrich Quiel
Remote Sensing of Environment | VOL. 112
Amir Houshang Ehsani, et. al.Amir Houshang Ehsani ... Friedrich Quiel
02 Jun 2008
Remote Sensing of Environment | VOL. 112

Improvements Quality of Kohonen Maps Using Dimension Reduction Methods
Jiri Dvorsky ... Jana Kocibov
-
Jiri Dvorsky, et. al.Jiri Dvorsky ... Jana Kocibov
01 Apr 2010
01 Apr 2010

Clustering Using Genetic Algorithm-Based Self-Organising Map
Azmi Hassan ... Putri Dwi Annisa
Advanced Materials Research | VOL. 1115
Azmi Hassan, et. al.Azmi Hassan ... Putri Dwi Annisa
01 Jul 2015
Advanced Materials Research | VOL. 1115

A Review of Self-Organizing Map Applications in Meteorology and Oceanography
Yonggang Liu ... Robert H
-
Yonggang Liu, et. al.Yonggang Liu ... Robert H
21 Jan 2011
21 Jan 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computation