Abstract

We have been working on the application of Machining Learning in Metabolomics, Drug-Drug Interaction Discovery and Human Gait Recognition [1-5], profiling large data sets. Extraction of vital information about 1) plant metabolomics that can improve the environment and food quality, 2) in vitro neuronal network behavior patterns for various drugs that can be used to characterize drugs for brain deceases and 3) human gait patterns that can reveal various diseases were among these applications. We have demonstrated with considerable success in using unsupervised clustering techniques to analyze genetic and metabolomic data. This includes analysis of drought resistance in wheat [4] and microbial metagenomes [5].We introduced two methods: Near Unsupervised Learning (NUL) and Sub-sample Error Graphs (SEGs) [5] to analyze large amount of data. Self Organizing Map (SOM), which is one of the widely used Unsupervised Neural Networks, has been used as a data-mining tool due to its ability to map high dimensional data into a two dimensional feature map, which is expected to be topology preserving allowing users to visually identify clusters by their topological relationships in terms of their proximity on the map. The Growing Self Organizing Maps (GSOM) further allows the map size to be determined by the algorithm, which relies upon a user set parameter called Spread Factor (SF) [6]. The wide availability of GPUs for affordable prices allows faster comparison of various SOMs with different maps sizes taking away some of the advantages GSOM claimed. Further development of GSOM into a Dynamic SOM Tree exploits the possibility of varying SF to obtain multiple GSOMs from a small number of compact clusters to a large number of sparse clusters [7]. NUL methods can be applied on GSOM using a small number of labels that should be available for every class. This is however not realistic in some applications where the number of classes cannot be explicitly known. We propose a new method call Deep Near Unsupervised Learning (D-NUL), where Dynamic SOM tree is used instead of GSOM and the number of classes are not assumed to be known. The implementation of Dynamic SOM tree methods with varying SF on GPUs will make the computation with D-NUL possible for many problems in big data analytics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call