Abstract
BackgroundClustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clustering methods, interactive visualizations and evaluation of clustering results is lacking.MethodsClusterEnG (acronym for Clustering Engine for Genomics) provides a web interface for clustering data and interactive visualizations including 3D views, data selection and zoom features. Eighteen clustering validation measures are also presented to aid the user in selecting a suitable algorithm for their dataset. ClusterEnG also aims at educating the user about the similarities and differences between various clustering algorithms and provides tutorials that demonstrate potential pitfalls of each algorithm.ConclusionsThe web resource will be particularly useful to scientists who are not conversant with computing but want to understand the structure of their data in an intuitive manner. The validation measures facilitate the process of choosing a suitable clustering algorithm among the available options. ClusterEnG is part of a bigger project called KnowEnG (Knowledge Engine for Genomics) and is available at http://education.knoweng.org/clustereng.
Highlights
Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure
Clustering is one of the most powerful and widely used analysis techniques for discovering structure in large datasets by grouping data points that are similar according to some measure
As part of the KnowEnG BD2K Center, we have developed a web-based resource called ClusterEnG for clustering large datasets with efficient parallel algorithms and software containerization
Summary
Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. ClusterEnG (acronym for Clustering Engine for Genomics) provides a web interface for clustering data and interactive visualizations including 3D views, data selection and zoom features. Clustering is one of the most powerful and widely used analysis techniques for discovering structure in large datasets by grouping data points that are similar according to some measure. Several programming languages such as R (R Core Team , 2015) and Python. As part of the KnowEnG BD2K Center, we have developed a web-based resource called ClusterEnG (acronym for Clustering Engine for Genomics) for clustering large datasets with efficient parallel algorithms and software containerization
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.