Graphs from Features: Tree-Based Graph Layout for Feature Analysis

Rosane Minghim,Guilherme P Telles,Ivar V Belizario,Liz Huancapaza,Erasmo Artur

doi:10.3390/a13110302

Abstract

Feature Analysis has become a very critical task in data analysis and visualization. Graph structures are very flexible in terms of representation and may encode important information on features but are challenging in regards to layout being adequate for analysis tasks. In this study, we propose and develop similarity-based graph layouts with the purpose of locating relevant patterns in sets of features, thus supporting feature analysis and selection. We apply a tree layout in the first step of the strategy, to accomplish node placement and overview based on feature similarity. By drawing the remainder of the graph edges on demand, further grouping and relationships among features are revealed. We evaluate those groups and relationships in terms of their effectiveness in exploring feature sets for data analysis. Correlation of features with a target categorical attribute and feature ranking are added to support the task. Multidimensional projections are employed to plot the dataset based on selected attributes to reveal the effectiveness of the feature set. Our results have shown that the tree-graph layout framework allows for a number of observations that are very important in user-centric feature selection, and not easy to observe by any other available tool. They provide a way of finding relevant and irrelevant features, spurious sets of noisy features, groups of similar features, and opposite features, all of which are essential tasks in different scenarios of data analysis. Case studies in application areas centered on documents, images and sound data demonstrate the ability of the framework to quickly reach a satisfactory compact representation from a larger feature set.

Highlights

Many data analysis tasks are performed on datasets where each data item has a set of features that define it
We propose a visual analysis framework for a graph representation of the attributes in a dataset, with features represented as vertices and feature similarity as edges
Two measures of relevance are currently available in Graphs from Features (GFF): the Pearson correlation and Extra Trees Classifier (ETC) [44]

Summary

Introduction

Many data analysis tasks are performed on datasets where each data item ( referred to as sample, example or instance) has a set of features ( referred to as variables or attributes) that define it. Exploratory tasks that are often performed during data analysis, like data classification, clustering, Algorithms 2020, 13, 302; doi:10.3390/a13110302 www.mdpi.com/journal/algorithms. Two main strategies have been proposed for dimensionality reduction of datasets: feature selection and feature transformation [2]. Feature selection techniques discard features in the quest for a small subset of features that preserves relationships among data. Feature transformation ( referred to as feature extraction) techniques build a new, smaller, feature space from the original features

Methods

Results

Discussion

Conclusion