Abstract

SummaryTraditional classification tasks learn to assign samples to given classes based solely on sample features. This paradigm is evolving to include other sources of information, such as known relations between samples. Here, we show that, even if additional relational information is not available in the dataset, one can improve classification by constructing geometric graphs from the features themselves, and using them within a Graph Convolutional Network. The improvement in classification accuracy is maximized by graphs that capture sample similarity with relatively low edge density. We show that such feature-derived graphs increase the alignment of the data to the ground truth while improving class separation. We also demonstrate that the graphs can be made more efficient using spectral sparsification, which reduces the number of edges while still improving classification performance. We illustrate our findings using synthetic and real-world datasets from various scientific domains.

Highlights

  • Classifying samples into a given set of classes is one of the fundamental tasks of data analytics.[1]

  • Geometric graphs constructed from data features can aid sample classification We consider geometric graph constructions that fall broadly in two groups: (1) three methods based on local neighborhoods, i.e., k-Nearest Neighbor, Mutual k-Nearest Neighbor (MkNN), and CkNN27 graphs; and (2) a method that balances local and global distances measured on the Minimum Spanning Tree (MST), i.e., the Relaxed Minimum Spanning Tree (RMST).[29]

  • We start from an MST to guarantee that the resulting graph comprises a single connected component, and we add edges based on the corresponding distance heuristics

Read more

Summary

Introduction

Classifying samples into a given set of classes is one of the fundamental tasks of data analytics.[1]. In a dataset of scientific articles, each article will be described by features that encode its text, but we might have information on citations between articles; in a dataset of patients, each person will be associated with a series of clinical or socio-economic features, but we might have information about their social interactions.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call