Abstract

Predicting links between the nodes of a graph has become an important Data Mining task because of its direct applications to biology, social networking, communication surveillance, and other domains. Recent literature in time-series link prediction has shown that the Vector Auto Regression (VAR) technique is one of the most accurate for this problem. In this study, we apply Support Vector Machine (SVM) to improve the VAR technique that uses an unweighted adjacency matrix along with 5 matrices: Common Neighbor (CN), Adamic-Adar (AA), Jaccard’s Coefficient (JC), Preferential Attachment (PA), and Research Allocation Index (RA). A DBLP dataset covering the years from 2003 until 2013 was collected and transformed into time-sliced graph representations. The appropriate matrices were computed from these graphs, mapped to the feature space, and then used to build baseline VAR models with lag of 2 and some corresponding SVM classifiers. Using the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) as the main fitness metric, the average result of 82.04% for the VAR was improved to 84.78% with SVM. Additional experiments to handle the highly imbalanced dataset by oversampling with SMOTE and undersampling with K-means clusters, however, did not improve the average AUC-ROC of the baseline SVM.

Highlights

  • One of the major problems in network analysis involves predicting the existence or emergence of links given a network

  • Because the Vector Auto Regression (VAR) model assumes a linear dependence of the temporal links on multiple time-series, we propose the use of Support Vector Machine (SVM) in order to more robustly handle a non-linear type of dependency even while retaining the assumption that the dependency is on multiple time-series

  • We were able to improve the performance of the VAR model by transforming its input multivariate time-series data as a feature set vector that was used as a training set to linear SVM

Read more

Summary

Introduction

One of the major problems in network analysis involves predicting the existence or emergence of links given a network. Most of the previous works on link prediction use a static network to predict hidden or future links. In the detection of hidden links, the network is based on a known partial snapshot, and the objective is to predict currently existing links [4]. In the prediction of future links, the network is based on a snapshot at time t, and the objective is to predict links at time t’ (t’ > t) [5]. In this framework, insight regarding the dynamics of the network is disregarded, and information on the occurrence and frequency of links across time is lost. Recent works on link prediction use a dynamic network where the network is characterized by a series of snapshots that represent the network across time [4, 5]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.