Abstract

The exponential rise in advanced software computing and low-cost hardware has broadened the horizon for the Internet of Medical Things (IoMT), interoperable e-Healthcare systems serving varied purposes including electronic healthcare records (EHRs) and telemedicine. However, being heterogeneous and dynamic in nature, their database security remains a challenge forever. Numerous intrusion attacks including bot-attack and malware have confined major classical databases towards e-Healthcare. Despite the robustness of NoSQL over the structured query language databases, the dynamic data nature over a heterogeneous environment makes it vulnerable to intrusion attacks, especially over interoperable e-Healthcare systems. Considering these challenges, this work proposed a first of its kind semantic feature-driven NoSQL intrusion attack (NoSQL-IA) detection model for interoperable e-Healthcare systems. This work assessed the efficacy of the different semantic feature-extraction methods like Word2Vec, Continuous Bag of Words, N-Skip Gram (SKG), Count Vectorizer, TF-IDF, and GLOVE towards NoSQL-IA prediction. Subsequently, to minimize computational exhaustion, different feature selection methods including Wilcoxon Rank Sum Test (WRST), significant predictor test, principal component analysis, Select K-Best, and variance threshold feature selection algorithms were employed. To alleviate the data imbalance problem, it applied different resampling methods including upsampling, downsampling, and synthetic minority oversampling technique (SMOTE) over the selected features. Later, Min–Max normalization was performed over the input feature vectors to alleviate any possibility of overfitting. Towards NoSQL-IA prediction, different machine learning methods like Multinomial Naïve Bayes, decision tree, logistic regression, support vector machine, k-NN, AdaBoost, Extra Tree Classifier, random forest ensemble, and XG-Boost were applied, which classified each input query as the regular query or the NoSQL-IA attack query. The depth performance assessment revealed that the use of Word2Vec features SKG in sync with VTFS feature selection and SMOTE resampling processed with the bootstrapped random forest classifier can provide the best performance in terms of high accuracy (98.86%), F-Measure (0.974), and area under the curve (AUC) (0.981), thus enabling it suitable for interoperable e-Healthcare database security.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call