Asymmetric Data Research Articles

This paper aims to address two fundamental challenges arising in eigenvector estimation and inference for a low-rank matrix from noisy observations: 1) how to estimate an unknown eigenvector when the eigen-gap (i.e. the spacing between the associated eigenvalue and the rest of the spectrum) is particularly small; 2) how to perform estimation and inference on linear functionals of an eigenvector—a sort of “fine-grained” statistical reasoning that goes far beyond the usual <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\ell _{2}$ </tex-math></inline-formula> analysis. We investigate how to address these challenges in a setting where the unknown <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n\times n$ </tex-math></inline-formula> matrix is symmetric and the additive noise matrix contains independent (and non-symmetric) entries. Based on eigen-decomposition of the asymmetric data matrix, we propose estimation and uncertainty quantification procedures for an unknown eigenvector, which further allow us to reason about linear functionals of an unknown eigenvector. The proposed procedures and the accompanying theory enjoy several important features: 1) distribution-free (i.e. prior knowledge about the noise distributions is not needed); 2) adaptive to heteroscedastic noise; 3) minimax optimal under Gaussian noise. Along the way, we establish valid procedures to construct confidence intervals for the unknown eigenvalues. All this is guaranteed even in the presence of a small eigen-gap (up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(\sqrt {n/\mathrm {poly}\log (n)}\,)$ </tex-math></inline-formula> times smaller than the requirement in prior theory), which goes significantly beyond what generic matrix perturbation theory has to offer.

Read full abstract

The emergence of ground-breaking technologies such as artificial intelligence, cloud computing, big data powered by the Internet, and its highly valued real-world applications consisting of symmetric and asymmetric data distributions, has significantly changed our lives in many positive aspects. However, it equally comes with the current catastrophic daily escalating cyberattacks. Thus, raising the need for researchers to harness the innovative strengths of machine learning to design and implement intrusion detection systems (IDSs) to help mitigate these unfortunate cyber threats. Nevertheless, trustworthy and effective IDSs is a challenge due to low accuracy engendered by vast, irrelevant, and redundant features; inept detection of all types of novel attacks by individual machine learning classifiers; costly and faulty use of labeled training datasets cum significant false alarm rates (FAR) and the excessive model building and testing time. Therefore, this paper proposed a promising hybrid feature selection (HFS) with an ensemble classifier, which efficiently selects relevant features and provides consistent attack classification. Initially, we harness the various strengths of CfsSubsetEval, genetic search, and a rule-based engine to effectively select subsets of features with high correlation, which considerably reduced the model complexity and enhanced the generalization of learning algorithms, both of which are symmetry learning attributes. Moreover, using a voting method and average of probabilities, we present an ensemble classifier that used K-means, One-Class SVM, DBSCAN, and Expectation-Maximization, abbreviated (KODE) as an enhanced classifier that consistently classifies the asymmetric probability distributions between malicious and normal instances. HFS-KODE achieves remarkable results using 10-fold cross-validation, CIC-IDS2017, NSL-KDD, and UNSW-NB15 datasets and various metrics. For example, it outclassed all the selected individual classification methods, cutting-edge feature selection, and some current IDSs techniques with an excellent performance accuracy of 99.99%, 99.73%, and 99.997%, and a detection rate of 99.75%, 96.64%, and 99.93% for CIC-IDS2017, NSL-KDD, and UNSW-NB15, respectively based on only 11, 8, 13 selected relevant features from the above datasets. Finally, considering the drastically reduced FAR and time, coupled with no need for labeled datasets, it is self-evident that HFS-KODE proves to have a remarkable performance compared to many current approaches.

Read full abstract

Asymmetric Data Research Articles

Related Topics

Articles published on Asymmetric Data

Likelihood-Based Inference for the Asymmetric Exponentiated Bimodal Normal Model

BAYESIAN ESTIMATION FOR THE STABLE DISTRIBUTIONS IN THE PRESENCE OF COVARIATES WITH APPLICATIONS IN CLINICAL ISSUES

Analysis of Data-centric Financial Governance System

Modeling asymmetrically dependent multivariate ocean data using truncated copulas

Resource Allocation Algorithm Based on Power Control and Dynamic Transmission Protocol Configuration for HAPS-IMT Integrated System

Coverage and Spectral Efficiency of Network Assisted Full Duplex in a Millimeter Wave System

Asymmetric Density for Risk Claim-Size Data: Prediction and Bimodal Data Applications

Asymmetric Data Hiding for Compressed Images with High Payload and Reversibility

A Novel Generator of Continuous Probability Distributions for the Asymmetric Left-skewed Bimodal Real-life Data with Properties and Copulas

The Extended Birnbaum\u2013Saunders Distribution Based on the Scale Shape Mixture of Skew Normal Distributions

PQMLE of a Partially Linear Varying Coefficient Spatial Autoregressive Panel Model with Random Effects

Tackling Small Eigen-Gaps: Fine-Grained Eigenvector Estimation and Inference Under Heteroscedastic Noise

Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture

Towards Potential Content-Based Features Evaluation to Tackle Meaningful Citations

Exponentiated Generalized Inverted Gompertz Distribution: Properties and Estimation Methods with Applications to Symmetric and Asymmetric Data

Forecasting Grain Production and Static Capacity of Warehouses Using the Natural Neighbor and Multiquadric Equations

Feature Selection and Ensemble-Based Intrusion Detection System: An Efficient and Comprehensive Approach

Modeling Extreme Values Utilizing an Asymmetric Probability Function

Deep Learning-Based Residual Control Chart for Binary Response

A Compound Class of the Inverse Gamma and Power Series Distributions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Asymmetric Data Research Articles

Related Topics

Articles published on Asymmetric Data

Likelihood-Based Inference for the Asymmetric Exponentiated Bimodal Normal Model

BAYESIAN ESTIMATION FOR THE STABLE DISTRIBUTIONS IN THE PRESENCE OF COVARIATES WITH APPLICATIONS IN CLINICAL ISSUES

Analysis of Data-centric Financial Governance System

Modeling asymmetrically dependent multivariate ocean data using truncated copulas

Resource Allocation Algorithm Based on Power Control and Dynamic Transmission Protocol Configuration for HAPS-IMT Integrated System

Coverage and Spectral Efficiency of Network Assisted Full Duplex in a Millimeter Wave System

Asymmetric Density for Risk Claim-Size Data: Prediction and Bimodal Data Applications

Asymmetric Data Hiding for Compressed Images with High Payload and Reversibility

A Novel Generator of Continuous Probability Distributions for the Asymmetric Left-skewed Bimodal Real-life Data with Properties and Copulas

The Extended Birnbaum\u2013Saunders Distribution Based on the Scale Shape Mixture of Skew Normal Distributions

PQMLE of a Partially Linear Varying Coefficient Spatial Autoregressive Panel Model with Random Effects

Tackling Small Eigen-Gaps: Fine-Grained Eigenvector Estimation and Inference Under Heteroscedastic Noise

Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture

Towards Potential Content-Based Features Evaluation to Tackle Meaningful Citations

Exponentiated Generalized Inverted Gompertz Distribution: Properties and Estimation Methods with Applications to Symmetric and Asymmetric Data

Forecasting Grain Production and Static Capacity of Warehouses Using the Natural Neighbor and Multiquadric Equations

Feature Selection and Ensemble-Based Intrusion Detection System: An Efficient and Comprehensive Approach

Modeling Extreme Values Utilizing an Asymmetric Probability Function

Deep Learning-Based Residual Control Chart for Binary Response

A Compound Class of the Inverse Gamma and Power Series Distributions