A Hybrid Method to Measure Distribution Consistency of Mixed-Attribute Datasets

Yulin He,Xuan Ye,Defa Huang,Philippe Fournier-Viger,Joshua Zhexue Huang

doi:10.1109/tai.2022.3151724

Abstract

Random sample partition (RSP) is a newly developed data management and processing model for Big Data processing and analysis. To apply the RSP model for Big Data computation tasks, it is very important to measure the distribution consistency of different datasets. Existing measurement methods for continuous-attribute and discrete-attribute datasets cannot directly deal with mixed-attribute datasets. In this article, we design a hybrid method to measure the distribution consistency among different mixed-attribute datasets by using a multilayer extreme learning machine (MLELM) and the generalized maximum mean discrepancy (GMMD) criterion, abbreviated as MLELM-GMMD. MLELM is first used to transform original mixed-attribute datasets into corresponding deep encoding datasets. Then, the GMMD criterion is applied to check the distribution consistency of the deep encoding datasets. Four experiments have been done to validate the feasibility and effectiveness of MLELM-GMMD, i.e., the impact of MLELM on the amount of information during mixed-attribute data transformation, the impact of MLELM on distributions of mixed-attribute data, the distribution consistencies of RSP and non-RSP data blocks, and the comparison with other measurement methods. Experimental results show that the proposed MLELM-GMMD method can measure the distribution consistency of mixed-attribute datasets more accurately than one-hot encoding-based methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Hybrid Method to Measure Distribution Consistency of Mixed-Attribute Datasets

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Artificial Intelligence

Lead the way for us

Journal: IEEE Transactions on Artificial Intelligence	Publication Date: Feb 1, 2023
Citations: 2

Similar Papers

Detecting Network Anomalies in Mixed-Attribute Data Sets
Khoi-Nguyen Tran ... Huidong Jin
-
Khoi-Nguyen Tran, et. al. Khoi-Nguyen Tran ... Huidong Jin
01 Jan 2009
01 Jan 2009

An Enhanced Hierarchical Extreme Learning Machine with Random Sparse Matrix Based Autoencoder
Tianlei Wang ... Chi-Man Vong
-
Tianlei Wang, et. al.Tianlei Wang ... Chi-Man Vong
01 May 2019
01 May 2019

Perbandingan Waktu Respon Aplikasi Database NoSQL Elasticsearch dan MongoDB pada Pengujian Operasi CRUD
Theresia Liana Sinaga ... Nur Heri Cahyana
JISKA (Jurnal Informatika Sunan Kalijaga) | VOL. 8
Theresia Liana Sinaga, et. al.Theresia Liana Sinaga ... Nur Heri Cahyana
30 Jan 2023
JISKA (Jurnal Informatika Sunan Kalijaga) | VOL. 8

A new method for measuring the distribution consistency of mixed-attribute data sets
Yulin He ... Dexin Dai
Journal of Shenzhen University Science and Engineering | VOL. 38
Yulin He, et. al.Yulin He ... Dexin Dai
01 Mar 2021
Journal of Shenzhen University Science and Engineering | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Method to Measure Distribution Consistency of Mixed-Attribute Datasets

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Artificial Intelligence