A novel region based neighbors searching classification algorithm for big data

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

A novel region based neighbors searching classification algorithm for big data

Similar Papers
  • Conference Article
  • 10.1109/icetci55101.2022.9832399
Big Data Classification Model and Algorithm based on Blockchain
  • May 27, 2022
  • Jiyin Zhou

Although the 4V properties of big data have been widely discussed, most of them still describe the appearance of big data. Technical characteristics are necessary. Based on the above background, the purpose of this article is to study the big data classification model and algorithm based on blockchain. This article addresses the problem of mutual trust in data classification caused by the lack of a transparent, open, and equal interaction environment in the current centralized big data classification model. This article analyzes the advantages of blockchain-based big data classification in depth and establishes a blockchain-based big data classification model. Through the data connection model of the combination of blockchain and distributed file system, key information in all classification interactions is stored on the chain in an immutable and traceable manner. This article implements a big data classification prototype system based on Ethereum, IPFS, Laravel and other technologies. The experimental results show that the stand-alone running time is 4343ms longer than the three-node parallel running time, and the blockchain-based execution time is 17615ms longer than the serial running time. Based on the blockchain, this article introduces the interstellar file system IPFS and Zigzag coding, capabilities-based access control methods, and publish-subscribe models, and proposes a reliable data connection mechanism based on BIZi network, a blockchain-based data capability access control mechanism, and a zone-based Blockchain data service customization mechanism.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-030-67871-5_28
Research on Big Data Classification Algorithm of Disease Gene Detection Based on Complex Network Technology
  • Jan 1, 2021
  • Yuan-Yuan Gao + 4 more

In order to improve the accuracy of the classification of the big data of disease gene detection, an algorithm for the classification of the big data of disease gene detection based on the complex network technology was proposed. On the basis of complex network technology, a distance-based membership function is first established. Considering the distance between the sample and the class center, the membership function of sample compactness is designed to complete the establishment of membership function of complex network. Combined with the design of the classification algorithm flow of the big data of disease gene detection, the design of the data classification algorithm was completed, and the classification of the big data of disease gene detection was realized. The experimental results show that the proposed algorithm is more accurate than the other two classification algorithms in the big data sets of different disease genes.KeywordsComplex network technologyDisease genesBig dataClassification algorithm

  • Book Chapter
  • 10.1007/978-3-319-32055-7_5
Similarity-Based Classification for Big Non-Structured and Semi-Structured Recipe Data
  • Jan 1, 2016
  • Wei Chen + 1 more

In current big data era, there has been an explosive growth of various data. Most of these large volume of data are non-structured or semi-structured (e.g., tweets, weibos or blogs), which are difficult to be managed and organized. Therefore, an effective and efficient classification algorithm for such data is essential and critical. In this article, we focus on a specific kind of non-structured/semi-structured data in our daily life: recipe data. Furthermore, we propose the document model and similarity-based classification algorithm for big non-structured and semi-structured recipe data. By adopting the proposed algorithm and system, we conduct the experimental study on a real-world dataset. The results of experiment study verify the effectiveness of the proposed approach and framework.

  • Research Article
  • 10.54691/sjt.v5i4.4738
A Survey of Fuzzy Pattern Tree Classification Algorithms
  • Apr 20, 2023
  • Scientific Journal of Technology
  • Ji Zhang + 1 more

Data classification algorithm is the core content of big data mining. Its main function is to extract valuable knowledge and information,analyze the characteristics of all kinds of information, and provide data basis for further research. With the wide application of data mining technology, data classification algorithms continue to emerge and gradually improve. The classic classification algorithms include decision tree classification algorithm, naive Bayes algorithm, support vector machine classification algorithm, artificial neural network classification algorithm, fuzzy pattern tree (FPT) and so on. This paper summarizes several common algorithms in data classification algorithms, and analyzes their characteristics to understand their algorithm principles and application scenarios.

  • Research Article
  • 10.51557/pt_jiit.v8i1.1705
Comparative Performance Evaluation Results of Classification Algorithm in Data Mining to Identify Types of Glass Based on Refractive Index and It’s Elements
  • Mar 18, 2023
  • PENA TEKNIK: Jurnal Ilmiah Ilmu-Ilmu Teknik
  • Rinto Suppa

Data science is becoming familiar to the public and companies in the era of Industrial Revolution 4.0. One part of data science is data mining. Data mining is the process of collecting information to see patterns from very large datasets and data discovery which is processed in such a way as to become knowledge based on the interpretation of the information obtained. The purpose of this paper is to compare the performance evaluation results of several classification algorithms in data mining (such as DT C-45, Neural Network, KNN, LDA, Naïve Bayes, SVM, and Rule Induction) for identifying types of glass based on the refractive index and its elements. The data set used is a glass identification dataset taken from the UCI Machine Learning Repository. The results of the evaluation can be seen from the criteria like of Accuracy and Kappa using 10 fold cross validation. As a result, the K-Nearest Neighbors (KNN) algorithm has the best Accuracy and Kappa values, namely 72.90% for Accuracy and 0.632 for Kappa values. To determine the significance of the accuracy value, the T-Test method is used.

  • Research Article
  • Cite Count Icon 11
  • 10.5391/ijfis.2008.8.1.001
A Comparison Study of Classification Algorithms in Data Mining
  • Mar 1, 2008
  • International Journal of Fuzzy Logic and Intelligent Systems
  • Seung-Joo Lee + 1 more

Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.

  • Conference Article
  • Cite Count Icon 6
  • 10.5753/sbsi.2013.5736
Evaluating the Influence of Missing Data on Classification Algorithms in Data Mining Applications
  • May 22, 2013
  • Luciano C Blomberg + 1 more

This paper presents an analysis regarding the influence of missing data on datasets when submitted to traditional classification algorithms in data mining applications. For this purpose, we use ten UCI datasets and manipulate them to hold controlled levels of missing data. Our empirical analysis shows that the classification performance decreases after significant insertion of missing values in all datasets tested. Among the analyzed algorithms, Naïve Bayes is the least influenced by missing data, being SMO the next. IBK is the most influenced, presenting the lowest accuracy, predominantly in datasets whose independent variables are continuous.

  • Research Article
  • 10.47191/etj/v9i07.21
Opinion-Mining Technique on Generative Artificial Intelligence Topic Using Data Classification Algorithms
  • Jul 30, 2024
  • Engineering and Technology Journal
  • Michael Albino

The study employed an opinion-mining technique using data classification algorithms on the topic of Generative Artificial Intelligence (GenAI) to determine the sentiments of Twitter users. The researcher used a sentiment analysis framework to gather the datasets for dataset training and predict the results using Naïve Bayes, Random Forest, and SVM algorithms. The result shows that SVM and Random Forest algorithms had the same precision and recall of 1.000 indicating that the result has no false positive values. On the other hand, the Naïve Bayes algorithm garnered a .949 precision and .939 recall which means fewer false positive results on the trained models. The overall result shows that the trained datasets indicate a successful prediction with fewer false positive results. Moreover, the result of the sentiment analysis shows that more positive sentiments were drawn on the topic of generative artificial intelligence indicating the use and benefits of using AI. Furthermore, based on the result of the study, the research recommended the use of the sentiment analysis framework through an opinion-mining technique using data classification algorithms as it may help analyze different emotions of social media users.

  • Conference Article
  • Cite Count Icon 23
  • 10.1109/bdcloud-socialcom-sustaincom.2016.34
A Hybrid Outlier Detection Method for Health Care Big Data
  • Oct 1, 2016
  • Ke Yan + 4 more

Technology advancements in health care informatics, digitalizing health records, and telemedicine has resulted in rapid growth of health care data. One challenge is how to effectively discover useful and important information out of such massive amount of data through techniques such as data mining. Outlier detection is a typical technique used in many fields to analyze big data. However, for the large scale and high-dimensional heath care data, the conventional outlier detection methods are not efficient. This paper proposes a novel hybrid outlier detection method, namely, Pruning-based K-Nearest Neighbor (PB-KNN), which integrates the density-based, cluster-based methods and KNN algorithm to conduct effective outlier detection. The proposed PB-KNN adopts the case classification quality character (CCQC) as the medical quality evaluation model and uses the attribute overlapping rate (AOR) algorithm for data classification and dimensionality reduction. To evaluate the performance of the pruning operations in PB-KNN, we conduct extensive experiments. The experiment results show that the PB-KNN method outperforms the k-nearest neighbor (KNN) and local outlier factor (LOF) in terms of the accuracy and efficiency.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/icetci55101.2022.9832370
E-commerce Big Data Classification and Mining Algorithm based on Artificial Intelligence
  • May 27, 2022
  • Yulin Luo + 2 more

On the basis of expounding the current e-commerce big data application mode, this paper analyzes the e-commerce big data classification and mining algorithm from the perspective of artificial intelligence. The research points out that in addition to the low recursive efficiency, the current e-commerce data classification and mining still have the problems of frequent cycles and high redundancy; these problems hinder the intelligent development of e-commerce big data. In this regard, it is proposed to establish a fast Spark architecture, set up a vertical sequence of data control based on the framework and the guidance of data jurisdiction dimension, so as to deeply mine the big data of e-commerce and generate a user behavior tree; Classify the user behavior tree and its data set in the way of sequential mapping, and then implement simulation test, which can realize high-precision and efficient mining and classification of e-commerce big data and improve the intelligence of e-commerce big data application.

  • Conference Article
  • 10.1109/dasc/picom/cbdcom/cy55231.2022.9927777
A Fast Distributed Accelerated Gradient Algorithm for Big Data Classification
  • Sep 12, 2022
  • Changsheng Wu + 1 more

With the development of mobile Internet technology, various applications generate a huge amount of data in Cyber-Physical-Social systems. The exponential growth of data brings great difficulties to big data classification, especially time efficiency. The Alternating Direction Method of Multipliers (ADMM) is widely used for distributed machine learning tasks. However, it usually suffers from a slow convergence speed, and thus communication still is a significant bottleneck of distributed algorithms. To this end, in this paper, we pay attention to subproblem optimization in distributed algorithms, and propose a novel Distributed Accelerated Stochastic Variance Reduced Gradient algorithm (DAcSVRG+) for big data classification. Specially, we study the alternating direction method of multipliers for distributed learning framework, and transform the global classification problem into several small subproblems which can be solved in parallel. For the subproblem optimization, we adopt a variance reduction algorithm with Nesterov acceleration strategy, accelerated stochastic variance reduced gradient algorithm, to solve subproblems, and thus further improve the time efficiency. The experimental results on four public and benchmark datasets show that our proposed distributed algorithm can converge faster and achieve the competitive accuracy performance compared with other distributed classification methods with variance reduction.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.ijmedinf.2020.104242
The detection of hospitalized patients at risk of testing positive to multi-drug resistant bacteria using MOCA-I, a rule-based “white-box” classification algorithm for medical data
  • Jul 29, 2020
  • International Journal of Medical Informatics
  • Julie Jacques + 8 more

The detection of hospitalized patients at risk of testing positive to multi-drug resistant bacteria using MOCA-I, a rule-based “white-box” classification algorithm for medical data

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.knosys.2019.104930
PwAdaBoost: Possible world based AdaBoost algorithm for classifying uncertain data
  • Aug 12, 2019
  • Knowledge-Based Systems
  • Han Liu + 2 more

PwAdaBoost: Possible world based AdaBoost algorithm for classifying uncertain data

  • Research Article
  • 10.20894/ijdmta.102.008.001.002
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in Data Mining
  • Jun 5, 2019
  • International Journal of Data Mining Techniques and Applications
  • J.Vijayalakshmi Ms

In order to analyse the chosen data from various points of view, data mining is used as the effective process. This process is also used to sum up all those views into useful information. There are several types of algorithms in data mining such as Classification algorithms, Regression, Segmentation algorithms, association algorithms, sequence analysis algorithms, etc.,. The classification algorithm can be used to bifurcate the data set from the given data set and foretell one or more discrete variables, based on the other attributes in the dataset. The ID3 (Iterative Dichotomiser 3) algorithm is an original data set S as the root node. An unutilised attribute of the data set S calculates the entropy H(S) (or Information gain IG (A)) of the attribute. Upon its selection, the attribute should have the smallest entropy (or largest information gain) value. The prime objective of this paper is to analyze the data from a Kidney disorder due to diabetics by using classification technique to predict class accurately.

  • Conference Article
  • 10.1109/ickecs56523.2022.10060274
Data Classification Algorithm Based on Association Rules from the Perspective of Data Mining
  • Dec 28, 2022
  • Hongxing Liu + 3 more

Association rules(AR) are a common data classification method. It can create more value by studying how to better mine user information and establish connections between these large number(LR) of reusable objects. For better studying the data classification algorithm, this paper studies from the perspective of data mining. This paper mainly discusses and studies some common data mining technologies, and designs a method to deal with related events based on learning rules to solve the problems in practical applications. The experimental data shows that when the number of concurrent users increases, the time of different algorithms also increases, but the time spent in data mining is less than 2 minutes. It shows that the data classification algorithm under the data mining can play a certain role.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.