Data Mining Classification Algorithms Research Articles

Generative AI tools exemplified by ChatGPT are becoming a new reality. This study is motivated by the premise that “AI generated content may exhibit a distinctive behavior that can be separated from scientific articles”. In this study, we show how articles can be generated using means of prompt engineering for various diseases and conditions. We then show how we tested this premise in two phases and prove its validity. Subsequently, we introduce xFakeSci, a novel learning algorithm, that is capable of distinguishing ChatGPT-generated articles from publications produced by scientists. The algorithm is trained using network models driven from both sources. To mitigate overfitting issues, we incorporated a calibration step that is built upon data-driven heuristics, including proximity and ratios. Specifically, from a total of a 3952 fake articles for three different medical conditions, the algorithm was trained using only 100 articles, but calibrated using folds of 100 articles. As for the classification step, it was performed using 300 articles per condition. The actual label steps took place against an equal mix of 50 generated articles and 50 authentic PubMed abstracts. The testing also spanned publication periods from 2010 to 2024 and encompassed research on three distinct diseases: cancer, depression, and Alzheimer’s. Further, we evaluated the accuracy of the xFakeSci algorithm against some of the classical data mining algorithms (e.g., Support Vector Machines, Regression, and Naive Bayes). The xFakeSci algorithm achieved F1 scores ranging from 80 to 94%, outperforming common data mining algorithms, which scored F1 values between 38 and 52%. We attribute the noticeable difference to the introduction of calibration and a proximity distance heuristic, which underscores this promising performance. Indeed, the prediction of fake science generated by ChatGPT presents a considerable challenge. Nonetheless, the introduction of the xFakeSci algorithm is a significant step on the way to combating fake science.

Read full abstract

AbstractStudent outcomes are of great importance in higher education institutions. Accreditation bodies focus on them as an indicator to measure the performance and effectiveness of the institution. Forecasting students’ academic performance is crucial for every educational establishment seeking to enhance performance and perseverance of its students and reduce the failure rate in the future. The main goal of this study is to predict the performance of undergraduate first-level students in the Computer Department during the years 2016 to 2021 to enhance their performance in future by discovering the best algorithm use to analyze the educational data to identify the students’ academic performance. The secondary data was collected by reviewing the Student Affairs Department at the Faculty of Specific Education at Damietta University, in addition to the Statistics Department at the university. The dataset contained 830 instances after excluding 139 instances of missing values, irrelevant rows, and outliers. The dataset was divided into train (577 instances (70%)), test (253 instances (30%)) and involved six features such year, midterm, practical exam, writing exam, final total degree, and grade. This paper use five machine learning (ML) algorithms which was selected according to the literature review and high accuracy in predicting educational data mining: For the purpose of comparison, a number of different machine learning algorithms, such as Random Forest, Decision Tree, Naive Bayes, Neural Network, and K-Nearest Neighbours, were utilized and evaluated with evaluation metrics such as confusion matrix, accuracy, precision, recall, and F-measure. The Random Forest and Decision Tree classifiers emerged as the top-performing algorithms, accurately categorizing 250 instances when predicting students' performance in the statistics course. This was determined based on the findings of the study. Out of a total of 253 instances that were included in the testing set, they only made three incorrect classifications.

Read full abstract

Data Mining Classification Algorithms Research Articles

Related Topics

Articles published on Data Mining Classification Algorithms

Impact Analysis of Filter and Wrapper-Based Feature Selection Techniques for Webpages Phishing Attacks Identification

Exploring the Effectiveness of Data Mining Classification Algorithms in Credit Card Fraud Detection

Detection of ChatGPT fake science with the xFakeSci learning algorithm

Technical–tactical differences between female and male elite football: A data mining approach through neural network analysis, binary logistic regression, and decision tree techniques

Prediction of student exam performance using data mining classification algorithms

Predicting Diabetes Disease Using Data Mining Classification Algorithms and Comparison of Algorithm Performances

Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method

An overview of big data mining and data privacy protection technologies

Educational data mining: Methods and applications

Data Cleaning in Medical Procurement Database: Performance Comparison of Data Mining Classification Algorithms for Tackling Missing Value

PENERAPAN METODE DECISION TREE UNTUK PRAKIRAAN CUACA KOTA BEKASI

Prediksi Keberhasilan Pemasaran Layanan Jasa Perbankan Mengunnakan Algoritma Logistic Regreesion

Modeling and Predicting the Changes in Hearing Loss of Workers with the Use of a Neural Network Data Mining Algorithm: A Field Study

RFID-based logistics big data asset evaluation and data mining research

Big Data Stream Mining Using Integrated Framework with Classification and Clustering Methods

Performance Comparison of Data Mining Classification Algorithms on Student Academic Achievement Prediction

Student performance prediction using datamining classification algorithms: Evaluating generalizability of models from geographical aspect

Comparative Performance Evaluation Results of Classification Algorithm in Data Mining to Identify Types of Glass Based on Refractive Index and It’s Elements

Multiple disease prediction using Machine learning algorithms

Identification of Significant Features and Data Mining Techniques in Predicting Heart Stroke

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Data Mining Classification Algorithms Research Articles

Related Topics

Articles published on Data Mining Classification Algorithms

Impact Analysis of Filter and Wrapper-Based Feature Selection Techniques for Webpages Phishing Attacks Identification

Exploring the Effectiveness of Data Mining Classification Algorithms in Credit Card Fraud Detection

Detection of ChatGPT fake science with the xFakeSci learning algorithm

Technical–tactical differences between female and male elite football: A data mining approach through neural network analysis, binary logistic regression, and decision tree techniques

Prediction of student exam performance using data mining classification algorithms

Predicting Diabetes Disease Using Data Mining Classification Algorithms and Comparison of Algorithm Performances

Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method

An overview of big data mining and data privacy protection technologies

Educational data mining: Methods and applications

Data Cleaning in Medical Procurement Database: Performance Comparison of Data Mining Classification Algorithms for Tackling Missing Value

PENERAPAN METODE DECISION TREE UNTUK PRAKIRAAN CUACA KOTA BEKASI

Prediksi Keberhasilan Pemasaran Layanan Jasa Perbankan Mengunnakan Algoritma Logistic Regreesion

Modeling and Predicting the Changes in Hearing Loss of Workers with the Use of a Neural Network Data Mining Algorithm: A Field Study

RFID-based logistics big data asset evaluation and data mining research

Big Data Stream Mining Using Integrated Framework with Classification and Clustering Methods

Performance Comparison of Data Mining Classification Algorithms on Student Academic Achievement Prediction

Student performance prediction using datamining classification algorithms: Evaluating generalizability of models from geographical aspect

Comparative Performance Evaluation Results of Classification Algorithm in Data Mining to Identify Types of Glass Based on Refractive Index and It’s Elements

Multiple disease prediction using Machine learning algorithms

Identification of Significant Features and Data Mining Techniques in Predicting Heart Stroke