Machine Learning and data mining tools applied for databases of low number of records

Hubert Anysz

doi:10.23947/2687-1653-2021-21-4-346-363

Abstract

The use of data mining and machine learning tools is becoming increasingly common. Their usefulness is mainly noticeable in the case of large datasets, when information to be found or new relationships are extracted from information noise. The development of these tools means that datasets with much fewer records are being explored, usually associated with specific phenomena. This specificity most often causes the impossibility of increasing the number of cases, and that can facilitate the search for dependences in the phenomena under study. The paper discusses the features of applying the selected tools to a small set of data. Attempts have been made to present methods of data preparation, methods for calculating the performance of tools, taking into account the specifics of databases with a small number of records. The techniques selected by the author are proposed, which helped to break the deadlock in calculations, i.e., to get results much worse than expected. The need to apply methods to improve the accuracy of forecasts and the accuracy of classification was caused by a small amount of analysed data. This paper is not a review of popular methods of machine learning and data mining; nevertheless, the collected and presented material will help the reader to shorten the path to obtaining satisfactory results when using the described computational methods

Highlights

Применение инструментов машинного обучения и интеллектуальный анализ данных в отношении баз данных с небольшим количеством записей
Machine Learning and data mining tools applied for databases of low number of records Hubert Anysz Warsaw University of Technology (Warsaw, Poland) h.anysz@il.pw.edu.pl
The use of data mining and machine learning tools is becoming increasingly common. Their usefulness is mainly noticeable in the case of large datasets, when information to be found or new relationships are extracted from information noise

Summary

Сложность системы и процесса

Выбор данных означает необходимость выбора лишь нескольких независимых переменных, на основе которых будут выполняться классификация или прогнозирование выходного значения с использованием искусственного интеллекта (также известного как машинное обучение). В [12] количество зависимых переменных было сокращено, а в [9] для анализа была принята новая переменная как сумма значений двух сильно положительно коррелированных независимых переменных (это также было технически оправданно). Наиболее распространенной формой преобразования данных является их стандартизация, то есть такое преобразование значений независимых переменных и зависимой переменной, при котором они принимают значения из одного и того же диапазона. Которым назначены отдельные случаи (описанные в базе данных) (например, восемь), может случиться так, что для пяти классов классификация будет на 100 % правильной, а 10 % ошибок относятся к другим трем классам. Ошибкой классификации в медицинских приложениях является разделение ошибок только на два класса, где так же важно не вводить лекарства здоровому человеку, как и не отказывать в лечении действительно больному человеку (принимая его за здорового) (рис. 7)

Класс присваивается классификатором

Дихотомические подмножества на каждом этапе

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Advanced Engineering Research	Publication Date: Jan 10, 2022
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Machine Learning and data mining tools applied for databases of low number of records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Advanced Engineering Research

Lead the way for us

Similar Papers

Machine learning and data mining assisted petroleum reservoir engineering: a comprehensive review
Rupali Purbey ... Anil Kumar Dahiya
International Journal of Oil, Gas and Coal Technology | VOL. 30
Rupali Purbey, et. al.Rupali Purbey ... Anil Kumar Dahiya
01 Jan 2021
International Journal of Oil, Gas and Coal Technology | VOL. 30

Creating Efficiencies in the Extraction of Data From Randomized Trials: A Prospective Evaluation of a Machine Learning and Text Mining Tool
Allison Gates ... Jennifer Pillay
-
Allison Gates, et. al.Allison Gates ... Jennifer Pillay
10 Aug 2021
10 Aug 2021

Detecting ADRD Caregivers’ Information Wants in Social Media: A Machine Learning–Aided Approach
Bo Xie ... Zhendong Wang
Innovation in Aging | VOL. 4
Bo Xie, et. al.Bo Xie ... Zhendong Wang
16 Dec 2020
Innovation in Aging | VOL. 4

Explainable machine learning practices: opening another black box for reliable medical AI
Emanuele Ratti ... Mark Graves
AI and Ethics | VOL. 2
Emanuele Ratti, et. al.Emanuele Ratti ... Mark Graves
15 Feb 2022
AI and Ethics | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning and data mining tools applied for databases of low number of records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Advanced Engineering Research