Abstract
In recent years, Big Data requirements have evolved. Organizations are trying more than ever to accent their efforts on industrial development of all data at their disposal and move further away from underpinning technologies. After investing around Data Lake concept, organizations must now overhaul their data architecture to face IoT (Internet of Things) and AI (Artificial Intelligence) expansion. Efficient and effective data mapping treatments could serve in understanding the importance of data being transformed and used for decision-making process endorsement. As current relational databases are not able to manage large amounts of data, organizations headed towards NoSQL (Not only Structured Query Language) databases. One such known NoSQL database is MongoDB, which has a high scalability. This article mainly put forward a new data model able to extract, classify, and then map data for the purpose of generating new more structured data that meet organizational needs. This can be carried out by calculating various metadata attributes weights, which are considered as important information. It also processed on data clustering stored into MongoDB. This categorization based on data mining clustering algorithm named K-Means.
Highlights
Around the world, organizations are looking for a complete data analytics solution to cut costs, accelerate development cycles, and provide valuable information to solve certain of their biggest organizational problems
On the entire data collected from Data Lake and stored in MongoDB, K-Means algorithm is applied for data classification and clustering
A K-means clustering with the standard model was executed, a K-means clustering based on this study developed model was executed
Summary
Organizations are looking for a complete data analytics solution to cut costs, accelerate development cycles, and provide valuable information to solve certain of their biggest organizational problems. They view their data assets as an engine driving economic activity for competitive edge. It becomes difficult to place confidence in its accuracy and veracity as well as to use it carefully [3] [4] To solve this problem, organizations have implemented systems with a clustering strategy. This paper concentrates on various data sources centralized in Data Lake and analyzes them based on a common targeted schema [8] These data are collected and mapped into NoSQL database named MongoDB. On the entire data collected from Data Lake and stored in MongoDB, K-Means algorithm is applied for data classification and clustering
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have