Abstract

In recent years, Big Data requirements have evolved. Organizations are trying more than ever to accent their efforts on industrial development of all data at their disposal and move further away from underpinning technologies. After investing around Data Lake concept, organizations must now overhaul their data architecture to face IoT (Internet of Things) and AI (Artificial Intelligence) expansion. Efficient and effective data mapping treatments could serve in understanding the importance of data being transformed and used for decision-making process endorsement. As current relational databases are not able to manage large amounts of data, organizations headed towards NoSQL (Not only Structured Query Language) databases. One such known NoSQL database is MongoDB, which has a high scalability. This article mainly put forward a new data model able to extract, classify, and then map data for the purpose of generating new more structured data that meet organizational needs. This can be carried out by calculating various metadata attributes weights, which are considered as important information. It also processed on data clustering stored into MongoDB. This categorization based on data mining clustering algorithm named K-Means.

Highlights

  • Around the world, organizations are looking for a complete data analytics solution to cut costs, accelerate development cycles, and provide valuable information to solve certain of their biggest organizational problems

  • On the entire data collected from Data Lake and stored in MongoDB, K-Means algorithm is applied for data classification and clustering

  • A K-means clustering with the standard model was executed, a K-means clustering based on this study developed model was executed

Read more

Summary

INTRODUCTION

Organizations are looking for a complete data analytics solution to cut costs, accelerate development cycles, and provide valuable information to solve certain of their biggest organizational problems. They view their data assets as an engine driving economic activity for competitive edge. It becomes difficult to place confidence in its accuracy and veracity as well as to use it carefully [3] [4] To solve this problem, organizations have implemented systems with a clustering strategy. This paper concentrates on various data sources centralized in Data Lake and analyzes them based on a common targeted schema [8] These data are collected and mapped into NoSQL database named MongoDB. On the entire data collected from Data Lake and stored in MongoDB, K-Means algorithm is applied for data classification and clustering

Objectives and Contribution
RELATED WORKS
Metadata Analysis
K-means Algorithm
SPECIFIC OBJECTIVES OF PROPOSAL SYSTEM
Data Flow Diagram
Servers Availability Process
IMPLEMENTATION AND EVALUATION
Running MongoDB
Running K-means Algotithm based on Metadata
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call