Abstract

Among the of characteristics of Large Data complexity comprising of volume, velocity, variety, and veracity (4Vs), this paper focuses on the volume to ensure a better performance of data extract, transform, and load processes in the context of data migration from one server to the other due to the necessity of update to the population data of Tegal City. An approach often used by most programmers in the Department of Population and Civil Registration of Tegal City is conducting the transfer process by transferring all available data (in specific file format) to the database server regardless of the file size. It is prone to errors that may disrupt the data transfer process like timeout, oversized data package, or even lengthy execution time due to large data size. The research compares several approaches to extract, transform, and load/transfer large data to a new server database using a command line and native-PHP programming language (object-oriented and procedural style) with different file format targets, namely SQL, XML, and CSV. The performance analysis that we conducted showed that the big scale data transfer method using LOAD DATA INFILE statement with comma-separated value (CSV) data source extension is the fastest and effective, therefore recommendable.

Highlights

  • The existence of an information system in an organization can help improve different aspects, namely improving the organization’s efficiency and effectiveness of the business process, decision making, productivity, and competitive advantages[1]

  • There are several methods that can be used in the database migration, namely data import to database server using default import feature, third party application suite like phpmyadmin [4] and navicat [5], and using standalone application developed by the programmer itself using Extract, Transform, Load (ETL) Procedures

  • The file or data source used in the research is the population data of Tegal City obtained from the Depatment of Population and Civil Registration of Tegal City

Read more

Summary

Introduction

The existence of an information system in an organization can help improve different aspects, namely improving the organization’s efficiency and effectiveness of the business process, decision making, productivity, and competitive advantages[1]. Data are processed on a daily basis and stored in the server, the volume is always increasing every year [2]. The volume size of population data is increasing every year, and it requires new server upgrade as well as database migration to server. There are several methods that can be used in the database migration, namely data import to database server using default (built-in) import feature, third party application suite like phpmyadmin [4] and navicat [5], and using standalone application developed by the programmer itself using Extract, Transform, Load (ETL) Procedures. The Procedures are performed by collecting the data from different sources as needed, modifying it according to the needs, and uploading it to specific database server to be processed or displayed as needed [6]. The most frequently used tools for ETL process are spreadsheet, relational database, non-SQL database, and many more [7]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call