Abstract
The conventional extracting–transforming–loading (ETL) system is typically operated on a single machine not capable of handling huge volumes of geospatial big data. To deal with the considerable amount of big data in the ETL process, we propose D_ELT (delayed extracting–loading –transforming) by utilizing MapReduce-based parallelization. Among various kinds of big data, we concentrate on geospatial big data generated via sensors using Internet of Things (IoT) technology. In the IoT environment, update latency for sensor big data is typically short and old data are not worth further analysis, so the speed of data preparation is even more significant. We conducted several experiments measuring the overall performance of D_ELT and compared it with both traditional ETL and extracting–loading– transforming (ELT) systems, using different sizes of data and complexity levels for analysis. The experimental results show that D_ELT outperforms the other two approaches, ETL and ELT. In addition, the larger the amount of data or the higher the complexity of the analysis, the greater the parallelization effect of transform in D_ELT, leading to better performance over the traditional ETL and ELT approaches.
Highlights
Numerous types of sensors have been connected to the Internet of Things (IoT) and have produced huge volumes of data with high velocity
extracting–loading– transforming (ELT) can speed up transform using parallelization/distribution supported in the Hadoop-based big data platform
While in the ETL process, transform was conducted on a single machine, which is based on non-MapReduce and transform in the ELT was performed in a parallel distributed way based on MapReduce
Summary
Numerous types of sensors have been connected to the Internet of Things (IoT) and have produced huge volumes of data with high velocity. Once the sequence of operators has been incorporated, it is automatically transformed to the map and reduces jobs in our Hadoop-based geospatial big data system During this conversion process, our system controls the number of MapReduce steps in such a way as to achieve better performance by decreasing the overhead of mapping and reducing. We propose a method to start geospatial big data analysis in a short time by reducing the time required for data transformation under the Hadoop platform. With the increa3soifn1g5 number of IoT sensing devices, the amount of sensor data is expected to grow significantly over time ftoimr ea fwoirdae wraindgeeroafngfielodfs fiaenlddsapapnldicatpiponlisc.aItoioTn-bs.asIeodT-sbeansseodr dseantasoarred,ahtaowareev,ehr,oewsseevnetri,aellsyselonotisaelly slotrouscetluyrsetdrucatnudretdypanicdaltlyypicnaclolymipnlceotme,pmleutec,hmoufchit obfeiitnbgeidnigredcitrleyctulynunsaubslaeb.leIn. Inadaddidtiiotino,ni,ninththee IIooTT environmeentt,, tthheeuuppddaateteppereiroidod——thtehetimtime beebtweteweenetnhethaerraivrrailvoafl roafwradwatadantadawnhdenwmheenanminegafnuilndgafutal daraetamardee mavaadileabalve—ailoacbclue—rs omcocurersfremqoureentflryeqthuaenntfloyr ttyhpaincaflobratcyhpidcatla.baTthcehseddaitffia.cTuhlteiesse rdeqifuficreultthieast rceoqnusidreertahbaltecroenssoiudrecreasbalerereussoeudrcfoesr tarraenusfsoerdmfoatriotrnaninsftohremEaTtiLonprionctehses.ETL process. Wexeplfaoicnuins gpaarutitcoumlaartliycaolnlyecxopnlavienritnigngauatoumseart-iscpaellcyificeodnvseerqtuinegncaeuosfero-pspeercaitfoiersd fsoerquspenatciealoaf noapleyrsaitsotros fMorapspRaetdiaulcaensatelypssi.sSteocMtioanp3Rdedesuccreibsetsepusp.-Stoe-cdtiaotne E3TdLesrcersiebaerschupfo-tlloo-wdaetdebEyToLurreaspeaprrcohacfohlolonwimedpbroyvoinugr paperpfrooramcahnocne oimf trparnosvfionrgmpaetirofnorimn tahnecEeToLf ptrraoncsefsosrems baatisoend oinntMheapERTeLdpurcoe.ceOsuserscobnadseudctoend eMxpaperRimedeunctea.l OseuttrincgosnadnudctreedsuelxtspaerreimdeensctarilbseedttiinngSsecatniodnsre4saunltds 5a,rreesdpeescctriivbeeldy. iSnecSteioctnio6ncso4ncalundde5s,oruerspweocrtkivaenlyd. pSerecstieonnts6ocuornpcluadnsesfoorufruwtuorrekreasnedarpcrhe.sents our plans for future research
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.