Abstract

This paper presents HDSAnalytics: A data analytics framework for heterogeneous data sources. This framework utilizes data from a variety of data sources differing in formats and volume. These data sources can contain data in structured, semi-structured or unstructured form. The integration of data from these different data sources into a single unified data source may result in some loss of information due to semantic, syntactic and schematic differences that arise among data sources. Semantic heterogeneity arises because of the presence of similar data in different forms in different data sources. Schematic and Syntactic heterogeneity arises due to the difference in formats/schema in which the data is stored and the way in which the data is accessed and retrieved respectively. Hence, the need to access, retrieve and utilize the information from different data sources possess challenges like 1. Mapping similar attributes among different data sources, 2. Retrieving specific attributes from different data sources that are relevant with respect to a users query, 3. Retrieving data from different data sources in different formats as requested by different components in the system. The proposed HDS Analytics framework design aides analytic models in using heterogeneous data sources As-Is without integrating into a single data source, thereby overcoming all the above mentioned challenges. Our prototype of the framework, experimented using data from Bangalore Metropolitan Transport Corporation (BMTC), India, demonstrates how bus fleet operations can be smoothly analyzed, diagnosed and explored for improving bus fleet schedules and reducing the operations costs. It provides detailed insight on bus fleet operations. Our prototype scales and works efficiently well with increasing number of heterogeneous data sources.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call