Abstract
In Today’s world data is collected at an unpredictable scale from various application areas. Prior to the arrival of Big Data, all the data that was generated was handled manually. With data being produced in the range of terabytes today, that is impossible. To make the situation worse, almost 80% of the data generated by organizations is unstructured. This means that it cannot be understood in its avail- able format. It is very difficult and risky to make decisions just based on such crude data. In order to make quick, yet correct decisions, the generated data has to be optimized. This Paper discusses to create an end-to-end system to optimize approximately 6 million records of unstructured data provided as .txt files, which is in the form of strings and numbers into understandable or structured data. The next step is to analyse the structured data in order to make calculations on the given dataset. Finally, the analysed data will be represented in the form of dashboards, which are tabular reports or charts. In this Paper, unstructured data in the form of .txt files will be transformed into structured data in the form of tables through the SQL stored procedures in SQL Server Management Studio (SSMS). Along with the data, four other tables called dimensions will be created and then all five tables will then be integrated using SQL Server Integrated Ser- vices. Then an Online Analytical Processing (OLAP) cube is built over this data with product, customer, currency and time as its dimen- sions using the SQL Server Analysis Services (SSAS). At last this analysed data is then reported through dashboards through SQL Server Reporting Services (SSRS).The results of the analysed data is viewed in the form of reports and charts. These reports are customizable and a variety of operations can be performed on them as required by an organization. Since these reports are short and informative, they will be easy to understand and will provide for easier and correct decision making.
Highlights
Information integration is active and challenging research area, despite significant progress made in the recent years
The volume of data available online and in electronicform has grown exponentially over the recent years, increasingthe significance of information integration for any organizational growth.Today 80% of the data generated by the organizations is primarily in the form of text
The primary purpose of the Business Intelligence (BI) solutions are due to the three main challenges in Big Data as shown in Fig .1 which are Variety: The data is being generated by electronic systems in the unstructured, semi-structured or even a mixture of both formats, which makes the analysis of crude data difficult
Summary
Information integration is active and challenging research area, despite significant progress made in the recent years. The data stored in these files may be unstructured or semi- structured This unstructured data cannot be used for analyzing the data as they do not have a predefined format and cannot gain proper insights or any key business driving elements. This data has to be converted into structured data in order to analyze and gain the insights from the analyzed data. The primary purpose of the Business Intelligence (BI) solutions are due to the three main challenges in Big Data as shown in Fig .1 which are Variety: The data is being generated by electronic systems in the unstructured, semi-structured or even a mixture of both formats, which makes the analysis of crude data difficult.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have