Abstract

AbstractNow-a-days, large amount of data is being generated at various organizations. In many organizations, there is an inefficiency of handling Big Data with higher volumes, velocity, and variety. Though data is a huge resource, organizing Big Data is a huge challenge in present days. Currently, number of companies adopted different types of NoSQL databases like Cassandra, MongoDB, HBase, etc., which can handle number of requests at a time. To process the Big Data, Apache Spark, one of the most powerful processing engines, has a number of benefits. The main programming notion in Apache Spark is Resilient Distributed Datasets (RDDs), which handles only procedural processing. However, the most regular data processing paradigms are relational queries which cannot be handled by RDD. To overcome this, there is a need to use several higher-level libraries on Apache Spark. Spark SQL is one of the novel components in Apache Spark Framework that integrates relational processing through Apache Spark’s functional programming API. It allows Apache Spark programmers to use the benefits of relational processing. It also provides an integration of relational processing and procedural processing using a declarative Data Frame API. Hence, in this study, Spark SQL Data Frames are experimented to enhance the processing of weather data stored in Cassandra database. Further, the study has proved that the Spark SQL Data Frames have outperformed performance than Spark Core RDD which we have experimented earlier. KeywordsApache Spark-SQLData framesApache Cassandra

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.