Abstract
Big data applications are currently used in many application domains, ranging from statistical applications to prediction systems and smart cities. However, the quality of these applications is far from perfect, such as functional error, failure and low performance. Consequently, assuring the overall quality for big data applications plays an increasingly important role. This paper aims at summarizing and assessing existing quality assurance (QA) technologies addressing quality issues in big data applications. We have conducted a systematic literature review (SLR) by searching major scientific databases, resulting in 83 primary and relevant studies on QA technologies for big data applications. The SLR results reveal the following main findings: (1) the quality attributes that are focused for the quality of big data applications, including correctness, performance, availability, scalability and reliability, and the factors influencing them; (2) the existing implementation-specific QA technologies, including specification, architectural choice and fault tolerance, and the process-specific QA technologies, including analysis, verification, testing, monitoring and fault and failure prediction; (3) existing strengths and limitations of each kind of QA technology; (4) the existing empirical evidence of each QA technology. This study provides a solid foundation for research on QA technologies of big data applications and can help developers of big data applications apply suitable QA technologies.
Highlights
The big data technology market grows at a 27% compound annual growth rate (CAGR), and big data market opportunities will reach over 203 billion dollars in 2020 [1]
We provide an exhaustive survey of quality assurance (QA) technologies that have significant roles in big data applications, covering 83 papers published from Jan. 2012 to Dec. 2019
A systematic literature review has been performed on QA technologies for big data applications
Summary
The big data technology market grows at a 27% compound annual growth rate (CAGR), and big data market opportunities will reach over 203 billion dollars in 2020 [1]. Big data applications are associated with the so-called 4V attributes, e.g., volume, velocity, variety and veracity [6]. Due to the large amount of data generated, the fast velocity of arriving data and the various types of heterogeneous data, the quality of data is far from ideal, which makes the software quality of big data applications far from perfect [7]. Due to the volume and velocity attributes [8,9], the data generated from big data applications are extremely numerous and more so with high speed Internet, which may affect data accuracy and data timeliness [10], and lead to software quality problems, such as performance and availability issues [10,11].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.