Abstract
The importance of big data is widely accepted in various fields. Organisations spend a lot of money to collect, process and mine the data to identify patterns. These patterns facilitate their future decision-making process to improve the organisational performance and profitability. However, among discovered patterns, there are some meaningless and misleading patterns which restrict the effectiveness of decision-making process. The presence of data discrepancies, noise and outliers also impacts the quality of discovered patterns and leads towards missing strategic goals and objectives. Quality inception of these discovered patterns is vital before utilising them in making predictions, decision-making process or strategic planning. Mining useful and credible patterns over social media is a challenging task. Often, people spread targeted content for character assassination or defamation of brands. Recently, some studies have evaluated the credibility of information over social media based on users’ surveys, experts’ judgement and manually annotating Twitter tweets to predict credibility. Unfortunately, due to the large volume and exponential growth of data, these surveys and annotation-based information credibility techniques are not efficiently applicable. This article presents a data quality and credibility evaluation framework to determine the quality of individual data instances. This framework provides a way to discover useful and credible patterns using credibility indicators. Moreover, a new Twitter bot detection algorithm is proposed to classify tweets generated by Twitter bots and real users. The results of conducted experiments showed that the proposed model generates a positive impact on improving classification accuracy and quality of discovered patterns.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.