Abstract

We present a novel approach to linguistic data summarization for numeric data, i.e., of the numbers to text type. Linguistic summarization is meant as a process of a comprehensive description of big and complex datasets via short statements in natural language. First, we briefly survey main developments in the traditional, well developed, and powerful approach to linguistic data summarization based on tools and techniques of natural language generation (NLG), notably due to Reiter and his collaborators. We indicate that this approach has a serious limitation on the representation and processing of imprecision that is characteristic for natural languages. We show that a fuzzy logic‐based approach to linguistic data summarization can be a simple yet efficient solution in this respect. We present the linguistic summaries represented by protoforms in the form of linguistically quantified propositions dealt with using tools and techniques of fuzzy logic to grasp an inherent imprecision of natural language. Such linguistic data summaries can provide a human user, whose only natural means of articulation and communication is natural language, with a simple yet effective and efficient means for the representation and manipulation of knowledge about processes and systems. We concentrate on the linguistic summarization of dynamic processes and systems, dealing with data represented as time series. We extend the basic, static data‐oriented concept of a linguistic data summary to the case of time series data, present various possible protoforms of linguistic summaries, and an analysis of their properties and ways of generation. We show two our own real applications of the new tools of linguistic summarization of time series, for the summarization of quotations of an investment (mutual) fund, and of Web server logs, to show the power of the tool. We also mention some other applications known from the literature. We conclude with some remarks on the strength of the linguistic summarization for broadly perceived data mining and knowledge discovery and some possible further research directions. WIREs Data Mining Knowl Discov 2016, 6:37–46. doi: 10.1002/widm.1175This article is categorized under: Fundamental Concepts of Data and Knowledge > Human Centricity and User Interaction Technologies > Computational Intelligence Technologies > Machine Learning

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call