Big Data Knowledge Discovery Research Articles

With the fast development of various techniques, more and more data have been accumulated with the unique properties of large size (tall) and high dimension (wide). The era of big data is coming. How to understand and discover new knowledge from these data has attracted more and more scholars' attention and has become the most important task in data mining. As one of the most important techniques in data mining, clustering analysis, a kind of unsupervised learning, could group a set data into objectives(clusters) that are meaningful, useful, or both. Thus, the technique has played very important role in knowledge discovery in big data. However, when facing the large-sized and high-dimensional data, most of the current clustering methods exhibited poor computational efficiency and high requirement of computational source, which will prevent us from clarifying the intrinsic properties and discovering the new knowledge behind the data. Based on this consideration, we developed a powerful clustering method, called MUFOLD-CL. The principle of the method is to project the data points to the centroid, and then to measure the similarity between any two points by calculating their projections on the centroid. The proposed method could achieve linear time complexity with respect to the sample size. Comparison with K-Means method on very large data showed that our method could produce better accuracy and require less computational time, demonstrating that the MUFOLD-CL can serve as a valuable tool, at least may play a complementary role to other existing methods, for big data clustering. Further comparisons with state-of-the-art clustering methods on smaller datasets showed that our method was fastest and achieved comparable accuracy. For the convenience of most scholars, a free soft package was constructed.

Read full abstract

SummaryData structure description, conceptual modeling, and logic reasoning for knowledge discovery are three critical factors for the integration of information with heterogeneity. In particular, technologies of NoSQL databases and Internet of Things raise an urgent requirement for a uniform expression of heterogeneous data, and little attention has been paid to researches on the integration of NoSQL databases with traditional data models, as well as the semantic description of big data. To tackle these problems, in this paper, a concept‐and‐relation‐oriented grid data model called GODM model is first proposed based on the definitions of Monad, Compounder, Relation, etc. Then, the GODM model is utilized to uniformly describe traditional data models and NoSQL data models, which eliminates structure differences of heterogeneous data. Next, based on the GODM relation mechanism, an extendable semantic system is built up by choosing SHOIQ(D) description logic as the example to establish the correspondence with GODM grammar subset, providing a fundamental support for semantic integration and knowledge discovery of heterogeneous data. After that, comprehensive comparisons with GODM and other models are made, especially the distinctions between GODM and OWL on the aspects of relation mechanism, hybrid schema, description logic, grammatical constructors, etc. Besides, experimental evaluations and analyses on time and space efficiencies of some primary common data models are conducted after the proposal of a general evaluation model, with the results showing that the GODM model has great advantage on properties of expressiveness, flexibility, etc, particularly time and space efficiency. In summary, the GODM model describes heterogeneous data from both aspects of data structure and semantic relationship and realizes a hybrid schema reconciling the schemaful and schemaless data models, making it especially suitable for dynamic data integration and knowledge discovery from big data models.

Read full abstract

Big Data Knowledge Discovery Research Articles

Related Topics

Articles published on Big Data Knowledge Discovery

Sophisticated methods for noise filtering, subgroup discovery, and classification in big data analysis

Big Data Knowledge Discovery as a Service: Recent Trends and Challenges

Efficient Algorithms for Dynamic Incomplete Decision Systems

µBIGMSA-Microservice-Based Model for Big Data Knowledge Discovery: Thinking Beyond the Monoliths

Big Data Knowledge Discovery Platforms: A 360 Degree Perspective

Special Issue on Knowledge Discovery in Big Data (KDBD)

Assessing reliability of Big Data Knowledge Discovery process

A Fast Projection-Based Algorithm for Clustering Big Data.

ProTraS: A probabilistic traversing sampling algorithm

A general framework for big data knowledge discovery and integration

Machine learning in pain research.

Semantic genetic programming for fast and accurate data knowledge discovery

NYSOL: A User-Centric Framework for Knowledge Discovery in Big Data

Outlier Detection by Interaction with Domain Experts

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Big Data Knowledge Discovery Research Articles

Related Topics

Articles published on Big Data Knowledge Discovery

Sophisticated methods for noise filtering, subgroup discovery, and classification in big data analysis

Big Data Knowledge Discovery as a Service: Recent Trends and Challenges

Efficient Algorithms for Dynamic Incomplete Decision Systems

µBIGMSA-Microservice-Based Model for Big Data Knowledge Discovery: Thinking Beyond the Monoliths

Big Data Knowledge Discovery Platforms: A 360 Degree Perspective

Special Issue on Knowledge Discovery in Big Data (KDBD)

Assessing reliability of Big Data Knowledge Discovery process

A Fast Projection-Based Algorithm for Clustering Big Data.

ProTraS: A probabilistic traversing sampling algorithm

A general framework for big data knowledge discovery and integration

Machine learning in pain research.

Semantic genetic programming for fast and accurate data knowledge discovery

NYSOL: A User-Centric Framework for Knowledge Discovery in Big Data

Outlier Detection by Interaction with Domain Experts