Categories for (Big) Data models and optimization

Laurent Thiry,Michel Hassenforder,Heng Zhao

doi:10.1186/s40537-018-0132-9

Laurent Thiry, Michel Hassenforder + Show 1 more

Open Access

https://doi.org/10.1186/s40537-018-0132-9

Copy DOI

Journal: Journal of Big Data	Publication Date: Jul 7, 2018
Citations: 8	License type: open-access

Affiliation: University of Upper Alsace

Abstract

This paper proposes a theoretical foundation for Big Data. More precisely, it explains how “functors”, a concept coming from Category Theory, can serve to model the various data structures commonly used to represent (large) data sets, and how “natural transformations” can formalize relations between these structures. Algorithms, such as querying a precise information, mainly depend on the data structure considered, and thus natural transformations can serve to optimize these algorithms and get a result in a shorter time. The paper details four functors modeling tabular data, graph structures (e.g. triple stores), cached and split data. Next, the paper explains how, by considering a functional programming language, the concepts can be implemented without effort to propose new tools (e.g. efficient information servers and query languages). And, as a complement to the mathematical models proposed, the paper also presents a optimized data server and a specific query language (based on “unification” to facilitates the search of information). Finally, the paper gives a comparison study and shows that this tool is more efficient than most of the standards available in the market: the functional server appears to be 10+ times faster than relational or document oriented databases (Mysql and MongoDB), and 100+ times faster than a graph database (Neo4j).

Highlights

Big Data is centered on large amount of data what directly impacts the performances of the programs and requires specific architectures to improve them [1], e.g. use of graph databases or distributed concurrent computations
This article describes on how Category Theory combined with a functional programming language can be interesting in a Big Data context
The paper analyzes the performance of this tool and gives a comparison with standard databases: MySql, Mongo and Neo4j

Summary

Introduction

Big Data is centered on large amount of data what directly impacts the performances of the programs (e.g. to query a specific information) and requires specific architectures to improve them [1], e.g. use of graph databases or distributed concurrent computations. Though a lot of technologies are available today to put Big Data into practice, theories usable to well understand the benefits/limitations of each architecture, to identify possible improvements or means to combine them are more rare [2] In this context, the paper presents the capabilities offered by Category Theory together with a functional programming language (to implement the concepts and facilitate experimentation) to solve this limitation. All the code is presented in the following parts what confirms the fact that a functorial/functional approach leads to shorter programs than the ones developed in an other paradigms (imperative or object-oriented in particular) As a complement, it shows than these “short” programs can implement complex algorithms (such as unification) by using the capabilities brought by the concepts (e.g. functors and higher-order functions). This category can be related to (functional) programs by considering that sets model basic datatypes (e.g. boolean, integer, etc.), and morphisms (e.g. fj ) correspond to programs with a parameter Xk and a result Xl [12]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Categories for (Big) Data models and optimization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Are graph databases ready for bioinformatics?
Christian Theil Have ... Lars Juhl Jensen
Bioinformatics | VOL. 29
Christian Theil Have, et. al.Christian Theil Have ... Lars Juhl Jensen
17 Oct 2013
Bioinformatics | VOL. 29

Modeling methods of big data for power grid based on graph database
Zhanhua Pan ... Zhaoxia Jing
-
Zhanhua Pan, et. al.Zhanhua Pan ... Zhaoxia Jing
01 Nov 2018
01 Nov 2018

Reactome graph database: Efficient access to complex pathway data.
Antonio Fabregat ... Peter D’Eustachio
PLOS Computational Biology | VOL. 14
Antonio Fabregat, et. al.Antonio Fabregat ... Peter D’Eustachio
29 Jan 2018
PLOS Computational Biology | VOL. 14

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures.
Ameera M Almasoud ... Abdulmalik S Al-Salman
BioMed Research International | VOL. 2019
Ameera M Almasoud, et. al.Ameera M Almasoud ... Abdulmalik S Al-Salman
27 Jan 2019
BioMed Research International | VOL. 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Categories for (Big) Data models and optimization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data