Abstract

RcppMsgPack: MessagePack Headers and Interface Functions for R

Highlights

  • IntroductionMessagePack (or MsgPack for short, or when referring to the actual implementation) is a binary serialization format made for exchanging data between different programming languages (Furuhashi, 2018)

  • MessagePack is a binary serialization format made for exchanging data between different programming languages (Furuhashi, 2018)

  • R support for these formats is available via the packages mongolite (Ooms, 2014) and RProtoBuf (Eddelbuettel et al, 2016); Redis is implemented in R through the RcppRedis package (Eddelbuettel, 2018)

Read more

Summary

Introduction

MessagePack (or MsgPack for short, or when referring to the actual implementation) is a binary serialization format made for exchanging data between different programming languages (Furuhashi, 2018). In order to support as much generality as possible in serialization and deserialization, the use of lists to represent arrays and maps is necessary It is often the case in R that one would want to deal with large vectors or matrices of a single type without the computational and memory overhead of lists. Transferring large datasets from Python to R and back To evaluate the performance of MsgPack serialization, we benchmarked the transfer of the MNIST (LeCun et al, 2010) and CIFAR-10 (Krizhevsky, 2009) datasets to and from Python and R We compared this approach to writing and reading in CSV format, and to writing and reading using feather (Wickham et al, 2016), a cross-platform library and specification for efficiently handling tabular data. When writing floating point data as CSV, both numpy and the data.table package truncate floating point numbers by default, which causes loss of precision

MsgPack serialization memory usage
Serialization of large lists
Summary
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call