Abstract

SummaryThe growing prevalence of big and streaming data requires a new generation of tools. Data often has infinite size in the sense that new observations are continually arriving daily, hourly, etc. In recent years, several new technologies such as Kafka (Apache Software Foundation, n.d.-a) and Spark Streaming (Apache Software Foundation, n.d.-b) have been introduced for processing streaming data. Statistical tools for data streams, however, are under-developed and offer only basic functionality. The majority of statistical software can only operate on finite batches and require re-loading possibly large datasets for seemingly simple tasks such as incorporating a few more observations into an analysis.OnlineStats is a Julia (Bezanson, Edelman, Karpinski, & Shah, 2017) package for high-performance online algorithms. The OnlineStats framework is easily extensible, includes a large catalog of algorithms, provides primitives for parallel computing, and offers a weighting mechanism that allows new observations have a higher relative influence over the value of the statistic/model/visualization.

Highlights

  • The growing prevalence of big and streaming data requires a new generation of tools

  • The majority of statistical software can only operate on finite batches and require re-loading possibly large datasets for seemingly simple tasks such as incorporating a few more observations into an analysis

  • A new type must provide implementations of these functions in order to use the rest of the OnlineStats framework

Read more

Summary

Summary

The growing prevalence of big and streaming data requires a new generation of tools. Statistical tools for data streams, are under-developed and offer only basic functionality. The OnlineStats framework is extensible, includes a large catalog of algorithms, provides primitives for parallel computing, and offers a weighting mechanism that allows new observations have a higher relative influence over the value of the statistic/model/visualization. A new type must provide implementations of these functions in order to use the rest of the OnlineStats framework. Method determines how the statistic stat is updated with a single observation y. Each OnlineStat is a concrete subtype of OnlineStat{T}, where T is the type of a single observation. When fit!(stat::OnlineStat{T}, y::S) is called (where S is not a subtype of T), y is iterated through and fit! When fit!(stat::OnlineStat{T}, y::S) is called (where S is not a subtype of T), y is iterated through and fit! is called on each element

Update Weights
Returning the State
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call