A Systematic-Architectural-Perspective Based Performance Analysis of A-MERIT-C- Dynamic Learning Multitiered Ensemble-Based Real Time Flight Data Analysis

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Large-scale data analysis has been the subject of numerous studies recently. In many applications of today's data-intensive world, data is typically brought in continually as data streams. Analytics engines that handle streaming data must be able to react to data that is in motion. Data streams provide special challenges because traditional methods for data mining and machine learning are meant for static information. They are less suited to consider the representative characteristics of data streams and are very less suitable to effectively analyse data that is growing quickly. The authors through this research viz. A-MERIT-C - a dynamic learning multitiered ensemble-based flight real time data analysis system. Through this research authors have presented an active learning dynamic real time data stream analysis model built with self-tuning ensemble learning framework, able to quickly adapt to concepts in near real time streaming data analysis. The conceptual architectural framework illustrated through this research is adaptive to deal with the dynamics related with real time data through the evolving classifier pool (i.e. best performing classifiers get added to classifier pool at every epoch). One more distinguishing characteristic of -A-MERIT-C is instead of using traditional hold out evaluation, it uses prequentially evaluated classifiers. A-MERIT-C's unique features provide significant gains in accuracy, precision, and AUC for streaming data analytics; however, it can also overcome the drawbacks of current algorithms, including concept evolution and feature drift, by using incremental learning and feedback.

Similar Papers
  • Research Article
  • 10.3390/math13071054
DIA-TSK: A Dynamic Incremental Adaptive Takagi–Sugeno–Kang Fuzzy Classifier
  • Mar 24, 2025
  • Mathematics
  • Hao Chen + 6 more

In order to continuously adapt to dynamic data distributions, existing incremental and online learning methods adopt bagging or boosting structures, in which some sub-classifiers are abandoned when the data distribution varies significantly in the learning process. As such, these ensemble classifiers may fail to reach the global optimum. Furthermore, the training of static sub-classifiers, which are dropped when concept drift emerges, leads to unnecessary computational costs. To solve these issues, this study proposes a novel training method consisting of a single dynamic classifier—named the dynamic incremental adaptive Takagi–Sugeno–Kang fuzzy classifier (DIA-TSK)—which leverages the superior non-linear modeling capabilities and interpretability of the TSK fuzzy system. DIA-TSK utilizes a multi-dimensional incremental learning strategy that is capable of dynamically learning from new data in real time while maintaining global optimal solutions across various online application scenarios. DIA-TSK incorporates two distinct learning paradigms: online learning (O-DIA-TSK) and batch incremental learning (B-DIA-TSK). These modules can work separately or collaborate synergistically to achieve rapid, precise and resource-efficient incremental learning. With the implementation of O-DIA-TSK, we significantly reduce the computational complexity in incremental processes, effectively addressing real-time learning requirements for high-frequency dynamic data streams. Moreover, the novel incremental update mechanism of O-DIA-TSK dynamically adjusts its parameters to ensure progressive optimization, enhancing both real-time performance and learning accuracy. For large-scale data sets, DIA-TSK evolves into B-DIA-TSK, which implements batch updates for multiple samples based on the Woodbury matrix identity. This extension substantially improves computational efficiency and robustness during incremental learning, making it particularly suitable for high-dimensional and complex data sets. Extensive comparative experiments demonstrate that the DIA-TSK approaches significantly outperform existing incremental learning methods across multiple dynamic data sets, exhibiting notable advantages in terms of computational efficiency, classification accuracy and memory management. In the experimental comparison, O-DIA-TSK and B-DIA-TSK reach significant superiority in classification performance with respect to comparative methods, with up to 33.3% and 55.8% reductions in training time, respectively, demonstrating the advantage of DIA-TSK in classification tasks using dynamic data.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/eee.2004.1287312
Single-pass algorithms for mining frequency change patterns with limited space in evolving append-only and dynamic transaction data streams
  • Jan 1, 2004
  • Hua-Fu Li + 1 more

We propose an online single-pass algorithm MFC-append (mining frequency change patterns in append-only data streams) for online mining frequent frequency change items in continuous append-only data streams. An online space-efficient data structure called Change-Sketch is developed for providing fast response time to compute dynamic frequency changes between data streams. A modified approach MFC-dynamic (mining frequency change patterns in dynamic data streams) is also presented to mine frequency changes in dynamic data streams. The theoretic analyses show that our algorithms meet the major performance requirements of single-pass, bounded storage, and real time for streaming data mining.

  • Research Article
  • Cite Count Icon 7
  • 10.1109/access.2018.2883666
Online Real-Time Analysis of Data Streams Based on an Incremental High-Order Deep Learning Model
  • Jan 1, 2018
  • IEEE Access
  • Yuliang Li + 2 more

As the core part of the new generation of information technology, the Internet of Things has accumulated a large number of real-time data streams of various types and structures. The data stream is generated at an extremely fast speed, and its content and distribution characteristics are all in high-speed dynamic changes, which must be processed in real time. Therefore, the feature learning algorithm is required to support incremental updates and learn the characteristics of high-speed dynamic change data in real time. Most of the current machine learning models for processing big data belong to the static learning model. The batch learning method makes it impossible to analyze data streams in real time, and the learning ability of dynamic data streams is poor. Therefore, this paper proposes an incremental high-order deep learning model to extend the data from the vector space to the tensor space and update the parameters and structure of the network model in the high-order tensor space. In the process of parameter updating, the first-order approximation concept is introduced to avoid incrementing parameters by the iterative method and to improve the parameter update efficiency, so that the updated model can quickly learn the characteristics of dynamically changing big data and satisfy the real-time requirements of big data feature learning while maintaining the original knowledge of the neural network model as much as possible. To evaluate the performance of the proposed model, experiments were performed on real image data sets-MNIST, and the model was evaluated for stability, plasticity, and run time. The experimental results show that the model not only has the ability to incrementally learn the characteristics of new data online but also retains the ability to learn the original data features, improve the model update efficiency, and maximize the online analysis and real-time processing of dynamic data streams.

  • Research Article
  • Cite Count Icon 25
  • 10.1016/j.eswa.2021.115591
Incremental semi-supervised Extreme Learning Machine for Mixed data stream classification
  • Jul 14, 2021
  • Expert Systems with Applications
  • Qiude Li + 5 more

Incremental semi-supervised Extreme Learning Machine for Mixed data stream classification

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.energy.2023.130149
Predicting the electric power consumption of office buildings based on dynamic and static hybrid data analysis
  • Dec 27, 2023
  • Energy
  • Rongwei Zou + 5 more

Predicting the electric power consumption of office buildings based on dynamic and static hybrid data analysis

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/padsw.2014.7097853
Continuous similarity join on data streams
  • Dec 1, 2014
  • Jia Cui + 3 more

Similarity join plays an important role in many applications, such as data cleaning and integration, to address the poor data quality problem. Most of the existing studies focused on performing similarity join on static datasets but few studies realized running it on dynamic data streams. With the development of network technology, the data accessing paradigm has transferred from disk-oriented mode to online data streams, which makes performing similarity join in continuous query on data streams become a novel query processing paradigm. Different from static dataset, data stream is unbounded, continuous and unpredictable. The significant differences pose serious challenges, such as real-time query performance. To this end, we study the problem of continuous similarity join on data streams in this paper, which is based on edit distance metric and filter-and-verify framework with sliding-window semantics. Two subcases of this problem are studied, including self similarity join on a single data stream and similarity join on two streams. We introduced the basic window based sliding window model to facilitate the update of sliding window and its index. More details of our method, including signature extraction schemes, filtering and verification algorithms, re-evaluation strategies are discussed respectively. Finally, extensive experimental results show that our method works efficiently on real data streams.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.5194/isprs-archives-xli-b2-177-2016
CHANGE SEMANTIC CONSTRAINED ONLINE DATA CLEANING METHOD FOR REAL-TIME OBSERVATIONAL DATA STREAM
  • Jun 7, 2016
  • The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • Yulin Ding + 2 more

Abstract. Recent breakthroughs in sensor networks have made it possible to collect and assemble increasing amounts of real-time observational data by observing dynamic phenomena at previously impossible time and space scales. Real-time observational data streams present potentially profound opportunities for real-time applications in disaster mitigation and emergency response, by providing accurate and timeliness estimates of environment’s status. However, the data are always subject to inevitable anomalies (including errors and anomalous changes/events) caused by various effects produced by the environment they are monitoring. The “big but dirty” real-time observational data streams can rarely achieve their full potential in the following real-time models or applications due to the low data quality. Therefore, timely and meaningful online data cleaning is a necessary pre-requisite step to ensure the quality, reliability, and timeliness of the real-time observational data. In general, a straightforward streaming data cleaning approach, is to define various types of models/classifiers representing normal behavior of sensor data streams and then declare any deviation from this model as normal or erroneous data. The effectiveness of these models is affected by dynamic changes of deployed environments. Due to the changing nature of the complicated process being observed, real-time observational data is characterized by diversity and dynamic, showing a typical Big (Geo) Data characters. Dynamics and diversity is not only reflected in the data values, but also reflected in the complicated changing patterns of the data distributions. This means the pattern of the real-time observational data distribution is not stationary or static but changing and dynamic. After the data pattern changed, it is necessary to adapt the model over time to cope with the changing patterns of real-time data streams. Otherwise, the model will not fit the following observational data streams, which may led to large estimation error. In order to achieve the best generalization error, it is an important challenge for the data cleaning methodology to be able to characterize the behavior of data stream distributions and adaptively update a model to include new information and remove old information. However, the complicated data changing property invalidates traditional data cleaning methods, which rely on the assumption of a stationary data distribution, and drives the need for more dynamic and adaptive online data cleaning methods. To overcome these shortcomings, this paper presents a change semantics constrained online filtering method for real-time observational data. Based on the principle that the filter parameter should vary in accordance to the data change patterns, this paper embeds semantic description, which quantitatively depicts the change patterns in the data distribution to self-adapt the filter parameter automatically. Real-time observational water level data streams of different precipitation scenarios are selected for testing. Experimental results prove that by means of this method, more accurate and reliable water level information can be available, which is prior to scientific and prompt flood assessment and decision-making.

  • PDF Download Icon
  • Research Article
  • 10.5194/isprsarchives-xli-b2-177-2016
CHANGE SEMANTIC CONSTRAINED ONLINE DATA CLEANING METHOD FOR REAL-TIME OBSERVATIONAL DATA STREAM
  • Jun 7, 2016
  • ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • Yulin Ding + 2 more

Recent breakthroughs in sensor networks have made it possible to collect and assemble increasing amounts of real-time observational data by observing dynamic phenomena at previously impossible time and space scales. Real-time observational data streams present potentially profound opportunities for real-time applications in disaster mitigation and emergency response, by providing accurate and timeliness estimates of environment’s status. However, the data are always subject to inevitable anomalies (including errors and anomalous changes/events) caused by various effects produced by the environment they are monitoring. The “big but dirty” real-time observational data streams can rarely achieve their full potential in the following real-time models or applications due to the low data quality. Therefore, timely and meaningful online data cleaning is a necessary pre-requisite step to ensure the quality, reliability, and timeliness of the real-time observational data. <br><br> In general, a straightforward streaming data cleaning approach, is to define various types of models/classifiers representing normal behavior of sensor data streams and then declare any deviation from this model as normal or erroneous data. The effectiveness of these models is affected by dynamic changes of deployed environments. Due to the changing nature of the complicated process being observed, real-time observational data is characterized by diversity and dynamic, showing a typical Big (Geo) Data characters. Dynamics and diversity is not only reflected in the data values, but also reflected in the complicated changing patterns of the data distributions. This means the pattern of the real-time observational data distribution is not <i>stationary or static</i> but <i>changing and dynamic</i>. After the data pattern changed, it is necessary to adapt the model over time to cope with the changing patterns of real-time data streams. Otherwise, the model will not fit the following observational data streams, which may led to large estimation error. In order to achieve the best generalization error, it is an important challenge for the data cleaning methodology to be able to characterize the behavior of data stream distributions and adaptively update a model to include new information and remove old information. However, the complicated data changing property invalidates traditional data cleaning methods, which rely on the assumption of a stationary data distribution, and drives the need for more dynamic and adaptive online data cleaning methods. <br><br> To overcome these shortcomings, this paper presents a change semantics constrained online filtering method for real-time observational data. Based on the principle that the filter parameter should vary in accordance to the data change patterns, this paper embeds semantic description, which quantitatively depicts the change patterns in the data distribution to self-adapt the filter parameter automatically. Real-time observational water level data streams of different precipitation scenarios are selected for testing. Experimental results prove that by means of this method, more accurate and reliable water level information can be available, which is prior to scientific and prompt flood assessment and decision-making.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/icdmw51313.2020.00065
MIR_MAD: An Efficient and On-line Approach for Anomaly Detection in Dynamic Data Stream
  • Nov 1, 2020
  • Chang How Tan + 2 more

Anomaly detection in a dynamic data stream is a challenging task. The endless bound and high arriving rate of data prohibits anomaly detection models to store all observations in memory for processing. In addition, the dynamically moving properties of the data stream exhibit concept drift. While recent studies focus on feature extraction for anomaly detection, majority of them assume data stream are static ignoring the possibility of concept drift occurring. Anomaly detection models must operate efficiently in order to deal with high volume and velocity data, that is to have low complexity and to learn incrementally from each arriving observation. Incremental learning allows the model to adapt to concept drift. In cases where drifting rate is higher than adaptation rate, the capability to detect concept drift and retraining a new model is much preferable to minimize the performance losses. In this paper, we propose MIR_MAD, an approach based on multiple incremental robust Mahalanobis estimators that is efficient, learns incrementally and has the capability to detect concept drift. MIR_MAD is fast, can be initialized with small amount of data, and is able to estimate the drift location on the data stream. Our empirical results show that MIR_MAD achieves state-of-the-art performance and is significantly faster. We also performed a case study to show that detecting concept drift is critical to minimize the reduction in model's performance.

  • Conference Article
  • 10.1109/iciea.2017.8283162
A gradient-based algorithm for trend and outlier prediction in dynamic data streams
  • Jun 1, 2017
  • Dawei Sun + 2 more

Trend and outlier are frequently used to derive early warning predictive signal to decision maker in order to achieve ultimate quality decision outcome in domain specific (e.g. commercial, scientific, biomedical and engineering, just to name a few) applications. We develop a gradient-based algorithm using sample entropy gradient(SEG) for trend and outlier prediction in high frequency time series data streams. L2 similarity measure (Euclidean distance between two linearized gradient curves is then computed and used to quantify the degree of similarity and compared with a threshold L2 value to judge the extend of dissimilarity that would be classified as outlier. SEG algorithm which circumvents the need to pre-specify tolerance parameter in those cross sample entropy (CSE)-based algorithms that invariably involve real domain expert to set the tolerance threshold. We conduct real data experiments on SEG algorithm to two application areas: dynamic wind speed data stream; and financial time series data. Our experiments demonstrated that SEG algorithm can be feasibly used in online implementation to derive predictive early warning signals to domain-specific decision maker.

  • Conference Article
  • Cite Count Icon 40
  • 10.1145/2245276.2245432
Real-time visual analytics for event data streams
  • Mar 26, 2012
  • Fabian Fischer + 2 more

Real-time analysis of data streams has become an important factor for success in many domains such as server and system administration, news analysis and finance to name just a few. Introducing real-time visual analytics into such application areas promises a lot of benefits since the rate of new incoming information often exceeds human perceptual limits when displayed linearly in raw formats such as textual lines and automatic aggregation often hides important details. This paper presents a system to tackle some of the visualization challenges when analyzing such dynamic event data streams. In particular, we introduce the Event Visualizer, which is a loosely coupled modular system for collecting, processing, analyzing and visualizing dynamic real-time event data streams. Due to the variety of different analysis tasks the system provides an extensible framework with several interactive linked visualizations to focus on different aspects of the event data stream. Data streams with logging data from a computer network are used as a case study to demonstrate the advantages of visual exploration.

  • Conference Article
  • 10.1109/wicom.2008.1330
Query Quality of Service Management Based on Data Relationship over Real-Time Data Stream Systems
  • Oct 1, 2008
  • Jun Xiang + 3 more

Many real-time applications and data services in distributed environments need to operate on continuous unbounded data streams with the development of large wired and wireless sensor network. Conventional one-time queries cannot be suitable to provide continues results as data and update stream into the system, and continuous queries represent a new paradigm for interacting with dynamically changing data. At the same time, many real-time applications have inherent timing constraints in their tasks, so providing data objects deadline guarantees for continuous queries over dynamic data streams is a challenging problem. A novel performance metric and quality of service management scheme continuous queries over dynamic data streams are proposed to guarantee system performance based on relationship of all updated data items. Experimental simulations demonstrate that the presented algorithm can guarantee the performance and decrease miss ratios of queries for dynamic workload fluctuations especially transient overloads.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/bracis.2019.00043
Dynamic Correlation-Based Feature Selection for Feature Drifts in Data Streams
  • Oct 1, 2019
  • Jorge C Chamby-Diaz + 2 more

Learning from data streams requires efficient algorithms capable of constructing a model according to the arrival of new instances. These data stream learners need a quick and real-time response, but mainly, they must be tailored to adapt to possible changes in the data distribution, a condition known as concept drift. However, recent works have shown that changes of relevant feature subsets over time, called feature drift, may have significant impact in the learning process despite being commonly disregarded until now in the underlying concept of a data stream. To improve the performance of feature drifting data stream classification, in this work we present an algorithm called DCFS (Dynamic Correlation-based Feature Selection) that determines which features are the most important in each moment of a data stream. By implementing an adaptive strategy based on a drift monitor, in this algorithm, a correlation-based feature selection method is used to update the relevant feature subsets for data streams dynamically. The experimental results demonstrate that implementing our feature selection algorithm inside an incremental and online classifier leads the model to perform well on data stream datasets with feature drift, surpassing in some cases state-of-the-art data streams classifiers.

  • Research Article
  • Cite Count Icon 6
  • 10.2118/0616-0072-jpt
Stuck-Pipe Prediction With Automated Real-Time Modeling and Data Analysis
  • Jun 1, 2016
  • Journal of Petroleum Technology
  • Chris Carpenter

This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 178888, “Stuck-Pipe Prediction Using Automated Real-Time Modeling and Data Analysis,” by Kent Salminen, SPE, Curtis Cheatham, SPE, Mark Smith, SPE, and Khaydar Valiulin, SPE, Weatherford, prepared for the 2016 SPE/IADC Drilling Conference and Exhibition, Fort Worth, Texas, USA, 1–3 March. The paper has not been peer reviewed. A real-time method is presented to predict impending stuck pipe with sufficient warning to prevent it. The new method uses automated analysis of real-time modeling coupled with real-time- data analysis. It can be applied to all well types for any well operation. The new method combines two types of analysis: (1) deviation of real-time data from real-time model predictions by use of hydraulics and torque-and-drag software, and (2) trend analysis of real-time data. Data Types and Frequency The approach taken was to first study real-time data sets from wells in which stuck-pipe incidents occurred and determine the root cause of each. The majority of these wells were drilled between 2009 and 2013 in the Eagle Ford shale. Specific patterns in the data were then identified as potential leading indicators of stuck pipe. One of the first issues identified was that the type, frequency, and quality of data available are not consistent from well to well. To ensure that the alerting system was configured to work on different well types with a high degree of functionality, the decision was made that it would be designed to monitor a well and provide alerts even if only “critical” data streams are available. These critical streams would meet the following criteria: Available on most rigs capable of capturing real-time data Be useful in determining information about the drillstring, drilling fluid, and the wellbore itself Fit within a logical progression that can indicate impending stuck pipe

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/bigdatacongress.2017.24
Bleach: A Distributed Stream Data Cleaning System
  • Jun 1, 2017
  • Yongchao Tian + 2 more

Existing scalable data cleaning approaches have focused on batch data cleaning. However, batch data cleaning is not suitable for streaming big data systems, in which dynamic data is generated continuously. Despite the increasing popularity of stream-processing systems, few stream data cleaning techniques have been proposed so far. In this paper, we bridge this gap by addressing the problem of rule-based stream data cleaning, which sets stringent requirements on latency, rule dynamics and ability to cope with the continuous nature of data streams. We design a system, called Bleach, which achieves real-time violation detection and data repair on a dirty data stream. Bleach relies on efficient, compact and distributed data structures to maintain the necessary state to repair data. Additionally, it supports rule dynamics and uses a cumulative sliding window operation to improve cleaning accuracy. We evaluate a prototype of Bleach using both synthetic and real data streams and experimentally validate its high throughput, low latency and high cleaning accuracy, which are preserved even with rule dynamics.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon