Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link

Related Topics

  • Types Of Queries
  • Types Of Queries
  • Spatial Queries
  • Spatial Queries
  • Data Query
  • Data Query
  • Query Processing
  • Query Processing
  • Approximate Query
  • Approximate Query
  • Query Execution
  • Query Execution

Articles published on Analytical Queries

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
308 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1038/s41598-026-35284-0
Scalable privacy-preserving data analytics for IoMT via FHE and zk-SNARK-enabled edge aggregation.
  • Jan 13, 2026
  • Scientific reports
  • Soufiane Ben Othman + 1 more

The Internet of Medical Things (IoMT) enables real-time health monitoring and intelligent clinical decision-making by continuously collecting and processing sensitive physiological data from wearable, implantable, and edge-connected devices. However, this data aggregation paradigm introduces critical privacy and security challenges, including data leakage, aggregator misbehavior, and adversarial attacks, while existing frameworks often fail to simultaneously ensure confidentiality, verifiability, and efficiency. To address these limitations, we propose MedGuard, a novel end-to-end secure data aggregation framework for IoMT that synergistically integrates Fully Homomorphic Encryption (FHE) based on the CKKS scheme and Groth16 zero-knowledge Succinct Non-Interactive Arguments of Knowledge (zk-SNARKs). MedGuard enables healthcare providers to perform complex analytical queries, such as statistical analysis, anomaly detection, and trend forecasting, directly on encrypted data without decryption, ensuring compliance with privacy regulations. By allowing edge nodes to generate cryptographic proofs of correct computation and enabling cloud-based verification, MedGuard eliminates reliance on trusted intermediaries and mitigates insider threats. Our comprehensive evaluation, conducted in a high-fidelity OMNeT++ 6.0.1 simulation environment with 1,000 IoMT devices, 100 edge nodes, and an Amazon EC2 c5.4xlarge cloud server, uses a hybrid dataset combining real-world and GMM-augmented synthetic data. Results show that MedGuard achieves an end-to-end latency of 64.8 ms, a 13.3% improvement over state-of-the-art baselines, communication efficiency of 1.465 GB/s, per-query energy consumption of 1.489 mJ, and sustained throughputs of 1,200 packets/s, 120 aggregates/s, and 1,200 queries/s. These performance gains, combined with a robust [Formula: see text] security level, demonstrate that MedGuard delivers scalable, verifiable, and privacy-preserving analytics for next-generation smart healthcare systems.

  • Research Article
  • Cite Count Icon 1
  • 10.1097/mlr.0000000000002235
Lessons Learned From Using PCORnet® to Support the Pathways to Cardiovascular Disease Prevention and Impact of Specialty Referral Among People With HIV From Underrepresented Racial and Ethnic Groups in the Southern United States (PATHWAYS Study)
  • Jan 8, 2026
  • Medical Care
  • Keith Marsolo + 9 more

Objective:The PATHWAYS Study utilized data from the PCORnet® Common Data Model (CDM) at 4 sites participating in the STAR Clinical Research Network to assess the frequency of cardiology encounters for under-represented racial and ethnic minority group people living with Human Immunodeficiency Virus and to evaluate the determinants associated with specialty encounters from 2014 to 2020. This study dealt with several factors that other projects leveraging PCORnet might face. We describe benefits of working with the network, challenges, and recommendations for future study teams.Methods:PATHWAYS used a mix of queries through the study, including study-specific data quality and analytic queries. A “sidecar” table was created for the PCORnet® Common Data Model to support the inclusion of referral data. Linkage to the National Death Index was incorporated into the study to allow for more comprehensive information on participant deaths.Results:Data quality assessments identified several issues over the course of the study that needed to be addressed by the data teams at each site. The referral data proved not to be robust enough to support the proposed analyses, so an alternative strategy was required that leveraged encounter information. The National Data Index included information on participant deaths that were not part of each site’s PCORnet® CDM.Conclusion:Incorporating study-specific data characterization into the overall analysis plan is important. When working with new data, or variables not commonly used within studies, teams should include time and effort for site resources to investigate their local clinical workflows and potential mappings to the PCORnet® CDM.

  • Research Article
  • 10.24042/ijecs.v5i2.28108
Hybrid Database Architecture for Retail Big Data Analytics: PostgreSQL vs MongoDB Performance Analysis
  • Dec 20, 2025
  • International Journal of Electronics and Communications Systems
  • Tubagus Firman Iskandar Noor + 4 more

The rapid growth of retail big data has intensified the challenge of selecting a database architecture that can balance analytical performance and resource efficiency, particularly in data-intensive retail environments. This study aims to conduct a comparative performance analysis between PostgreSQL 16 and MongoDB 8.0 in the context of implementing big data analytics in the retail industry. An experimental quantitative approach is used, utilizing a large-scale, real-world retail sales and inventory dataset to benchmark PostgreSQL 16 and MongoDB 8.0 across a range of representative analytical workloads. Results show MongoDB is 28-31% faster in query processing, but PostgreSQL is 13-17% more efficient in resource usage (CPU, RAM, Storage I/O) and requires 6x less storage. These results indicate that MongoDB consistently achieves faster execution times for read-intensive analytical queries, especially in large-scale aggregation operations. Conversely, PostgreSQL exhibits superior storage efficiency and lower computational resource consumption due to its normalized relational architecture. These findings reveal a fundamental trade-off between analytical speed and infrastructure efficiency in retail big data systems. This research contributes to the development of hybrid data architecture strategies for big data analytics in the retail industry, supporting performance optimization and informed decision-making in data-rich environments

  • Research Article
  • 10.1145/3774755
The BigFAIR Architecture: Enabling Big Data Analytics in FAIR-compliant Repositories
  • Dec 4, 2025
  • Journal of Data and Information Quality
  • João Pedro De Carvalho Castro + 3 more

Collaboration is essential for scientific research. This is the foundation behind Open Science and the FAIR Principles, aimed at standardizing the development of scientific data sharing repositories. However, developing FAIR-compliant repositories can be a challenge, mostly due to managing a huge volume and variety of research data and metadata generated at a high velocity. We address these challenges by proposing BigFAIR, a novel FAIR-compliant architecture capable of managing this type of information in a massive scale. BigFAIR leverages existing local repositories, using separate infrastructures to handle scientific data and metadata. With this separation, it can reduce development and maintenance efforts, support data ownership, and increase flexibility. We define pipelines to demonstrate how BigFAIR answers queries in different contexts, introduce guidelines to support its instantiation, and propose a generic metadata warehouse model to support analytical query processing. We also demonstrate the applicability of BigFAIR through a case study in the context of two real-world datasets, detailing different types of queries and highlighting their importance to big data analytics.

  • Research Article
  • 10.15587/1729-4061.2025.342297
Devising a code-free method for detecting signs of informational-psychological influences in messages
  • Oct 31, 2025
  • Eastern-European Journal of Enterprise Technologies
  • Dmytro Lande + 3 more

This study investigates text messages that potentially contain signs of informational-psychological operations (IPSOs). The task addressed aims to solve the problem of detecting signs of IPSOs in the media space. An innovative method for detecting such signs has been proposed, based on the construction and analysis of semantic networks and implemented without the use of program code by using large language models (LLMs). This makes it possible to generate formalized analytical queries to LLMs in the form of a code-free system based on the composition of structured prompts. The method's unique feature is the parallel analysis of data from two sources of knowledge: internal and external. The internal one contains generalized IPSO patterns formed on the basis of a wide corpus of data. The external one includes verified examples of fake messages from social networks, news outlets, and archives of fact-checking organizations. To improve the accuracy of analysis, semantic normalization of concepts is used, which employs embedded vectors to unify terminology, as well as comparison of causal paths in semantic networks to identify connections. The assessment of the probability of a message belonging to IPSO is formed by aggregating the results using a weighted average, which makes it possible to take into account semantic and structural similarity. An example of applying the method to the analysis of a disinformation message is given, demonstrating the ability to detect key signs of psychological influence: manipulative narratives, emotional loading, and cause-and-effect relationships. The proposed method is flexible, reproducible, and accessible to researchers without programming skills, which makes it a valuable tool for monitoring information threats and analyzing disinformation in the context of information confrontations

  • Research Article
  • 10.62056/akgy11fgx
HiSE: Hierarchical (Threshold) Symmetric-key Encryption
  • Oct 6, 2025
  • IACR Communications in Cryptology
  • Pousali Dey + 3 more

Threshold symmetric encryption (TSE) [DiSE, CCS 2018], provides a practical decentralized solution for symmetric encryption by distributing the secret-key at all times, thus avoiding a single point of attack or failure. TSE was further enhanced [ATSE, CCS 2021] by an amortization which enables a “more privileged” client to encrypt bulk records by interacting only once with the key servers, while decryption must be performed individually for each record, potentially by a “less privileged” client. However, a typical enterprise generates data once and queries it several times for various data analysis; i.e., enterprise workloads are often decryption heavy! ATSE does not meet the bar for this setting because of linear interaction / computation (in the number of records to be decrypted) – our experiments show that ATSE provides a sub-par throughput of a few hundred records/sec. Our work starts with an observation that a large and useful class of analytics queries access some time-windowed sequence of database records (e.g. log entries or user transactions). Can we offer faster decryption for such access patterns, without compromising the benefits of prior schemes? To that end, we build a new TSE scheme that allows for both encryption and decryption with flexible granularity, in that a client's interactions with the key servers is at most logarithmic in the number of records. Our idea is to employ a binary-tree structure, where one interaction is needed to decrypt all ciphertexts in a sub-tree, and thus only log-many for any arbitrary sub-sequence. Our scheme incorporates ideas from binary-tree encryption by Canetti et al. [Eurocrypt 2003] and carefully combines that with Merkle-tree commitments. We show that our scheme satisfies all essential TSE properties, such as correctness, privacy and authenticity for our notion, formalized as hierarchical threshold symmetric-key encryption (HiSE). Our analysis relies on a well-known XDH assumption and a new assumption, that we call ℓ -masked BDDH, over asymmetric bilinear pairing in the programmable random oracle model. We also show that our new assumption holds in the generic group model. Our extensive implementation shows 10-65 × improvement in latency and throughput over ATSE. HiSE can decrypt over 6K records/sec on server-grade hardware, but the logarithmic overhead in encryption (not decryption) only lets us encrypt up to 3K records/sec ( ~ 4.5 × slowdown) and incurs roughly 500 bytes of ciphertext expansion per record – while reducing this penalty is an important future work, we believe HiSE offers an acceptable practical trade-off in practice.

  • Research Article
  • 10.23939/ictee2025.02.083
МЕТОД КОМБІНОВАНОГО ПАРТИЦІЮВАННЯ ВЕЛИКИХ ДАНИХ В ІНФОРМАЦІЙНИХ СИСТЕМАХ
  • Oct 1, 2025
  • Information and communication technologies, electronic engineering
  • V Solohub + 1 more

In today’s environment of rapidly growing data volumes, information systems must not only provide storage and access to massive datasets but also maintain stable performance when handling diverse query types. A critical challenge lies in balancing the efficiency of analytical (OLAP) operations with the responsiveness of transactional (OLTP) processes. Traditional approaches to data organization in relational DBMS often lose effectiveness in large-scale environments, resulting in longer processing times, reduced flexibility, and greater complexity in database management. This underscores the importance of developing new partitioning optimization methods capable of ensuring both high performance and scalability in hybrid information systems. This article investigates existing data partitioning methods in information systems designed to handle large volumes of structured information while simultaneously serving OLAP and OLTP workloads. The mechanisms of table partitioning in modern DBMSs are analyzed, and the strengths and limitations of each approach are identified with respect to performance, scalability, and ease of data management. A combined partitioning method (range + list) is proposed, tailored for hybrid information systems that concurrently process analytical and transactional workloads. Unlike traditional approaches, the proposed method not only applies combined partitioning for analytical tasks but also provides a comprehensive evaluation of its impact on query performance and transaction processing speed. The results demonstrate that the developed method achieves a balance between OLAP and OLTP performance, enhances scalability and flexibility of information systems, and can be considered a universal approach to managing large-scale data. To conduct the study, a unified simulation model of data processing was built using a star schema with a fact sales table, supporting both business analytics queries and transactional CRUD operations. Experimental findings confirm that the combined partitioning approach reduces analytical query execution time by 30–40% without significant degradation of CRUD performance, making it an effective tool for improving the performance of large-scale information systems.

  • Research Article
  • 10.52783/jisem.v10i60s.13044
Unified Framework for Real-Time Big Data Analytics with AI Integration
  • Sep 30, 2025
  • Journal of Information Systems Engineering and Management
  • Gopinath Ramisetty

The intersection of distributed computing technologies with artificial intelligence competencies has revolutionized enterprise analytical environments in ways never before possible, opening doors to unprecedented capabilities for real-time data processing and intelligent decision-making across various industrial segments. Contemporary unified analytical environments integrate in-memory processing engines, serverless data warehouse models, graph-based workflow orchestration systems, and advanced machine learning algorithms to provide end-to-end solutions with the ability to support gigantic datasets and yet respond in sub-second timescales. The unification makes it possible for organizations to handle streaming data workloads with high throughput levels, along with running complex analytical queries on petabyte-scale data repositories simultaneously. Sophisticated distributed computing frameworks harness complex memory management frameworks and smart caching hierarchies to realize maximum performance under different operating conditions, with serverless environments offering provisionless scaling of analytical workloads without any manual infrastructure provisioning. Graph-stateless workflow systems utilize adaptive scheduling algorithms and thorough fault tolerance mechanisms to facilitate the reliable execution of the processing pipeline in distributed computing environments. AI frameworks incorporate ensemble learning techniques and decision-making systems with automation to offer predictive analytics features and intelligent workflow orchestration. Implementation tactics consist of data-driven parameter optimization methods and privacy-improving analytics mechanisms that ensure the best performance with regulatory compliance and data safety requirements upheld throughout the complete processing life cycle.

  • Research Article
  • 10.15574/sp.2025.9(149).5462
Optimization of scientific research of theoretical and methodological foundations of the formation of a culture of personal health
  • Sep 28, 2025
  • Modern pediatrics. Ukraine
  • A.S Sverstyuk + 5 more

Aim: to optimize the scientific search for theoretical and methodological foundations of personal health culture formation by conducting a comprehensive bibliometric analysis in the Scopus database. Materials and methods. To achieve this goal, the bibliometric analysis of publications in the Scopus scientometric database was used. An analytical query was formed with keywords reflecting the concepts of ‘culture of personal health’, ‘theoretical and methodological foundations’, ‘health care’, etc. Quantitative indicators (dynamics of publications, distribution by industry, geography, and authors) and bibliometric metrics (CiteScore, SIR, SNIP) were selected and analyzed. Results. 828 scientific publications were identified, 538 of which were published in the last decade, which indicates the actualisation of the problem of health culture formation in the modern scientific space. Most researches are related to the medical field, social sciences, and engineering, which emphasises the interdisciplinary nature of the topic. The leading countries in terms of the number of publications are the United States, the United Kingdom, Australia, Canada, and China. The largest number of sources falls on recent years, which reflects the growing interest in health promotion, in particular, in the context of global challenges. Conclusions. The results of the study confirm the growing attention to the formation of a culture of personal health as an important component of preventive medicine, education, and social initiatives. The identified trend towards the integration of various fields of knowledge opens up additional prospects for the development of comprehensive methods and strategies in the field of health promotion. In the future, it is advisable to carry out an in-depth analysis of the qualitative aspects of publications, as well as to develop scientifically sound recommendations for the implementation of appropriate approaches in practice. No conflict of interest was declared by the authors.

  • Research Article
  • 10.3233/shti251122
Allowing Vector Attributes in Relational Database for Japanese Insurance Claims.
  • Aug 7, 2025
  • Studies in health technology and informatics
  • Jumpei Sato + 7 more

Japanese public healthcare insurance claims are a promising population-scale dataset to offer tangible evidence for healthcare policy making. The dataset has employed a nested-tuple form, which no known techniques can efficiently transform to the standard relational data model. This paper explores the technical benefit of allowing vector attributes in the relational database for managing public healthcare insurance claims. The experiment with the commercial implementation and Japanese population-scale dataset confirms that the proposed approach performs much (up to 3.4 times) faster for analytical queries having compositive predicates, while it only needs 10% penalty time for data preprocessing and database import.

  • Research Article
  • 10.14778/3750601.3750618
Cloudy with a Chance of JSON
  • Aug 1, 2025
  • Proceedings of the VLDB Endowment
  • Murtadha Al Hubail + 26 more

Couchbase Capella is a scalable document-oriented database service in the cloud. Its existing Capella Operational service is based on a shared-nothing architecture and supports high volumes of low-latency queries and updates for JSON documents. Its new Capella Columnar cloud service complements the Operational service. The Capella Columnar service supports complex analytical queries (e.g., ad hoc joins and aggregations) over large collections of JSON documents that can originate from a variety of Couchbase and non-Couchbase data sources and formats and can either be stored and managed by the Capella Columnar service or externally stored and accessed on demand at query time. This paper describes the new Capella Columnar service, looking both over and under the hood.

  • Research Article
  • 10.1051/sands/2025010
HTAP Benchmark in Financial Scenarios
  • Jul 31, 2025
  • Security and Safety
  • Weilong Wu + 6 more

With the development of HTAP (Hybrid Transactional/Analytical Processing), the use of HTAP databases has become increasingly common. HTAP databases integrate OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) into a single system, replacing traditional ETL methods and thereby reducing the cost of system usage and maintenance. However, financial scenarios impose higher demands on data and service solutions, and the data and business processes in financial contexts are much more complex. This complexity poses a series of challenges for the application of HTAP databases in financial scenarios.To address these challenges, this paper proposes a benchmark tailored for financial scenarios. This benchmark evaluates the key characteristics of HTAP systems in financial contexts: throughput frontier and freshness. It simulates data distributions specific to financial scenarios and generates data consistent with financial use cases. Based on financial data patterns, it defines a variety of analytical queries, including those for marketing, risk control, single-table aggregation, and multi-table joins. It also simulates common banking transactions such as account operations, deposits, and withdrawals, and supports risk detection rollback to reflect the real-world behavior of banking transactions. The testing program allows the generation of transactional and analytical clients in varying proportions based on parameters, enabling concurrent database testing.This paper presents adaptations of the benchmark to a community edition of a specific database for testing purposes. Considering the growing adoption of cloud databases, where resource allocation has become more flexible, this paper also investigates the changes in database performance and workload isolation under different resource allocation scenarios for transactional and analytical clients. The study finds that appropriate resource allocation can significantly enhance workload isolation. Additionally, the benchmark allows database users to assign intent weights to transactional and analytical workloads, enabling a unified performance evaluation metric for the overall performance of the database. A comparison of two different resource allocation strategies shows that appropriate resource allocation improved the comprehensive performance score of the database from 55.0 to 60.0.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/cancers17142376
Decoding the JAK-STAT Axis in Colorectal Cancer with AI-HOPE-JAK-STAT: A Conversational Artificial Intelligence Approach to Clinical-Genomic Integration.
  • Jul 17, 2025
  • Cancers
  • Ei-Wen Yang + 2 more

Background/Objectives: The Janus kinase-signal transducer and activator of transcription (JAK-STAT) signaling pathway is a critical mediator of immune regulation, inflammation, and cancer progression. Although implicated in colorectal cancer (CRC) pathogenesis, its molecular heterogeneity and clinical significance remain insufficiently characterized-particularly within early-onset CRC (EOCRC) and across diverse treatment and demographic contexts. We present AI-HOPE-JAK-STAT, a novel conversational artificial intelligence platform built to enable the real-time, natural language-driven exploration of JAK/STAT pathway alterations in CRC. The platform integrates clinical, genomic, and treatment data to support dynamic, hypothesis-generating analyses for precision oncology. Methods: AI-HOPE-JAK-STAT combines large language models (LLMs), a natural language-to-code engine, and harmonized public CRC datasets from cBioPortal. Users define analytical queries in plain English, which are translated into executable code for cohort selection, survival analysis, odds ratio testing, and mutation profiling. To validate the platform, we replicated known associations involving JAK1, JAK3, and STAT3 mutations. Additional exploratory analyses examined age, treatment exposure, tumor stage, and anatomical site. Results: The platform recapitulated established trends, including improved survival among EOCRC patients with JAK/STAT pathway alterations. In FOLFOX-treated CRC cohorts, JAK/STAT-altered tumors were associated with significantly enhanced overall survival (p < 0.0001). Stratification by age revealed survival advantages in younger (age < 50) patients with JAK/STAT mutations (p = 0.0379). STAT5B mutations were enriched in colon adenocarcinoma and correlated with significantly more favorable trends (p = 0.0000). Conversely, JAK1 mutations in microsatellite-stable tumors did not affect survival, emphasizing the value of molecular context. Finally, JAK3-mutated tumors diagnosed at Stage I-III showed superior survival compared to Stage IV cases (p = 0.00001), reinforcing stage as a dominant clinical determinant. Conclusions: AI-HOPE-JAK-STAT establishes a new standard for pathway-level interrogation in CRC by empowering users to generate and test clinically meaningful hypotheses without coding expertise. This system enhances access to precision oncology analyses and supports the scalable, real-time discovery of survival trends, mutational associations, and treatment-response patterns across stratified patient cohorts.

  • Research Article
  • 10.37547/tajiir/volume07issue07-04
A Five-Layer Framework for Cost Optimization in Snowflake: Applied to P&amp;C Insurance Workloads
  • Jul 7, 2025
  • The American Journal of Interdisciplinary Innovations and Research
  • Shreekant Malviya

The use of Snowflake as a cloud-native data warehouse has dramatically changed the management of analytics workload for Property and Casualty (P&amp;C) insurers, while simultaneously presenting serious cost governance challenges. The heavy volume of searches, big data retention, and decentralized business intelligence operations are industry-standard procedures that tend to lead to uncontrolled credit usage and overspending on storage. This research introduces a modular five-layer optimization framework focused on property and casualty insurance data, combining workload segmentation, and compute sizing with Snowflake's account usage metadata. The framework is tested and validated using Kaggle’s Insurance Agency Data, representing real-world P&amp;C operations across 17 states. Benchmark queries simulating core insurance workloads were designed using modified TPC-H logic, a standard decision support benchmark that enables realistic performance evaluation under analytical query conditions, achieving up to 82% cost reduction and a 64% reduction in execution time without compromising the results. These results highlight the efficiency of the framework to facilitate proactive and elastic cost control. Future studies can investigate AI-driven query forecasting, scalable warehouse dynamics, and real-time anomaly detection to further advance cloud-native data ecosystem governance.

  • Research Article
  • 10.31861/sisiot2025.1.01004
Information Technology for Data Compression and Transformation by Means of Amazon EMR
  • Jun 30, 2025
  • Security of Infocommunication Systems and Internet of Things
  • Yevhen Kyrychenko + 1 more

As data processing volumes grow in various fields, the demand for applications capable of efficiently managing, processing, and transforming large amounts of information is also increasing. Modern approaches to storing and processing large amounts of data are primarily based on universal text formats, such as CSV and JSON. Their prevalence can be explained by their compatibility with a wide range of software tools and ease of integration. These formats are inefficient when dealing with massive volumes of data, particularly when scaling systems or executing analytical queries. The lack of built-in compression, row structure, and metadata leads to significant time and computing resources, which creates a conflict between the requirements for speed and cost-effectiveness of processing and the technical capabilities of traditional text formats. Columnar storage formats, such as Parquet and ORC, offer an alternative. They employ a compact structure tailored for quick analytical queries in distributed computing settings. Effective coding, indexing, and built-in compression techniques considerably lower data sizes and speed up processing. This research aims to develop and experimentally verify the technology of automated data conversion from inefficient text formats to Parquet and ORC formats using Apache Airflow and Amazon EMR. The proposed architecture involves creating a cloud pipeline that performs data conversion and subsequent storage in formats focused on analytical workloads. The system uses Apache Airflow for process orchestration, Amazon EMR and Apache Spark for distributed processing, AWS S3 as scalable storage, AWS Glue for metadata management, and Amazon Athena for SQL access to transformed data. This approach solves performance problems by offering a flexible, reliable, cost-effective solution that adapts to different work scenarios and workloads.

  • Research Article
  • 10.46460/ijiea.1563777
Optimizing Big Data Management on Microsoft SQL Server: Enhancing Performance through Normalization and Advanced Analytical Techniques
  • Jun 30, 2025
  • International Journal of Innovative Engineering Applications
  • Süleyman Burak Altinişik + 1 more

This study investigates Big Data management challenges and solutions in cable manufacturing using Microsoft SQL Server (MSSQL), focusing on performance optimization, normalization, and advanced analytical techniques. Addressing the 4Vs of Big Data, our case study collects data from 45 TAGs at one-minute intervals, generating approximately 56 million daily records. We employ OPC technology for data acquisition, strategic normalization processes, and advanced MSSQL optimization techniques. Normalization significantly reduced data redundancy, decreasing the dataset from 56 million to 283 rows per day and improving query execution times from over 40 minutes to less than 0.1 seconds for complex analytical queries. We also propose a database-independent software development approach to balance cost and performance. This research contributes practical insights into performance optimization, scalability, and cost-effective solutions for organizations managing large-scale data processing challenges in industrial settings, offering a blueprint for efficient Big Data management that balances technical performance with economic considerations.

  • Research Article
  • 10.1145/3725259
Accelerate Distributed Joins with Predicate Transfer
  • Jun 17, 2025
  • Proceedings of the ACM on Management of Data
  • Yifei Yang + 1 more

Join is one of the most critical operators in query processing. One effective way to optimize multi-join performance is to pre-filter rows that do not contribute to the query output. Techniques that reflect this principle include predicate pushdown, semi-join, Yannakakis algorithm, Bloom join, predicate transfer, etc. Among these, predicate transfer is the state-of-the-art pre-filtering technique that removes most non-contributing rows through a series of Bloom filters thereby delivering significant performance improvement. However, the existing predicate transfer technique has several limitations. First, the current algorithm works only on a single-threaded system while real analytics databases for large workloads are typically distributed across multiple nodes. Second, some predicate transfer steps may not filter out any rows in the destination table thus introduces performance overhead with no speedup. This issue is exacerbated in a distributed environment, where unnecessary predicate transfers lead to extra network latency and traffic. In this paper, we aim to address both limitations. First, we explore the design space of distributed predicate transfer and propose cost-based adaptive execution to maximize the performance for each individual transfer step. Second, we develop a pruning algorithm to effectively remove unnecessary transfers that do not have positive contribution to performance. We implement both techniques and evaluate on a distributed analytics query engine. Results on standard OLAP benchmarks including TPC-H and DSB with a scale factor up to 400 show that distributed predicate transfer can improve the query performance by over 3×, and reduce the amount of data exchange by over 2.7×.

  • Research Article
  • 10.1145/3725329
Nested Parquet Is Flat, Why Not Use It? How To Scan Nested Data With On-the-Fly Key Generation and Joins
  • Jun 17, 2025
  • Proceedings of the ACM on Management of Data
  • Alice Rey + 2 more

Parquet is the most commonly used file format to store data in a columnar, binary structure. The format also supports storing nested data in this flattened columnar layout. However, many query engines either do not support nested data or process it with substantially worse performance than relational data. In this work, we close this gap and present a new way to leverage relational query engines for nested data that is stored in this flat columnar file format. Specifically, we demonstrate how to process nested Parquet files much more efficiently. Our approach does not store a copy of the data in an internal format but reads directly from the Parquet file. During query computation, the required flat columns are scanned independently and the nesting is reconstructed using joins with on-the-fly generated join keys. Our approach can be easily integrated into existing query engines to support querying nested Parquet files. Furthermore, we achieve orders of magnitude faster analytical query performance than existing solutions, which makes it a valuable addition.

  • Research Article
  • 10.9790/0661-2703050105
Forensic Recovery of Chat Artifacts from the Heesay App: A Manual Analysis Approach
  • Jun 1, 2025
  • IOSR Journal of Computer Engineering
  • M B Pathak + 5 more

The growing popularity of specialized dating and social networking sites, especially those serving the LGBTQ community, has presented significant difficulties for digital forensic investigators in recent years. The Heesay app is one such platform that was used in a criminal case and may have included important communication evidence. An organized forensic investigation using an Android handset running the Heesay app is presented in this case study. A thorough manual inspection revealed that the app's primary SQLite database file (blued2015.db) was present despite the lack of tool support. Several tables, such as ChattingModel, SessionModel, and UserAccountsModel, were contained in the file and were susceptible to scripting and structured queries. The results highlight the vital significance of hybrid forensic techniques that incorporate programmatic parsing, timezone-aware timestamp translation, and conventional database analysis. This case study provides useful, repeatable advice for digital forensic specialists working with unsupported or specialized mobile applications by outlining the file locations, database schema, and sequential analytical queries. The study emphasizes how useful tool-agnostic investigation techniques are for locating and verifying digital evidence in contemporary mobile settings

  • Research Article
  • Cite Count Icon 1
  • 10.30574/wjarr.2025.26.2.1669
Optimizing database architectures for high-frequency trading and financial analytics: A comprehensive analysis
  • May 30, 2025
  • World Journal of Advanced Research and Reviews
  • Pranith Kumar Reddy Myeka

Financial institutions increasingly rely on sophisticated database architectures to gain competitive advantages in high-frequency trading and analytics environments. This article examines optimal database technologies for financial applications, comparing in-memory, columnar, time-series, and distributed ledger architectures across standardized financial workloads. Multiple case studies demonstrate how different architectures excel in specific contexts: in-memory processing delivers superior performance for order processing, columnar storage enables faster analytical queries for market analysis, while time-series databases efficiently handle pattern recognition for fraud detection. Performance bottlenecks, consistency trade-offs, regulatory compliance challenges, and security considerations are explored in depth. The results indicate that no single architecture provides optimal performance across all financial application requirements; instead, financial institutions must select technologies based on specific use cases, with heterogeneous architectures often delivering superior results. The article concludes by examining emerging technologies with potential to transform financial database landscapes, including persistent memory, hardware acceleration, specialized indexing structures, AI-integrated engines, and hybrid blockchain solutions.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers