Published in last 50 years
Articles published on Schema Design
- Research Article
- 10.28925/2663-4023.2025.29.847
- Sep 26, 2025
- Cybersecurity: Education, Science, Technique
- Anatolii Kurotych + 2 more
The article explores the potential of using Large Language Models (LLMs) for automatically restoring relationships between tables in SQL databases with incompletely defined foreign keys. To evaluate the ability of LLMs to infer foreign keys from textual descriptions of table structures, an experimental database was created. The database schema, excluding relationships, was provided as input to two large language models: ChatGPT-4o and Claude 3.7 Sonnet. For analysis purposes, only basic information was provided to the LLMs: table names, field names, and primary keys, without any data examples. The ChatGPT-4o model successfully detected all relationships between tables but demonstrated limitations in determining the types of these relationships: all were classified as “one-to-one”, regardless of their actual structure. This indicates the model's inability to accurately interpret the type of relationships based on textual descriptions. In contrast, the Claude 3.7 Sonnet model not only correctly identified all existing relationships, but also correctly determined their types (e.g., one-to-many), demonstrating higher accuracy and a deeper understanding of the database structure within the task at hand. The description of the table structure was provided to the language models in PlantUML format, ensuring a standardized, clear and unambiguous representation of the input data. Based on the modeling results, ER diagrams were also constructed in PlantUML format. The experiment confirms the effectiveness of LLMs in reconstructing missing foreign keys and shows potential for automated analysis, documentation, and improvement of existing databases. Following consistent naming conventions during schema design significantly simplifies both the work of developers and the automated processing of database structures by intelligent systems, playing a crucial role in these processes.
- Research Article
- 10.1007/s00203-025-04413-0
- Aug 12, 2025
- Archives of microbiology
- Anand Eruvessi Pudavar + 3 more
Gut bacteria are well known to significantly influence human health and physiology. Knowledge Graph (KG) can effectively integrate the heterogenous factors modulating gut bacteria-host associations. Limited studies describe the construction and application of KGs capturing these associations for domain experts. This work outlines a methodology for constructing microbiome-centric KG and demonstrates how it enhances conventional microbiome data analysis workflows. Towards construction and deployment of this domain centric KG, methodologies involved in collection of data, selecting relevant entities and relationships, and preprocessing them are discussed. Key relevant entities include bacteria, host genetic and immune factors, chemicals and diseases. The KG construction in both RDF (Resource Description Framework) and LPG (Labeled Property Graph) models are demonstrated. Comparison of the querying techniques in both these models and applications of the KG using biologically relevant case studies are also presented. Overall, the work is intended to provide domain experts with a complete protocol for construction of a microbiome-centric KG starting from entity selection and schema design to utilizing the KG for microbiome data analysis and hypothesis generation.
- Research Article
- 10.3390/electronics14153116
- Aug 5, 2025
- Electronics
- Yuanhao Zhang + 4 more
Dietary reviews contain rich emotional and objective information; however, existing knowledge extraction methods struggle with low-resource scenarios due to sparse and imbalanced label distributions. To address these challenges, this paper proposes a few-shot supervised learning approach. First, we develop a professional dietary–emotional schema by integrating domain knowledge with real-time data to ensure the coverage of diverse emotional expressions. Next, we introduce a dataset optimization method based on dual constraints—label frequency and quantity—to mitigate label imbalance and improve model performance. Utilizing the optimized dataset and a tailored prompt template, we fine-tune the DRE-UIE model for multivariate knowledge extraction. The experimental results demonstrate that the DRE-UIE model achieves a 20% higher F1 score than BERT-BiLSTM-CRF and outperforms TENER by 1.1%. Notably, on a 20-shot subset, the model on the Chinese dataset scores 0.841 and attains a 15.16% F1 score improvement over unoptimized data, validating the effectiveness of our few-shot learning framework. Furthermore, the approach also exhibits robust performance across Chinese and English corpora, underscoring its generalization capability. This work offers a practical solution for low-resource dietary–emotional knowledge extraction by leveraging schema design, dataset optimization, and model fine-tuning to achieve high accuracy with minimal annotated data.
- Research Article
- 10.32628/cseit2511162
- Jul 27, 2025
- International Journal of Scientific Research in Computer Science, Engineering and Information Technology
- Amit Kanojia + 1 more
As more applications are migrated to NoSQL databases, they often rely on general guidelines to select appropriate schemas, but these methods do not fully address the unique challenges posed by NoSQL systems. Traditional relational database schema optimization techniques are not directly applicable to NoSQL environments, leading to inefficiencies in schema design. This paper introduces an approach for designing optimal database schemas specifically tailored for NoSQL databases like MongoDB. We propose a semi-automated schema model to recommend schemas and query plans based on Metadata SQL query information. The model captures Meta data information of SQL database and used as a suggestive measure to design NoSQL database. The key parameters captured are primary key, foreign key, table size and cardinality. The decision is made based on these parameters and manual interventions of frequently executed SQL queries indicating joins. This approach aims to simplify the development process, enhance database performance and scalability through our proposed model. To evaluate the impact of proposed model three benchmark workloads were implemented using the Yahoo! Cloud Serving Benchmark (YCSB) framework, especially focused on eliminating joins.
- Research Article
- 10.1080/10447318.2025.2528998
- Jul 17, 2025
- International Journal of Human–Computer Interaction
- Shilong Liu + 7 more
Facilitating the collecting and sharing of heterogeneous scientific data among researchers is a key approach to unlocking the full value of data. For data platforms to gain widespread adoption, they must satisfy two criteria simultaneously: (1) allow non-technical researchers to submit personalized data autonomously, and (2) comply with the FAIR principles — Findable, Accessible, Interoperable, and Reusable. To meet these requirements, we propose the Dynamic Container (DC) paradigm, which enables heterogeneous data collection and transforms the data schema designers from database experts to non-technical end users. We further introduce a data representation model that decomposes schemas into familiar components for end users and supports dynamical schema construction through an intuitive drag-and-drop interaction method. We have folded these ideas into a system and evaluated the usability through user studies. The results indicate that our system makes it more usable, efficient and less error-prone for end users to customize schemas.
- Research Article
- 10.1145/3725362
- Jun 17, 2025
- Proceedings of the ACM on Management of Data
- Zhuoxing Zhang + 1 more
State-of-the-art relational schema design generates a lossless, dependency-preserving decomposition into Third Normal Form (3NF), that is in Boyce-Codd Normal Form (BCNF) whenever possible. In particular, dependency-preservation ensures that data integrity can be maintained on individual relation schemata without having to join them, but may need to tolerate a priori unbounded levels of data redundancy and integrity faults. As our main contribution we parameterize 3NF schemata by the numbers of minimal keys and functional dependencies they exhibit. Conceptually, these parameters quantify, already at schema design time, the effort necessary to maintain data integrity, and allow us to break ties between 3NF schemata. Computationally, the parameters enable us to optimize normalization into 3NF according to different strategies. Operationally, we show through experiments that our optimizations translate from the logical level into significantly smaller update overheads during integrity maintenance. Hence, our framework provides access to parameters that guide the computation of logical schema designs which reduce operational overheads.
- Research Article
- 10.32996/jcsts.2025.7.5.67
- Jun 4, 2025
- Journal of Computer Science and Technology Studies
- Jagan Nalla
Cloud data warehouses have emerged as the cornerstone of modern enterprise analytics infrastructure, yet achieving optimal performance across platforms like Redshift, Snowflake, and Synapse requires specialized knowledge that extends beyond traditional on-premises optimization techniques. This article presents a systematic framework for performance tuning in cloud data warehouse environments, encompassing critical aspects from foundational data modeling principles to advanced query optimization strategies. The interplay between schema design decisions, partitioning schemes, and indexing mechanisms significantly impacts both performance outcomes and cost efficiency in cloud deployments. Platform-specific considerations are examined alongside universal best practices, offering data engineers and warehouse architects practical guidance for identifying and resolving performance bottlenecks. Through careful attention to workload characteristics, resource allocation, and caching strategies, organizations can establish a balanced approach to cloud data warehouse optimization that delivers both technical performance advantages and business value. The optimization techniques outlined provide a comprehensive toolkit for navigating the distinct challenges presented by cloud data warehouse architectures while leveraging their inherent scalability and flexibility.
- Research Article
- 10.30574/wjaets.2025.15.2.0470
- May 30, 2025
- World Journal of Advanced Engineering Technology and Sciences
- Madhuri Koripalli
Intelligent assistants, including AI-driven copilots and specialized extensions, transform how data professionals interact with database environments. These tools leverage advanced language models and contextual understanding to automate routine tasks while providing sophisticated recommendations for query optimization, schema design, and performance tuning. By integrating with platforms like SQL Server Management Studio and Azure Data Studio, these assistants offer capabilities ranging from natural language query translation to predictive code completion and error prevention. Success stories across financial services, healthcare, and retail demonstrate their potential to accelerate development cycles, improve code quality, and democratize data access. However, implementation requires careful consideration of adoption frameworks, governance policies, and technical prerequisites. These systems face challenges despite their value, including performance limitations with complex queries, organizational resistance, potential skill erosion, and privacy concerns. The evolution of these intelligent companions represents a significant shift from passive tools to active collaborators in data management.
- Research Article
- 10.61841/turcomat.v11i3.15236
- May 7, 2025
- Turkish Journal of Computer and Mathematics Education (TURCOMAT)
- Sai Vinod Vangavolu,
MongoDB, a document-oriented NoSQL database, is crucial in modern web applications, particularly in the MEAN (MongoDB, Express.js, Angular, Node.js) stack. The challenge with optimizing MongoDB schemas exists because of the no-schema database design approach combined with changing workload requirements. This article investigates the optimal approaches to creating and optimizing MongoDB schema designs. It focuses on normalization and denormalization decisions using shard and replication systems with workload-related optimizations and scale-up capabilities. The article evaluates modern AI schema optimization trends and computerized performance optimization that dramatically boosts operational efficiency across extensive applications. MEAN applications will obtain superior scalability, reduced query delays, and enhanced system performance through these implementation methods. When merged with reliable data protection and schema longevity, the article will provide organizations with a complete mold to optimize MongoDB schemas for peak operational efficiency.
- Research Article
- 10.37745/ejcsit.2013/vol13n291327
- Apr 15, 2025
- European Journal of Computer Science and Information Technology
- Ashif Anwar
Event-driven architecture (EDA) represents a transformative paradigm in distributed systems development, enabling organizations to build more responsive, scalable, and resilient applications. By facilitating asynchronous communication through events that represent significant state changes, EDA establishes loosely coupled relationships between system components that can operate independently. This architectural approach addresses fundamental challenges in distributed systems including component coordination, state management, and fault isolation. Microsoft Azure cloud services provide comprehensive support for implementing event-driven architectures through specialized offerings such as Event Grid for event routing, Service Bus for enterprise messaging, and Functions for serverless computing. These services create a foundation for sophisticated event processing pipelines that adapt dynamically to changing business requirements. When properly implemented with attention to event schema design, idempotent processing, appropriate delivery mechanisms, and comprehensive monitoring strategies, event-driven architectures deliver substantial benefits across diverse industry sectors including financial services, healthcare, manufacturing, and retail. The integration of EDA with microservices architecture creates particularly powerful synergies, enabling systems to evolve incrementally while maintaining operational resilience. As distributed systems continue to evolve, event-driven patterns implemented through cloud-native services will play an increasingly central role in meeting the demands for real-time responsiveness and elastic scalability.
- Research Article
- 10.29121/granthaalayah.v13.i3.2025.6185
- Mar 31, 2025
- International Journal of Research -GRANTHAALAYAH
- Aparna Sharma
The impact of the sustainability on graphic index design is a part of physical /internal schema of relational design . Graphics index design concept as a technical and aesthetical refer to the idea , content and layout of print media and electronic media . sustainability on graphics index design we will confirm on the level of economical , social and nature planet reusability . Sustainability may be defined as three pillars environmental sustainability, social sustainability, and economic sustainability graphics index design fulfil all requirement on the basis of visual . Graphics index (content) display the inspiration to purchase item, generates entrepreneurship idea and open new sources from old sources .Graphics index design means graphics content design in the form of colour schema , text schema and geometrical shapes or abstract topology .Social sustainability on graphic index design form refer to the aesthetical sense of law of attraction .Visual impact scientifically gets on mind and soul. Graphics index design come under verbal and non verbal communication .when we listen advertising content or index , punch line and copy matter line , we feel attraction about the product and create desire to purchase . SFX (special effects )and VFX(visual effects ) create a very important on graphics index design . Graphics index design means schema design , layout design ,content design . A big product house and small scale production house have catchy copy matter /content /index designer. In a way of news channel every day is a big challenge of content /index of high light news on this point create a point of view of audience gives TRP of news channel . Advertisement also depends on content or index design of product add and social add , target audience want strong content copy matter /index design in market one product have 100 same field competitor in front of one product . In Magazine design area also same problem graphics index design will be stronger then any one take interest to read the magazine . Graphics is a visual data and whatever we visualize the concept through text ,colour , shapes transaction and effects all are a graphics form . When we try to combine text , colour and shapes in a layout sense , content sense and content index design sense of books , film, advertisement and magazine / newspaper .Audience / viewer take inspiration from books and make one, film on it means graphics index design provide sustainability.
- Research Article
- 10.71097/ijsat.v16.i1.2921
- Mar 28, 2025
- International Journal on Science and Technology
- Naresh Pala -
Event-Driven Architecture (EDA) has emerged as a powerful architectural paradigm for building scalable and resilient systems in today's complex digital landscape. This article explores the core principles, implementation considerations, and real-world applications of EDA, with a particular focus on retail and e-commerce environments. By reconceptualizing system interactions around the production, detection, and consumption of events, organizations can create loosely coupled systems that adapt more readily to changing requirements. The article examines how this architectural approach enables real-time operations, seamless data synchronization, and enhanced customer experiences through decoupled and resilient design. Through a comprehensive case study of a mid-sized retailer's transformation journey, the article demonstrates how EDA principles translate into tangible business advantages, from improved inventory accuracy to accelerated feature delivery velocity. Technical implementation considerations, including message broker selection, schema design, and consistency models, provide practical guidance for organizations embarking on their own event-driven transformation.
- Research Article
- 10.56028/aetr.13.1.1062.2025
- Mar 27, 2025
- Advances in Engineering Technology Research
- Xusheng Mei
This article examines the challenges traditional relational databases face in dealing with the growing demands of modern digital environments and proposes MongoDB (a NoSQL database) as a powerful alternative solution. MongoDB's document-oriented model provides flexible schema design, high performance, and easy scalability, making it particularly suitable for applications requiring rapid processing of large amounts of data, such as real-time analytics and the Internet of Things. Through in-depth analysis, we explored the key features of MongoDB, including automatic sharding, JSON style BSON data format, and dynamic load balancing technology considering access frequency, which significantly improved its efficiency and reliability. We also discussed MongoDB's evolution, technological advancements, and role in promoting agile development and scalable solutions in data-intensive applications. The article concludes by acknowledging the crucial role of MongoDB in modern application development, driven by its ability to meet the dynamic and complex demands of big data and cloud computing.
- Research Article
- 10.48175/ijarsct-24406
- Mar 24, 2025
- International Journal of Advanced Research in Science, Communication and Technology
- Sai Teja Battula
This article examines the transformative role of Event-Driven Architectures (EDA) in revolutionizing real-time risk analysis and decision-making processes within the fintech sector. As financial markets become increasingly dynamic and complex, traditional batch processing methods prove inadequate for managing emerging risks effectively. The article explores how EDA enables financial institutions to process market events as they occur, providing unprecedented visibility into evolving risk landscapes. By implementing key architectural patterns such as Event Sourcing, CQRS, and Saga patterns, alongside technologies like Apache Kafka and specialized time-series databases, financial organizations can achieve significant improvements in risk detection, analysis, and mitigation capabilities. It details best practices for successful EDA implementation, including robust event schema design, effective event enrichment strategies, comprehensive monitoring frameworks, and rigorous compliance mechanisms. Despite clear advantages, adoption challenges remain, including data quality issues, consistency-availability tradeoffs, and skills gaps. The article concludes that EDA represents a fundamental paradigm shift in financial risk management that will increasingly distinguish industry leaders from laggards
- Research Article
- 10.3390/su17052089
- Feb 28, 2025
- Sustainability
- Chuan Shi + 4 more
The article presents an innovative design schema for air circulation within collective housing, which effectively reduces energy consumption and improves the indoor environment. It also solves the problem of the high operating and maintenance costs caused by the simultaneous installation of air conditioners and radiators. Employing dynamic energy consumption calculation software THERB for HAM, the energy-saving benefits of this design are simulated. The strategy involves capturing heat within the sunspace and transferring it to the conditioning chamber, from where the air is tempered and circulated throughout the habitable spaces to minimize heating. The findings suggest that by strategically using sunspace heat, heating energy can be significantly reduced by 43%. It helps to promote the development of sustainable building design. A comparative analysis of window materials in the sunspace, including single glazing, double glazing, and low-e double glazing, indicates that windows with enhanced insulation properties can substantially decrease the heating energy. Considering both energy efficiency and economic feasibility, low-e double glazing is identified as a particularly advantageous choice.
- Research Article
1
- 10.32628/cseit251112323
- Feb 18, 2025
- International Journal of Scientific Research in Computer Science, Engineering and Information Technology
- Sagar Chaudhari
This comprehensive article explores the evolution and implementation of Event-Driven Architecture (EDA) in modern enterprise systems. It explores the fundamental components, patterns, and best practices that enable organizations to build responsive and scalable applications. The article analyzes various aspects of EDA implementation, including message broker selection, schema design, and performance optimization strategies, while providing detailed insights into real-world applications across different industries. Through extensive analysis of deployment scenarios and performance metrics, the article demonstrates how EDA transforms traditional request-response patterns into dynamic, asynchronous communication models that significantly enhance system responsiveness, operational efficiency, and business outcomes.
- Research Article
- 10.55124/ijccscm.v1i2.240
- Jan 1, 2025
- International Journal of Cloud Computing and Supply Chain Management
- Suresh Pandipati
A Data Mart in Palantir Foundry is a curated dataset designed to support specific business use cases, enabling efficient data access and analysis. Building a Data Mart in Foundry involves ingesting raw data, transforming it using Code Repositories (Pipelines, Functions, and Workflows), and organizing it in Ontology for easy discovery and governance. Foundry’s Schema Workflows and Quiver Tables help structure data effectively, ensuring high performance. With granular access controls and versioning, Foundry enables secure collaboration. A well-built Data Mart empowers analysts and applications with reliable, up-to-date insights while maintaining data integrity and scalability across the organization.The significance of researching Data Mart building in Palantir Foundry development lies in its impact on data-driven decision-making, efficiency, and scalability. Data Marts streamline data access by organizing domain-specific datasets, enhancing performance and user experience. In Palantir Foundry, efficient Data Mart design ensures optimized data pipelines, governance, and interoperability, enabling organizations to extract actionable insights with minimal redundancy. Understanding its construction aids in improving data modeling, security, and analytics workflows, leading to faster, more informed business decisions. This research contributes to best practices for scalable, resilient, and efficient data ecosystems in modern enterprises using Foundry’s powerful capabilities. The methodology for building a Data Mart in Palantir Foundry begins with Requirement Analysis, where business needs, data sources, and user requirements are identified. Next, Data Ingestion is performed using Foundry’s pipelines to integrate structured and unstructured data. The Data Transformation phase leverages Foundry’s Code Repositories or Transform features to clean, enrich, and normalize data. Schema Design follows, using Object Builders and the Foundry Ontology to define the data model. To enhance performance, Data Optimization techniques such as caching, indexing, and partitioning are applied. Security & Governance measures ensure access controls and audit policies are in place. Rigorous Validation & Testing ensures data quality through Foundry’s testing frameworks. Finally, the Deployment & Maintenance phase involves continuous monitoring and optimization to keep Data Mart efficient and up to date. Cloud-Native Data Mart is getting first place of the table and Self-Service Data Mart is getting last place of the table Keywords: Cloud-Native Data Mart, Hybrid Storage Mart, Real-Time Analytics Mart, Ease of Integration, User Adoption Rate (%)
- Research Article
- 10.1142/s0218488525500011
- Jan 1, 2025
- International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
- Ashwini Mandale + 1 more
A schema database functions as a repository for interconnected data points, facilitating comprehension of data structures by organizing information into tables with rows and columns. These databases utilize established connections to arrange data, with attribute values linking related tuples. This integrated approach to data management and distributed processing enables schema databases to maintain models even when the working set size surpasses available RAM. However, challenges such as data quality, storage, scarcity of data science professionals, data validation, and sourcing from diverse origins persist. Notably, while schema databases excel at reviewing transactions, they often fall short in updating them effectively. To address these issues, a Chimp-based radial basis neural model (CbRBNM) is employed. Initially, the Schemaless database was considered and integrated into the Python system. Subsequently, compression functions were applied to both schema and schema-less databases to optimize relational data size by eliminating redundant files. Performance validation involved calculating compression parameters, with the proposed method achieving memory usage of 383.37[Formula: see text]Mb, a computation time of 0.455[Formula: see text]s, a training time of 167.5[Formula: see text]ms, and a compression rate of 5.60%. Extensive testing demonstrates that CbRBNM yields a favorable compression ratio and enables direct searching on compressed data, thereby enhancing query performance.
- Research Article
- 10.62225/2583049x.2024.4.6.4267
- Dec 31, 2024
- International Journal of Advanced Multidisciplinary Research and Studies
- Jeffrey Chidera Ogeawuchi + 5 more
As Software-as-a-Service (SaaS) applications and cloud-based Business Intelligence (BI) platforms proliferate across industries, the demand for scalable, responsive, and intelligent data modeling and reporting solutions has become paramount. This paper presents a comprehensive conceptual and technical exploration of scalable data architectures and reporting mechanisms tailored for SaaS ecosystems and cloud-native BI environments. It begins by contextualizing the rise of multi-tenant cloud systems and outlines the need for resilient data modeling practices that balance extensibility with operational efficiency. Foundational concepts such as multi-tenant architecture patterns, schema design, and metadata governance are examined, followed by a review of emerging innovations in real-time pipelines, self-service analytics, and cost-optimized performance strategies. The paper further investigates the integration of automation and machine learning into data workflows, highlighting AutoML for predictive reporting, AI-based data quality monitoring, and automated documentation through semantic modeling tools. Critical challenges are addressed, including trade-offs between scalability and system complexity, evolving compliance and privacy standards, and the ethical implications of AI-enhanced reporting. Finally, the discussion identifies emerging trends such as generative AI, data fabric architectures, and edge analytics, offering a roadmap for future research and development in the domain of scalable cloud data systems. This work contributes to both academic discourse and practical understanding by framing a holistic view of the data lifecycle in contemporary SaaS and BI applications.
- Research Article
- 10.37394/23205.2024.23.33
- Dec 31, 2024
- WSEAS TRANSACTIONS ON COMPUTERS
- Bisera Chauleva + 2 more
In a highly competitive, data-driven marketplace, optimizing customer journeys is essential for businesses. This paper examines the combination of advanced analytics techniques with Google BigQuery’s data warehousing capabilities, utilizing data from Google Analytics 4 (GA4). GA4 provides a comprehensive view of user interactions across platforms, but extracting actionable insights requires a robust data infrastructure. Google BigQuery’s scalable architecture supports real-time analysis of massive datasets, offering valuable insights into customer behavior. This research explores methodologies such as sequence analysis, network analysis, and clustering to analyze customer journeys and enhance marketing strategies. Our technical contributions include the development of a scalable ELT pipeline using Dataform for processing GA4 data, the implementation of optimized star schema design for enhanced query performance in BigQuery, and the integration of advanced analytics techniques, such as sequence, cluster, and network analysis, to drive actionable insights and improve decision-making accuracy. Through practical implementations and real-world examples, the study demonstrates the effectiveness of this integration. Key findings show sequence analysis improves purchase flow, network analysis identifies product relationships, and clustering analysis enables customer segmentation for targeted marketing. The paper concludes with recommendations for businesses to fully leverage GA4 data, improving user experiences and fostering sustainable growth.