Abstract—Scaling data infrastructure for high- volume manufacturing presents significant challenges owing to the rapid growth, diversity, and complexity of the data generated by modern production processes. This review explores the key challenges and solutions in big-data engineering to enable efficient, scalable, and reliable data management in manufacturing environments. The primary challenges include handling the volume, velocity, and variety of data; ensuring real-time processing and analysis; managing data storage and retrieval at scale; and maintaining data quality and consistency. To address these challenges, various big data engineering solutions have been discussed, including distributed computing frameworks, cloud-based storage and computing resources, data lakes, data governance and metadata management, stream processing technologies, machine learning, and AI for predictive analytics. This review also examines the role of data architecture and infrastructure in building scalable systems, highlighting the importance of microservices, containerization, orchestration, NoSQL databases, and data security and privacy. Performance optimization techniques, such as query optimization, data partitioning, sharding, caching, and data compression, have been explored to ensure efficient operation of large-scale data systems. The review includes case studies of successful implementations and discusses emerging trends, such as edge computing, and the growing importance of data interoperability and standardization. Future research directions were identified, emphasizing the need for ongoing development in this field to meet the ever- growing demand for high-volume manufacturing. Keywords—high-volume manufacturing, big data, data infrastructure, distributed computing frameworks, cloud computing, data lakes, microservices, containerization, orchestration, query optimization, data partitioning, sharding, caching, data compression
Read full abstract