Scientific Big Data Research Articles

Interferometric synthetic aperture radar (InSAR) has developed rapidly over the past years and is considered as an important method for surface deformation monitoring, benefiting from growing data quantities and improving data quality. However, the handing of SAR big data poses significant challenges for related algorithms and pipeline, particularly in large-scale SAR data processing. In addition, InSAR algorithms are highly complex, and their task dependencies are intricate. There is a lack of efficient optimization models and task scheduling for InSAR pipeline. In this paper, we design parallel time-series InSAR processing models based on multi-thread technology for high efficiency in processing InSAR big data. These models concentrate on parallelizing critical algorithms that have high complexity, with a focus on deconstructing two computationally intensive algorithms through loop unrolling. Our parallel models have shown a significant improvement of 10–20 times in performance. We have also developed a parallel optimization tool, Simultaneous Task Automatic Runtime (STAR), which utilizes a data flow optimization strategy with thread pool technology to address the problem of low CPU utilization resulting from multiple modules and task dependencies in the InSAR processing pipeline. STAR provides a data-driven pipeline and enables concurrent execution of multiple tasks, with greater flexibility to keep the CPU busy and further improve CPU utilization through predetermined task flow. Additionally, a supercomputing-based system has been constructed for processing massive InSAR scientific big data and providing technical support for nationwide surface deformation measurement, in accordance with the framework of time series InSAR data processing. Using this system, we processed InSAR data with the volumes of 500 TB and 700 TB in 5 and 7 days, respectively. Finally we generated two maps of land surface deformation all over China.

Read full abstract

With the Square Kilometer Array (SKA) project and the new Multi-Purpose Reactor (MPR) soon coming on-line, South Africa and other collaborating countries in Africa will need to make the management, analysis, publication, and curation of "Big Scientific Data" a priority. In addition, the recent draft Open Science policy from the South African Department of Science and Innovation (DSI) requires both Open Access to scholarly publications and research outputs, and an Open Data policy that facilitates equal opportunity of access to research data. The policy also endorses the deposit, discovery and dissemination of data and metadata in a manner consistent with the FAIR principles - making data Findable, Accessible, Interoperable and Re-usable (FAIR). The challenge to achieve Open Science in Africa starts with open access for research publications and the provision of persistent links to the supporting data. With the deluge of research data expected from the new experimental facilities in South Africa, the problem of how to make such data FAIR takes center stage. One promising approach to make such scientific datasets more "Findable" and "Interoperable" is to rely on the Dataset representation of the Schema.org vocabulary which has been endorsed by all the major search engines. The approach adds some semantic markup to Web pages and makes scientific datasets more "Findable" by search engines. This paper does not address all aspects of the Open Science agenda but instead is focused on the management and analysis challenges of the "Big Scientific Data" that will be produced by the SKA project. The paper summarizes the role of the SKA Regional Centers (SRCs) and then discusses the goal of ensuring reproducibility for the SKA data products. Experiments at the new MPR neutron source will also have to conform to the DSI's Open Science policy. The Open Science and FAIR data practices used at the ISIS Neutron source at the Rutherford Appleton Laboratory in the UK are then briefly described. The paper concludes with some remarks about the important role of interdisciplinary teams of research software engineers, data engineers and research librarians in research data management.

Read full abstract

Scientific Big Data Research Articles

Related Topics

Articles published on Scientific Big Data

Special Issue: Big Scientific Data and Machine Learning in Science and Engineering.

EGQCY: A smart contract-based scientific big data system approach for incentive sharing and transaction on the cost of data quality

Scientific Data Spaces - Experiences from the EGI-ACE project.

Parallel Optimization for Large Scale Interferometric Synthetic Aperture Radar Data Processing

Stability of scientific big data sharing mechanism based on two-way principal-agent

Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment

Open science and Big Data in South Africa.

Reflections on the Innovation of University Scientific Research Management in the Era of Big Data

RENDA: Resource and Network Aware Data Placement Algorithm for Periodic Workloads in Cloud

Call for Special Issue Papers: Big Scientific Data and Machine Learning in Science and Engineering.

Call for Special Issue Papers: Big Scientific Data and Machine Learning in Science and Engineering.

Estigma, género e interseccionalidad en la poesía de Xela Arias

Digital Innovation Risk Management Model of Discrete Manufacturing Enterprise Based on Big Data Analysis

Call for Special Issue Papers: Big Scientific Data and Machine Learning in Science and Engineering.

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

SwCS: Section-wise Content Similarity Approach to Exploit Scientific Big Data

Research on the Innovation Path of University Ideological and Political Work Based on Big Data Technology

An efficient list scheduling algorithm with task duplication for scientific big data workflow in heterogeneous computing environments

An Efficient Multiresolution Clustering for Motif Discovery in Complex Networks.

Segmented In-Advance Data Analytics for Fast Scientific Discovery

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Scientific Big Data Research Articles

Related Topics

Articles published on Scientific Big Data

Special Issue: Big Scientific Data and Machine Learning in Science and Engineering.

EGQCY: A smart contract-based scientific big data system approach for incentive sharing and transaction on the cost of data quality

Scientific Data Spaces - Experiences from the EGI-ACE project.

Parallel Optimization for Large Scale Interferometric Synthetic Aperture Radar Data Processing

Stability of scientific big data sharing mechanism based on two-way principal-agent

Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment

Open science and Big Data in South Africa.

Reflections on the Innovation of University Scientific Research Management in the Era of Big Data

RENDA: Resource and Network Aware Data Placement Algorithm for Periodic Workloads in Cloud

Call for Special Issue Papers: Big Scientific Data and Machine Learning in Science and Engineering.

Call for Special Issue Papers: Big Scientific Data and Machine Learning in Science and Engineering.

Estigma, género e interseccionalidad en la poesía de Xela Arias

Digital Innovation Risk Management Model of Discrete Manufacturing Enterprise Based on Big Data Analysis

Call for Special Issue Papers: Big Scientific Data and Machine Learning in Science and Engineering.

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

SwCS: Section-wise Content Similarity Approach to Exploit Scientific Big Data

Research on the Innovation Path of University Ideological and Political Work Based on Big Data Technology

An efficient list scheduling algorithm with task duplication for scientific big data workflow in heterogeneous computing environments

An Efficient Multiresolution Clustering for Motif Discovery in Complex Networks.

Segmented In-Advance Data Analytics for Fast Scientific Discovery