Guest Editorial: TSAS Special Issue on Parallel and Distributed Processing of Spatial Data: Algorithms and Systems
Guest Editorial: TSAS Special Issue on Parallel and Distributed Processing of Spatial Data: Algorithms and Systems
1
- 10.1145/3719202
- Apr 11, 2025
- ACM Transactions on Spatial Algorithms and Systems
1
- 10.1145/3703157
- Apr 11, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Book Chapter
4
- 10.1007/978-3-662-46632-2_12
- Jan 1, 2015
With continuous increasing of the data scale of GNSS observations network, the computing pressure of data processing is growing. The undifferenced precise point positioning (PPP) model is one of the main strategies of GNSS network data processing. With the increasing of stations’ scale, the processing time of PPP pattern also increases linearly, the traditional serial processing pattern need to consume a large amount of computing time. As the PPP model is not related, this model has good characteristics of parallel processing between stations. This paper established a distributed parallel processing strategy based on the PPP model, which can not only improve the efficiency of data processing, but also enhance the efficiency of hardware performance. However, due to the high concurrency of data access and processing, the parallel programming is faced with great challenges which can cause immeasurable results. In this paper, by analyzing the flow characteristics of the PPP method, a parallel GNSS data process model at multi-core and multi node level was set up, and a lightweight parallel programming model was adopted to realize the parallel model. Through a large number of data tests and experiments, high efficiency of parallel processing of GNSS data based on the PPP model was achieved. The experiment shows that, under the environment of four multi-core nodes, the parallel processing is at least six times faster than the traditional serial processing.
- Research Article
3
- 10.1016/j.proeng.2012.06.261
- Jan 1, 2012
- Procedia Engineering
Improved Task Graph-based Parallel Data Processing for Dynamic Resource Allocation in Cloud
- Book Chapter
2
- 10.1007/978-3-319-47247-8_10
- Jan 1, 2016
The GPU-Services project fits into the context of research and development of methods for data processing of three-dimensional sensors data applied to mobile robotics and intelligent vehicles. The implemented methods are called services on this project, which provide 3D point clouds pre-processing algorithms, such as, data alignment, segmentation of safe/unsafe navigable zones (e.g. separating ground from obstacles and borders/curbs) and elements of interest detection. Due to the large amount of data provided by the sensors to be processed in a very short time, these services use the GPU (NVidia CUDA) to perform partial or complete parallel processing of these data. The project aims to provide data processing services to an autonomous car, forcing the services to approach real-time processing, which is defined as completing all data processing routines before the arrival of the sensor’s next frame. This work was implemented considering 3D data acquired from a LIDAR, more specifically from a Velodyne HDL-32. The sensor data is structured in the form of a cloud of three-dimensional points, allowing for great parallel processing. However, the major challenge is the high rate of data received from this sensor (around 700,000 points/sec or 70.000 points/frame at 10 Hz), which gives the motivation of this project: to use the full potential of sensor and to efficiently use the parallelism of GPU programming. The GPU services are divided into four steps: The first step is an intelligent extraction, reorganization and spacial correction of the data provided by the Velodyne multi-layer laser sensor; The second stage is the segmentation of planar data; The third stage is object segmentation; The fourth stage is to develop a methodology that unite the results from the previous steps in order to better detect the curbs. The services were implemented and the performance was evaluated using traditional sequential data processing (CPU data processing) and parallel data processing (GPU CUDA implementations). Besides that, different NVidia GPUs were also tested, allowing us to process the acquired data much faster than using the CPUs, and in some cases faster than it was provided by the Velodyne sensor.
- Research Article
- 10.14201/adcaij.31506
- Jun 5, 2024
- ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal
The volume of data is growing at an astonishingly high speed. Traditional techniques for storing and processing data, such as relational and centralized databases, have become inefficient and time-consuming. Linked data and the Semantic Web make internet data machine-readable. Because of the increasing volume of linked data and Semantic Web data, storing and working with them using traditional approaches is not enough, and this causes limited hardware resources. To solve this problem, storing datasets using distributed and clustered methods is essential. Hadoop can store datasets because it can use many hard disks for distributed data clustering; Apache Spark can be used for parallel data processing more efficiently than Hadoop MapReduce because Spark uses memory instead of the hard disk. Semantic Web data has been stored and processed in this paper using Apache Spark GraphX and the Hadoop Distributed File System (HDFS). Spark's in-memory processing and distributed computing enable efficient data analysis of massive datasets stored in HDFS. Spark GraphX allows graph-based semantic web data processing. The fundamental objective of this work is to provide a way for efficiently combining Semantic Web and big data technologies to utilize their combined strengths in data analysis and processing. First, the proposed approach uses the SPARQL query language to extract Semantic Web data from DBpedia datasets. DBpedia is a hugely available Semantic Web dataset built on Wikipedia. Secondly, the extracted Semantic Web data was converted to the GraphX data format; vertices and edges files were generated. The conversion process is implemented using Apache Spark GraphX. Third, both vertices and edge tables are stored in HDFS and are available for visualization and analysis operations. Furthermore, the proposed techniques improve the data storage efficiency by reducing the amount of storage space by half when converting from Semantic Web Data to a GraphX file, meaning the RDF size is around 133.8 and GraphX is 75.3. Adopting parallel data processing provided by Apache Spark in the proposed technique reduces the required data processing and analysis time. This article concludes that Apache Spark GraphX can enhance Semantic Web and Big Data technologies. We minimize data size and processing time by converting Semantic Web data to GraphX format, enabling efficient data management and seamless integration.
- Research Article
66
- 10.1126/sciadv.abm8537
- Apr 8, 2022
- Science Advances
Convolutional neural networks (CNNs) have gained much attention because they can provide superior complex image recognition through convolution operations. Convolution processes require repeated multiplication and accumulation operations, which are difficult tasks for conventional computing systems. Compute-in-memory (CIM) that uses parallel data processing is an ideal device structure for convolution operations. CIM based on two-terminal synaptic devices with a crossbar structure has been developed, but unwanted leakage current paths and the high-power consumption remain as the challenges. Here, we demonstrate integrated ferroelectric thin-film transistor (FeTFT) synaptic arrays that can provide efficient parallel programming and data processing for CNNs by the selective and accurate control of polarization in the ferroelectric layer. In addition, three-terminal FeTFTs can act as both nonvolatile memory and access device, which tackle issues from two-terminal devices. An integrated FeTFT synaptic array with parallel programming capabilities can perform convolution operations to extract image features with a high-recognition accuracy.
- Research Article
1
- 10.5120/12196-7913
- May 31, 2013
- International Journal of Computer Applications
Parallel data processing has become more and more reliable phenomenon due to the realization of could computing, especially using IaaS (Infrastructure as a Service) clouds. The cloud service providers such as IBM, Google, Microsoft and Oracle have made provisions for parallel data processing in their cloud services. Nevertheless, the frameworks used as of now are static and homogenous in nature in a cluster environment. The problem with these frameworks is that the resource allocation when large jobs are submitted is not efficient as they take more time for processing besides incurring more cost. In this paper we discuss the possibilities of parallel processing and its challenges. One of the IaaS products meant for parallel processing is presented in this paper. VMs are allocated to tasks dynamically for execution of jobs. With proposed framework we performed parallel job processing which involves Map Reduce, a new programming phenomenon. We also compare this with Hadoop.
- Research Article
283
- 10.1109/tpds.2011.65
- Jun 1, 2011
- IEEE Transactions on Parallel and Distributed Systems
In recent years ad hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper, we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.
- Conference Article
105
- 10.1145/1646468.1646476
- Nov 16, 2009
In recent years Cloud Computing has emerged as a promising new approach for ad-hoc parallel data processing. Major cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used stem from the field of cluster computing and disregard the particular nature of a cloud. As a result, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our ongoing research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's compute clouds for both, task scheduling and execution. It allows assigning the particular tasks of a processing job to different types of virtual machines and takes care of their instantiation and termination during the job execution. Based on this new framework, we perform evaluations on a compute cloud system and compare the results to the existing data processing framework Hadoop.
- Research Article
92
- 10.1155/2021/3839800
- Dec 31, 2021
- Mathematical Problems in Engineering
The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.
- Research Article
- 10.1088/1742-6596/2294/1/012007
- Jun 1, 2022
- Journal of Physics: Conference Series
To meet the needs of large-scale users for personalized streaming media services with high speed, low delay, and high quality in a 5G mobile network environment, this paper studies the resource allocation mechanism of streaming media based on a 5G network from the perspective of user demand prediction, which can alleviate the pressure of mobile network, improve the utilization rate of streaming media resources and the quality of user service experience. The augmented reality visualization of large-scale social media data must rely on the computing power of distributed clusters. This paper constructs a distributed parallel processing framework in a high-performance cluster environment, which adopts a loosely coupled organizational structure. Each module can be combined, called, and expanded arbitrarily under the condition of following a unified interface. In this paper, the algebraic method of parallel computing algorithm is innovatively proposed to describe parallel processing tasks and organize and call large-scale data-parallel processing operators, which effectively supports the business requirements of large-scale parallel processing of large-scale spatial social media data and solves the bottleneck of large-scale spatial social media data-parallel processing.
- Research Article
- 10.9790/0661-0230105
- Jan 1, 2012
- IOSR Journal of Computer Engineering
In recent years ad-hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution.
- Conference Article
- 10.1364/ecbo.2011.80910v
- Jan 1, 2011
In this contribution we describe a specialised data processing system for Spectral Optical Coherence Tomography (SOCT) biomedical imaging which utilises massively parallel data processing on a low-cost, Graphics Processing Unit (GPU). One of the most significant limitations of SOCT is the data processing time on the main processor of the computer (CPU), which is generally longer than the data acquisition. Therefore, real-time imaging with acceptable quality is limited to a small number of tomogram lines (A-scans). Recent progress in graphics cards technology gives a promising solution of this problem. The newest graphics processing units allow not only for a very high speed three dimensional (3D) rendering, but also for a general purpose parallel numerical calculations with efficiency higher than provided by the CPU. The presented system utilizes CUDA™ graphic card and allows for a very effective real time SOCT imaging. The total imaging speed for 2D data consisting of 1200 A-scans is higher than refresh rate of a 120 Hz monitor. 3D rendering of the volume data build of 10 000 A-scans is performed with frame rate of about 9 frames per second. These frame rates include data transfer from a frame grabber to GPU, data processing and 3D rendering to the screen. The software description includes data flow, parallel processing and organization of threads. For illustration we show real time high resolution SOCT imaging of human skin and eye.
- Conference Article
4
- 10.1117/12.889805
- Jun 9, 2011
In this contribution we describe a specialised data processing system for Spectral Optical Coherence Tomography (SOCT) biomedical imaging which utilises massively parallel data processing on a low-cost, Graphics Processing Unit (GPU). One of the most significant limitations of SOCT is the data processing time on the main processor of the computer (CPU), which is generally longer than the data acquisition. Therefore, real-time imaging with acceptable quality is limited to a small number of tomogram lines (A-scans). Recent progress in graphics cards technology gives a promising solution of this problem. The newest graphics processing units allow not only for a very high speed three dimensional (3D) rendering, but also for a general purpose parallel numerical calculations with efficiency higher than provided by the CPU. The presented system utilizes CUDA™ graphic card and allows for a very effective real time SOCT imaging. The total imaging speed for 2D data consisting of 1200 A-scans is higher than refresh rate of a 120 Hz monitor. 3D rendering of the volume data build of 10 000 A-scans is performed with frame rate of about 9 frames per second. These frame rates include data transfer from a frame grabber to GPU, data processing and 3D rendering to the screen. The software description includes data flow, parallel processing and organization of threads. For illustration we show real time high resolution SOCT imaging of human skin and eye.
- Research Article
1
- 10.14257/ijgdc.2016.9.3.05
- Mar 31, 2016
- International Journal of Grid and Distributed Computing
Cloud computing is a technology in which the Cloud Service Providers (CSP) provide many virtual servers to the users to store their information in the cloud. The faults occurring on the assignment and dismission of the virtual machines, the processing cost in the allocation of resources must also be considered. The parallel processing of the information on the virtual machines must be done effectively and in an efficient manner. A variety of systems were developed to facilitate Many Task Computing (MTC). These systems aim to hide the issues of parallelism and fault tolerant and they are used in many applications. In this paper, we introduced Nephele, a data processing framework to exploit dynamic resource provisioning offered by IaaS clouds. The performance evaluation of the virtual machines has been evaluated and the allocation and deallocation of job tasks to the specific virtual machines has also been considered. A performance comparison with the well known data processing framework hadoop has been done. Thus this paper tells about the effective and efficient manner of processing the data by parallel processing and allocating the correct resources for the desired task. It also helps to reduce the cost of resource utilization by exploiting the dynamic resource utilization.
- Book Chapter
- 10.1007/978-3-642-17313-4_2
- Jan 1, 2010
Data intensive applications are widely existed, such as massive data mining, search engine and high-throughput computing in bioinformatics, etc. Data processing becomes a bottleneck as the scale keeps bombing. However, the cost of processing the large scale dataset increases dramatically in traditional relational database, because traditional technology inclines to adopt high performance computer. The boost of cloud computing brings a new solution for data processing due to the characteristics of easy scalability, robustness, large scale storage and high performance. It provides a cost effective platform to implement distributed parallel data processing algorithms. In this paper, we proposed CPLDP (Cloud based Parallel Large Data Processing System), which is an innovative MapReduce based parallel data processing system developed to satisfy the urgent requirements of large data processing. In CPLDP system, we proposed a new method called operation dependency analysis to model data processing workflow and furthermore, reorder and combine some operations when it is possible. Such optimization reduces intermediate file read and write. The performance test proves that the optimization of processing workflow can reduce the time and intermediate results.
- Research Article
- 10.1145/3732286
- May 22, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
- 10.1145/3722555
- May 8, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
- 10.1145/3729226
- May 7, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
- 10.1145/3721363
- Apr 11, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
1
- 10.1145/3703157
- Apr 11, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
1
- 10.1145/3719202
- Apr 11, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
- 10.1145/3701989
- Apr 11, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
- 10.1145/3699511
- Apr 11, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
- 10.1145/3716825
- Mar 5, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Research Article
- 10.1145/3715910
- Feb 25, 2025
- ACM Transactions on Spatial Algorithms and Systems
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.