Parallel Data Retrieval Research Articles

Many scientific experiments are carried out in collaboration with researchers around the world to use existing infrastructures and conduct experiments at massive scale. Data produced by such experiments are thus replicated and cached at multiple geographic locations. This gives rise to new challenges when selecting distributed data and compute resources so that the execution of applications is time-and cost-efficient. Existing heuristic techniques select ‘best’ data source for retrieving data to a compute resource and subsequently process task-resource assignment. However, this approach of scheduling, which is based only on single source data retrieval, may not give time-efficient schedules when: (i) tasks are interdependent on data, (ii) the average size of data processed by most tasks is large and (iii) data transfer time exceeds task computation time by at least one order of magnitude. In order to address these characteristics of data-intensive applications, we propose to leverage the presence of replicated data sources, retrieve data in parallel from multiple locations and thus achieve time-efficient schedules. In this article, we propose two multi-source data-retrieval-based scheduling heuristic that assigns interdependent tasks to compute resources based on both data retrieval time and task-computation time. We carry out experiments using real applications and deploy them on emulated as well as real environments. With a combination of data retrieval and task-resource mapping technique, we show that our heuristic produces time-efficient schedules that are better than existing heuristic-based techniques for scheduling application workflows.

We present a novel, to our knowledge, architecture for parallel database processing called the multiwavelength optical content-addressable parallel processor (MW-OCAPP). The MW-OCAPP is designed to provide efficient parallel data retrieval and processing by means of moving the bulk of database operations from electronics to optics. It combines a parallel model of computation with the many-degrees-of-processing freedom that light provides. The MW-OCAPP uses a polarization and wavelength-encoding scheme to achieve a high level of parallelism. Distinctive features of the proposed architecture include (1) the use of a multiwavelength encoding scheme to enhance processing parallelism, (2) multicomparand word-parallel bit-parallel equality and magnitude comparison with an execution time independent of the data size or the word size, (3) the implementation of a suite of 11 database primitives, and (4) multicomparand two-dimensional data processing. The MW-OCAPP architecture realizes 11 relational database primitives: difference, intersection, union, conditional selection, maximum, minimum, join, product, projection, division, and update. Most of these operations execute in constant time, independent of the data size. We outline the architectural concepts and motivation behind the MW-OCAPP's design and describe the architecture required for implementing the equality and intersection-difference processing cores. Additionally, a physical demonstration of the multiwavelength equality operation is presented, and a performance analysis of the proposed system is provided.

Parallel Data Retrieval Research Articles

Articles published on Parallel Data Retrieval

Scheduling Workflow Applications Based on Multi-source Parallel Data Retrieval in Distributed Computing Networks

Distributed data organization and parallel data retrieval methods for huge laser scanner point clouds

Multiwavelength optical content-addressable parallel processor for high-speed parallel relational database processing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Parallel Data Retrieval Research Articles

Articles published on Parallel Data Retrieval

Scheduling Workflow Applications Based on Multi-source Parallel Data Retrieval in Distributed Computing Networks

Distributed data organization and parallel data retrieval methods for huge laser scanner point clouds

Multiwavelength optical content-addressable parallel processor for high-speed parallel relational database processing