Query Size Research Articles

BackgroundThe Basic Local Alignment Search Tool (BLAST) is a suite of commonly used algorithms for identifying matches between biological sequences. The user supplies a database file and query file of sequences for BLAST to find identical sequences between the two. The typical millions of database and query sequences make BLAST computationally challenging but also well suited for parallelization on high-performance computing clusters. The efficacy of parallelization depends on the data partitioning, where the optimal data partitioning relies on an accurate performance model. In previous studies, a BLAST job was sped up by 27 times by partitioning the database and query among thousands of processor nodes. However, the optimality of the partitioning method was not studied. Unlike BLAST performance models proposed in the literature that usually have problem size and hardware configuration as the only variables, the execution time of a BLAST job is a function of database size, query size, and hardware capability. In this work, the nucleotide BLAST application BLASTN was profiled using three methods: shell-level profiling with the Unix “time” command, code-level profiling with the built-in “profiler” module, and system-level profiling with the Unix “gprof” program. The runtimes were measured for six node types, using six different database files and 15 query files, on a heterogeneous HPC cluster with 500+ nodes. The empirical measurement data were fitted with quadratic functions to develop performance models that were used to guide the data parallelization for BLASTN jobs.ResultsProfiling results showed that BLASTN contains more than 34,500 different functions, but a single function, RunMTBySplitDB, takes 99.12% of the total runtime. Among its 53 child functions, five core functions were identified to make up 92.12% of the overall BLASTN runtime. Based on the performance models, static load balancing algorithms can be applied to the BLASTN input data to minimize the runtime of the longest job on an HPC cluster. Four test cases being run on homogeneous and heterogeneous clusters were tested. Experiment results showed that the runtime can be reduced by 81% on a homogeneous cluster and by 20% on a heterogeneous cluster by re-distributing the workload.DiscussionOptimal data partitioning can improve BLASTN’s overall runtime 5.4-fold in comparison with dividing the database and query into the same number of fragments. The proposed methodology can be used in the other applications in the BLAST+ suite or any other application as long as source code is available.

Read full abstract

Private information retrieval (PIR) protocols ensure that a user can download a file from a database without revealing any information on the identity of the requested file to the servers storing the database. While existing protocols strictly impose that no information is leaked on the file’s identity, this work initiates the study of the tradeoffs that can be achieved by relaxing the perfect privacy requirement. We refer to such protocols as weakly-private information retrieval (WPIR) protocols. In particular, for the case of multiple noncolluding replicated servers, we study how the download rate, the upload cost, and the access complexity can be improved when relaxing the perfect privacy constraint. To quantify the information leakage on the requested file’s identity we consider mutual information (MI), worst-case information leakage, and maximal leakage (MaxL). We present two WPIR schemes, denoted by Scheme A and Scheme B, based on two recent PIR protocols and show that the download rate of the former can be optimized by solving a convex optimization problem. We also show that Scheme A achieves an improved download rate compared to the recently proposed scheme by Samy <i>et al.</i> under the so-called <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-privacy metric. Additionally, a family of schemes based on partitioning is presented. Moreover, we provide an information-theoretic converse bound for the maximum possible download rate for the MI and MaxL privacy metrics under a practical restriction on the alphabet size of queries and answers. For two servers and two files, the bound is tight under the MaxL metric, which settles the WPIR capacity in this particular case. Finally, we compare the performance of the proposed schemes and their gap to the converse bound.

Read full abstract

Query Size Research Articles

Related Topics

Articles published on Query Size

ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains

Efficient computation of Top-k G-Skyline groups on large-scale database

Selectivity Estimation for Queries Containing Predicates over Set-Valued Attributes

EnsemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R.

Blockchain-Assisted Secure Data Sharing Protocol With a Dynamic Multiuser Keyword Search in IIoT

Cardinality estimation of activity trajectory similarity queries using deep learning

Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters

A PID-Based kNN Query Processing Algorithm for Spatial Data.

Transformer for Nonintrusive Load Monitoring: Complexity Reduction and Transferability

A plurality problem with three colors and query size three

COORDINATE INDEXING ALGORITHMS TAKING INTO ACCOUNT THE CLASSIFICATION SIGNS-TERMS IN THE SUBJECT AREA

A Hybrid Method for Equivalence Checking Between System Level and RTL

Multi-Server Weakly-Private Information Retrieval

A Connection Access Mechanism of Distributed Network based on Block Chain

On non-adaptive majority problems of large query size

SamQL: a structured query language and filtering tool for the SAM/BAM file format

The parameterized complexity and kernelization of resilience for database queries

Finite Open-world Query Answering with Number Restrictions

Virtualization Based Efficient Service Matching and Discovery in Internet of Things

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Query Size Research Articles

Related Topics

Articles published on Query Size

ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains

Efficient computation of Top-k G-Skyline groups on large-scale database

Selectivity Estimation for Queries Containing Predicates over Set-Valued Attributes

EnsemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R.

Blockchain-Assisted Secure Data Sharing Protocol With a Dynamic Multiuser Keyword Search in IIoT

Cardinality estimation of activity trajectory similarity queries using deep learning

Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters

A PID-Based kNN Query Processing Algorithm for Spatial Data.

Transformer for Nonintrusive Load Monitoring: Complexity Reduction and Transferability

A plurality problem with three colors and query size three

COORDINATE INDEXING ALGORITHMS TAKING INTO ACCOUNT THE CLASSIFICATION SIGNS-TERMS IN THE SUBJECT AREA

A Hybrid Method for Equivalence Checking Between System Level and RTL

Multi-Server Weakly-Private Information Retrieval

A Connection Access Mechanism of Distributed Network based on Block Chain

On non-adaptive majority problems of large query size

SamQL: a structured query language and filtering tool for the SAM/BAM file format

The parameterized complexity and kernelization of resilience for database queries

Finite Open-world Query Answering with Number Restrictions

Virtualization Based Efficient Service Matching and Discovery in Internet of Things

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries.