Conquering Big Data Through the Usage of the Wrangler Supercomputer

Jorge Salazar

doi:10.1007/978-3-319-33742-5_16

Abstract

Data-intensive computing brings a new set of challenges that do not completely overlap with those met by the more typical and even state-of-the-art High Performance Computing (HPC) systems. Working with ‘big data’ can involve analyzing thousands of files that need to be rapidly opened, examined and cross-correlated—tasks that classic HPC systems might not be designed to do. Such tasks can be efficiently conducted on a data-intensive supercomputer like the Wrangler supercomputer at the Texas Advanced Computing Center (TACC). Wrangler allows scientists to share and analyze the massive collections of data being produced in nearly every field of research today in a user-friendly manner. It was designed to work closely with the Stampede supercomputer, which is ranked as the number ten most powerful in the world by TOP500, and is the HPC flagship of TACC. Wrangler was designed to keep much of what was successful with systems like Stampede, but also to introduce new features such as a very large flash storage system, a very large distributed spinning disk storage system, and high speed network access. This allows a new way for users to access HPC resources with data analysis needs that weren’t being fulfilled by traditional HPC systems like Stampede. In this chapter, we provide an overview of the Wrangler data-intensive HPC system along with some of the big data use-cases that it enables.

Full Text