Achieving High Performance Results from Accumulo and other Open Source Big Data Frameworks

Back to Schedule

Slides

Video

Sponsored

Abstract

Open source frameworks can provide access to large amounts of data in a productive and fault resilient way on scale-out commodity hardware systems. Implementing High Performance Data Analytic systems can maintain framework productivity and improve data analyst performance. One proven approach is integrating the open source framework with High Performance Computing hardware. In order to achieve this performance, the framework must be viewed as an application and migrated to the HPC system, and may include the surgical replacement of key functions, creating a custom version of the framework.

We will discuss the merits and procedures standing up open source frameworks, such as Accumulo, on an HPC-class system. Additionally the panelists will share best practices for addressing bottlenecks, improving performance and working with the open source community.

Speakers

Doug Gerrelts
Senior Technical Lead, General Dynamics Mission Systems
(Moderator) Doug Gerrelts is the Senior Technical Lead for GDMS’ High Performance Computing (HPC) group. In this and other roles, he has been leading technical teams developing HPC systems and solutions for Government customers for more than ten years.
Bill Leinberger
Technical Lead for High Performance Data Analytics, General Dynamics Missions Systems
(Panelist) Dr. Leinberger has worked in High Performance Computing (HPC) for over 30 years. He began his career with Control Data Corporation, which transitioned to Computing Devices International, which was then acquired by General Dynamics. Throughout his career, he has worked with the same HPC group in the design, development, integration, and deployment of mission specific HPC systems. He is currently the Technical Lead for the High Performance Data Analytics (HPDA) team in the Cyber Systems Division of General Dynamics, Mission Systems. He holds a Bachelor’s degree in Computer and Electrical Engineering from Purdue University, and a PhD in Computer Science and Engineering from the University of Minnesota.
Josh Elser
Member of Technical Staff, Hortonworks
(Panelist) Josh is a member of the engineering staff at Hortonworks. He is strong advocate for open source software and is an Apache Accumulo committer and PMC member. He is also a committer and PMC member of Apache Slider (incubating) and regularly contributes to other Apache projects in the Apache Hadoop ecosystem. He holds a Bachelor's degree in Computer Science from Rensselaer Polytechnic Institute.
James Maltby, Ph.D.
Product Manager, Analytics Products, Cray Inc.
(Panelist) Dr. James Maltby is a Product Manager for Cray, Inc. and specializes in mapping scientific and business applications to new computer architectures. He has an academic background in physics and engineering, specializing in radiation transport. He has worked for Cray since 2000, developing software for the massively multithreaded Cray XMT (and its MTA-1 predecessor) as a well as the other Cray systems. He also led the Bioinformatics practice at Cray for several years, using HPC to solve Life Science problems. His most recent project involved developing a highly parallel in-memory Semantic Database for the XMT architecture, now available as Urika-GD by Cray, Inc.