Introducing Datawave: Scalable Data Ingest and Query on Apache Accumulo

Back to Schedule

Slides

Video

APIs/Frameworks

Abstract

Out of the box, Accumulo's strengths are difficult to appreciate without first building an application that showcases its capabilities to handle massive amounts of data. Unfortunately, building such an application is non-trivial for many would-be users, which affects Accumulo's adoption.

In this talk, we introduce Datawave, a complete ingest, query, and analytic framework for Accumulo. Datawave, recently open-sourced by the National Security Agency, capitalizes on Accumulo's capabilities, provides an API for working with structured and unstructured data, and boasts a robust, flexible, and scalable backend.

We'll do a deep dive into Datawave's project layout, table structures, and APIs in addition to demonstrating the Datawave quickstart—a tool that makes it incredibly easy to hit the ground running with Accumulo and Datawave without having to develop a complete application.

Speakers

Drew Farris
Chief Technologist, Booz Allen Hamilton

Drew Farris is a software developer and technology consultant at Booz Allen Hamilton where he helps his client solve problems related to large scale analytics, distributed computing and machine learning. He is a member of the Apache Software Foundation and a contributing author to Manning Publications’ “Taming Text” and the Booz Allen Hamilton “Field Guide to Data Science”.

Hannah Pellón
Cyber Software Engineer, Northrop Grumman

Hannah Pellón is a software developer at Northrop Grumman focusing on distributed computing technologies.