Accumulo In-Depth: Building Bulk Ingest

Back to Schedule

Slides

Video

Sponsored

Abstract

Bulk ingest enables Accumulo to import externally-prepared data into existing tables. Unlike ingest via batch writers, much of the work of organizing data can be left to external processing frameworks such as MapReduce and scaled independently of the Accumulo cluster itself. This reduces the work required of the tablet servers to support ingest, freeing resources to support other operations.

Under the hood, bulk ingest involves a number a moving parts and accounting for a variety of failure scenarios. This talk covers the components of the bulk ingest process in-depth and describes past, current and future implementations of this capabiltiy. Attendees will leave this session with an understanding of bulk ingest that will enable troubleshooting capacity estimation and performance management.

Speakers

Eric Newton
Senior Software Developer, SWComplete
Eric Newton has been a programmer for over 30 years, and has worked on Accumulo since 2009. He has been an open-source contributor and consumer since 1988. Through the years, his distributed communications systems work has included Air Traffic Control, Systems Monitoring and Databases. Eric has started 3 of his own companies and helped several other businesses start.