Bulk ingest enables Accumulo to import externally-prepared data into existing tables. Unlike ingest via batch writers, much of the work of organizing data can be left to external processing frameworks such as MapReduce and scaled independently of the Accumulo cluster itself. This reduces the work required of the tablet servers to support ingest, freeing resources to support other operations.
Under the hood, bulk ingest involves a number a moving parts and accounting for a variety of failure scenarios. This talk covers the components of the bulk ingest process in-depth and describes past, current and future implementations of this capabiltiy. Attendees will leave this session with an understanding of bulk ingest that will enable troubleshooting capacity estimation and performance management.