From Key Value Pairs to Base Pairs - Apache Accumulo and Precision Medicine

Back to Schedule



Use Case


In this talk, we will show how Apache Accumulo can be used to provide quick and secure access to billions of genomic observations for clinical and research purposes.

We’ll start by introducing the precision medicine problem space:

Specifically, we will focus on critical challenges related to cohort analysis:

Essentially, these challenges are “two sides of the same coin”: mapping from genotype (an organism’s full hereditary information) to phenotype (an organism’s actual observed properties) and then back again. We will explore how you can define a key schema in Accumulo to move between these two “sides” easily and efficiently.

We will also demonstrate how the Accumulo SeekingFilter and well-understood constructs (like a transpose table) can be used to address these core challenges.

We will also discuss the access control requirements necessary in the precision medicine domain, and how Accumulo’s cell-level security model can be used to satisfy these requirements from both a regulatory and organizational perspective.

Finally, we will demonstrate an implementation of these concepts using Spark and Zeppelin to analyze a dataset of several billion genomic observations. This will show how Accumulo’s distributed index gives sub-second responses to multi-criteria point queries, as well as interactive access to large datasets.


Russ Weeks
Software Architect, PHEMI Systems
Russ Weeks is a Software Architect at PHEMI. Prior to joining PHEMI Systems, Russ worked in the network management groups at Ericsson and Cray Supercomputers, where he discovered a passion for distributed data structures and algorithms. PHEMI Systems is a Vancouver, BC-based startup focused on the storage, retention and governance of structured and unstructured data.