Apache Accumulo is known for being a high performance sorted key/value database, but achieving high performance in your application still requires good development practices. Often, developers will extrapolate from small-scale tests to argue that the application will perform well at higher scales. Unfortunately, design and implementation flaws that aren't visible at small scale inevitably show up in production at a much higher cost to fix.
Sqrrl is an application built on Accumulo that leverages log storage, indexing, graphs, and statistics modeling while supporting high throughput ingest and distributed analytic processing. At Sqrrl, we ensure reliable performance using a variety of modeling and simulation techniques. This talk will show examples of insights and performance improvements gained from micro-benchmarking, analog simulation, and predictive model validation.