The ability to collect and analyze large amounts of data is a growing problem amongst the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data: volume, velocity and variety.
This tutorial aims to provide researchers and practitioners with a range of tools and techniques that they can use in conjunction with Apache Accumulo to close this gap. The proposed tutorial will focus on building solid fundamentals using a rapid prototyping tool – the Dynamic Distributed Dimensional Data Model (D4M) – to quickly prototype new algorithms that can be tested with Apache Accumulo. The tutorial will be suitable for participants from all levels of experience using Apache Accumulo. The tutorial will begin with a general introduction of the big data landscape in order to align terminology and provide a unified view of the system regardless of participant background. The tutorial will then discuss systems engineering and how it applies to big data systems. We will then introduce D4M and provide examples of D4M being used for analytics such as dimensional analysis and background model fitting. We will then discuss current areas of research on security and privacy as well as graph algorithms. Tutorial slides will be distributed to participants and brief demonstrations will be used to reinforce concepts.
The goals of the tutorial are 1) to provide participants with a theoretical foundation of big data; 2) to demonstrate how Accumulo can be used to solve real problems from diverse domains; and 3) describe future avenues of research. This tutorial provides a deep dive into the topics presented at the 2014 Accumulo Summit in the presentation entitled: “Addressing Big Data Challenges through Innovative Architecture, Databases and Software”.
Technical Staff, Lincoln Laboratory, MIT
Dr. Vijay Gadepally is a Technical Staff Member at the Massachusetts Institute of Technology, Lincoln Laboratory and Computer Science and AI Laboratory (CSAIL). Vijay pursues research in the areas of big data, machine learning, high performance computing, and pattern recognition. Vijay has previously worked as a Post-Graduate Intern with Raytheon Company, a visiting scholar with the Rensselaer Polytechnic Institute and as an Intern with the Indian Institute of Technology, Mumbai. Vijay holds a M.Sc. and PhD in Electrical and Computer Engineering from The Ohio State University. At Ohio State, Vijay's research focused on the estimation of driver behavior for autonomous vehicle applications and high performance computing. His dissertation in signal processing focused on developing mathematical models to accurately estimate and predict driver behavior to improve the safety of autonomous vehicles operating in a mixed-urban environment. At Ohio State, Vijay held concurrent appointments with the Department of Electrical and Computer Engineering and the Ohio Supercomputer Center and was recipient of a 2012 Outstanding Graduate Student award. Vijay Gadepally holds a Bachelors of Technology (B.Tech) degree in Electrical Engineering from the Indian Institute of Technology, Kanpur. For further information: http://vijayg.mit.edu/about-vijay
Associate Technical Staff, Lincoln Laboratory, MIT
Lauren Edwards is an Associate Technical Staff Member at the Massachusetts Institute of Technology, Lincoln Laboratory. Lauren’s interests include big data, machine learning, database technologies, and the application of these to diverse fields. Prior to joining MIT Lincoln Laboratory, Lauren worked as a Product Development Intern at Genscape, where she developed a nearest neighbor model to pinpoint similar past power market days and their corresponding electricity price drivers. Lauren has received a M.S. in Industrial Mathematics from The University of Massachusetts Lowell, focusing in computer science applications such as machine learning and algorithms, and a B.S. in Mathematical Sciences from Worcester Polytechnic Institute, where Lauren explored biological applications of mathematical modeling.
Senior Technical Staff, Lincoln Laboratory, MIT
Dr. Jeremy Kepner is a Senior Technical Staff Member at MIT Lincoln Laboratory, MIT CSAIL, and the MIT Math Department. Prior to joining MIT he was a DoE Computational Science Fellow at Princeton University where he received his Ph.D. in astrophysics. Dr. Kepner leads the supercomputing and big data research efforts at MIT Lincoln Laboratory, which has 3,500 employees and is the largest lab at MIT and accounts for half MIT’s external research funding. His team conducts research in a wide range of computing areas and oversees the operation of a variety supercomputers that service hundreds of users at MIT. Throughout his career the focus of Dr. Kepner’s research has been creating and delivering computing systems that require minimal training to operate, thus allowing scientists to be scientists and engineers to be engineers. In addition to his two books, both of which are SIAM bestsellers, Dr. Kepner has published over a hundred works in data mining, databases, high performance computing, graph algorithms, cyber security, visualization, cloud computing, random matrix theory, abstract algebra, and bioinformatics. Dr. Kepner is the most published author in the 60-year history of MIT Lincoln Laboratory. For further information: http://www.mit.edu/~kepner/