Graphulo: Graph Analytics in Apache Accumulo

Back to Schedule



APIs / Frameworks Graph


In this presentation, we will describe Graphulo, a tool built for Apache Accumulo to enable in-database graph analytics. Specifically, we will describe its architecture, usage, and application to graph algorithms of interest to the community. Graphs are used in many applications to determine the relationship between various entities and understand how complex networks are structured. As the size of graphs increase rapidly, performing efficient large-scale graph analytics can be difficult. Graphulo leverages the Accumulo computational infrastructure to perform graph analytics directly in-database. Fundamentally, Graphulo conforms to the GraphBLAS standard that has demonstrated its ability to represent diverse graph analytics. This ensures that Graphulo can be applied to a wide variety of analytics and can grow with the GraphBLAS community. Further, this representation allows us to naturally connect Graphulo with emerging polystore database systems. Initial results obtained using Graphulo indicate scalability and applicability to a wide variety of graph problems. Finally, we solicit feedback from the Accumulo community and describe our future development goals.


Dr. Vijar Gadepally
Scientist, Massachusetts Institute of Technology
Dr. Vijay Gadepally is a scientist at MIT's Lincoln Laboratory and Computer Science and Artificial Intelligence Laboratory (CSAIL). Vijay's research is in the area of high performance computing, big data/IoT systems, security, analytics, and advanced database technologies. He holds a Ph.D. in Electrical and Computer Engineering from The Ohio State University and a B.Tech degree in Electrical Engineering from the Indian Institute of Technology (IIT), Kanpur. At Ohio State, Vijay’s research focused on high performance computing, intelligent transportation systems and mathematical models for driver behavior. Vijay has also worked at Raytheon Company and Rensselaer Polytechnic Institute. Vijay has over 50 peer reviewed publications and talks.
Dr. Timothy Weale
Scientist, Department of Defense (DoD)
Dr. Timothy Weale is a scientist with the U.S. Department of Defense. Dr. Weale currently researches enterprise computing technologies, with a focus on big data infrastructure. He previously led software efforts in the area of information sharing and compliance. Dr. Weale holds a Ph.D. in Computer Science and Engineering from The Ohio State University and a Bachelors of Science degree in Computer Science from the University of Dayton.
Jeremy Kepner
Laboratory Fellow, Massachusetts Institute of Technology
Dr. Jeremy Kepner is an MIT Lincoln Laboratory Fellow who brought supercomputing to Lincoln Laboratory through the establishment of LLGrid, the Massachusetts Green High Performance Computing Center, and the MIT Lincoln Laboratory Supercomputing Center. He developed a novel database management language and schema (D4M) and is the architect of pMatlab (Parallel Matlab Toolbox), which is used by hundreds of Lincoln Laboratory staff and tens of thousands of scientists and engineers worldwide. Using the Accumulo database, his team has set world records in database performance. He also led the development of the Parallel Vector Tile Optimizing Library, which earned a 2011 R&D 100 Award. Dr. Kepner leads a number of standards groups and has chaired the IEEE HPEC Conference for many years. He has authored two SIAM bestselling books on parallel MATLAB and graph algorithms.
Dylan Hutchison
Graduate Student, University of Washington
As a University of Washington graduate student and avid idealist, Dylan Hutchison bridges databases and high performance computing in order to build faster, more universal data management systems. He emphasizes a mathematics-first approach, grounded in relational and linear algebra. Starting from work at the MIT Lincoln Laboratory, Dylan authored the Graphulo library for matrix processing in the Apache Accumulo database. He is supported by a National Science Foundation fellowship and holds a B.E. in Computer Engineering, M.S. in Computer Science, and M.S. in Applied Mathematics from Stevens Institute of Technology.