The HaLoop approach to large-scale iterative data analysis | Paul G. Allen School of Computer Science & Engineering

Title	The HaLoop approach to large-scale iterative data analysis
Publication Type	Journal Article
Year of Publication	2012
Authors	Bu Y, Howe B, Balazinska M, Ernst MD
Journal	The VLDB Journal
Volume	21
Pagination	169–190
Abstract	The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce has enjoyed particular success. However, MapReduce lacks built-in support for iterative programs, which arise naturally in many applications including data mining, web ranking, graph analysis, and model fitting. This paper (This is an extended version of the VLDB 2010 paper ``HaLoop: Efficient Iterative Data Processing on Large Clusters'' PVLDB 3(1):285–296, 2010.) presents HaLoop, a modified version of the Hadoop MapReduce framework, that is designed to serve these applications. HaLoop allows iterative applications to be assembled from existing Hadoop programs without modification, and significantly improves their efficiency by providing inter-iteration caching mechanisms and a loop-aware scheduler to exploit these caches. HaLoop retains the fault-tolerance properties of MapReduce through automatic cache recovery and task re-execution. We evaluated HaLoop on a variety of real applications and real datasets. Compared with Hadoop, on as much data between mappers and reducers in the applications that we tested.
Downloads	https://homes.cs.washington.edu/~mernst/pubs/haloop-vldb2010-slides.pdf VLDB 2010 slides (PDF) https://homes.cs.washington.edu/~mernst/pubs/haloop-vldb2010-slides.ppt VLDB 2010 slides (PowerPoint) https://code.google.com/archive/p/haloop/ HaLoop implementation https://homes.cs.washington.edu/~mernst/pubs/haloop-vldb2012.pdf PDF
Citation Key	BuHBErnst2012

Google Scholar