MapReduce and Its Discontents

MapReduce and Its Discontents

Apache Hadoop is the current darling of the “Big Data” world. At its core is the MapReduce computing model for decomposing large data-analysis jobs into smaller tasks and distributing those tasks around a cluster. MapReduce itself was pioneered at Google for indexing the Web and other computations over massive data sets.
 
In this talk, I describe MapReduce and discuss strengths, such as cost-effective scalability, as well as weaknesses, such as its limits for real-time event stream processing and the relative difficulty of writing MapReduce programs. I briefly show you how higher-level languages ease the development burden and provide useful abstractions for the developer.
 
Then I discuss emerging alternatives, such as Google’s Pregel system for graph processing and event stream processing systems like Storm, as well as the role of higher-level languages in optimizing the productivity of developers. Finally, I speculate about the future of Big Data technology.
 

Click Here to view this presentation