Write For Us

Tech Talk: Matei Zaharia (UC Berkeley) --

E-Commerce Solutions SEO Solutions Marketing Solutions
166 Views
Published
SPARK: A FRAMEWORK FOR ITERATIVE AND INTERACTIVE CLUSTER COMPUTING
Matei Zaharia (UC Berkeley)
Tuesday, February 8, 2011

ABSTRACT

Although the MapReduce programming model has been highly successful, it is not suitable for all applications. We present Spark, a framework optimized for one such type of applications - iterative jobs where a dataset is reused across multiple parallel operations, as is common in many machine learning and graph algorithms. Spark provides a functional programming model similar to MapReduce, but also lets users store datasets in memory between iterations, leading to up to 10x better performance than Hadoop. Spark also makes programming jobs easy by integrating into the Scala programming language. Finally, the ability of Spark to load a dataset into memory and query it repeatedly makes it especially suitable for interactive analysis of big datasets. We have modified the Scala interpreter to make it possible to use Spark interactively, providing a much more responsive experience than Hive and Pig (sub-second latency as opposed to tens of seconds for Hadoop).

BIOGRAPHY

Matei Zaharia is a fourth year graduate student at UC Berkeley. He works with professors Scott Shenker and Ion Stoica on topics in cloud computing, operating systems and networking. He is also a committer on the Apache Hadoop project. He got his undergraduate degree at the University of Waterloo in Canada.
Category
Computing
Tags
talks, cluster computing
Sign in or sign up to post comments.
Be the first to comment