Write For Us

Lighting Fast Cluster Computing With PySpark

E-Commerce Solutions SEO Solutions Marketing Solutions
247 Views
Published
This talk will introduce PySpark a framework for cluster scale data-intensive computing. PySpark is a deeply integrated set of Python API bindings for the popular Spark computation engine. Spark provides rich data-flow abstractions and sophisticated use of distributed caching to speed up complex analytic processing by several orders of magnitude. It interfaces directly with popular storage layers (e.g. Hadoop HDFS) and is optimized for advanced analytic functions such as machine learning, OLAP processing, and ETL. The talk will introduce the PySpark API and architecture, present use cases, and walk through a demo.
Category
Computing
Tags
Python, data analysis, PySpark, machine learning
Sign in or sign up to post comments.
Be the first to comment