SPARK 2.0

What is new in SPARK 2.0: Easier: SQL and Streamlined APIs One thing we are proud of in Spark is creating APIs that are simple, intuitive, and expressive. Spark 2.0 continues this tradition, with focus on two areas: (1) standard SQL support and (2) unifying DataFrame/Dataset API. On the SQL side, we have significantly expanded … More SPARK 2.0

Hadoop vs Spark

Listen in on any conversation about big data, and you’ll probably hear mention of Hadoop or Apache Spark. Here’s a brief look at what they do and how they compare. 1: They do different things. Hadoop and Apache Spark are both big-data frameworks, but they don’t really serve the same purposes. Hadoop is essentially a … More Hadoop vs Spark

Few minutes guide to Understand the Significance of Apache Spark

So what is Spark? Spark is another execution framework. Like MapReduce, it works with the filesystem to distribute your data across the cluster, and process that data in parallel. Like MapReduce, it also takes a set of instructions from an application written by a developer. MapReduce was generally coded from Java; Spark supports not only … More Few minutes guide to Understand the Significance of Apache Spark

Apache Spark

Apache Spark is a fast and general engine for large-scale data processing. Although Mapreduce is great for large scale data processing, it is not friendly for iterative algorithms or interactive analytic because the data have to be repeatedly loaded for each iteration or be materialized and replicated on the distributed file system between successive jobs. Apache Spark is designed … More Apache Spark