SPARK 2.0

What is new in SPARK 2.0: Easier: SQL and Streamlined APIs One thing we are proud of in Spark is creating APIs that are simple, intuitive, and expressive. Spark 2.0 continues this tradition, with focus on two areas: (1) standard SQL support and (2) unifying DataFrame/Dataset API. On the SQL side, we have significantly expanded… More SPARK 2.0

Why should we learn Hadoop, R and Python? Part III

PS: This is continuation of Part II. Python Python is easy to learn – Like Java, C, and Perl, the basics of Python are easier to grasp for newbies. A programmer coding in Python would be required to write less code owing to its beginner-friendly features like code readability, simple syntax, and ease-of-implementation. Python is… More Why should we learn Hadoop, R and Python? Part III

Why should we learn Hadoop, R and Python? Part II

PS: This is continuation of Part I. R At heart, a good data scientist is a passionate coder-slash-statistician –and there’s no better programming language for a statistician to learn than R. THE standard among statistical programming languages, R is sometimes called the ‘golden child’ of data science. It’s a popular skill among big data analysts,… More Why should we learn Hadoop, R and Python? Part II

Why should we learn Hadoop, R and Python? Part I

As the Big Data Analytics domain continues to acquire greater prominence at SaaS (Software as a Service) companies, the rush to break into Big Data has reached unprecedented levels. With plenty of job opportunities and considerably high pay benefits, Big Data Analytics is a safe bet for any professional looking for a high-paying career that… More Why should we learn Hadoop, R and Python? Part I

Open Data Platform

The Open Data Platform (ODP) initiative is an industry effort focused on simplifying adoption of Apache Hadoop for the enterprise, and enabling big data solutions to flourish through improved ecosystem interoperability. It relies on the governance of the Apache Software Foundation community to innovate and deliver the Apache project technologies included in the ODP core… More Open Data Platform

Moving Big Data from Mainframe to Hadoop

A blog from Cloudera. Apache Sqoop provides a framework to move data between HDFS and relational databases in a parallel fashion using Hadoop’s MR framework. As Hadoop becomes more popular in enterprises, there is a growing need to move data from non-relational sources like mainframe datasets to Hadoop. Following are possible reasons for this: HDFS… More Moving Big Data from Mainframe to Hadoop