Rs. 1999 Rs. 599
Learn Spark for Data Science with Python
Created by Stanford and IIT alumni with work experience in Google and Microsoft, this course will teach you how to process data using Spark for analytics, machine learning and data science.
Big Data analysis is a very valuable skill in the job market and this course will teach you the hottest technology in big data analytics: Apache Spark.
What is Spark? If you are an analyst or a data scientist, having several platforms such as SQL, Python, R, Java etc. for working with data might be something you are well-versed in. Apache Spark is a fast cluster computing framework used for large-scale data processing. With Spark, you have a single engine where you can explore and play with substantial data, run machine learning algorithms, and then use the same system to productionize your code.
By the end of this course you will be able to:
- Work with a variety of datasets ranging from predicting airplane departure delays to social networks and product ratings.
- Utilize all the features and libraries of Spark such as RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming and GraphX.
- Utilize Apache Spark for a number of analytics and Machine Learning tasks.
- Implement complex algorithms like PageRank and Recommendations in Music.
Prerequisites and Target Audience
Prerequisites for the course:
- To subscribe to this course, you need to have knowledge of Python. You must be able to write Python code directly in the PySpark shell. If you have IPython Notebook installed, this course will show you how to configure it for Spark.
- To get a firm grasp of the Java module, you should have knowledge of Java. An IDE which supports Maven, like IntelliJ IDEA/Eclipse would be useful.
- All examples work with or without Hadoop. If you want to use Spark with Hadoop, you will have to have Hadoop installed on your system. It could be either in pseudo-distributed or cluster mode.