Watch Demo

Rs. 1499  Rs. 599

Pig Tutorial: Process Big Data

Transform unstructured data into structured, predictable, and useful information that is ready for reporting and analysis, using Pig

Lifetime access
81 learners
Course Overview

Pig is aptly named. It will devour any data that you feed it and bring home the bacon!

Let's parse that

Omnivorous: Pig works with unstructured data. Many of Pig's operations are very similar to SQL. But it can perform operations on data which do not have fixed schema. Pig is awesome at wrangling data into a form which is both clean and storable in a data warehouse. This structured data can then be readily utilized for tasks such as reporting and analysis.

Brings home the bacon: With Pig, you can transform data in such a way that makes it organized, predictable and handy, ready for consumption.

What's Covered?

Pig Basics: Operating modes, running a Pig script, the Grunt shell, loading data and creating your first relation, scalar data types, complex data types such as the tuple, Bag and Map, partial schema specification for relations, and displaying and storing relations (the dump and store commands)

Read more

Course Objectives

By the end of this Pig tutorial, you will:

  • Extract information, transform and store unstructured data in a usable form
  • Munge data by writing intermediate-level Pig scripts
  • Work on large data sets by optimizing Pig operations

Prerequisites and Target Audience

The prerequisites for this Pig course are:

  • You should know the basics of SQL and working with data
  • You should know the basics of the Hadoop eco-system and MapReduce tasks

Read more
Course Plan
Certificate of completion

1. Pig For Wrangling Big Data
1 video
You, This Course and Us 01:47
2. Where does Pig fit in?
5 videos
How does Pig compare with Hive? 10:15

Pig Latin as a data flow language 06:17

Pig with HBase 05:18
3. Pig Basics
6 videos
Complex data types - The Tuple, Bag and Map 13:45

Partial schema specification for relations 10:00

Displaying and storing relations - The dump and store commands 03:54
4. Pig Operations And Data Transformations
5 videos
Selecting fields from a relation 10:22

Built-in functions 05:08

Evaluation functions 10:31

Using the distinct, limit and order by keywords 05:04

Filtering records based on a predicate 11:01
5. Advanced Data Transformations
7 videos
Group by and aggregate transformations 12:12

Combining datasets using Join 16:19

Concatenating datasets using Union 04:32

Generating multiple records by flattening complex fields 05:24

Using Co-Group, Semi-Join and Sampling records 09:26

The nested Foreach command 13:47

Debug Pig scripts using Explain and Illustrate 12:55
6. Optimizing Data Transformations
4 videos
Parallelize operations using the Parallel keyword 08:02

Join Optimizations: Multiple relations join, large and small relation join 10:34

Join Optimizations: Skew join and sort-merge join 08:51

Common sense optimizations 05:25
7. A real-world example
2 videos
Summarizing error logs 08:47
8. Installing Hadoop in a Local Environment
4 videos
Hadoop Install Modes 08:32

Setup a Virtual Linux Instance (For Windows users) 15:31

Hadoop Standalone mode Install 09:33

Hadoop Pseudo-Distributed mode Install 14:25

Meet the Author

4 Alumni of Stanford, IIM-A, IITs and Google, Microsoft, Flipkart

Loonycorn is a team of 4 people who graduated from reputed top universities. Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh have spent years (decades, actually) working in the Tech sector across the world.

  • Janani: Graduated from Stanford and has worked for 7 years at Google (New York, Singapore). She also worked at Flipkart and Microsoft.
  • Vitthal: Studied at Stanford; worked at Google (Singapore), Flipkart, Credit Suisse, and INSEAD.
  • Swetha: An IIM Ahmedabad and IIT Madras alumnus having experience of working in Flipkart.
  • Navdeep: An IIT Guwahati alumnus and Longtime Flipkart employee.
  • More from Loonycorn
    Ratings and Reviews     4.7/5

    You may also like