Watch Demo

Rs. 1299  Rs. 599

The Complete Apache Oozie Tutorial

This Apache Oozie tutorial, created by a Stanford alumni team, teaches how to work with Coordinators, Bundles and Workflows in Oozie with real-time examples to schedule Hadoop jobs.

03h:47m
Lifetime access
47 learners
Introduction to the Course on Apache Oozie

Apache Oozie, a workflow scheduler system for Apache Hadoop, makes it easy to work with complex dependencies, manage a multitude of jobs at different time schedules, and manage end-to-end data pipelines. It is sometimes considered to be formidable. This is because Oozie is entirely written in XML and is challenging to debug when things go wrong. But, once you have figured out how it works, it's a piece of cake. Oozie permits managing Hadoop jobs, Java scripts, programs, and other executables that have the same basic setup. It facilitates clean and logical management of dependencies. The key to master Oozie is to know the right configuration parameters that will get the job done.

This Apache Oozie tutorial broadly covers Workflow Management, Time-based and Data-based triggers for Workflows, and Data Pipelines using Bundles.

Read more

Course Objectives

By the end of this Apache Oozie tutorial you will:

  • Install and set up Oozie on your system
  • Learn how to configure workflows so that you can run jobs on Hadoop
  • Know how to configure data-triggered and time-triggered workflows
  • Be able to use Bundles in order to configure data pipelines

Prerequisites and Target Audience

This Apache Oozie tutorial requires you to have basic knowledge of the Hadoop eco-system. You should also know how to run MapReduce jobs on Hadoop.

Course Plan
Certificate of completion

1. A Brief Overview Of Oozie
3 videos
2. Workflows: A Directed Acyclic Graph Of Tasks
7 videos
Running MapReduce on the command line 04:40

The lifecycle of a Workflow 06:12

Running our first Oozie Workflow MapReduce application 11:15

The job.properties file 08:45

The workflow.xml file 12:07

A Shell action Workflow 07:46

Control nodes, Action nodes and Global configurations within Workflows 09:57
3. Coordinators: Managing Workflows
6 videos
Running our first Coordinator application 12:27

A time-triggered Coordinator definition 08:52

Coordinator control mechanisms 07:09

Data availability triggers 10:03

Running a Coordinator which waits for input data 06:11

Coordinator configuration to use data triggers 15:25
4. Bundles: A Collection Of Coordinators For Data Pipelines
2 videos
Bundles and why we need them 09:15

The Bundle kick-off time 11:12
5. Installing Hadoop in a Local Environment
4 videos
Hadoop Install Modes 08:32

Setup a Virtual Linux Instance (For Windows users) 15:31

Hadoop Standalone mode Install 09:33

Hadoop Pseudo-Distributed mode Install 14:25

Meet the Author


Loonycorn
4 Alumni of Stanford, IIM-A, IITs and Google, Microsoft, Flipkart

Loonycorn is a team of 4 people who graduated from reputed top universities. Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh have spent years (decades, actually) working in the Tech sector across the world.

  • Janani: Graduated from Stanford and has worked for 7 years at Google (New York, Singapore). She also worked at Flipkart and Microsoft.
  • Vitthal: Studied at Stanford; worked at Google (Singapore), Flipkart, Credit Suisse, and INSEAD.
  • Swetha: An IIM Ahmedabad and IIT Madras alumnus having experience of working in Flipkart.
  • Navdeep: An IIT Guwahati alumnus and Longtime Flipkart employee.
  • More from Loonycorn
    Ratings and Reviews     4.9/5

    You may also like