Font size:

Hadoop is an open source software for affordable supercomputing. It provides the distributed file system and the parallel processing required to run a massive computing cluster. This course explains Pig as a data flow scripting tool for interfacing with Hadoop. You'll learn about the installation and configuration of Pig and explore a demonstration of Pig in action. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Learning Objectives
  • Start the course
  • Describe Pig and its strengths
  • Recall the minimal edits needed to be made to the configuration file
  • Install and configure Pig
  • Recall the complex data types used by Pig
  • Recall some of the relational operators used by Pig
  • Use the Grunt shell with Pig Latin
  • Set parameters from both a text file and with the command line
  • Write a Pig script
  • Use a Pig script to filter data
  • Use the FOREACH operator with a Pig script
  • Set parameters and arguments in a Pig script
  • Write a Pig script to count data
  • Perform data joins using a Pig script
  • Group data using a Pig script
  • Cogroup data with a Pig script
  • Flatten data using a pig script
  • Recall the languages that can be used to write user defined functions
  • Create a user defined function for Pig
  • Recall the different types of error categories
  • Use explain in a Pig script
  • Install Pig, use Pig operators and Pig Latin, and retrieve and group records
Register Now
Data Factory with Pig Online course
  • Course ID:
  • Duration:
    113 minutes
  • Price: