Font size:

Hadoop is an open source software for affordable supercomputing. It provides the distributed file system and the parallel processing required to run a massive computing cluster. This course explains Pig as a data flow scripting tool for interfacing with Hadoop. You'll learn about the installation and configuration of Pig and explore a demonstration of Pig in action. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
  • start the course
  • describe Pig and its strengths
  • recall the minimal edits needed to be made to the configuration file
  • install and configure Pig
  • recall the complex data types used by Pig
  • recall some of the relational operators used by Pig
  • use the Grunt shell with Pig Latin
  • set parameters from both a text file and with the command line
  • write a Pig script
  • use a Pig script to filter data
  • use the FOREACH operator with a Pig script
  • set parameters and arguments in a Pig script
  • write a Pig script to count data
  • perform data joins using a Pig script
  • group data using a Pig script
  • cogroup data with a Pig script
  • flatten data using a pig script
  • recall the languages that can be used to write user defined functions
  • create a user defined function for Pig
  • recall the different types of error categories
  • use explain in a Pig script
  • install Pig, use Pig operators and Pig Latin, and retrieve and group records
Register Now
Data Factory with Pig Online course
  • Course ID:
    264104
  • Duration:
    113 minutes
  • Price:
    $75