Course Description
Create a data set with Kite SDK
Develop custom Flume components for data ingestion
Manage a multi-stage workflow with Oozie
Analyze data with Crunch
Write user-defined functions for Hive and Impala
Transform data with Morphlines
Index data with Cloudera Search
Agenda
1. Application Architecture
Scenario Explanation
Understanding the Development Environment
Identifying and Collecting Input Data
Selecting Tools for Data Processing and Analysis
Presenting Results to the User
2. Defining and Using Data Sets
Metadata Management
What is Apache Avro?
Avro Schemas
Avro Schema Evolution
Selecting a File Format
Performance Considerations
3. Using the Kite SDK Data Module
What is the Kite SDK?
Fundamental Data Module Concepts
Creating New Data Sets Using the Kite SDK
Loading, Accessing, and Deleting a Data Set
4. Importing Relational Data with Apache Sqoop
What is Apache Sqoop?
Basic Imports
Limiting Results
Improving Sqoop's Performance
Sqoop 2
5. Capturing Data with Apache Flume
What is Apache Flume?
Basic Flume Architecture
Flume Sources
Flume Sinks
Flume Configuration
Logging Application Events to Hadoop
6. Developing Custom Flume Components
Flume Data Flow and Common Extension Points
Custom Flume Sources
Developing a Flume Pollable Source
Developing a Flume Event-Driven Source
Custom Flume Interceptors
Developing a Header-Modifying Flume Interceptor
Developing a Filtering Flume Interceptor
Writing Avro Objects with a Custom Flume Interceptor
7. Managing Workflows with Apache Oozie
The Need for Workflow Management
What is Apache Oozie?
Defining an Oozie Workflow
Validation, Packaging, and Deployment
Running and Tracking Workflows Using the CLI
Hue UI for Oozie
8. Processing Data Pipelines with Apache Crunch
What is Apache Crunch?
Understanding the Crunch Pipeline
Comparing Crunch to Java MapReduce
Working with Crunch Projects
Reading and Writing Data in Crunch
Data Collection API
Functions
Utility Classes in the Crunch API
9. Working with Tables in Apache Hive
What is Apache Hive?
Accessing Hive
Basic Query Syntax
Creating and Populating Hive Tables
How Hive Reads Data
Using the RegexSerDe in Hive
10. Developing User-Defined Functions
What are User-Defined Functions?
Implementing a User-Defined Function
Deploying Custom Libraries in Hive
Registering a User-Defined Function in Hive
11. Executing Interactive Queries with Impala
What is Impala?
Comparing Hive to Impala
Running Queries in Impala
Support for User-Defined Functions
Data and Metadata Management
12. Understanding Cloudera Search
What is Cloudera Search?
Search Architecture
Supported Document Formats
13. Indexing Data with Cloudera Search
Collection and Schema Management
Morphlines
Indexing Data in Batch Mode
Indexing Data in Near Real Time
14. Presenting Results to Users
Solr Query Syntax
Building a Search UI with Hue
Accessing Impala through JDBC
Powering a Custom Web Application with Impala and Search
Audience
Developers, engineers, and architects who want to use Hadoop and related tools to solve real-world problems