-
Configuring Hue
After watching this video; you will be able to configure Hue and Hue users.
-
Configuring Hue with MySQL
After watching this video; you will be able to configure Hue with My SQL.
-
Configuring Oozie
After watching this video; you will be able to configure Oozie.
-
Installing Hadoop
After watching this video; you will be able to use Cloudera Manager to install Hadoop.
-
Adding DataNodes
After watching this video; you will be able to add a DataNode to a Hadoop cluster.
-
Administering HDFS
After watching this video; you will be able to describe the tools fsck and dfsadmin.
-
Balancing Hadoop Clusters
After watching this video; you will be able to balance a Hadoop cluster.
-
Balancing Resources
After watching this video; you will be able to describe how resources are distributed over the total capacity.
-
Building Client Servers
After watching this video; you will be able to build images for required servers in the Hadoop cluster.
-
Building Hadoop Clients
After watching this video; you will be able to build the Hadoop clients.
-
Building Images for Baseline Servers
After watching this video; you will be able to build an image for a baseline server.
-
Building Images for DataServers
After watching this video; you will be able to build an image for a DataServer.
-
Building Images for Master Servers
After watching this video; you will be able to build an image for a Master Server.
-
Calculating Storage Amounts
After watching this video; you will be able to calculate the correct number of disks required for a storage solution.
-
Configure Logging for Jobs
After watching this video; you will be able to describe how to configure Hadoop jobs for logging.
-
Configure Speculative Execution
After watching this video; you will be able to configure speculative execution.
-
Configuring for High Availability
After watching this video; you will be able to edit the Hadoop configuration files for high availability.
-
Configuring Hadoop Clusters
After watching this video; you will be able to configure a Hadoop cluster.
-
Configuring Hadoop for AWS
After watching this video; you will be able to prepare to install and configure a Hadoop cluster on AWS.
-
Configuring Hadoop Logs
After watching this video; you will be able to configure Hadoop logs.
-
Configuring Hcatalog Daemons
After watching this video; you will be able to configure Hcatalog daemons.
-
Configuring HDFS for Kerberos
After watching this video; you will be able to configure HDFS for Kerberos.
-
Configuring Hive Daemons
After watching this video; you will be able to configure Hive daemons.
-
Configuring Hive for Kerberos
After watching this video; you will be able to configure Hive for Kerberos.
-
Configuring Hue for Kerberos
After watching this video; you will be able to configure Hue for use with Kerberos.
-
Configuring JobHistoryServer Logs
After watching this video; you will be able to describe how to configure JogHistoryServer logs.
-
Configuring log4j for Hadoop
After watching this video; you will be able to describe how to configure log4j for Hadoop.
-
Configuring Logging
After watching this video; you will be able to configure logging for the Hadoop cluster.
-
Configuring Minimum Resources
After watching this video; you will be able to configure minimum share on the fair scheduler.
-
Configuring MySQL Databases
After watching this video; you will be able to configure a MySQL database.
-
Configuring Oozie for Kerberos
After watching this video; you will be able to configure Oozie for use with Kerberos.
-
Configuring Pig and HTTPFS for Kerberos
After watching this video; you will be able to configure Pig and HTTPFS for use with Kerberos.
-
Configuring Preemption
After watching this video; you will be able to configure preemption for the fair scheduler.
-
Configuring Single Resource Fairness
After watching this video; you will be able to configure single resource fairness.
-
Configuring YARN for Kerberos
After watching this video; you will be able to configure YARN for Kerberos.
-
Copying Data
After watching this video; you will be able to use distcp to copy data from one cluster to another.
-
Creating Access Control Lists
After watching this video; you will be able to create access control lists.
-
Creating an Amazon Cluster
After watching this video; you will be able to create an Amazon cluster.
-
Creating an Amazon Machine Image
After watching this video; you will be able to create an Amazon Machine Image.
-
Creating an AWS Account
After watching this video; you will be able to create an AWS account.
-
Creating an EC2 Baseline Server
After watching this video; you will be able to create an EC2 baseline server.
-
Creating High Availability Auto Failovers
After watching this video; you will be able to create an automated failover for the NameNode.
-
Creating Kerberos Diagrams
After watching this video; you will be able to diagram Kerberos and label the primary components.
-
Defining Cluster Management
After watching this video; you will be able to describe what cluster management entails and recall some of the tools that can be used.
-
Defining Hadoop Fault Tolerance
After watching this video; you will be able to describe how Hadoop leverages fault tolerance.
-
Defining HDFS High Availability
After watching this video; you will be able to describe the functions of Hadoop high availability.
-
Defining Supercomputing
After watching this video; you will be able to describe the principles of supercomputing.
-
Deploying Clusters
After watching this video; you will be able to use Cloudera Manager to deploy a cluster.
-
Deploying Hadoop Releases
After watching this video; you will be able to deploy a Hadoop release.
-
Deploying Support Tools
After watching this video; you will be able to distribute configuration files and admin scripts.
-
Design Considerations for Hadoop Clusters
After watching this video; you will be able to identify the hardware and networking recommendations for a Hadoop cluster.
-
Diagnosing with Cloudera Manager
After watching this video; you will be able to manage logs through Cloudera Manager.
-
Editing Oozie Workflows with Hue
After watching this video; you will be able to use Hue to edit Oozie workflows and coordinators.
-
Encrypting Data at Rest
After watching this video; you will be able to describe how to encrypt data at rest.
-
Encrypting Data in Motion
After watching this video; you will be able to encrypt data in motion.
-
Evaluating Storage Options
After watching this video; you will be able to compare the use of commodity hardware with enterprise disks.
-
Examining Additional Design Principles
After watching this video; you will be able to describe the design principles for move processing not data; embrace failure; and build applications not infrastructure.
-
Examining Amazon Web Services
After watching this video; you will be able to recall some of the most come services of the EC2 service bundle.
-
Examining Applications Modelling
After watching this video; you will be able to describe the purpose of application modelling.
-
Examining AWS Access Keys
After watching this video; you will be able to describe the use of AWS access keys.
-
Examining AWS Credentials
After watching this video; you will be able to describe how the AWS credentials are used for authentication.
-
Examining AWS Elastic MapReduce
After watching this video; you will be able to recall the advantages and limitations of using AWS EMR.
-
Examining Axioms of Supercomputing
After watching this video; you will be able to describe the three axioms of supercomputing.
-
Examining Benchmarking
After watching this video; you will be able to describe the practice of benchmarking on a Hadoop cluster.
-
Examining Best Practices for Benchmarking
After watching this video; you will be able to describe the different tools used for benchmarking a cluster.
-
Examining Best Practices for Network Tuning
After watching this video; you will be able to list some of the best practices for network tuning.
-
Examining Best Practices for Performance Tuning
After watching this video; you will be able to describe different strategies of performance tuning.
-
Examining Capacity Management
After watching this video; you will be able to compare the differences of availability versus performance.
-
Examining Capacity Strategies
After watching this video; you will be able to describe different strategies of resource capacity management.
-
Examining Cloud Computing
After watching this video; you will be able to describe how cloud computing can be used as a solution for Hadoop.
-
Examining Cloudera Manager
After watching this video; you will be able to describe the purpose and functionality of Cloudera Manager.
-
Examining Cluster Management Tools
After watching this video; you will be able to describe different tools from a functional perspective.
-
Examining Common Problems
After watching this video; you will be able to describe the categories of errors for a Hadoop cluster.
-
Examining Configuration Management Tools
After watching this video; you will be able to describe the configurations management tools.
-
Examining Data in Motion
After watching this video; you will be able to describe how to encrypt data in motion for Hadoop; Sqoop; and Flume.
-
Examining Data Server Recommendations
After watching this video; you will be able to recall some of the recommendations for a data server.
-
Examining DataNode Recovery
After watching this video; you will be able to describe the operation of the DataNode during a recovery.
-
Examining DataNode Reliability
After watching this video; you will be able to recall the most common causes for DataNode failure.
-
Examining Dominant Resource Fairness
After watching this video; you will be able to describe dominant resource fairness.
-
Examining EMR and End-users
After watching this video; you will be able to describe EMR End-user connections and EMR security levels.
-
Examining Engineering Teams
After watching this video; you will be able to recall the roles and skills needed for the Hadoop engineering team.
-
Examining Fair Scheduler Algorithms
After watching this video; you will be able to describe the primary algorithm and the configuration files for the fair scheduler.
-
Examining Fair Schedulers
After watching this video; you will be able to describe how the fair scheduling method allows all applications to get equal amounts of resource time.
-
Examining Flume and Kerberos
After watching this video; you will be able to describe how to configure Flume for use with Kerberos.
-
Examining Ganglia
After watching this video; you will be able to recall what Ganglia is and what it can be used for.
-
Examining Ganglia Functionality
After watching this video; you will be able to recall how Ganglia monitors Hadoop clusters.
-
Examining Hadoop Change Management
After watching this video; you will be able to describe the purpose of change management.
-
Examining Hadoop Cloud Implementations
After watching this video; you will be able to recall the advantages and limitations of using Hadoop in the cloud.
-
Examining Hadoop Cluster Architecture
After watching this video; you will be able to describe the different rack architectures for Hadoop.
-
Examining Hadoop logs
After watching this video; you will be able to describe how to manage logging levels.
-
Examining Hadoop Metrics2
After watching this video; you will be able to describe Hadoop Metrics2.
-
Examining Hadoop Security
After watching this video; you will be able to recall the primary security threats faced by the Hadoop cluster.
-
Examining Hardware Responsibilities
After watching this video; you will be able to recall the primary responsibilities for the master; data; and edge servers.
-
Examining HDFS Data Blocks
After watching this video; you will be able to describe the sizing and balancing of the HDFS data blocks.
-
Examining High Availability Auto Failovers
After watching this video; you will be able to recall the requirements for enabling an automated failover for the NameNode.
-
Examining Hive with Kerberos
After watching this video; you will be able to describe how to configure Hive for use with Kerberos.
-
Examining Hostnames and DNS Recommendations
After watching this video; you will be able to recall some of the recommendations for hostnames and DNS entries.
-
Examining Identification and Access Management
After watching this video; you will be able to describe AWS identification and access management.
-
Examining Input and Output Tune Up Options
After watching this video; you will be able to recall some of the rules for tuning the data node.
-
Examining Java Tune Up Options
After watching this video; you will be able to describe the purpose of Java tuning.
-
Examining Kerberos
After watching this video; you will be able to describe Kerberos and recall some of the common commands.
-
Examining MapReduce Job Management
After watching this video; you will be able to describe MapReduce job management on a Hadoop cluster.
-
Examining MapReduce Tune Up Options
After watching this video; you will be able to describe the configuration files and parameters used in performance tuning of MapReduce.
-
Examining Minimum Resources
After watching this video; you will be able to describe the minimum share function of the fair scheduler.
-
Examining MRv2
After watching this video; you will be able to describe the two major functions of the JobTracker.
-
Examining Nagios
After watching this video; you will be able to recall what Nagios is and what it can be used for.
-
Examining NameNode Recovery
After watching this video; you will be able to describe the operation of the NameNode during a recovery.
-
Examining NameNode Reliability
After watching this video; you will be able to recall the most common causes for NameNode failure.
-
Examining Network Clusters
After watching this video; you will be able to recall the best practices for different types of network clusters.
-
Examining Operating System Tune Up Options
After watching this video; you will be able to describe the configuration files and parameters used in performance tuning of the operating system.
-
Examining Pig; Sqoop; Oozie with Kerberos
After watching this video; you will be able to describe how to configure Pig; Sqoop; and Oozie for use with Kerberos.
-
Examining Preemption
After watching this video; you will be able to describe the preemption functions of the fair scheduler.
-
Examining Problem Management Best Practices
After watching this video; you will be able to recall some of the best practices for problem management.
-
Examining Rack Awareness
After watching this video; you will be able to describe rack awareness.
-
Examining Scheduler Behaviors
After watching this video; you will be able to describe the default behavior of the fair scheduler methods.
-
Examining Schedulers
After watching this video; you will be able to describe how schedulers perform various resource management.
-
Examining Security Risks
After watching this video; you will be able to describe the four pillars of the Hadoop security model.
-
Examining Single Resource Fairness
After watching this video; you will be able to describe the policy for single resource fairness.
-
Examining Single Resource Fairness Configurations
After watching this video; you will be able to identify different configuration options for single resource fairness.
-
Examining Storage Options
After watching this video; you will be able to describe the advantages of using a JBODs configuration.
-
Examining Tune Up Options for HDFS
After watching this video; you will be able to describe the configuration files and parameters used in performance tuning of HDFS.
-
Examining Tune UP Options for YARN
After watching this video; you will be able to describe the configuration files and parameters used in performance tuning of YARN.
-
Examining YARN Containers
After watching this video; you will be able to describe the functions of YARN containers.
-
Examining YARN High Availability
After watching this video; you will be able to describe the system view of the ResourceManager configurations set for high availability.
-
Examining YARN Job Reliability
After watching this video; you will be able to recall the most common causes of YARN job failure.
-
Examining YARN Task Reliability
After watching this video; you will be able to recall the most common causes for YARN task failure.
-
Exploring Amazon Web Services
After watching this video; you will be able to recall some of the most common services that Amazon offers.
-
Exploring Big Data Solutions
After watching this video; you will be able to recall the advantages and shortcomings of using Hadoop as a supercomputing platform.
-
Exploring Checkpoint Node
After watching this video; you will be able to recall the uses for the Checkpoint node.
-
Exploring Cloudera Manager Admin Console
After watching this video; you will be able to describe the different parts of the Cloudera Manager Admin Console.
-
Exploring Cloudera Manager Architecture
After watching this video; you will be able to describe the Cloudera Manager internal architecture.
-
Exploring Cluster Architecture
After watching this video; you will be able to describe the layout and structure of the Hadoop cluster.
-
Exploring DataNode Replications
After watching this video; you will be able to set up the DataNode for replication.
-
Exploring Design Principles for Hadoop
After watching this video; you will be able to describe the dumb hardware and smart software; and the share nothing design principles.
-
Exploring Event Management
After watching this video; you will be able to describe the importance of event management.
-
Exploring Ganglia
After watching this video; you will be able to describe how to use Ganglia to monitor a Hadoop cluster.
-
Exploring Incident Management
After watching this video; you will be able to describe the importance of incident management.
-
Exploring Master Server Best Practices
After watching this video; you will be able to recall some of the recommendations for a master server and edge server.
-
Exploring Operating Systems Best Practices
After watching this video; you will be able to recall some of the recommendations for an operating System.
-
Exploring Problem Management
After watching this video; you will be able to describe the different methodologies used for root cause analysis.
-
Exploring Problem Management Lifecycle
After watching this video; you will be able to describe the problem management lifecycle.
-
Exploring SSH Keys
After watching this video; you will be able to describe the use of SSH key pairs for remote access.
-
Exploring the AWS Command Line Interface
After watching this video; you will be able to describe what the command line interface is used for.
-
Format HDFS and Run a Hadoop Program
After watching this video; you will be able to install Hadoop on to the admin server.
-
Implementing Security Groups
After watching this video; you will be able to install security groups for AWS.
-
Importing Data with Hue
After watching this video; you will be able to import data using Hue.
-
Improving Performance
After watching this video; you will be able to improve cluster performance with Cloudera Manager.
-
Install Trash and Add a DataNode
After watching this video; you will be able to install Trash and add a DataNode.
-
Installing and Configuring Impala
After watching this video; you will be able to install and configure Impala.
-
Installing and Configuring Sentry
After watching this video; you will be able to install and configure Sentry.
-
Installing Cloudera Manager
After watching this video; you will be able to install Cloudera Manager.
-
Installing Compression
After watching this video; you will be able to install compression.
-
Installing Ganglia
After watching this video; you will be able to install Ganglia.
-
Installing Hadoop Metrics2 for Ganglia
After watching this video; you will be able to install Hadoop Metrics2 for Ganglia.
-
Installing Kerberos
After watching this video; you will be able to install Kerberos.
-
Installing Nagios
After watching this video; you will be able to install Nagios.
-
Installing Rack Awareness
After watching this video; you will be able to write configuration files for rack awareness.
-
Installing Trash
After watching this video; you will be able to install and configure trash.
-
Locking Down Networks
After watching this video; you will be able to recall the ports required for Hadoop and how network gateways are used.
-
Managing Hadoop Balancing
After watching this video; you will be able to describe the process for balancing a Hadoop cluster.
-
Managing Hadoop Service Levels
After watching this video; you will be able to monitor and improve service levels.
-
Managing Hadoop Upgrades
After watching this video; you will be able to plan an upgrade of a Hadoop cluster.
-
Managing HDFS
After watching this video; you will be able to use fsck and dfsadmin to check the HDFS file system.
-
Managing HDFS Backup and Recovery
After watching this video; you will be able to describe the operations involved for backing up data.
-
Managing HDFS DataNodes
After watching this video; you will be able to manage an HDFS DataNode.
-
Managing HDFS Scaling
After watching this video; you will be able to describe the operations for scaling a Hadoop cluster.
-
Managing Hosts with Cloudera Manager
After watching this video; you will be able to manage hosts with Cloudera Manager.
-
Managing Performance Tuning
After watching this video; you will be able to recall the two laws of performance tuning.
-
Managing Resources
After watching this video; you will be able to user Cloudera Manager to manage resources.
-
Managing User Access
After watching this video; you will be able to describe the use of POSIX and ACL for managing user access.
-
Managing User Security
After watching this video; you will be able to describe the security model for users on a Hadoop cluster.
-
Monitoring Fair Share
After watching this video; you will be able to monitor the behavior of Fair Share.
-
Monitoring Hadoop Security
After watching this video; you will be able to describe how to monitor Hadoop security.
-
Monitoring with Cloudera Manager
After watching this video; you will be able to use Cloudera Manager's monitoring features.
-
Moving Data Into AWS
After watching this video; you will be able to describe the various ways to move data into AWS.
-
Optimizing Memory for Containers
After watching this video; you will be able to recall why the Node Manager kills containers.
-
Optimizing Memory for Daemons
After watching this video; you will be able to describe the configuration files and parameters used in performance tuning of memory for daemons.
-
Optimizing Memory for YARN
After watching this video; you will be able to describe the purpose of memory tuning for YARN.
-
Perform Security Level task for Hadoop
After watching this video; you will be able to configure Hbase for Kerberos.
-
Performance Tuning HDFS
After watching this video; you will be able to performance tune HDFS.
-
Performance Tuning MapReduce
After watching this video; you will be able to tune up MapReduce for performance reasons.
-
Performing Cluster Management
After watching this video; you will be able to use Cloudera Manager to manage a cluster.
-
Performing MapReduce Job Management
After watching this video; you will be able to perform MapReduce job management on a Hadoop cluster.
-
Performing Root Cause Analysis
After watching this video; you will be able to conduct a root cause analysis on a major problem.
-
Planning a Deployment
After watching this video; you will be able to plan for the development of a Hadoop cluster.
-
Preparing for Kerberos Installation
After watching this video; you will be able to prepare for a Kerberos installation.
-
Provision Admin Servers
After watching this video; you will be able to provision an admin server.
-
Provisioning a Micro EC2
After watching this video; you will be able to provision a micro instance of EC2.
-
Provisioning Hadoop Clusters
After watching this video; you will be able to provision a Hadoop cluster.
-
Recover from a NameNode Failure
After watching this video; you will be able to Recover from a NameNode Failure.
-
Recovering Missing Data Blocks
After watching this video; you will be able to identify and recover from a missing data block scenario.
-
Replacing a DataNode
After watching this video; you will be able to use include and exclude files to replace a DataNode.
-
Run a Hadoop Program
After watching this video; you will be able to format HDFS; create an HDFS directory; import data; run a WordCount and view the results.
-
Running EMR Jobs
After watching this video; you will be able to run an EMR job from the Web Console.
-
Running EMR Jobs with Hue
After watching this video; you will be able to run an EMR job with Hue.
-
Running EMR Jobs with the Command Line Interface
After watching this video; you will be able to run an EMR job with the command line interface.
-
Running Hive Jobs with Hue
After watching this video; you will be able to run various Hive jobs using Hue.
-
Scaling Hadoop Architectures
After watching this video; you will be able to describe the best practices for scaling a Hadoop cluster.
-
Setting Cloudera Manger for High Availability
After watching this video; you will be able to set up Cloudera Manager for high availability.
-
Setting HDFS Quotas
After watching this video; you will be able to set quotas for the HDFS file system.
-
Setting Quotas
After watching this video; you will be able to set quotas for the HDFS file system.
-
Setting Up Checkpoint Servers
After watching this video; you will be able to provision a checkpoint server.
-
Setting Up EMR Clusters
After watching this video; you will be able to set up an EMR cluster.
-
Setting Up Flash Drive Installer
After watching this video; you will be able to set up a flash drive as boot media.
-
Setting Up Flash Drives
After watching this video; you will be able to setup a flash drive as book media.
-
Setting Up High Availability for ResourceManagers
After watching this video; you will be able to set up high availability for the ResourceManager.
-
Setting Up Identification and Access Management
After watching this video; you will be able to set up AWS IAM.
-
Setting Up NameNode High Availability
After watching this video; you will be able to set up a high availability solution for NameNode.
-
Setting Up Network Installer
After watching this video; you will be able to set up a network installer.
-
Setting Up S3
After watching this video; you will be able to set up S3 and import data.
-
Simulating Configuration Management Tools
After watching this video; you will be able to simulate a configuration management tool.
-
Starting and Stopping a Hadoop Cluster
After watching this video; you will be able to start and stop a Hadoop cluster.
-
Stress Testing and Benchmarking Hadoop Clusters
After watching this video; you will be able to perform a benchmark of a Hadoop cluster.
-
Swapping NameNodes
After watching this video; you will be able to swap to a new NameNode.
-
Testing Application Reliability
After watching this video; you will be able to test application reliability.
-
Testing Data Blocks
After watching this video; you will be able to describe the use of TestDFSIO.
-
Testing DataNode Reliability
After watching this video; you will be able to test the availability for the DataNode.
-
Testing NameNode Failure
After watching this video; you will be able to test the availability for the NameNode.
-
Testing YARN Container Reliability
After watching this video; you will be able to test YARN container reliability.
-
Tuning a Hadoop Cluster
After watching this video; you will be able to optimize memory and benchmark a Hadoop cluster.
-
Tuning Memory for Hadoop Clusters
After watching this video; you will be able to performance tune memory for the Hadoop cluster.
-
Use Monitoring Tools
After watching this video; you will be able to use different monitoring tools to identify problems; failures; errors and solutions.
-
Using a Hadoop Cluster on AWS
After watching this video; you will be able to write an Elastic MapReduce script for AWS.
-
Using Cloudera Manager for Administration
After watching this video; you will be able to perform backups; snapshots; and upgrades using Cloudera Manager.
-
Using Ganglia
After watching this video; you will be able to use Ganglia to monitor a Hadoop cluster.
-
Using Hadoop Metrics2 for Nagios
After watching this video; you will be able to use Hadoop Metrics2 for Nagios.
-
Using Hive for Sentry Administration
After watching this video; you will be able to implement security administration using Hive.
-
Using Nagios
After watching this video; you will be able to use Nagios to monitor a Hadoop cluster.
-
Using Nagios Commands
After watching this video; you will be able to use Nagios commands.
-
Using the AWS Command Line Interface
After watching this video; you will be able to use the command line interface.
-
Using the Fair Scheduler
After watching this video; you will be able to use the fail scheduler with multiple users.
-
Validating Flume; Sqoop; HDFS; and MapReduce
After watching this video; you will be able to test the functionality of Flume; Sqoop; HDFS; and MapReduce.
-
Validating Hive and Pig
After watching this video; you will be able to test the functionality of Hive and Pig.
-
Writing Init Scripts
After watching this video; you will be able to write init scripts for Hadoop.
-
Writing Service Levels
After watching this video; you will be able to write service levels for performance.
-
Securing Hadoop Clusters
Hadoop development has allowed big data technologies to reach companies in all sectors of the economy. But as this grows so do the security concerns. In this course you will examine the risks and learn how to implement the security protocols for Hadoop clusters. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- describe the four pillars of the Hadoop security model
- recall the ports required for Hadoop and how network gateways are used
- install security groups for AWS
- describe Kerberos and recall some of the common commands
- diagram Kerberos and label the primary components
- prepare for a Kerberos installation
- install Kerberos
- configure Kerberos
- describe how to configure HDFS and YARN for use with Kerberos
- configure HDFS for Kerberos
- configure YARN for Kerberos
- describe how to configure Hive for use with Kerberos
- configure Hive for Kerberos
- describe how to configure Pig, Sqoop, and Oozie for use with Kerberos
- configure Pig and HTTPFS for use with Kerberos
- configure Oozie for use with Kerberos
- configure Hue for use with Kerberos
- describe how to configure Flume for use with Kerberos
- describe the security model for users on a Hadoop cluster
- describe the use of POSIX and ACL for managing user access
- create access control lists
- describe how to encrypt data in motion for Hadoop, Sqoop, and Flume
- encrypt data in motion
- describe how to encrypt data at rest
- recall the primary security threats faced by the Hadoop cluster
- describe how to monitor Hadoop security
- configure Hbase for Kerberos
-
Hadoop in the Cloud
Amazon Web Services, also known as AWS, is a secure cloud-computing platform offered by Amazon.com. This course introduces AWS and it's most prominent tools such as IAM, S3, and EC2. Additionally we will cover how to install configure and use a Hadoop cluster on AWS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- describe how cloud computing can be used as a solution for Hadoop
- recall some of the most come services of the EC2 service bundle
- recall some of the most common services that Amazon offers
- describe how the AWS credentials are used for authentication
- create an AWS account
- describe the use of AWS access keys
- describe AWS identification and access management
- set up AWS IAM
- describe the use of SSH key pairs for remote access
- set up S3 and import data
- provision a micro instance of EC2
- prepare to install and configure a Hadoop cluster on AWS
- create an EC2 baseline server
- create an Amazon machine image
- create an Amazon cluster
- describe what the command line interface is used for
- use the command line interface
- describe the various ways to move data into AWS
- recall the advantages and limitations of using Hadoop in the cloud
- recall the advantages and limitations of using AWS EMR
- describe EMR End-user connections and EMR security levels
- set up an EMR cluster
- run an EMR job from the web console
- run an EMR job with Hue
- run an EMR job with the command line interface
- write an Elastic MapReduce script for AWS
-
Performance Tuning of Hadoop Clusters
The Apache Hadoop software library is a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage. This course will focus on performance tuning of the Hadoop cluster. We will examine best practices and recommendations for performance tuning of the operating system, memory, HDFS, YARN and MapReduce. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- recall the three main functions of service capacity
- describe different strategies of performance tuning
- list some of the best practices for network tuning
- install compression
- describe the configuration files and parameters used in performance tuning of the operating system
- describe the purpose of Java tuning
- recall some of the rules for tuning the datanode
- describe the configuration files and parameters used in performance tuning of memory for daemons
- describe the purpose of memory tuning for YARN
- recall why the Node Manager kills containers
- performance tune memory for the Hadoop cluster
- describe the configuration files and parameters used in performance tuning of HDFS
- describe the sizing and balancing of the HDFS data blocks
- describe the use of TestDFSIO
- performance tune HDFS
- describe the configuration files and parameters used in performance tuning of YARN
- configure Speculative execution
- describe the configuration files and parameters used in performance tuning of MapReduce
- tune up MapReduce for performance reasons
- describe the practice of benchmarking on a Hadoop cluster
- describe the different tools used for benchmarking a cluster
- perform a benchmark of a Hadoop cluster
- describe the purpose of application modeling
- optimize memory and benchmark a Hadoop cluster
-
Cloudera Manager and Hadoop Clusters
Cloudera Manager is a simple automated, customizable management tool for Hadoop clusters. In this course, you will become familiar with the various web consoles available with Cloudera Manager. You will learn how to use Cloudera Manager to perform everything from a Hadoop cluster installation, to performance tuning, to diagnosing issues. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- describe what cluster management entails and recall some of the tools that can be used
- describe different tools from a functional perspective
- describe the purpose and functionality of Cloudera Manager
- install Cloudera Manager
- use Cloudera Manager to deploy a cluster
- use Cloudera Manager to install Hadoop
- describe the different parts of the Cloudera Manager Admin Console
- describe the Cloudera Manager internal architecture
- use Cloudera Manager to manage a cluster
- manage Cloudera Manager's services
- manage hosts with Cloudera Manager
- set up Cloudera Manager for high availability
- user Cloudera Manager to manage resources
- use Cloudera Manager's monitoring features
- manage logs through Cloudera Manager
- improve cluster performance with Cloudera Manager
- install and configure Impala
- install and configure Sentry
- implement security administration using Hive
- perform backups, snapshots, and upgrades using Cloudera Manager
- configure Hue with My SQL
- import data using Hue
- use Hue to run a Hive job
- use Hue to edit Oozie workflows and coordinators
- format HDFS, create an HDFS directory, import data, run a WordCount, and view the results
-
Hadoop Cluster Availability
When examining Hadoop availability it's important not to focus solely on the NameNode. There is a tendency since that is the single point of failure for HDFS, and many components in the ecosystem rely on HDFS, but Hadoop availability is a more general larger issue. In this course we are going to examine the availability and how to recover from failures for the NameNode, DataNode, HDFS, and YARN. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- describe how Hadoop leverages fault tolerance
- recall the most common causes for NameNode failure
- recall the uses for the Checkpoint node
- test the availability for the NameNode
- describe the operation of the NameNode during a recovery
- swap to a new NameNode
- recall the most common causes for DataNode failure
- test the availability for the DataNode
- describe the operation of the DataNode during a recovery
- set up the DataNode for replication
- identify and recover from a missing data block scenario
- describe the functions of Hadoop high availability
- edit the Hadoop configuration files for high availability
- set up a high availability solution for NameNode
- recall the requirements for enabling an automated failover for the NameNode
- create an automated failover for the NameNode
- recall the most common causes for YARN task failure
- describe the functions of YARN containers
- test YARN container reliability
- recall the most common causes of YARN job failure
- test application reliability
- describe the system view of the Resource Manager configurations set for high availability
- set up high availability for the Resource Manager
- move the Resource Manager HA to alternate master servers
-
Stabilizing Hadoop Clusters
Apache Hadoop is increasingly in popularity as a framework for large-scale, data-intensive applications. Tuning Hadoop clusters is vital to improve cluster performance. In this course you will look at the importance of incident and log management and examine the best practices for root cause analysis. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- describe the importance of event management
- describe the importance of incident management
- describe the different methodologies used for root cause analysis
- recall what Ganglia is and what it can be used for
- recall how Ganglia monitors Hadoop clusters
- install Ganglia
- describe Hadoop Metrics2
- install Hadoop Metrics2 for Ganglia
- describe how to use Ganglia to monitor a Hadoop cluster
- use Ganglia to monitor a Hadoop cluster
- recall what Nagios is and what it can be used for
- install Nagios
- use Nagios commands
- use Nagios to monitor a Hadoop cluster
- use Hadoop Metrics2 for Nagios
- describe how to manage logging levels
- describe how to configure Hadoop jobs for logging
- describe how to configure log4j for Hadoop
- describe how to configure JogHistoryServer logs
- configure Hadoop logs
- describe the problem management lifecycle
- recall some of the best practices for problem management
- describe the categories of errors for a Hadoop cluster
- conduct a root cause analysis on a major problem
- use different monitoring tools to identify problems, failures, errors and solutions
-
Designing Hadoop Clusters
Hadoop is an Apache Software Foundation project and open source software platform for scalable, distributed computing. Hadoop can provide fast and reliable analysis of both structured data and unstructured data. In this course you will learn about the design principles, the cluster architecture, considerations for servers and operating systems, and how to plan for a deployment. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- describe the principles of supercomputing
- recall the roles and skills needed for the Hadoop engineering team
- recall the advantages and shortcomings of using Hadoop as a supercomputing platform
- describe the three axioms of supercomputing
- describe the dumb hardware and smart software, and the share nothing design principles
- describe the design principles for move processing not data, embrace failure, and build applications not infrastructure
- describe the different rack architectures for Hadoop.
- describe the best practices for scaling a Hadoop cluster.
- recall the best practices for different types of network clusters
- recall the primary responsibilities for the master, data, and edge servers
- recall some of the recommendations for a master server and edge server
- recall some of the recommendations for a data server
- recall some of the recommendations for an operating system
- recall some of the recommendations for hostnames and DNS entries
- describe the recommendations for HDD
- calculate the correct number of disks required for a storage solution
- compare the use of commodity hardware with enterprise disks
- plan for the development of a Hadoop cluster
- set up flash drives as boot media
- set up a kickstart file as boot media
- set up a network installer
- identify the hardware and networking recommendations for a Hadoop cluster
-
Deploying Hadoop Clusters
There are important decisions you must make to ensure network, disks, and hosts are configured correctly when deploying a Hadoop Cluster. This course will walk you through all of the steps to install Hadoop in a pseudo-distributed mode and the set up of some of the common open source software used to create a Hadoop Ecosystem. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- describe the configurations management tools
- simulate a configuration management tool
- build an image for a baseline server
- build an image for a DataServer
- build an image for a master server
- provision an admin server
- describe the layout and structure of the Hadoop cluster
- provision a Hadoop cluster
- distribute configuration files and admin scripts
- use init scripts to start and stop a Hadoop cluster
- configure a Hadoop cluster
- configure logging for the Hadoop cluster
- build images for required servers in the Hadoop cluster
- configure a MySQL database
- build the Hadoop clients
- configure Hive daemons
- test the functionality of Flume, Sqoop, HDFS, and MapReduce
- test the functionality of Hive and Pig
- configure Hcatalog daemons
- configure Oozie
- configure Hue and Hue users
- install Hadoop on to the admin server
-
Capacity Management for Hadoop Clusters
Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. This course focuses on the capacity management of Hadoop clusters. You will be introduced to the concepts of resource management through scheduling. You will learn how to use the Fair Scheduler Tool, and how to plan for scaling. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- compare the differences of availability versus performance
- describe different strategies of resource capacity management
- describe how schedulers perform various resource management
- set quotas for the HDFS file system
- recall how to set the maximum and minimum memory allocations per container
- describe how the fair scheduling method allows all applications to get equal amounts of resource time
- describe the primary algorithm and the configuration files for the Fair Scheduler
- describe the default behavior of the Fair Scheduler methods
- monitor the behavior of Fair Share
- describe the policy for single resource fairness
- describe how resources are distributed over the total capacity
- identify different configuration options for single resource fairness
- configure single resource fairness
- describe the minimum share function of the Fair Scheduler
- configure minimum share on the Fair Scheduler
- describe the preemption functions of the Fair Scheduler
- configure preemption for the Fair Scheduler
- describe dominant resource fairness
- write service levels for performance
- use the fail scheduler with multiple users
-
Operating Hadoop Clusters
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware. In this course we will examine many of the HDFS administration and operational processes required to operate and maintain a Hadoop cluster. We will take a look at how to balance a Hadoop cluster, manage jobs, and perform backup and recovery for HDFS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.
- start the course
- monitor and improve service levels
- deploy a Hadoop release
- describe the purpose of change management
- describe rack awareness
- write configuration files for rack awareness
- start and stop a Hadoop cluster
- write init scripts for Hadoop
- describe the tools fsck and dfsadmin
- use fsck to check the HDFS file system
- set quotas for the HDFS file system
- install and configure trash
- manage an HDFS DataNode
- use include and exclude files to replace a DataNode
- describe the operations for scaling a Hadoop cluster
- add a DataNode to a Hadoop cluster
- describe the process for balancing a Hadoop cluster
- balance a Hadoop cluster
- describe the operations involved for backing up data
- use distcp to copy data from one cluster to another
- describe MapReduce job management on a Hadoop cluster
- perform MapReduce job management on a Hadoop cluster
- plan an upgrade of a Hadoop cluster
- write and complete a plan to install Hbase with high availability
-
Managing Services
After watching this video; you will be able to manage Cloudera Manager's services.