
Data Science Essentials elearning bundle
Course ID: 271335 view dates
Font size: 
Description:
This is a bundled training package. It contains training for each of the bundled items below:
Course  Price 
Data Gathering  
Data Communication and Visualization  
Data Integration  
Data Exploration  
Data Filtering  
Data Transformation  
Data Science Overview  
Data Analysis Concepts  
Data Classification and Machine Learning 
Bundle Price: $319.00
Total Savings: $355.55
Data Gathering
To carry out data science, you need to gather data. Extracting, parsing, and scraping data from various sources, both internal and external, is a critical first part in the data science pipeline. In this course, you'll explore examples of practical tools for data gathering.
 start the course
 describe problems and software tools associated with data gathering
 use curl to gather data from the Web
 use in2csv to convert spreadsheet data to CSV format
 use agate to extract data from spreadsheets
 use agate to extract tabular data from dbf files
 extract data from particular tags in an HTML document
 distinguish between metadata and data
 work with metadata in HTTP Headers
 work with Linux log files
 work with metadata in email headers
 perform a secure shell connection to a remote server
 copy remote data using a secure copy
 synchronize data from a remote server
 download an HTML file and explore table data
Data Communication and Visualization
The final step in the data science pipeline is to communicate the results or findings. In this course, you'll explore communication and visualization concepts needed by data scientists.
 start the course
 choose appropriate visualization techniques
 describe the difference between correlation and causation
 define Simpson's paradox
 communicate data science results informally
 communicate data science results formally
 implement strategies for effective data communication
 use scatter plots
 use line graphs
 use bar charts
 use histograms
 use box plots
 create a network visualization
 create a bubble plot
 create an interactive plot
 find an appropriate data set in which a scatter plot represents it visually and plot it
Data Integration
Data integration is the last step in the data wrangling process where data is put into its useable and structured format for analysis. In this course, you'll explore examples of practical tools and techniques for data integration.
 start the course
 use csvjoin to concatenate CSV data
 use the cat function to concatenate separate logs into a single file
 sort lines in a text file
 merge separate xml files into a single schema
 aggregate data from a CSV file into a table of summarized values
 normalize data from unstructured sources
 denormalize data from a structured source
 use pivot tables to cross tabulate data
 insert missing values in a data set
 use csvjoin to merge two compatible CSV documents into one
Data Exploration
Once data is transformed into a useable format, the next step is to carry out preliminary data exploration on the data. In this course, you'll explore examples of practical tools and techniques for data exploration.
 start the course
 use csvgrep to explore data in CSV data
 use csvstat to explore values in CSV data
 use csvsql to query CSV data like a SQL database
 use gnuplot to quickly plot data on the command line
 use wc to count words, characters, and lines within a text file
 explore a subdirectory tree from the command line
 use natural language processing to count word frequencies in a text document
 take random samples from a list of records
 find the top rows by value and percent in a data set
 find repeated records in a data set
 identify outliers using standard deviation
 perform a word frequency count on a classic book from Project Gutenberg
Data Filtering
Once data is gathered for data science it is often in an unstructured or raw format. Data must be filtered for content and validity. In this course, you'll explore examples of practical tools and techniques for data filtering.
 start the course
 identify common filtering techniques and tools
 extract date elements from common date formats
 parse content types in HTTP headers
 use csvcut to filter CSV data
 use sed to replace values in a text data stream
 drop duplicate records from data
 extract headers from a jpeg image
 use pdfgrep to extract data from searchable pdf files
 detect invalid or impossible data combinations
 parse robots.txt from a web site to decide what should and shouldn't be crawled nor indexed
 drop records from a CSV file based on date range
Data Transformation
Once data is filtered the next step is to transform it into a usable format. In this course, you'll explore examples of practical tools and techniques for data transformation.
 start the course
 convert CSV data to JSON format
 convert XML data to JSON format
 create SQL inserts from CSV data
 extract CSV data from SQL
 change delimiters in a csv file from commas to tabs
 convert basic date formats to standard ISO 8601 format
 convert numeric formats within a CSV document
 round floating point decimals to two places within a CSV document
 use optical character recognition (OCR) to extract text from a jpeg image
 use optical character recognition (OCR) to extract text from a pdf document
 read various date formats and convert to standard compliant ISO 8601 format
Data Science Overview
Data science differentiates itself from academic statistics and application programming by using what it needs from a variety of disciplines. In this course, you'll explore what it is to be a data scientist and study what sets data science apart from other disciplines. It prepares learners to navigate the foundational elements of data science.
 start the course
 define data science and what it is to be a data scientist
 describe the data wrangling aspect of data science
 describe the big data aspect of data science
 describe the machine learning aspect of data science
 use common data science terminology
 recognize ways to communicate results of your data science
 recall the steps in data science analysis
 compare various tools and software libraries used for data science
Data Analysis Concepts
There are many software and programming tools available to data scientists. Before applying those tools effectively, you must understand the underlying concepts. In this course, you'll explore the underlying data analysis concepts needed to employ the software and programming tools effectively
 start the course
 perform basic math operations required by data scientists
 perform basic vector math operations required by data scientists
 perform basic matrix math operations required by data scientists
 perform a matrix decomposition
 identify different forms of data
 describe probability in terms of events and sample space size
 describe basic properties of outcomes
 apply probability rules in calculation
 identify common continuous probability distributions
 identify common discrete probability distributions
 apply bayes theorem and describe how it is used in email spam algorithms
 apply random sampling to A/B tests
 identify and describe various statistical measures
 describe the difference between an unbiased and biased estimator
 describe sampling distributions and recognize the central limit theorem
 define confidence intervals and work with margins of error
 carrying out hypothesis tests and working with pvalues
 apply the chisquare test for categorical values
 identify the given data set descriptions by their types
Data Classification and Machine Learning
Machine learning is a particular area of data science that uses techniques to create models from data without being explicitly programmed. In this course, you'll explore the conceptual elements of various machine learning techniques.
 start the course
 identify problems in which supervised learning techniques apply
 identify problems in which unsupervised learning techniques apply
 apply linear regression to machine learning problems
 identify predictors in machine learning
 apply logistic regression to machine learning problems
 describe the use of dummy variables
 use naive bayes classification techniques
 work with decision trees
 describe Kmeans clustering
 define cluster validation
 define principal component analysis
 describe machine learning errors
 describe underfitting
 describe overfitting
 apply kfolds cross validation
 describe fallforward and backpropagation in neural networks
 describe SVMs and their use
 choose the appropriate machine learning method for the given example problems
Register Now
Data Science Essentials elearning bundle
 Course ID:
271335  Duration:
n/a  Price:
$319