Course Description
This course covers the essentials of using Python as a tool for data scientists to perform exploratory data analysis, complex visualizations, and large-scale distributed processing on “Big Data”. In this course, we cover essential mathematical and statistics libraries such as NumPy, Pandas, SciPy, SciKit-Learn, frameworks like TensorFlow and Spark, as well as visualization tools like matplotlib, PIL, and Seaborn. This course is ‘intermediate level’ as it assumes that attendees have solid data analytics and data science background and have basic Python knowledge. Topics are introductory in nature but are covered in-depth, geared for experienced students.
This course is about 50% hands-on lab to 50% lecture ratio, combining engaging instructor presentations, demos, and practical group discussions with extensive machine-based student labs and project work. Throughout the course, students will learn to write Python scripts and apply them within a scientific framework working with the latest technologies listed on the agenda. This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development.
Working in a hands-on learning environment led by our expert practitioner, students will learn:
How to work with Python in a Data Science Context
How to use NumPy, Pandas, and MatPlotLib
How to create and process images with PIL
How to visualize with Seaborn
Key features of SciPy and Scikit Learn
How to interact with Spark using DataFrames
How to use SparkSQL, MLlib, and Streaming in BigData
Substitution & Cancellation Policy:You may cancel or reschedule up to 21 days prior to the start date of the class at no penalty. For any cancellation or reschedule requests within 21 days, the full course tuition is still due and not eligible for refund. Any paid tuition will be credited towards a future class and must be used within 12 months.*Partner delivered courses may be subject to different cancellation terms Agenda
Session: Python for Data Science
Lesson: Python Review (Optional)
- Python Language
- Essential Syntax
- Lists, Sets, Dictionaries, and Comprehensions
- Functions
- Classes, Modules, and imports
- Exceptions
Lesson: iPython
- iPython basics
- Terminal and GUI shells
- Creating and using notebooks
- Saving and loading notebooks
- Ad hoc data visualization
- Web Notebooks (Jupyter)
Lesson: numpy
- numpy basics
- Creating arrays
- Indexing and slicing
- Large number sets
- Transforming data
- Advanced tricks
Lesson: scipy
- What can scipy do?
- Most useful functions
- Curve fitting
- Modeling
- Data visualization
- Statistics
Lesson: A tour of scipy subpackages
- Clustering
- Physical and mathematical Constants
- FFTs
- Integral and differential solvers
- Interpolation and smoothing
- Input and Output
- Linear Algebra
- Image Processing
- Distance Regression
- Root-finding
- Signal Processing
- Sparse Matrices
- Spatial data and algorithms
- Statistical distributions and functions
- C/C++ Integration
Lesson: pandas
- pandas overview
- Dataframes
- Reading and writing data
- Data alignment and reshaping
- Fancy indexing and slicing
- Merging and joining data sets
Lesson: matplotlib
- Creating a basic plot
- Commonly used plots
- Ad hoc data visualization
- Advanced usage
- Exporting images
Lesson: The Python Imaging Library (PIL)
- PIL overview
- Core image library
- Image processing
- Displaying images
Lesson: seaborn
- Seaborn overview
- Bivariate and univariate plots
- Visualizing Linear Regressions
- Visualizing Data Matrices
- Working with Time Series data
Lesson: SciKit-Learn Machine Learning Essentials
- SciKit overview
- SciKit-Learn overview
- Algorithms Overview
- Classification, Regression, Clustering, and Dimensionality Reduction
- SciKit Demo
Lesson: TensorFlow Overview
- TensorFlow overview
- Keras
- Getting Started with TensorFlow
Session: Python on Spark
Lession: PySpark Overview
- Python and Spark
- SciKit-Learn vs. Spark MLlib
- Python at Scale
- PySpark Demo
Lesson: RDDs and DataFrames
- DataFrames and Resilient Distributed Datasets (RDDs)
- Partitions
- Adding variables to a DataFrame
- DataFrame Types
- DataFrame Operations
- Dependent vs. Independent variables
- Map/Reduce with DataFrames
Lesson: Spark SQL
- Spark SQL Overview
- Data stores: HDFS, Cassandra, HBase, Hive, and S3
- Table Definitions
- Queries
Lesson: Spark MLib
- MLib overview
- MLib Algorithms Overview
- Classification Algorithms
- Regression Algorithms
- Decision Trees and forests
- Recommendation with ALS
- Clustering Algorithms
- Machine Learning Pipelines
- Linear Algebra (SVD, PCA)
- Statistics in MLib
Lesson: Spark Streaming
- Streaming overview
- Integrating Spark SQL, MLlib, and Streaming
Substitution & Cancellation Policy:You may cancel or reschedule up to 21 days prior to the start date of the class at no penalty. For any cancellation or reschedule requests within 21 days, the full course tuition is still due and not eligible for refund. Any paid tuition will be credited towards a future class and must be used within 12 months.*Partner delivered courses may be subject to different cancellation terms