Comprehensive Data Science with Python

692 Ratings

Course Number: PYTH-124
Duration: 5 days (32.5 hours)
Format: Live, hands-on

Python for Data Science Training Overview

This Comprehensive Data Science with Python training course teaches engineers, data scientists, data analysts, statisticians, and other quantitative professionals the Python programming skills they need to chart, visualize, and apply inferential statistics. Attendees learn the essentials of Python, including data structures, variables, and libraries, as well as how Python is used in data science. Students also learn how to clean and explore their data, build predictive models, and develop data-driven web applications. Our experienced instructors guide you through the full range of topics, starting with the basics, and equip you for advanced data science work.

Let us work with you to customize this course or design a Data Science upskilling solution for your team or entire organization.

Location and Pricing

Accelebrate offers instructor-led enterprise training for groups of 3 or more online or at your site. Most Accelebrate classes can be flexibly scheduled for your group, including delivery in half-day segments across a week or set of weeks. To receive a customized proposal and price quote for private corporate training on-site or online, please contact us.

In addition, some courses are available as live, instructor-led training from one of our partners.

Objectives

  • Understand the difference between Python basic data types
  • Know when to use different Python collections
  • Implement Python functions
  • Understand control flow constructs in Python
  • Handle errors via exception handling constructs
  • Quantitatively define an answerable, actionable question
  • Import both structured and unstructured data into Python
  • Parse unstructured data into structured formats
  • Understand the differences between NumPy arrays and pandas dataframes
  • Simulate data through random number generation
  • Understand mechanisms for missing data and analytic implications
  • Explore and clean data
  • Create compelling graphics to reveal analytic results
  • Reshape and merge data to prepare for advanced analytics
  • Find test for group differences using inferential statistics
  • Implement linear regression from a frequentist perspective
  • Understand non-linear terms, confounding, and interaction in linear regression
  • Extend to logistic regression to model binary outcomes

Prerequisites

All attendees should have prior programming experience and an understanding of basic statistics.

Outline

Expand All | Collapse All

An Accelerated Introduction and Overview to Python for Data Science Foundations
  • Introduction to course and computing environment
  • Up and running with Jupyter notebooks
  • Fundamental Python types: String literals, numeric, Boolean, and dates
  • Understanding Python ‘variables’ (reference assignment)
  • Slicing syntax
  • Fundamental collections: tuples, lists, dictionaries, and sets
  • Control flow iteration in Python (if/then, for, while, list comprehension)
  • Writing your own functions
  • Handling exceptions
Matrix Computing with NumPy
  • Introduction to the ndarray
  • Dtypes in NumPy
  • NumPy operations, uFuncs
  • Broadcasting
  • Missing data in NumPy (masked array)
  • Random number generation
Managing, Exploring, and Cleaning Data with Pandas
  • Fundamental Pandas: Series and DataFrames
  • Exploring objects with attributes/methods
  • Importing data from different structured sources
  • Basic DataFrame summaries
  • Creating new variables (columns)
  • Scaling and standardizing data elements
  • Discretizing continuous data
  • Mapping categorical data to new values
  • Establishing dummy codes (one hot encoding)
  • Filtering rows and selecting columns
  • Managing the indices
  • Identifying duplicate rows
  • Quantifying and managing missing data
  • Combining datasets
  • Merging datasets
  • Transposing datasets
  • Changing data from long to wide formats and back
Exploratory Data Analysis with Pandas (including visualization with Seaborn)
  • Univariate Statistical Summaries and Detecting Outliers, visually with graphical approaches and numerically.
  • Multivariate Statistical Summaries and Outlier Detection, visually with graphical approaches and numerically.
  • Groupwise calculations
  • Pivot Table type operations to aggregate by group
  • Pandas DataFrame plotting methods
Data Pseudo-Coding Process, Extension to Data-Centric Problems
  • Identifying data verbs
  • Answering a question using a well-formatted analytic dataframe
  • Understanding the unit of analysis
    • Identifying the unit of analysis for a given question – is my dataframe organized this way?
  • Leveraging normalized data to create the analytic dataframe through combinations of data verbs
    • Identify the question and unit of analysis
    • Define the desired analytic dataframe
    • Examine the normalized source data
    • Create data pseudo-code to map source data to the final analytic dataframe
    • Implement with Python
Focus on Graphics with Python: Seaborn, Matplotlib, and Plotly
  • Using seaborn for 1 and 2 variable summaries
  • Advanced statistical plots with Seaborn
  • Controlling plot details through Seaborn
  • Making graphs interactive with Plotly
  • Introduction to Matplotlib for full control of parameters
Overview of Descriptive versus Inferential Analytics
  • Identifying the null hypothesis
  • P-value interpretation
  • The idea of statistical power and type 1/2 errors
Implementing Inferential Statistics in Python
  • Analyzing an A/B randomized test:
    • T-tests/ANOVA
    • Chi-square tests
  • Correlation methods
Multivariate Models: Linear Regression
  • Estimating the mean
  • Identifying p-values of interest
  • Adding a categorical predictor and the link to t-tests
  • Nonlinear trends: Polynomial regression and spline modeling
  • Interaction terms
  • Confounding
  • Model building approaches (choosing the best model)
  • Scoring new data from the model (making predictions)
Multivariate Models: Logistic Regression
  • GLMs and the link function
  • Understanding the logit function
  • The binomial distribution and
  • Recovering the average event probability from the model
  • Interpreting the coefficient – the odds ratio
  • Categorical predictors and the connection to the chi-square test
  • Expansion to more complex models (non-linear trends, multiple predictors)
  • Confounding
  • Interaction terms
  • Making predictions
  • Comparing models and picking the ‘best’ model
Conclusion
Optional modules depending on student interest and timing
  • Analyzing unstructured data with Python
    • Overview of structure versus unstructured data
    • Implementing regular expressions in Python
    • Converting unstructured data to structured data for analysis
  • Missing Data
    • Exploring and understanding patterns in missing data
    • Missing at Random
    • Missing Not at Random
    • Missing Completely at Random
    • Data imputation methods

Training Materials

All Python for Data Science training attendees receive comprehensive courseware.

Software Requirements

  • Anaconda Python 3.6 or later
  • Spyder IDE and Jupyter notebook (Comes with Anaconda)


"Just completed Python with Data Science with Accelebrate and REALLY enjoyed the experience. The courseware and presentation/platform were awesome, and our instructor was amazing at relating the topics to real-world applications. Would definitely recommend this course to those looking for an introduction to data science with Python (PANDAs in particular...) as an icebreaker!" - Robert C, US Army

Learn faster

Our live, instructor-led lectures are far more effective than pre-recorded classes

Satisfaction guarantee

If your team is not 100% satisfied with your training, we do what's necessary to make it right

Learn online from anywhere

Whether you are at home or in the office, we make learning interactive and engaging

Multiple Payment Options

We accept check, ACH/EFT, major credit cards, and most purchase orders



Recent Training Locations

Alabama

Birmingham

Huntsville

Montgomery

Alaska

Anchorage

Arizona

Phoenix

Tucson

Arkansas

Fayetteville

Little Rock

California

Los Angeles

Oakland

Orange County

Sacramento

San Diego

San Francisco

San Jose

Colorado

Boulder

Colorado Springs

Denver

Connecticut

Hartford

DC

Washington

Florida

Fort Lauderdale

Jacksonville

Miami

Orlando

Tampa

Georgia

Atlanta

Augusta

Savannah

Hawaii

Honolulu

Idaho

Boise

Illinois

Chicago

Indiana

Indianapolis

Iowa

Cedar Rapids

Des Moines

Kansas

Wichita

Kentucky

Lexington

Louisville

Louisiana

New Orleans

Maine

Portland

Maryland

Annapolis

Baltimore

Frederick

Hagerstown

Massachusetts

Boston

Cambridge

Springfield

Michigan

Ann Arbor

Detroit

Grand Rapids

Minnesota

Minneapolis

Saint Paul

Mississippi

Jackson

Missouri

Kansas City

St. Louis

Nebraska

Lincoln

Omaha

Nevada

Las Vegas

Reno

New Jersey

Princeton

New Mexico

Albuquerque

New York

Albany

Buffalo

New York City

White Plains

North Carolina

Charlotte

Durham

Raleigh

Ohio

Akron

Canton

Cincinnati

Cleveland

Columbus

Dayton

Oklahoma

Oklahoma City

Tulsa

Oregon

Portland

Pennsylvania

Philadelphia

Pittsburgh

Rhode Island

Providence

South Carolina

Charleston

Columbia

Greenville

Tennessee

Knoxville

Memphis

Nashville

Texas

Austin

Dallas

El Paso

Houston

San Antonio

Utah

Salt Lake City

Virginia

Alexandria

Arlington

Norfolk

Richmond

Washington

Seattle

Tacoma

West Virginia

Charleston

Wisconsin

Madison

Milwaukee

Alberta

Calgary

Edmonton

British Columbia

Vancouver

Manitoba

Winnipeg

Nova Scotia

Halifax

Ontario

Ottawa

Toronto

Quebec

Montreal

Puerto Rico

San Juan