Introduction to Apache Spark 2 Programming

4.4 out of 5 (52 reviews)  

SPRK-100 (3 Days)
Request Pricing for Introduction to Apache Spark 2 Programming

Apache Spark 2 Training Overview

Accelebrate's Introduction to Apache Spark 2 training provides students with a solid technical introduction to the Spark architecture and how Spark works. Attendees learn the basic building blocks of Spark, including RDDs and the distributed compute engine, as well as higher-level constructs that provide a simpler and more capable interface, including Spark SQL and DataFrames.

This course also covers more advanced capabilities such as the use of Spark Streaming to process streaming data, and provides an overview of Spark Graph Processing (GraphX and GraphFrames) and Spark Machine Learning (SparkML Pipelines). Finally, the class explores possible performance issues, troubleshooting, cluster deployment techniques, and strategies for optimization.

Location and Pricing

Accelebrate courses are taught as private, customized training for groups of 3 or more at your site. In addition, we offer live, private online training for teams who may be in multiple locations or wish to save on travel costs. To receive a customized proposal and price quote for private on-site or online training, please contact us.

Online Training with Accelebrate. In the wake of the sudden emergence of the Coronavirus crisis, we want to help our clients continue to learn and grow while at home. We are offering 10% off any new booking of private online classes for your team of 3 or more if purchased before May 15, 2020. Please use discount code online2020 in the comments section of the contact form. For more information, and to see how we ensure successful online training, please see this Online Training Q&A.

Apache Spark 2 Training Objectives

All students will:

  • Understand the need for Spark in data processing
  • Understand the Spark architecture and how it distributes computations to cluster nodes
  • Be familiar with basic installation / setup / layout of Spark
  • Use the Spark for interactive and ad-hoc operations
  • Use Dataset/DataFrame/Spark SQL to efficiently process structured data
  • Understand basics of RDDs (Resilient Distributed Datasets), and data partitioning, pipelining, and computations
  • Understand Spark's data caching and its usage
  • Understand performance implications and optimizations when using Spark
  • Be familiar with Spark Graph Processing and SparkML machine learning

Apache Spark 2 Training Outline

Expand All | Collapse All | Printer-Friendly

Scala Ramp Up (Optional)
  • Scala Introduction, Variables, Data Types, Control Flow
  • The Scala Interpreter
  • Collections and their Standard Methods (e.g. map())
  • Functions, Methods, Function Literals
  • Class, Object, Trait
Introduction to Spark
  • Overview, Motivations, Spark Systems
  • Spark Ecosystem
  • Spark vs. Hadoop
  • Typical Spark Deployment and Usage Environments
RDDs and Spark Architecture
  • RDD Concepts, Partitions, Lifecycle, Lazy Evaluation
  • Working with RDDs - Creating and Transforming (map, filter, etc.)
  • Caching - Concepts, Storage Type, Guidelines
DataSets/DataFrames and Spark SQL
  • Introduction and Usage
  • Creating and Using a DataSet
  • Working with JSON
  • Using the DataSet DSL
  • Using SQL with Spark
  • Data Formats
  • Optimizations: Catalyst and Tungsten
  • DataSets vs. DataFrames vs. RDDs
Creating Spark Applications
  • Overview, Basic Driver Code, SparkConf
  • Creating and Using a SparkContext/SparkSession
  • Building and Running Applications
  • Application Lifecycle
  • Cluster Managers
  • Logging and Debugging
Spark Streaming
  • Overview and Streaming Basics
  • Structured Streaming
  • DStreams (Discretized Steams),
  • Architecture, Stateless, Stateful, and Windowed Transformations
  • Spark Streaming API
  • Programming and Transformations
Performance Characteristics and Tuning
  • The Spark UI
  • Narrow vs. Wide Dependencies
  • Minimizing Data Processing and Shuffling
  • Caching - Concepts, Storage Type, Guidelines
  • Using Caching
  • Using Broadcast Variables and Accumulators
(Optional): Spark GraphX Overview
  • Introduction
  • Constructing Simple Graphs
  • GraphX API
  • Shortest Path Example
(Optional): MLLib Overview
  • Introduction
  • Feature Vectors
  • Clustering / Grouping, K-Means
  • Recommendations
  • Classifications
Conclusion
Request Pricing for Introduction to Apache Spark 2 Programming

Lecture percentage

40%

Lecture/Demo

Lab percentage

60%

Lab

Course Number:

SPRK-100

Duration:

3 Days

Prerequisites:

Students should have an introductory knowledge of Python or Scala. An overview of Scala is provided if needed. (Class can be customized for SQL data analysts, emphasizing SQL techniques and minimizing procedural coding.)

Training Materials:

All Spark training students receive comprehensive courseware.

Software Requirements:

  • Windows, Mac, or Linux PCs with the current Chrome or Firefox browser.
    • Most class activities will create Spark code and visualizations in a browser-based notebook environment, the class also details how these notebooks can be exported, and how to run Spark code outside of this environment.
  • Internet access

Contact Us:

Accelebrate’s training classes are available for private groups of 3 or more people at your site or online anywhere worldwide.

Don't settle for a "one size fits all" public class! Have Accelebrate deliver exactly the training you want, privately at your site or online, for less than the cost of a public class.

For pricing and to learn more, please contact us.

Contact Us Train For Us

Have you read our Google reviews?

Toll-free in US/Canada:
877 849 1850
International:
+1 678 648 3113

Toll-free in US/Canada:
866 566 1228
International:
+1 404 420 2491

925B Peachtree Street, NE
PMB 378
Atlanta, GA 30309-3918
USA

Subscribe to our Newsletter:

Never miss the latest news and information from Accelebrate:

Microsoft Gold Partner

Please see our complete list of
Microsoft Official Courses

Recent Training Locations

Alabama

Birmingham

Huntsville

Montgomery

Alaska

Anchorage

Arizona

Phoenix

Tucson

Arkansas

Fayetteville

Little Rock

California

Los Angeles

Oakland

Orange County

Sacramento

San Diego

San Francisco

San Jose

Colorado

Boulder

Colorado Springs

Denver

Connecticut

Hartford

DC

Washington

Florida

Fort Lauderdale

Jacksonville

Miami

Orlando

Tampa

Georgia

Atlanta

Augusta

Savannah

Hawaii

Honolulu

Idaho

Boise

Illinois

Chicago

Indiana

Indianapolis

Iowa

Ceder Rapids

Des Moines

Kansas

Wichita

Kentucky

Lexington

Louisville

Louisiana

New Orleans

Maine

Portland

Maryland

Annapolis

Baltimore

Frederick

Hagerstown

Massachusetts

Boston

Cambridge

Springfield

Michigan

Ann Arbor

Detroit

Grand Rapids

Minnesota

Minneapolis

Saint Paul

Mississippi

Jackson

Missouri

Kansas City

St. Louis

Nebraska

Lincoln

Omaha

Nevada

Las Vegas

Reno

New Jersey

Princeton

New Mexico

Albuquerque

New York

Albany

Buffalo

New York City

White Plains

North Carolina

Charlotte

Durham

Raleigh

Ohio

Akron

Canton

Cincinnati

Cleveland

Columbus

Dayton

Oklahoma

Oklahoma City

Tulsa

Oregon

Portland

Pennsylvania

Philadelphia

Pittsburgh

Rhode Island

Providence

South Carolina

Charleston

Columbia

Greenville

Tennessee

Knoxville

Memphis

Nashville

Texas

Austin

Dallas

El Paso

Houston

San Antonio

Utah

Salt Lake City

Virginia

Alexandria

Arlington

Norfolk

Richmond

Washington

Seattle

Tacoma

West Virginia

Charleston

Wisconsin

Madison

Milwaukee

Alberta

Calgary

Edmonton

British Columbia

Vancouver

Manitoba

Winnipeg

Nova Scotia

Halifax

Ontario

Ottawa

Toronto

Quebec

Montreal

Puerto Rico

San Juan

© 2013-2020 Accelebrate, Inc. All Rights Reserved. All trademarks are owned by their respective owners.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.