Analyzing Big Data with R Programming

4.5 out of 5 (23 reviews)  

RPROG-112 (4 Days)
Request Pricing for Analyzing Big Data with R Programming

Big Data with R Training Overview

Accelebrate's Analyzing Big Data with R Programming training teaches attendees how to use In-memory/on-disk, distributed analysis using H20, Hadoop, and Apache Spark, and how to integrate Microsoft Machine Learning Server and R.

Location and Pricing

Accelebrate courses are taught as private, customized training for groups of 3 or more at your site. In addition, we offer live, private online training for teams who may be in multiple locations or wish to save on travel costs. To receive a customized proposal and price quote for private on-site or online training, please contact us.

In addition, some courses are available as live, online classes for individuals. See a schedule of online courses.

Big Data with R Training Objectives

All students will be able to:

  • Use Data.table to manipulate large datasets that fit in memory
  • Understand batch processing via SQL queries
  • Implement online learning style models using R
  • Describe the difference between Hadoop and Spark
  • Understand the HDFS file format
  • Use the SparkR package to leverage Spark through an R API
  • Manage data via H20 and the R API
  • Implement models using H20 and the R API
  • Explain how Microsoft R Server and Microsoft R Client work
  • Use R Client with R Server to explore big data held in different data stores
  • Visualize data by using graphs and plots
  • Transform and clean big data sets
  • Build and evaluate regression models generated from big data
  • Create, score, and deploy partitioning models generated from big data
  • Use R in the SQL Server and Hadoop environments

Big Data with R Training Outline

Expand All | Collapse All | Printer-Friendly

Introduction
In-memory Big Data: Data.table
  • Why do we need data.table?
  • Why is it 
  • The i and the j arguments in data.table
  • Renaming Columns
  • Adding new columns
  • Binning data (continuous to categorical)
  • Combining categorical values
  • Transforming Variables
  • Group-by functions with data.table
  • Handling missing data
  • Long to Wide and Back
  • Merging datasets together
  • Stacking datasets together (concatenation)
SQL Connections and Sequential data updates
Implementing Online Learning (Sequential Model Updates)
  • The biglm package
Data Munging and Machine Learning Via H20
  • Intro to H20
  • Launching the cluster, checking status
  • Data Import, manipulation in H20
  • Unstructured data analysis: Word2Vec
  • Fitting models in H20
    • Generalized Linear Models
    • Naïve Bayes
    • RandomForest
    • Gradient Boosting Machine (GBM)
  • Ensemble model building
Overview of Hadoop
  • Distributed data versus distributed analytics
  • HDFS and map-reduce
Apache Spark
  • Overview of Spark
  • APIs to use Apache Spark with R
    • Sparklyr versus SparkR
    • R, Python, Java and Scala APIs to Spark
  • Applied Examples using SparkR
  • Data import and manipulation in Spark(R)
  • The Spark machine learning library mllib:
    • General Linear Models
    • Random Forest
    • Naïve Bayes
Microsoft Machine Learning Server Overview
  • What is Microsoft R server
  • Using Microsoft R client
  • The ScaleR functions
Data Munging
  • Understanding XDF files
  • Data I/O
  • Variable transformations
  • Data subsetting, splitting, and merging
Data Summarization
  • Creating visualizations
  • Numerical summaries
Processing Big Data
  • Transforming Big Data
  • Managing datasets
Implementing General Linear Models
  • Establishing and leveraging partitions/clusters
  • Fitting regression models and making predictions
Implementing Other Models
  • Decision Trees and Random Forests
  • Naïve Bayes
Conclusion
Request Pricing for Analyzing Big Data with R Programming

Lecture percentage

50%

Lecture/Demo

Lab percentage

50%

Lab

Course Number:

RPROG-112

Duration:

4 Days

Prerequisites:

In addition to their professional experience, students who attend this course should have:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices
  • Basic knowledge of the Microsoft Windows operating system and its core functionality

Training Materials:

All R training students receive comprehensive courseware.

Software Requirements:

  • R 3.0 or later with console
  • IDE or text editor of your choice (RStudio recommended)

Contact Us:

Accelebrate’s training classes are available for private groups of 3 or more people at your site or online anywhere worldwide.

Don't settle for a "one size fits all" public class! Have Accelebrate deliver exactly the training you want, privately at your site or online, for less than the cost of a public class.

For pricing and to learn more, please contact us.

Contact Us Train For Us

Have you read our Google reviews?

Toll-free in US/Canada:
877 849 1850
International:
+1 678 648 3113

Fax: +1 404 420 2491

925B Peachtree Street, NE
PMB 378
Atlanta, GA 30309-3918
USA

Subscribe to our Newsletter:

Never miss the latest news and information from Accelebrate:

Microsoft Gold Partner

Please see our complete list of
Microsoft Official Courses

Recent Training Locations

Alabama

Birmingham

Huntsville

Montgomery

Alaska

Anchorage

Arizona

Phoenix

Tucson

Arkansas

Fayetteville

Little Rock

California

Los Angeles

Oakland

Orange County

Sacramento

San Diego

San Francisco

San Jose

Colorado

Boulder

Colorado Springs

Denver

Connecticut

Hartford

DC

Washington

Florida

Fort Lauderdale

Jacksonville

Miami

Orlando

Tampa

Georgia

Atlanta

Augusta

Savannah

Hawaii

Honolulu

Idaho

Boise

Illinois

Chicago

Indiana

Indianapolis

Iowa

Cedar Rapids

Des Moines

Kansas

Wichita

Kentucky

Lexington

Louisville

Louisiana

New Orleans

Maine

Portland

Maryland

Annapolis

Baltimore

Frederick

Hagerstown

Massachusetts

Boston

Cambridge

Springfield

Michigan

Ann Arbor

Detroit

Grand Rapids

Minnesota

Minneapolis

Saint Paul

Mississippi

Jackson

Missouri

Kansas City

St. Louis

Nebraska

Lincoln

Omaha

Nevada

Las Vegas

Reno

New Jersey

Princeton

New Mexico

Albuquerque

New York

Albany

Buffalo

New York City

White Plains

North Carolina

Charlotte

Durham

Raleigh

Ohio

Akron

Canton

Cincinnati

Cleveland

Columbus

Dayton

Oklahoma

Oklahoma City

Tulsa

Oregon

Portland

Pennsylvania

Philadelphia

Pittsburgh

Rhode Island

Providence

South Carolina

Charleston

Columbia

Greenville

Tennessee

Knoxville

Memphis

Nashville

Texas

Austin

Dallas

El Paso

Houston

San Antonio

Utah

Salt Lake City

Virginia

Alexandria

Arlington

Norfolk

Richmond

Washington

Seattle

Tacoma

West Virginia

Charleston

Wisconsin

Madison

Milwaukee

Alberta

Calgary

Edmonton

British Columbia

Vancouver

Manitoba

Winnipeg

Nova Scotia

Halifax

Ontario

Ottawa

Toronto

Quebec

Montreal

Puerto Rico

San Juan

© 2013-2020 Accelebrate, Inc. All Rights Reserved. All trademarks are owned by their respective owners.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.