Spark for Developers

This training class meant for developers and data analysts will introduce Apache Spark.


Delegates will learn


park fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX.

Scala primer

A Quick Introduction to Scala


Spark Basics

Background and History

Spark and Hadoop

Spark Concepts and Architecture

Spark eco System (core, spark sql, mlib, streaming)


RDDs

Running Spark in Local Mode

Spark Web UI

Spark Shell

Analyzing Dataset - part 1

Inspecting RDDs


RDDs In Depth

Partitions

RDD Operations / Transformations

RDD Types

Key-Value Pair RDDs

MapReduce on RDD

Caching and Persistence


Spark and Hadoop

Hadoop Intro (HDFS / YARN)

Hadoop + Spark Architecture

Running Spark on Hadoop YARN

Processing HDFS Files Using Spark


Spark API programming

Introduction to Spark API / RDD API

Submitting the First Program to Spark

Debugging / Logging

Configuration Properties


Spark SQL

SQL Context

Defining Tables and Importing Datasets

Querying


Spark Streaming

Streaming Overview

Streaming Operations

Sliding Window Operations

Writing Spark Streaming Applications


Spark Mlib

mlib Intro

mlib Algorithms

Writing mlib Applications


Spark GraphX

GraphX Library Overview

GraphX APIs

Processing Graph Data Using Spark


Spark Performance and Tuning

Broadcast Variables

Accumulators

Memory Management

Courses that can help you meet these prerequisites: Introduction to Python Java SE Programming Essentials

Program Details
Duration 3 Days
Capacity Max 12 Persons
Training Type Classroom / Virtual Classroom


Can't find what you're looking for? Let us know if you have a query or cannot find what you are looking for.

Contact