Spark for Developers

This training class meant for developers and data analysts will introduce Apache Spark.

Delegates will learn

park fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX.

Scala primer

A Quick Introduction to Scala

Spark Basics

Background and History

Spark and Hadoop

Spark Concepts and Architecture

Spark eco System (core, spark sql, mlib, streaming)


Running Spark in Local Mode

Spark Web UI

Spark Shell

Analyzing Dataset - part 1

Inspecting RDDs

RDDs In Depth


RDD Operations / Transformations

RDD Types

Key-Value Pair RDDs

MapReduce on RDD

Caching and Persistence

Spark and Hadoop

Hadoop Intro (HDFS / YARN)

Hadoop + Spark Architecture

Running Spark on Hadoop YARN

Processing HDFS Files Using Spark

Spark API programming

Introduction to Spark API / RDD API

Submitting the First Program to Spark

Debugging / Logging

Configuration Properties

Spark SQL

SQL Context

Defining Tables and Importing Datasets


Spark Streaming

Streaming Overview

Streaming Operations

Sliding Window Operations

Writing Spark Streaming Applications

Spark Mlib

mlib Intro

mlib Algorithms

Writing mlib Applications

Spark GraphX

GraphX Library Overview

GraphX APIs

Processing Graph Data Using Spark

Spark Performance and Tuning

Broadcast Variables


Memory Management

Courses that can help you meet these prerequisites: Introduction to Python Java SE Programming Essentials

Program Details
Duration 3 Days
Capacity Max 12 Persons
Training Type Classroom / Virtual Classroom

Can't find what you're looking for? Let us know if you have a query or cannot find what you are looking for.