Tags
Language
Tags
May 2025
Su Mo Tu We Th Fr Sa
27 28 29 30 1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
SpicyMags.xyz

Spark Performance Tuning For Data Engineers: Part1 - Storage

Posted By: ELK1nG
Spark Performance Tuning For Data Engineers: Part1 - Storage

Spark Performance Tuning For Data Engineers: Part1 - Storage
Published 5/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.36 GB | Duration: 3h 23m

Data Engineering & Apache Spark Optimization Techniques on Databricks to Boost Speed, Reduce cost & Handle Big Data

What you'll learn

Hands on Demo based on different Scenarios & Usecases

Learn the nuances of spark performance tuning

Get detailed insights about different operations in spark

Get clear understanding about how spark configs work hand in hand & best combination for optimal results

Learn to identify and solve bottlenecks & errors in your spark application

Requirements

Basic Spark Architecture & internals

Spark programming in PySpark or Scala

Databricks Cloud Platform

Description

Unlock the true potential of Apache Spark by mastering storage-related performance tuning techniques. This hands-on course is packed with real-world scenarios, guided demos, and practical use cases that will help you fine-tune Spark storage strategies for speed, efficiency, and scalability.This course is perfect for Intermediate Data Engineers & Spark Developers as well as Aspiring Achitects who wants to optimize Spark jobs, reduce resource costs, and ensure fast, reliable performance for large-scale data applications.What You’ll Learn1. Understand how Apache Spark handles storage internally: memory vs disk2. Learn when and how to use Spark caching and persistence effectively3. Compare and choose the right storage levels: MEMORY_ONLY, MEMORY_AND_DISK, etc.4. Use real-world examples and hands-on demos to benchmark storage decisions5. Learn how to monitor storage metrics using the Spark UI6. Handle memory spills, disk I/O bottlenecks, and storage tuning in cluster environments7. Apply best practices for storage optimization in cloud and on-prem Spark clustersWhy Take This Course?100% Hands-on: Focused on practical implementation, not just theoryDesigned for Data Engineers, Spark Developers, and Big Data PractitionersCovers both foundational concepts and advanced tuning techniquesTeaches how to measure performance gains using real metricsHelps you make cost-efficient decisions for big data storageTools & Technologies CoveredApache Spark (2.x and 3.x)DataBricksSpark UIHDFS, DataLake (for storage scenarios)

Overview

Section 1: Introduction

Lecture 1 Introduction

Lecture 2 What is Optimization

Lecture 3 What is Benchmarking

Section 2: Important Concepts

Lecture 4 Spark High Level Architecture

Lecture 5 Spark Job Execution

Lecture 6 Reading Spark UI

Lecture 7 Physical Plans & DAG - Part 1

Lecture 8 Physical Plans & DAG - Part 2

Section 3: Optimizing Storage

Lecture 9 Schema Inference Problem

Lecture 10 Reuse DataFrame

Lecture 11 Column Elimination

Lecture 12 Row Elimination

Lecture 13 Directory Scan Problem

Lecture 14 Optimal File Size

Lecture 15 Haystack Query

Data Engineers & Spark Developers as well as Aspiring Achitects curious about advanced techniques of Performance Tuning & Optimization