GCP Data Engineering-End to End Project-Healthcare Domain
Last updated 4/2025
Duration: 7h 47m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 3.51 GB
Genre: eLearning | Language: English
Last updated 4/2025
Duration: 7h 47m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 3.51 GB
Genre: eLearning | Language: English
Industry Standard Project in Healthcare Domain using GCP services like GCS, BigQuery, Dataproc, Composer, GitHub, CICD
What you'll learn
- Understand the End to End Data Engineering Project
- Design and Implement Scalable ETL Pipelines for Healthcare Data
- Implement Key Techniques like Incremental Data, SCD2, Metadata driven approach, Medallion Arch, Error Handling, CDM , CICD & Many more..
- Develop and Deploy Data Solutions with CI/CD Practices
Requirements
- Basic Knowledge on Python and SQL
Description
This project focuses on building a data lake in Google Cloud Platform (GCP) for Revenue Cycle Management (RCM) in the healthcare domain.
The goal is to centralize, clean, and transform data from multiple sources, enabling healthcare providers and insurance companies to streamline billing, claims processing, and revenue tracking.
GCP Services Used:
Google Cloud Storage (GCS):Stores raw and processed data files.
BigQuery:Serves as the analytical engine for storing and querying structured data.
Dataproc:Used for large-scale data processing with Apache Spark.
Cloud Composer (Apache Airflow):Automates ETL pipelines and workflow orchestration.
Cloud SQL (MySQL):Stores transactional Electronic Medical Records (EMR) data.
GitHub & Cloud Build:Enables version control and CI/CD implementation.
CICD (Continuous Integration & Continuous Deployment):Automates deployment pipelines for data processing and ETL workflows.
Techniques involved :
Metadata Driven Approach
SCD type 2 implementation
CDM(Common Data Model)
Medallion Architecture
Logging and Monitoring
Error Handling
Optimizations
CICD implementation
many more best practices
Data Sources
EMR (Electronic Medical Records) data from two hospitals
Claims files
CPT (Current Procedural Terminology) Code
NPI (National Provider Identifier) Data
Expected Outcomes
Efficient Data Pipeline: Automating the ingestion and transformation of RCM data.
Structured Data Warehouse: gold tables in BigQuery for analytical queries.
KPI Dashboards: Insights into revenue collection, claims processing efficiency, and financial trends.
Who this course is for:
- Aspiring Data Engineers, Data Professionals
More Info