← All Projects

Data Engineering

DataFlow MW

Automated ETL Pipeline & Analytics Dashboard for Public Health Data

PythonApache AirflowPostgreSQLGrafanaDHIS2 APIDocker

Problem

Malawi's Ministry of Health publishes data on DHIS2 (District Health Information System), but the raw data is messy, siloed, and inaccessible to analysts who need it in a queryable, visual format.

Solution

DataFlow MW is an Airflow-orchestrated ETL pipeline that pulls data from DHIS2 API nightly, cleans and normalizes it with Python/Pandas, loads it to PostgreSQL, and surfaces it through a Grafana dashboard with district-level health indicators.

Real-World Impact

Demonstrates a production-grade data engineering workflow applicable to any NGO or government body that needs automated reporting. The pipeline pattern is reusable for any DHIS2 instance across Africa.

Challenges Faced

DHIS2's API pagination and rate limiting required careful backoff logic. Data quality was highly variable — some districts had 40% missing values requiring sophisticated imputation strategies.

Key Learnings

Data engineering discipline (idempotent pipelines, proper logging, retry logic) is what separates a script from a production system.

Demo & Execution Screenshots

DataFlow MW screenshot 1DataFlow MW screenshot 2

← Previous

AgriPulse

Next →

ChainVerify