Data Engineering
DataFlow MW
Automated ETL Pipeline & Analytics Dashboard for Public Health Data
Problem
Malawi's Ministry of Health publishes data on DHIS2 (District Health Information System), but the raw data is messy, siloed, and inaccessible to analysts who need it in a queryable, visual format.
Solution
DataFlow MW is an Airflow-orchestrated ETL pipeline that pulls data from DHIS2 API nightly, cleans and normalizes it with Python/Pandas, loads it to PostgreSQL, and surfaces it through a Grafana dashboard with district-level health indicators.
Real-World Impact
Demonstrates a production-grade data engineering workflow applicable to any NGO or government body that needs automated reporting. The pipeline pattern is reusable for any DHIS2 instance across Africa.
Challenges Faced
DHIS2's API pagination and rate limiting required careful backoff logic. Data quality was highly variable — some districts had 40% missing values requiring sophisticated imputation strategies.
Key Learnings
Data engineering discipline (idempotent pipelines, proper logging, retry logic) is what separates a script from a production system.
Demo & Execution Screenshots

