Rodney Gitonga | Data Analytics Portfolio

The Bottleneck

At the end of every month, branch managers would email Excel files with varying formats (merged cells, different headers) to the central team. Collating this took a full working day.

The Script

I developed a Python script using the `pandas` and `os` libraries to:

Scan: Iterate through a designated directory for .xlsx files.
Normalize: Map different column headers (e.g., "CustID", "CIF", "Client ID") to a standard schema.
Clean: Drop empty rows, fill missing numerical values with 0, and validate date formats.
Consolidate: Append all clean dataframes into a master list and export to a single CSV.

Outcome

The process now takes 14 seconds. The team uses the saved time for deeper analysis of the sales data rather than just preparing it.

Automated Data Cleaning Pipeline

Challenge Overview

The Problem

My Approach

Key Lesson

The Bottleneck

The Script

Outcome