Lab Challenges
A log of technical problems solved, CTFs completed, and practical experiments in data engineering and analytics.
Optimizing High-Volume Transaction Queries
A critical daily report query was taking 45 minutes to execute, causing delays in morning briefings. The table contained over 10 million rows with multiple joins.
Analyzed the execution plan and identified a full table scan on the transaction log table. Created a composite non-clustered index on transaction_date and branch_id. Refactored the query to replace a correlated subquery with a CTE.
"Indexing strategies must align with query patterns. CTEs can improve readability and sometimes performance over subqueries."
Tools Used
Real-time Fraud Detection Dashboard
Fraud analysts needed to see suspicious transaction patterns in real-time, but the existing report was only updated once a day.
Set up a Power BI streaming dataset connected to a Python script that monitored the transaction log API. Designed a dashboard with card visuals for immediate alerts and a line chart for trend analysis over the last hour.
"Real-time dashboards require a different design philosophy than historical reports—focus on immediate outliers rather than deep aggregation."
Tools Used
Automated Data Cleaning Pipeline
Creating the monthly sales report involved manually cleaning and merging 15 different Excel files from various branches, taking 6 hours of manual work.
Wrote a Python script using Pandas to iterate through the folder, read each Excel file, standardize column names, handle missing values, and merge them into a single master dataset ready for analysis.
"Automation not only saves time but drastically reduces human error in data preparation tasks."