Professional Portfolio

ML Fraud Detection System (Digital Payments)

Detects fraudulent digital payment transactions that could harm company profit and reputation. Built using combination of XGBoost, CatBoost, and Random Forest, tuned to balance precision and recall. Accuracy improved from ~39% to >90% and weekly fraud detection increased by ~150%.Looker dashboard was developed for live monitoring, showing consistent accuracy and declining PSI trends. Impact: Prevented financial loss, strengthened trust in the fraud system, and improved model stability.

Post-improvement performance overview (Looker dashboard)

(Figure 1: Weekly Accuracy, Fraud Confirmed, and PSI Trend per Feature. Data covers subsequent monitoring weeks after model update.)

DTTOT Transaction Detection

In payment gateway operations, even a single high-risk transaction linked to a terrorist watchlist can expose the company to serious financial and reputational risks.

To mitigate this, an automated detection system was built to identify transactions potentially connected to names listed in the DTTOT (Suspected Terrorist and Terrorist Organizations) registry.

The system, developed in Python and orchestrated via Airflow, automatically scrapes the official DTTOT name and compares it with customer and merchant name from the company’s daily transaction records using a Levenshtein-based name similarity algorithm.

Each transaction receives a risk score and classification — Similar, Suspicious, or Not Similar — and those flagged as Similar are automatically held for further review by the finance and compliance teams.
Impact: Strengthened AML compliance, prevented reputational risks, and automated watchlist monitoring.

(Figure 2 : Sample output of DTTOT Similarity Detection (synthetic data for illustration only)

Credit Risk Scorecard Model (Collection Segmentation)

In lending operations, collection activities can be costly — assigning the wrong treatment (e.g., field visit vs. call) often leads to unnecessary expenses and lower recovery.
To address this, a credit risk scorecard model was developed to predict clients’ payment probability within 15 days and segment them into high or low treatment groups.

The model combines statistical and machine learning approaches (e.g., logistic regression with WOE transformation and feature regularization) to ensure both interpretability and stable performance over time.
By applying this segmentation, the company can allocate only 74% of contracts to high treatment while still achieving the highest possible net gain (payment collected – collection cost) compared to any existing method

Impact: Improved net-gain (=profit - cost) through data-driven treatment allocation.

(Figure 3 : Average Net Gain by Treatment Share. The optimal point (74%) yields the highest profit when using the model

LLM-Enabled Report Reader (Streamlit App)

Not every one can understand daily or weekly report coming out from data division many times we have to held some meetings to explain and that can be time consuming.

To address this, an LLM-powered Report Assistant was developed using Streamlit, enabling users to upload reports and interactively ask questions or request summaries in natural language.
The system uses a lightweight Retrieval-Augmented Generation (RAG) setup with Llama 3.2 (1B), retrieving relevant report sections and generating contextual answers aligned with common business questions discussed in meetings.

Impact: Reduced meeting time and improved accessibility of complex analytics reports for non-technical teams.

(Figure 4 : LLM-enabled Streamlit interface for interactive report summarization and Q&A.

Automatic Monitoring Dashboard for Collectors Performance

Automated daily Power BI dashboard tracking 13 behavioral and operational metrics — attendance, punctuality, engagement, and visit consistency.
Normalized into an overall performance score, automatically flagging bottom performers or suspicious visit patterns.

Impact: Enhanced transparency, improved field efficiency, and enabled bias-free performance review.

(Figure 5 : Power BI dashboard highlighting collector score ranking, percentile grouping, and detailed metric breakdown per employee)

Customer Satisfaction Prediction

In the collection process, the desk team regularly calls clients to gather feedback about field agent visits. These responses are written in free text, making it difficult to identify whether clients are satisfied or dissatisfied.
To solve this, a deep learning model was developed using TensorFlow to automatically classify client feedback sentiment as positive or negative, based on historical QA-tagged responses stored in the database.
The model was integrated into the company’s Airflow pipeline, running daily to analyze new feedback and store sentiment results back into the data warehouse.

Impact:
• Automated sentiment classification with high accuracy.
• Enabled real-time monitoring of client satisfaction trends.
• Accelerated performance evaluation and coaching for field agents.

(Figure 6: Example of model output displaying daily customer feedback records with predicted sentiment classification, confidence scores, and satisfaction flags generated from synthetic data)

Individual Projects

To support my profile and showcase my skill I also created Individual projects stored in git hub as below:

My Git Hub