Introduction

In this project, I explored the IBM HR Analytics Employee Attrition & Performance dataset with the goal of predicting employee attrition. The aim was to understand the factors that contribute to employees leaving the company and to build a predictive model that could support strategic HR decisions. The final solution also includes a Streamlit web app for interactive exploration of results, and the github repositories.

Exploratory Data Analysis (EDA)

The project began with an initial exploratory data analysis (EDA) combined with basic data cleaning. I removed irrelevant columns, assessed the distribution of the target variable (Attrition), and identified key categorical variables, distinguishing between ordinal and nominal types.

image.png

I created various visualizations to analyze the relationships between features and attrition. Early observations showed that variables like MonthlyIncome and TotalWorkingYears seemed to have significant influence on whether an employee was likely to leave the company.

image.png

To support this, I generated a correlation heatmap for numerical variables. Although most correlations were expected (e.g., tenure and job level), they didn't reveal particularly novel insights for modeling purposes.

image.png

Feature Engineering & Preprocessing

Next, I prepared the data for modeling. This included:

Model Benchmarking

I tested several classification algorithms to identify the best model for this problem, including: