Bankruptcy Prediction

Developed a predictive model for bankruptcy using 64 financial indicators, an ensemble of XGBoost and neural networks, with 91% accuracy, to significantly improve financial risk assessment.

Project Overview:

Our project aimed to predict the probability of bankruptcy for firms by analyzing 64 economic indicators. This binary classification problem sought to identify financially distressed entities, a crucial task with significant implications, such as avoiding misclassification errors that could lead to either denying creditworthy firms or accepting firms in distress.

Key variables:

(Net profit + Depreciation) / Total liabilities
(Gross profit + Depreciation) / Total liabilities
These metrics highlight a company's ability to cover liabilities through operational earnings and efficiency.

Data Processing:

Filter Node: Removed outliers to avoid skewed predictions.
Impute Node: Replaced null values with mean values to maintain predictive accuracy.
Data Partition Node: Split data into 80% training and 20% validation sets.

Modeling & Evaluation:

Implemented an ensemble of Gradient Boosting and Neural Network.
Used an Ensemble Node to average results from both models.
Achieved an accuracy of 93.38% on half the test data.
Accuracy dropped to 90.88% on the entire test data set, indicating overfitting.

Challenges and Lessons Learned:

Overfitting: The model performed well on training data but showed decreased accuracy on validation data. Addressing class imbalance and avoiding high VIF (Variance Inflation Factor) columns could have mitigated overfitting.
Iterations and Depth: Limiting tree depth and balancing iterations would have improved model generalization.

Future Enhancements:

Logarithmic Transformations: Applying log transformations to the 15 most skewed variables for consistent data distribution.
Square Transformations: Enhancing the significance of top 10 most important variables through square transformations.

Conclusion:

This project underscores the importance of robust preprocessing and model tuning in predictive analytics. The experience gained will guide future endeavors in refining predictive models and achieving better generalization across varied datasets.