Trained and evaluated ML classification models to predict the probability of default on a loan with a dataset about US small businesses provided by the US Small Business Administration.
Final model: XGBoost with 93% test accuracy and 93% test sensitivity.

Report Repository

Abstract

The goal of the analysis is to predict whether or not a loan should be approved based on a dataset provided by the U.S. Small Business Administration (SBA), available on Kaggle. In order to do so default predictive classification models are estimated to understand which variables are most likely to influence this process, using logistic regression, decision trees, random forest and gradient boosting. This dataset has been chosen because the SBA is a reliable government organization, founded in 1953, that fosters small business formation and growth, which have considerable social benefits by creating job opportunities and reducing unemployment in the United States of America.

Authors

  • Luigi Noto
  • Giacomo Bugli
  • Chiara D’Ignazio
  • Davide Drago
  • Nunzio Fallico
  • Mert Tekdemir