top of page

NCAA Crossroads Classic Analytics Challenge

Predicted consumer activity type for the NCAA Women’s Basketball Tournament using an ensemble of XG Boost, Random Forest and Logistic Regression, achieving an accuracy of 98.5%. Team secured 3rd place at the university level.




 


Project Objective:

Develop predictive models using Division I Women’s Basketball customer data and external datasets to forecast customer ticket purchase behavior and identify the market (primary or secondary) through which tickets will be purchased.


Data Overview:

  • Dataset: Division I Women’s Basketball dataset

  • Records: 200,000 rows

  • Features: 25 predictor variables

  • New Features Created: 10 additional features to capture relationships and improve model accuracy


Data Preprocessing:

Null Imputation:

  • Categorical columns: Replaced nulls with 'Unknown'

  • Numerical columns: Median imputation


Feature Engineering:

  • Duration of events

  • College basketball rankings

  • Engagement score (based on email engagements)

  • Temporal attributes

  • Consumer information by zip code


Scaling: Standard Scaler used for both test and train datasets


Multicollinearity: Dropped columns with high multicollinearity, such as “CustomerInstitutionAffinity”


Model Building:


Models Used:

  • XGBoost: Estimators = 100, Learning Rate = 0.1, Max Depth = 6, Eval Metric = log loss

  • Random Forest: Iterations = 150, Learning Rate = 0.05, Max Depth = 8, Min Sample Split = 2, Min Samples Leaf = 1

  • Logistic Regression: Iterations = 800, C = 1.0



Model Performance:

  • Validation Accuracy: 99.1%

  • Public Leaderboard Accuracy: 98.575%

  • Precision: 71.031%

  • Recall: 69.673%


Recommendations:

  • Loyalty & Rewards Program: Introduce loyalty programs to reward customers for primary market purchases.

  • Data-Driven Event Management: Use predictive insights for better event logistics, venue selection, and staffing.

  • Use External Data: Incorporate economic trends, social media sentiment analysis, etc., to enhance model performance.

  • Feedback Loops: Capture customer feedback post-event to refine models.

  • Enhance Customer Service: Use customer likelihood insights to provide tailored assistance.


Achievements

  • Predicted consumer activity type with 98.5% accuracy.

  • Secured 3rd place at the university level.


bottom of page