top of page

Improving Craigslist's Classification

Optimized Craigslist's classification system by creating an algorithm combining LSTM and Random Forest for Text and Image Classification respectively, reducing misclassifications by 31%.





 


Overview:

This project aimed to optimize Craigslist's classification system for its computer and computer parts listings. By developing a sophisticated algorithm that combines Long Short-Term Memory (LSTM) networks for text classification and Random Forest for image classification, we achieved a significant reduction in misclassifications by 31%.


Challenges Addressed:

  • Frequent misclassification of items leading to inefficient searches and user frustration.

  • Improving the discoverability of accurately listed items to bolster market efficiency.


Solution:

  • Text Classification: Implemented a comprehensive NLP pipeline including tokenization, stopwords removal, stemming, lemmatization, and TF-IDF vectorization.

  • Image Classification: Utilized preprocessing steps such as resizing, normalizing, and converting images to RGB color space. The Convolutional Neural Network (CNN) and Random Forest models were applied to classify images.

  • Algorithm: Combined LSTM and Random Forest models to leverage the strengths of both text and image data for accurate classification.

  • Performance: Achieved a 19.7% misclassification rate on Craigslist test data, significantly lower than the platform's previous rate of ~30-40%.


Impact:

  • User Experience: Enhanced search precision, reducing user frustration and improving the overall experience.

  • Operational Efficiency: Decreased the need for manual oversight, reducing operational costs.

  • Monetization Opportunities: Improved categorization paves the way for new revenue streams through targeted advertising and premium listing features.

bottom of page