
Improving Craigslist's Classification
Optimized Craigslist's classification system by creating an algorithm combining LSTM and Random Forest for Text and Image Classification respectively, reducing misclassifications by 31%.
Overview:
This project aimed to optimize Craigslist's classification system for its computer and computer parts listings. By developing a sophisticated algorithm that combines Long Short-Term Memory (LSTM) networks for text classification and Random Forest for image classification, we achieved a significant reduction in misclassifications by 31%.
Challenges Addressed:
Frequent misclassification of items leading to inefficient searches and user frustration.
Improving the discoverability of accurately listed items to bolster market efficiency.
Solution:
Text Classification: Implemented a comprehensive NLP pipeline including tokenization, stopwords removal, stemming, lemmatization, and TF-IDF vectorization.
Image Classification: Utilized preprocessing steps such as resizing, normalizing, and converting images to RGB color space. The Convolutional Neural Network (CNN) and Random Forest models were applied to classify images.
Algorithm: Combined LSTM and Random Forest models to leverage the strengths of both text and image data for accurate classification.
Performance: Achieved a 19.7% misclassification rate on Craigslist test data, significantly lower than the platform's previous rate of ~30-40%.
Impact:
User Experience: Enhanced search precision, reducing user frustration and improving the overall experience.
Operational Efficiency: Decreased the need for manual oversight, reducing operational costs.
Monetization Opportunities: Improved categorization paves the way for new revenue streams through targeted advertising and premium listing features.