This project focuses on predicting the sales price of used cars based on features such as mileage, age, model, and condition. It helps sellers and buyers set competitive and fair prices in the market.
Data Collection
The dataset includes car attributes like make, model, year, mileage, transmission, fuel type, accident history, and the target variable (price).
Data Preprocessing
The preprocessing steps involved:
- Handling Missing Data: Missing values in attributes like mileage and accident history were imputed based on averages.
- Feature Engineering: Features like car age were derived from the car’s manufacturing year, and one-hot encoding was applied to categorical variables like fuel type and transmission.
Exploratory Data Analysis
Several insights were derived from the EDA:
- Price vs. Mileage: Cars with higher mileage were priced lower, with a strong negative correlation.
- Price vs. Age: Older cars had lower prices, although luxury brands retained their value better over time.
- Correlation Analysis: Mileage, age, and the number of previous owners were found to be significant predictors of price.
Model Development
The following models were developed:
- Linear Regression: Used as the baseline model.
- Random Forest Regressor & XGBoost: Applied to capture non-linear relationships and improve prediction accuracy.
- Hyperparameter Tuning: Tree depth and the number of estimators were optimized using GridSearchCV.
Model Evaluation
- R-squared: 0.89, indicating a strong fit between the predicted and actual prices.
- Mean Absolute Error (MAE): $1,200, meaning the model’s predictions were within $1,200 of the actual price on average.
- Residual Analysis: Residuals were randomly distributed, indicating no bias in the model’s predictions.
Conclusion & Recommendations
Mileage, age, and accident history are the most important factors influencing a car’s price. Buyers and sellers can use this model to estimate fair market prices and adjust pricing strategies accordingly.