0% found this document useful (0 votes)
139 views10 pages

Rohit Godke Dsbda Report Sppu

The document describes a mini project on used car price prediction using machine learning algorithms. It discusses developing models to accurately predict used car prices based on their features. Various algorithms like linear regression, ridge regression, lasso regression, decision trees and ensemble methods like random forest, gradient boosting and XGBoost are implemented and compared. The project involves collecting a dataset, preprocessing it, training models and selecting the best performing algorithm based on accuracy. The selected model can then predict prices for new cars based on input features.

Uploaded by

Aniket Bhoknal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
139 views10 pages

Rohit Godke Dsbda Report Sppu

The document describes a mini project on used car price prediction using machine learning algorithms. It discusses developing models to accurately predict used car prices based on their features. Various algorithms like linear regression, ridge regression, lasso regression, decision trees and ensemble methods like random forest, gradient boosting and XGBoost are implemented and compared. The project involves collecting a dataset, preprocessing it, training models and selecting the best performing algorithm based on accuracy. The selected model can then predict prices for new cars based on input features.

Uploaded by

Aniket Bhoknal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

lOM oA R c P S D | 11 221 53 7

Savitribai Phule Pune University


A Report

On
Mini Project
Prepared by
Ghodke Rohit Sanjay
Pawar Baban Navanath
Mane Kunal Dhanajay

709

DEPARTMENT OF COMPUTER TECHNOLOGY

DEPARTMENT OF COMPUTER ENGINEERING


VISHWABHARATI ACADEMY’S COLLAGE OF ENGINEERING & POLYTECHNIC,
Sarola Baddi, Jamkhed Road, Ahmednagar, 414201.
ACADEMIC YEAR 2021-22
lOM oA R c P S D | 11 221 53 7

DEPARTMENT OF COMPUTER ENGINEERING

CERTIFICATE

This is to certify that Mr. Ghodake Rohit Sanjay in Third Year Computer Engineering has
successfully completed his mini Project on Data Science and Big data analytics for academic year
2021-22 on Mini Project Used Car Prediction System at Vishwabharti Academy College of
Engineering, Ahemdnagar towards partial fulfilment of Computer Engineering.

Prof. Sapike N. S Prof.Joshi S.G


Guide Head Of Department

Prof. Dhondage V.S.


PRINCIPAL
lOM oA R c P S D | 11 221 53 7

Acknowlegement
Achievement is Finding out what you have been doing and what you have to do. The higher is
submit,the harder is climb. The goal was fixed and I began with the determined resolved and put in a
ceaseless sustained hard work. Greater the challenge, greater was our determination and it guided us
to overcome all difficulties. It has been rightly said that we are built on the shoulders of others. For
everything I have achieved, the credit goes to who had really help us to complete this seminar and
for the timely guidance and infrastructure. Before we proceed any further, We would like to thank
all those who have helped me in all the way through. To start with I thank my guide, for his
guidance, care and support, which he offered whenever I needed it the most. I would also like to take
this opportunity and our respected Head of Department Prof.Joshi S.G,. I also thankful to
Honorable Principal Prof.Nawale Mam for his encouragement and support.
lOM oA R c P S D | 11 221 53 7

Content

Sr No. Title Page No.

1. Introduction 1

2. Methodology 3

3. Proposed System 4

4. Requirements (Hardware, Software) 7

5. Implementation 8

6. Conclusion 10
lOM oA R c P S D | 11 221 53 7

1 Introduction

1.1 Introduction:
Determining whether the listed price of a used car is a challenging task, due
to the many factors that drive a used vehicle’s price on the market. The focus
of this project is developing machine learning models that can accurately
predict the price of a used car based on its features, in order to make
informed purchases. We implement and evaluate various learning methods
on a dataset consisting of the sale prices of different makes and models . We
will compare the performance of various machine learning algorithms like
Linear Regression, Ridge Regression, Lasso Regression, Elastic Net,
Decision Tree Regressor and choose the best out of it. Depending on various
parameters we will determine the price of the car. Regression Algorithms are
used because they provide us with continuous value as an output and not a
categorized value because of which it will be possible to predict the actual
price a car rather than the price range of a car.

1.2 Motivation:
Data Analysis is an area of rapidly growing diversity. It can be defined in
relationship to the need to parse large datasets from multiple sources and to
produce information inreal time or near real time.
It requires massive performance and scalability-common problems that old
platforms can’t scale to big data volumes, load data too slowly, respond to
queries too slowly, lack processing capacity for analytics and can’t handle
concurrent mixed workloads Big data analytics is the application of
advanced analytic techniques to very big datasets.
Cars Prediction system analysis is collection of newly released movies and
their features who’s attributed are segregated using different library
functions, which will help us to get familiar with top rated movies instead of
asking public review.
lOM oA R c P S D | 11 221 53 7

2 Methodology

There are two primary phases in the system: 1. Training phase: The system is trained by
using the data in the data set and fits a model (line/curve) based on the algorithm chosen
accordingly. 2. Testing phase: the system is provided with the inputs and is tested for its
working. The accuracy is checked. And therefore, the data that is used to train the model or
test it, has to be appropriate. The system is designed to detect and predict price of used car
and hence appropriate algorithms must be used to do the two different tasks. Before the
algorithms are selected for further use, different algorithms were compared for its accuracy.
The well-suited one for the task was chosen
1. 1. Linear Regression Linear Regression was chosen as the first model due to its simplicity
and comparatively small training time. The features, without any feature mapping, were used
directly as the feature vectors. No regularization was used since the results clearly showed
low variance. 2. Random Forest Random Forest is an ensemble learning based regression
model. It uses a model called decision tree, specifically as the name suggests, multiple
decision trees to generate the ensemble model which collectively produces a prediction. The
benefit of this model is that the trees are produced in parallel and are relatively uncorrelated,
thus producing good results as each tree is not prone to individual errors of other trees. This
uncorrelated behavior is partly ensured by the use of Bootstrap Aggregation or bagging
providing the randomness required to produce robust and uncorrelated trees. This model was
hence chosen to account for the large number of features in the dataset and compare a
bagging technique with the following gradient boosting methods.
3. Gradient Boost Gradient Boosting is another decision tree based method that is
generally described as “a method of transforming weak learners into strong learners”. This
means that like a typical boosting method, observations are assigned different weights and
based on certain metrics, the weights of difficult to predict observations are increased and
then fed into another tree to be trained. In this case the metric is the gradient of the loss
function. This model was chosen to account for non-linear relationships between the features
and predicted price, by splitting the data into 100 regions. 4. XGBoost Extreme Gradient
Boosting or XGBoost [4] is one of the most popular machine learning models in current
times. XGBoost is quite similar at the core to the original gradient boosting algorithm but
features many additive features that significantly improve its performance such as built in
support for regularization, parallel processing as well as giving additional hyperparameters to
tune such as tree pruning, sub sampling and number of decision trees. A maximum depth of
16 was used and the algorithm was run on all cores in parallel. 5. LightGBM Light GBM [5]
is another gradient boosting based framework which is gaining popularity due it higher speed
and accuracy compared to XGBoost or the original gradient boosting method. Similar to
lOM oA R c P S D | 11 221 53 7

3 Proposed System

As shown in the above figure, the process starts by collecting the dataset. The next step is to do Data
Preprocessing which includes Data cleaning, Data reduction, Data Transformation. Then, using
various machine learning algorithms we will predict the price. The algorithms involve Linear
Regression, Ridge Regression and Lasso Regression. The best model which predicts the most accurate
price is selected. After selection of the best model the predicted price is displayed to the user
according to user’s inputs. User can give input through website to for used car price prediction to
machine learning model. Linear Regression Linear Regression attempt to model the relationship
between two variables by fitting a linear equation to observed data. The other is considered to be
dependent variable. For Example: A modeler might want to relate weights of individuals to their
heights using a linear regression model Fig – 2: Linear Regression Linear regression is useful for
finding relationship between multiple continuous variables There are multiple independent variables
and single independent variable y = m1X1+m2X2+……+b m1, m2, m3 ….  slope b  y intercept
X1, X2, X3 ……  independent variables y  dependent variables. Ridge Regression A Ridge
regressor is basically a regularized version of Linear Regressor. The regularized term has the
parameter ‘alpha’ which controls the regularization of the model i.e helps in reducing the variance of
the estimates.
lOM oA R c P S D | 11 221 53 7

4 Requirements

Hardware requirements

Operating system- Windows 7,8,10


Processor- dual core 2.4 GHz (i5 or i7 series Intel processor or equivalent AMD)
RAM-4GB

Software Requirements
Python Pycharm PIP 2.7
Jupyter Notebook
Google collab
lOM oA R c P S D | 11 221 53 7

5 Implementation

To analyze the degree to which our features are linearly related to price, we plotted the Price against
Mileage and Year for a particular Make and Model. There seemed to be a fair degree of linearity for
these two features.
lOM oA R c P S D | 11 221 53 7

6 Conclusion
The increased prices of new cars and the financial incapability of the customers to buy them, Used
Car sales are on a global increase. Therefore, there is an urgent need for a Used Car Price Prediction
system which effectively determines the worthiness of the car using a variety of features. The
proposed system will help to determine the accurate price of used car price prediction. This paper
compares 3 different algorithms for machine learning : Linear Regression, Lasso Regression and
Ridge Regression

You might also like