Summer Training Report Onmachine Learningusing Python
Summer Training Report Onmachine Learningusing Python
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted By:-
DRISHTI GUPTA(8716113)
MACHINE LEARNING
(USING PYTHON)
Under the
supervision of
Mr. Manoj Dhiman
Mentor
TCIL-IT
Chandigarh
DECLARATION
Date:
Place:
DRISHTI GUPTA
Roll No.:-8716113
Computer Science and Engineering
ACKNOWLEDGEMENT
First and foremost, we wish to express my profound gratitude to Mr. Manoj Dhiman, Chief
Mentor, TCIL-IT,Chandigarh and for giving us the opportunity to carry out our project at
TCIL-IT. We find great pleasure to express our unfeigned thanks to our trainer Mr.
Jitender Kumar for his invaluable guidance, support and useful suggestions at every stage
of this project work.
No words can express out deep sense of gratitude to Mr. Jitender, without whom this
project would not have turned up this way. Our heart felt thanks to him for his immense
help and support, useful discussions and valuable recommendations throughout the course
of my project work.
We wish to thank our respected faculty and our classmates for their support.
Last but not the least we thank the almighty for enlightening us with his blessings.
DRISHTI GUPTA
Roll No.:-8716113
Computer Science and Engineering
TCIL-IT is a leading company for providing six months industrial training in Chandigarh and
six weeks industrial training in Chandigarh for IT students. The TCIL-IT is training division
of TCIL, a premier engineering organization, is a Government of India Enterprise, Ministry
of Communication and Information Technology associated with administrative control of
department of telecommunications, which was started in the year 1978. Further in the year
1999, ICS had initiated the Six months/Six weeks training division with TCIL-IT, which is
managed by ICSIL in Chandigarh. This joint venture is the coordination of Delhi State
Industrial Infrastructure Development Corporation (DSIIDC) and an undertaking of Delhi
Government & Telecommunication Consultants India Limited (TCIL) itself.
Software Development
We provide the best and latest IT software training which helps all the fresher and the
corporates to understand well and give them the knowledge to go hand in hand with the latest
technologies.
TCIL-IT helps all the new instructors to get the best exposure to show their talent in right
way.
At TCIL-IT, workshops are held to increase the understanding level because theoretical
values are always not enough. We provide the best placement services and for that we give
our best to give you the best.
PREFACE
In the 60 days’ summer training we study about so many languages and then we chose to
learn Machine Learning (with Python) in our summer training used because it is easy to
manage, and it is object oriented and availability of debugging tools. And then we are start
to search the best institute who give us summer training in Python. Then we found that
TCIL-IT is the best company who deal in the Python. Then we start our 60 days’ summer
training from TCIL-IT. First we learn how to make basic programs in Python. Then we start
Machine Leraning concepts with Python. Machine Learning is a field of Artificial
Intelligence that uses statistical techniques to give computer systems the ability to computer
systems to learn from the given dataset, without being explicitly programmed. After 60 days
training we are able to develop applications in Python. In 60 days’ training we implement
this technology to Automation system for house loan predictor.
CHAPTER 2
LITERATURE REVIEW
2.1Python: -
The diverse application of the Python language is a result of the combination of features
which give this language an edge over others. Some of the benefits of programming in
Python include:
The Python Package Index (PPI) contains numerous third-party modules that make
Python capable of interacting with most of the other languages and platforms.
Python provides a large standard library which includes areas like internet protocols,
string operations, web services tools and operating system interfaces. Many high use
programming tasks have already been scripted into the standard library which reduces
length of code to be written significantly.
Python language is developed under an OSI-approved open source license, which makes
it free to use and distribute, including for commercial purposes.
Further, its development is driven by the community which collaborates for its code
through hosting conferences and mailing lists, and provides for its numerous modules.
Python offers excellent readability and uncluttered simple-to-learn syntax which helps
beginners to utilize this programming language. The code style guidelines, PEP 8,
provide a set of rules to facilitate the formatting of code. Additionally, the wide base of
users and active developers has resulted in a rich internet resource bank to encourage
development and the continued adoption of the language.
Python has built-in list and dictionary data structures which can be used to construct fast
runtime data structures. Further, Python also provides the option of dynamic high-level
data typing which reduces the length of support code that is needed.
Python has clean object-oriented design, provides enhanced process control capabilities,
and possesses strong integration and text processing capabilities and its own unit testing
framework, all of which contribute to the increase in its speed and productivity. Python
is considered a viable option for building complex multi-protocol network applications.
“Data science” is just about as broad of a term as they come. It may be easiest to describe
what it is by listing its more concrete components:
Included here: Pandas; NumPy; SciPy; a helping hand from Python’s Standard
Library.
2) Data visualization:- A pretty self-explanatory name. Taking data and turning it into
something colorful.
4) Deep learning:- This is a subset of machine learning that is seeing a renaissance, and
is commonly implemented with Keras, among other libraries. It has seen monumental
improvements over the last ~5 years, such as AlexNet in 2012, which was the first
design to incorporate consecutive convolutional layers.
5) Data storage and big data frameworks:-Big data is best defined as data that is
either literally too large to reside on a single machine, or can’t be processed in the
absence of a distributed environment. The Python bindings to Apache technologies
play heavily here.
6) Odds and ends. Includes subtopics such as natural language processing, and image
manipulation with libraries such as OpenCV.
1. Data loading: Load a dataset “prisoners.csv” using pandas and display the first and last five
rows in the dataset. Then find out the number of columns using describe method in Pandas.
2. Data Manipulation: Create a new column -“total benefitted”, which is the sum of inmates
benefitted through all modes.
3. Data Visualization: Create a bar plot with each state name on the x-axis and their total
benefitted inmates as their bar heights.
Solution:
For data loading, write the below code:-
1 import pandas as pd
2 import matplotlib.pyplot as plot
3 %matplotlib inline
4 file_name = "prisoners.csv"
5 prisoners = pd.read_csv(file_name)
6 prisoners
Now to use describe method in Pandas, just type the below statement:
1 prisoners.describe()
Next in Python with data science article, let us perform data manipulation.
1 prisoners["total_benefited"]=prisoners.sum(axis=1)
2
3 prisoners.head()
And finally, let us perform some visualization in Python for data science article. Refer the
below code:
1 import numpy as np
2 xlabels = prisoners['STATE/UT'].values
3 plot.figure(figsize=(20, 3))
4 plot.xticks(np.arange(xlabels.shape[0]), xlabels, rotation = 'vertical', fontsize = 18)
5 plot.xticks
6 plot.bar(np.arange(prisoners.values.shape[0]),prisoners['total_benefited'],align = 'edge')
OUTPUT:-
2.3MACHINE LEARNING:-
Machine Learning is adept at reviewing large volumes of data and identifying patterns
and trends that might not be apparent to a human. For instance, a machine learning
program may successfully pinpoint a causal relationship between two events. This
makes the technology highly effective at data mining, particularly on a continual,
ongoing basis, as would be required for an algorithm.
Machine Learning technology typically improves efficiency and accuracy over time
thanks to the ever-increasing amounts of data that are processed. This gives the
algorithm or program more “experience,” which can, in turn, be used to make better
decisions or predictions.
A great example of this improvement over time involves weather prediction models.
Predictions are made by looking at past weather patterns and events; this data is then
used to determine what’s most likely to occur in a particular scenario. The more data
you have in your data set, the greater the accuracy of a given forecast. The same basic
concept is also true for algorithms that are used to make decisions or
recommendations.
Machine Learning allows for instantaneous adaptation, without the need for human
intervention.An excellent example of this can be found in security and anti-virus
software programs, which leverage machine learning and AI technology to implement
filters and other safeguards in response to new threats.
These systems use data science to identify new threats and trends. Then, the AI
technology is used to implement the appropriate measures for neutralizing or
protecting against that threat. Data Science has eliminated the gap between the time
when a new threat is identified and the time when a response is issued. This near-
immediate response is critical in a niche where bots, viruses, worms, hackers and other
cyber threats can impact thousands or even millions of people in minutes.
4.Automation
On the flip side, you have a computer running the show and that’s something that is
certain to make any developer squirm with discomfort. For now, technology is
imperfect. Still, there are workarounds. For instance, if you’re employing Data Science
technology in order to develop an algorithm, you might program the Data Science
interface so it just suggests improvements or changes that must be implemented by a
human.
This workaround adds a human gatekeeper to the equation, thereby eliminating the
potential for problems that can arise when a computer is in charge. After all, an
algorithm update that looks good on paper may not work effectively when it’s put
practice.
2.4Numpy
NumPy is the fundamental package for scientific computing with Python. It contains among
other things:
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined. This allows NumPy to
seamlessly and speedily integrate with a wide variety of databases.
NumPy is licensed under the BSD license, enabling reuse with few restrictions. The core
functionality of NumPy is its "ND array", for n-dimensional array, data structure. These
arrays are stride views on memory. In contrast to Python's built-in list data structure (which,
despite the name, is a dynamic array), these arrays are homogeneously typed: all elements of
a single array must be of the same type. NumPy has built-in support for memory-
mappedarrays.
1. zeros (shape [, dtype, order]) - Return a new array of given shape and type, filled with
zeros.
2. array (object [, dtype, copy, order, lubok, ndim]) - Create an array
3. as array (a [, dtype, order]) - Convert the input to an array.
4. As an array (a [, dtype, order]) - Convert the input to an ND array, but pass ND array
subclasses through.
5. arange([start,] stop [, step,] [, dtype]) - Return evenly spaced values within a given
interval.
6. linspace (start, stop [, num, endpoint, ...]) - Return evenly spaced numbers over a
specified interval.
etc. there many functions which are used to perform specified operation on the given
input values
2.5 Pandas
The name Pandas is derived from the word Panel Data – an Econometrics from
Multidimensional data.
In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data munging and preparation. It had very little
contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the origin
of data — load, prepare, manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.
Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
Time Series functionality.
2.6Matplotlib
Matplotlib tries to make easy things easy and hard things possible. You can generate plots,
histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code. For
examples, see the sample plots and thumbnail gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when
combined with IPython. For the power user, you have full control of line styles, font
properties, axes properties, etc, via an object-oriented interface or via a set of functions
familiar to MATLAB users.
2.7Scikit-Learn
Scikit-learn (formerly scikits. learn) is a free software machine learning library for the Python
programming language. It features various classification, regression and clustering algorithms
including support vector machines, random forests, gradient boosting, k-means and
DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries
NumPy and SciPy.
The scikit-learn project started as scikits.learn, a Google Summer of Code project by David
Cournapeau. Its name stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately-
developed and distributed third-party extension to SciPy. The original codebase was later
rewritten by other developers. In 2010 Fabian Pedregosa, Gael Varoquaux, Alexandre
Gramfort and Vincent Michel, all from INRIA took leadership of the project and made the
first public release on February the 1st 2010. Of the various scikits, scikit-learn as well as
scikit-image were described as "well-maintained and popular" in November 2012.
As of 2018, scikit-learn is under active development.
Scikit-learn is largely written in Python, with some core algorithms written in Cython to
achieve performance. Support vector machines are implemented by a Cython wrapper around
LIBSVM; logistic regression and linear support vector machines by a similar wrapper around
LIBLINEAR. [10]
CHAPTER 3
SYSTEM REQUIREMENT SPECIFICATION
To be used efficiently, all computer software needs certain hardware components or other software
resources to be present on a computer. These prerequisites are known as (computer) system
requirements and are often used as a guideline as opposed to an absolute rule. Most software defines
two sets of system requirements: minimum and recommended.Software requirements specification
establishes the basis for an agreement between customers and contractors or suppliers on how the
software product should function.
Non-functional requirements are the functions offered by the system. It includes time constraints and
constraints on the development process and standards. The non-functional requirements are as
follows:
Speed: The system should process the given input into output within appropriate time.
Ease of use: The software should be user friendly. Then the customers can use easily,
so it doesn’t require much training time.
Reliability: The rate of failures should be less then only the system is more reliable
User Interfaces: The external users are the clients. All the clients can use this software
for choosing and buy health plans.
Hardware Interfaces: The external hardware interface used for searching is personal
computers of the clients. The PC’smay be laptops with wireless LAN as the internet
connections provided will be wireless.
Software Interfaces: The Operating Systems can be any version of Windows.
prerequisites are generally not included in the software installation package and need to be installed
separately before the software is installed.
HTML: HTML stands for Hyper Text Markup Language, which is the most widely used language on
Web to develop web pages. Hypertext refers to the way in which Web pages (HTML documents) are
linked together. HTML is a Markup Language which means you use HTML to simply "mark-up" a
text document with tags that tell a Web browser how to structure it to display.
CSS: CSS is the acronym for "Cascading Style Sheet". CSS handles the look and feel part of a web
page. Using CSS, you can control the color of the text, the style of fonts, the spacing between
paragraphs, how columns are sized and laid out, what background images or colors are used, layout
designs,variations in display for different devices and screen sizes as well as a variety of other effects.
Bootstrap:Bootstrap is the popular HTML, CSS and JavaScript framework for developing a
responsive and mobile friendly website. Bootstrap is used to create responsive websites.
1. Processor – 64 bit
2. RAM – 4 GB for development and evaluation.
3. Hard disk –40 GB for installation.
CHAPTER 5
SYSTEM ANALYSIS
The tool has been designed using Jupyter (python integrated development environment), Integrated
development environment, big data, Hadoop. The user interacts with the tool using a GUI.
The GUI operates in two forms, backend, and Frontend.
Frontend shows the interface and Tkinter frames, windows , canvas, etc
Backend used to execute several queries to extract useful insights.
The product handling team module can login into the system in case he has to get feedback.
Eligible Voter can login, register and cast their vote and extract the feedback.
Voter Module:
Frontend shows the interface and Tkinter frames, windows , canvas, etc
Backend used to execute several queries to extract useful insights.
CHAPTER 6:
SYSTEM ANALYSIS TRANSFIGURATION
CHAPTER7:
there is no extra installation required since it comes bundled with Python's compiler.
You can see the cursor blinking right after >>>. This is where you will be writing your code.
Also, the current running version of Python is also mentioned at the top.
In IDLE we write code line by line. One line will handle one thing. You type whatever you want
in that line and press enter to execute it. IDLE works more like a terminal or command prompt -
We can also create python file which will contain the complete multiline program and can
execute that using IDLE as well. A python script has an extension .py.
Python takes some time to compile, its compilation is not fast and thus writing the example code
in a file, then compiling the whole code again and again gets tedious and is not suited for
beginners. When we open the IDLE, a session is created, which saves all the lines of code that
you write and execute in that one window as a single program. This is the reason why, what you
wrote above may affect what you will write later, e.g. using a variable. Here is a preview of how
IDLE is pretty neat in its own way. You can choose custom colors for the background and text, to
give it your own style. There is an auto-complete feature, which predicts what you are typing and
CHAPTER 8
Python is a popular programming language. It was created by Guido van Rossum, and released in
1991.
It is used for:
Why Python?
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer lines than some other
programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-orientated way or a functional way.
Good to know
The most recent major version of Python is Python 3, which we shall be using in this tutorial.
However, Python 2, although not being updated with anything other than security updates, is still
quite popular.
In this tutorial Python will be written in a text editor. It is possible to write Python in an
Integrated Development Environment, such as Thonny, Pycharm, Net beans or Eclipse which are
particularly useful when managing larger collections of Python files.
Python was designed for readability, and has some similarities to the English language with
influence from mathematics.
Python uses new lines to complete a command, as opposed to other programming languages
which often use semicolons or parentheses.
Python relies on indentation, using whitespace, to define scope; such as the scope of loops,
functions and classes. Other programming languages often use curly-brackets for this purpose.
PyCharm Installation
1.Go to this link: https://www.jetbrains.com/pycharm/download/ and download the
community edition.
Launch PyCharm
Mac: Go to the Applications folder and click on the PyCharm icon. Alternatively, you can
drag the icon to your dock to open the IDE quickly by clicking on the icon in dock.
Windows: If you have followed the default installation process then you can see the
PyCharm icon on your desktop. If you don’t find the icon then go to the PyCharm folder –
C:\Program Files (x86)\Jet Brains\PyCharm 2017.1\bin (the path may be different for your
system) and click on the PyCharm.exe file to launch the IDE
1.Now that we have created a Python project, it’s time to create a Python program file to
write and run our first Python program. To create a file, right click on the folder name > New
> Python File (as shown in the screenshot). Give the file name as “Hello World” and click ok.
3.Let’s run the code. Right click on the HelloWorld.py file (or the name you have given while
creating Python file) in the left sidebar and click on ‘Run Hello World’.
4.You can see the output of the program at the bottom of the screen.
CHAPTER 9:
SCREENSHOTS
Conclusion
The main aim of this project is first, to protect Health Insurance Companies from frauds and
lose then second, aim is to reduce the load of maintaining two websites for the Insurance
company. So, on one single website both the user and Insurance company will be able to
perform their own tasks. User will be able to choose a plan and buy policy and at the same time
Insurance company can also update a plan and do prediction. The project is quite beneficial for
health insurance companies as it will protect them from frauds and false medical claims.
This project will be implemented as an online website Advance Python using Django
Framework. User and Insurance company can access this website very easily with the help of
internet.
REFERENCES