0% found this document useful (0 votes)
4 views2 pages

Assignment 2 - Bayesian Classification

Uploaded by

farmar.marfar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views2 pages

Assignment 2 - Bayesian Classification

Uploaded by

farmar.marfar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

DATA 430 – Assignment 2: Bayesian Classification

Introduction
Bayesian classification is used to predict the probability of a class membership. Despite the
simplicity of the Naïve Bayes assumption of independence among explanatory variables, these
classifiers have been found to perform well with relatively small amount of training data. Naïve
Bayes classifiers can use continuous and categorical independent variables. Examples of the use
of Bayesian classification, among many others, include those in spam detection and sentiment
analysis.
1. General steps:
 Review theoretical background and implementation examples, resources are provided on
the course content page.
 Your dataset will be the same dataset used in Assignment 1: Logistic Regression
 Perform your analysis and fit a Naïve Bayes model using python.
 Run an analysis, perform evaluation, and capture the results.
 Document your findings and analysis in a technical report using the template that accom-
panies these instructions.

2. Deliverables with critical areas:


Overview: areas to address:
 Problem Domain: give some background and context about the problem domain (appli-
cation area). For instance, if you are doing the analysis for predicting heart disease, pro-
vide some context about the disease and include some interesting statistics about it. Also,
discuss how the method is relevant for the chosen problem.
 Objective: clearly state the objective of the analysis in relation to the kind of algorithm
you are employing. Use specific language as to what question(s) you are trying to answer
using the specific analysis/modeling type.
Analysis: areas to address:
 Exploratory Analysis: describe the data including the source, the collection method and
variables. Perform exploratory analysis. Also, select few key variables (including the tar-
get variable for supervised learning) and study their distributions using plots such as his-
tograms, box plot, bar chart, etc.
 Preprocessing: armed with the exploratory analysis, perform the necessary preprocess-
ing, both general and specific types appropriate for the modeling type being employed.
 Model Fitting: explain the key steps and activities you perform to fit the model. Experi-
ment (as appropriate) with parameters tuning. This is key, what separates highly accurate
model from a less accurate ones is the amount of performance tuning performed.
Results: areas to address:

1
 Model Properties: explain the components of the fitted model and their characteristics.
Leverage functions to summarize the model properties. Also, leverage visualization as re-
quired.
 Output Interpretation: explain the result and interpret the final model output using
terms that reflect the application area and in relation to the stated objective. This is where
you check whether or not the stated objective is met.
 Evaluation: employ appropriate metrics to quantitatively evaluate the performance of the
fitted model. For supervised classification, this includes simple accuracy, precision & re-
call (or sensitivity & specificity), all of which can be generated from a confusion matrix,
or ROC.
Conclusion: areas to address:
 Summary: highlight the main findings in relation to the stated objective. You don’t need
to discuss the details of the analysis and the model such as accuracy here, just focus on
the key findings.
 Limitations & Improvement areas: discuss the limitations of the analysis and identify
potential improvement areas for future work. This could be related to the data, algorithm
or a combination of the two.
Miscellaneous:
 Use the template that accompanies these instructions to submit your responses to each
section, with Python code and any extended model outputs in the Appendix section
 Proofread your report for correct structure, grammar, and spelling.
 The report should be entirely in your own words, no direct quotes from any source.
However, keep in mind that any original ideas, information or interpretation of your
dataset or regarding the general use of any algorithm, method, or model that you may dis-
cover from a source must be cited. Follow appropriate APA format and provide all nec-
essary references. If you have any questions about this requirement, please ask your in-
structor for clarification.
 Graphics, figures, or tables should be titled and explained. For example, screen captures
generated should be assigned a figure title and label (e.g. Figure 1.xxx) and have a de-
scription associated with that figure providing details and context for the image.
 In addition to any python code that you may include in the Appendix, you should submit
all your python script as a separate file when you submit the assignment to LEO
The total length of the report should be 6-9 pages, single-spaced within the text areas of the
template provided, excluding the appendix, and python script. Large code snippets and graphs
should be in the appendix.

You might also like