Skip to content

jaideep-siva/Visualization_and_Featureselection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Data Visualization Report from Jupyter Notebook

This report synthesizes key insights and visualizations from a diabetics dataset on data visualization. It aims to succinctly present the analytical and visual findings contained within the attached notebook

Notebook Overview

The notebook encompasses various stages of data analysis, including data importing, cleaning, visualization, feature selection, and model evaluation. Key insights are drawn from the visualizations to understand the data better and inform subsequent modeling decisions.

Data Importing and Cleaning

The notebook begins by importing necessary packages and the dataset, followed by initial data exploration. No missing data values were reported, which simplifies the preprocessing stage. However, outliers and data distribution were carefully analyzed to ensure data quality.

Data Visualization

Data Spread and Distribution

The notebook provides visualizations to understand the data's spread and distribution. It highlights the presence of outliers in features like glucose and blood pressure and notes the class imbalance in the outcomes.

Outlier Visualization

Further visualizations focus on identifying and analyzing outliers across different features. The analysis concludes that most outliers do not significantly impact the output, suggesting their removal might be safe. image

Model Building and Evaluation

Feature Selection

The notebook explores feature selection techniques to identify significant predictors. Glucose and insulin were identified as impactful features, and models built using only these features performed comparably to those using the full feature set.

Select Kbest

image

Principal Component Analysis (PCA)

PCA was employed to reduce dimensionality while retaining the essential variance in the data. The notebook demonstrates that a model with reduced dimensions via PCA can still yield accurate predictions.

image

Advanced Visualization Techniques

UMAP and t-SNE techniques were used for advanced data visualization, providing a deeper understanding of the data's structure.

image

newplot

image

Univariate Feature Selection

Using univariate methods to extract important features image

Conclusion

The notebook concludes with insights on the utility of various analysis and visualization techniques. It notes that while UMAP and t-SNE offer valuable data insights, PCA stands out for its ability to reduce dimensionality effectively without significantly compromising model accuracy.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published