Data Manipulation and Analysis

This part works on data harvesting, processing, aggregation, and analysis in Python jupyter notebook.

Introduction

Data analysis is crucial to evaluating and designing solutions and applications, as well as understanding user's information needs and use. In many cases the data we need to access is distributed online among many webpages, stored in a database, or available in a large text file. Often these data (e.g. web server logs) are too large to obtain and/or process manually.
We need an automated way of gathering data, parsing it, and summarizing it before more advanced analysis.
Topics would contain techniques of exploratory data analysis, using scripting, text parsing, structured query language, regular expressions, graphing, and clustering methods to explore data.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Some data ready to use		Some data ready to use
Basic Data Manipulation.ipynb		Basic Data Manipulation.ipynb
CNAME		CNAME
Classification.ipynb		Classification.ipynb
Clustering for handwriting and document.ipynb		Clustering for handwriting and document.ipynb
Clustering for music preference and Vector Quantization.ipynb		Clustering for music preference and Vector Quantization.ipynb
Dimension Reduction for gene expression dataset.ipynb		Dimension Reduction for gene expression dataset.ipynb
Dimension_Reduction Implementation.ipynb		Dimension_Reduction Implementation.ipynb
Dimensionality Reduction Notes.pdf		Dimensionality Reduction Notes.pdf
Getting_Started.ipynb		Getting_Started.ipynb
Natural Language Processing Introduction.ipynb		Natural Language Processing Introduction.ipynb
Natural Language Processing for Project Gutenberg.ipynb		Natural Language Processing for Project Gutenberg.ipynb
Pivoting, contingency tables, crosstabs, mosaic plots and chi-squared.ipynb		Pivoting, contingency tables, crosstabs, mosaic plots and chi-squared.ipynb
README.md		README.md
Univariate Statistics.ipynb		Univariate Statistics.ipynb
Visualization, Correlation, and Linear Models1.ipynb		Visualization, Correlation, and Linear Models1.ipynb
Visualization, Correlation, and Linear Models2-case based.ipynb		Visualization, Correlation, and Linear Models2-case based.ipynb
_config.yml		_config.yml
pandas operations.ipynb		pandas operations.ipynb