This project focuses on identifying distinct customer segments through a series of analytical steps to enhance promotional strategies and improve marketing efforts.
-
Data Loading and Exploration
- Load the Dataset: Import the dataset into the environment.
- Initial Exploration: Perform exploratory data analysis (EDA) to understand the dataset's structure, summary statistics, and potential anomalies.
-
Data Cleaning
- Handle Missing Values: Identify and address missing data through removal.
- Remove Outliers: Detect and eliminate outliers that could skew analysis results.
- Correct Data Types: Ensure that all data types are appropriate for analysis.
-
Feature Engineering
- Create New Features: Develop new features based on domain knowledge to enhance model performance.
- Encode Categorical Variables: Convert categorical variables into numerical form.
-
Data Preprocessing
- Scale and Normalize Features: Standardize features to ensure uniformity across the dataset.
-
Dimensionality Reduction
- Apply Dimensionality Reduction Techniques: Use Principal Component Analysis (PCA) to reduce the number of features while retaining significant information.
-
Optimal Number of Clusters Determination
- Elbow Method or Silhouette Analysis: Apply to determine the optimal number of clusters.
-
Cluster Analysis
- Cluster the Data: Perform clustering using K-Means based on the results from the Elbow Method or Silhouette Analysis and assign cluster labels to each data point.
-
Cluster Demographic Evaluation
- Analyze Cluster Demographics: Evaluate and interpret the demographic characteristics (e.g., age, gender, income) of each cluster.
-
Cluster Shopping Habits Evaluation
- Analyze Shopping Habits by Cluster: Examine the shopping behaviors (e.g., purchase frequency, product categories) within each cluster to draw insights about customer segments.
- Python
- Pandas
- NumPy
- scikit-learn
- Matplotlib
- Seaborn