Analytical SQL Documentation
Analytical SQL Documentation
2023
1
Q1: The Query
Step 1:
Exploring the dataset by getting the number of distinct invoices, products, customers, and places
as(
select
(select distinct count(invoiceno) from Online_Retail where invoiceno not like 'C%') as
num_sold_invoice,
from Online_Retail
select
to_char(num_sold_invoice,'fm999G999G999') as num_sold_invoice,
from ex_table
2
Step 2:
To count for each customer the number of invoices they did and order them by the count from the max
to min.
as (
from Online_Retail
from c_table
from Online_Retail
group by CustomerID
3
Step 3:
To rank the country according to number of customers, income they record and the number of invoices
they produce and select the most ordered product and most profitable product.
as (
from Online_Retail
fv as most_ordered_product,
from c_table
4
Step 4:
I tried to find out the most profitable month out of the year and its percent rank.
from Online_Retail
select *,
round (cast(percent_rank ()over (partition by year order by profit desc )*100 as decimal) ,2) as
profit_percent_rank
from c_table
5
Step 5:
To get the top 10 returned product
as(
from Online_Retail
group by description
select *
from (
from t_table) t
6
Q1: The Story
Our business is an Online retail business which mean that customers have a plethora of options
in searching for, selecting, and purchasing products, information, and services over the
internet.
The business transactions have been recorded over many years and countries, to get an
overview of how the business is going. I thought of analyzing the dataset using SQL.
1. Firstly, I wanted to explore the ongoing business process by answering the following
questions:
- How many invoices did the business did? and how many of them have really
been sold and how many cancelled? and each percentage out of total?
- How many products does the business sell?
- How many customers does the business serve?
- How many counties does the business cover?
to answer these questions, I wrote the query that in step 1, and I found that the
cancelled invoices represent 2% of the total invoices which is a small percentage which
may indicate that our customers are satisfied by the products quality.
2. So, I thought to know the business repeated customers more, so I run the query that is
in step 2, in this query I ordered the customers according to the number of invoices that
they get. When the number of invoices increases this may indicate that this is a frequent
customer who visits our store regularly. I noticed that the customer with the most
number. of invoices doesn't have and id but I decided to keep it in the data
representation to inform the business about this issue.
3. I wrote the query in step 3 cause, I had been inquiring what is the most profitable
country to our business so we can maintain our spread there, and what's the least
profitable countries, in order to give them more attention and try to schedule more
marketing campaigns. I also wanted to see if the most ordered product does mean that
it’s the most profitable product or not.
in some cases, it was the same in other it wasn't, in cases that is not I recommend that
the business try to turn the most profitable product into most ordered too.
4. Later I was wondering if there is a trend in the salles over the last years so run the query
that is in step 4, and I found that the sales may increase in the q2 and q4.
5. Finally, I thought in taking a step forward and try to know the top 10 returned products
so the business may get red off it, and I calculated how much loss they did to the
business by running the query in step 5.
Note: You can navigate by clicking on the step.
7
Q2: Customer Segmentation
as (
select customerid,
count(invoiceno) as frequency,
from Online_Retail
group by customerid
select *,
from collection_tabel
--- joining tables, rounding monetary and frequency and monetary AVG score
8
,ca.R_S as r_score , ((ca.F_S+ ca.M_S)/2) as fm_score
---customer segmentation
select * ,
THEN 'Champions'
THEN 'Promising'
9
WHEN (r_score = 1 AND fm_score= 5) or (r_score = 1 AND fm_score= 4)
THEN 'Hibernating'
THEN 'Lost'
END) as cust_segment
from t_table
10