Introduction
Danny seriously loves Japanese food so at the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favorite foods: sushi, curry, and ramen.
Danny's Diner needs your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from its operation but has no idea how to use the data to help it run the business.
Problem Statement
Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they've spent, and which menu items are their favorite.
He plans on using these insights to understand his customers better and help him deliver a better and more personalized experience for his loyal customers.
Dataset Overview
Three different datasets have been provided for this case study:
sales
menu
members
Sales
The sales table captures all customer_id
level purchases with a corresponding order_date
and product_id
information for when and what menu items were ordered.
Menu
The menu table associates the product_id
to the actual product_name
and price
of each menu item.
Members
The members table captures the join_date
when a customer_id
joins the Danny's Diner loyalty program.
Case Study Questions
Q1
What is the total amount each customer spent at the restaurant?
The goal is to find the total amount each customer has spent at the restaurant. To get this, we have to find the products bought by the customer from the sales table and link each product to their price on the menu table by joining using the product_id on each table, then getting the total amount by summing the price of all the products bought.
SELECT
customer_id,
SUM(price) amount_spent
FROM
sales s
LEFT JOIN
menu m
ON
s.product_id = m.product_id
GROUP BY
customer_id;
The above query would return two columns, the customer_id
and amount_spent
. A preview of the result can be seen below
The result shows that Customer A
has spent the most amount of money and Customer C
has spent the least amount of money so far.
Q2
How many days has each customer visited the restaurant?
The given task is to find the number of days each customer has visited the restaurant but since there's no data that explicitly shows when a customer visits the restaurant, I did a distinct count of the order_date
column in the sales table as each customer would have to visit the restaurant to place an order.
SELECT
customer_id,
COUNT(DISTINCT order_date) days_visited
FROM
sales
GROUP BY
customer_id;
The result of the above query would contain 2 columns:customer_id
which is the unique identifier for each customerdays_visited
which is the number of days each customer has visited the restaurant
From the above result, it can be noted that Customer B
has visited the restaurant the most while Customer C
has visited the restaurant the least.
Although Customer B
has visited the restaurant more times than Customer A
, it is interesting to see that Customer A
has still spent more money than Customer B
.
Q3
What was the first item from the menu purchased by each customer?
The objective is to find the first item purchased by each customer from the menu. To do that, I would be breaking it down into steps.
The first step is to use a
cte
called "cte" to select all the relevant columns from their respective tables which in this case would be the menu and sales table.The second step is to use another
cte
called "cte2" to select thecustomer_id
andproduct_name
then rank them using therank()
window function according to theorder_date
for each customer.The final step is to select the
customer_id
andproduct_name
from cte2 with the condition that the rank is 1 before grouping by thecustomer_id
andproduct_name
and ordering bycustomer_id
. The final query should look like the one below.
WITH cte AS
(
SELECT
product_name,
s.product_id,
customer_id,
order_date
FROM
sales s
LEFT JOIN
menu m
ON
s.product_id = m.product_id
),
cte2 AS
(
SELECT
customer_id,
product_name,
RANK() OVER(PARTITION BY customer_id ORDER BY order_date) AS rank_
FROM
cte
)
SELECT
customer_id,
product_name
FROM
cte2
WHERE
rank_ = 1
GROUP BY
customer_id,
product_name
ORDER BY
customer_id;
Like the objective of this question, the above query returns the customer_id
and the name of the first product purchased from the menu.
The above result shows that although other customers purchased a single product the first time, Customer A
purchased two products the first time.
Q4
What is the most purchased item on the menu and how many times was it purchased by all customers?
To find the most purchased item by all customers and how many times it has been purchased, I would:
Select the
product_name
and count ofproduct_id
then rank the products using thedense_rank()
window function and thecount(product_id)
as the condition before storing it in aCTE
called items.Select the
product_name
andtimes_bought
column from the items CTE then filter the results with therank_
column. The final query should look like the one below.
WITH items AS
(
SELECT
product_name,
COUNT(s.product_id) times_bought,
DENSE_RANK() OVER(ORDER BY count(s.product_id) DESC) AS rank_
FROM
sales s
LEFT JOIN
menu m
ON
s.product_id = m.product_id
GROUP BY
product_name
)
SELECT
product_name,
concat(times_bought, ' times') AS times_bought
FROM
items
WHERE
rank_ =1;
This query returns two columns product_name
and times_bought
but just one row of results.
The result above shows that the product Ramen
has the highest number of purchases with 8
purchases from all customers.
Q5
Which item was the most popular for each customer?
To get the most popular item for each customer, I would be using the most bought item by each customer as this is the best way to gauge what their most popular item is.
Firstly, I would be creating a CTE called item_sales to select the product for each customer, how many times each product has been bought by using the
count
function, then using therank()
window function to rank the customer purchases, from the sales and menu table.After the CTE has been created, I would select the
customer_id
andproduct_name
then filter the results using therank_
column before ordering the results using thecustomer_id
column.
WITH items_sales AS
(
SELECT
customer_id,
product_name,
COUNT(s.product_id) times_bought,
RANK() OVER(PARTITION BY customer_id ORDER BY COUNT(s.product_id) DESC) rank_
FROM
sales s
LEFT JOIN
menu m
ON
s.product_id = m.product_id
GROUP BY
customer_id,
product_name
)
SELECT
customer_id,
product_name as popular_item
FROM
items_sales
WHERE
rank_ = 1
ORDER BY
customer_id;
The result of the above query would contain two columns namely:
customer_id
: this is the unique identifier for each customer.popular_item
: this is the name of the popular item.
The preview of the result above shows that Customer A
and Customer C
both enjoy ramen while Customer B
enjoys all the items on the menu.
Q6
Which item was purchased first by the customer after they became a member?
The approach for this is pretty easy but might get complex as I would be joining three tables based on different conditions to get the desired output.
Join the
menu
table and thesales
table on theproduct_id
column on both tables.Perform a right join between the menu and members table on the
customer_id
column including a condition thatjoin_date
on the members table is greater than or equal to theorder_date
on the sales table.
~~ Using the right join is to exclude the record of sales from non-members and members before they became members.
~~ Adding the condition "<sup>b.join_date >= s.order_date</sup>
" when joining, is so the output only includes purchases from customers after they became members.Using the combination of
ROW_NUMBER()
andPARTITION BY
window function to divide and order the purchases of the customers based on the number of days they've been members for "DATEDIFF(DAY, order_date, join_date)
"In the outer query, I would select the
customer_id
andproduct_id
from the CTE and filter the results using the "purchase_no = 1
" column to show the first purchases.
WITH cte AS
(
SELECT
b.customer_id,
m.product_name,
ROW_NUMBER() OVER(PARTITION BY b.customer_id ORDER BY DATEDIFF(DAY, order_date, join_date)) AS purchase_no
FROM
menu m
LEFT JOIN
sales s
ON
m.product_id = s.product_id
RIGHT JOIN
members b
ON
b.customer_id = s.customer_id
AND
b.join_date >= s.order_date
)
SELECT
customer_id,
product_name
FROM
cte
WHERE
purchase_no = 1;
The result of the above query would contain two columns namely:
customer_id
: this is the unique identifier for each customer.product_name
: this is the name of the item first purchased after becoming a member.
The first item Customer A
and Customer B
purchased after becoming members are curry and sushi respectively.
Q7
Which item was purchased just before the customer became a member?
The objective of this question is very similar to the previous question, so I would be following the same approach.
I would create a
CTE
to select the relevant columnsI would join the
menu
table and thesales
table using theproduct_id
column on both tables.I would perform a right join between the menu and members table using the
customer_id
column including a condition thatjoin_date
on the members table is less than theorder_date
on the sales table.
~~ Using the right join is to exclude the record of sales from non-members and customers after they became members.
~~ Adding the condition "<sup>b.join_date < s.order_date</sup>
" when joining, is so the output only includes purchases from customers before they became members.Using the combination of
ROW_NUMBER()
andPARTITION BY
window function to divide and order the purchases of the customers based on the number of days they've been members for "DATEDIFF(DAY, order_date, join_date) DESC
"In the outer query, I would select the
customer_id
andproduct_id
from the CTE and filter the results using the "purchase_pos = 1
" column to show the last purchase before membership.
WITH cte AS
(
SELECT
b.customer_id,
m.product_name,
DATEDIFF(DAY, order_date, join_date) ad,
ROW_NUMBER() OVER(PARTITION BY b.customer_id ORDER BY DATEDIFF(DAY, order_date, join_date) DESC) AS purchase_pos
FROM
menu m
LEFT JOIN
sales s
ON
m.product_id = s.product_id
RIGHT JOIN
members b
ON
b.customer_id = s.customer_id
AND
b.join_date < s.order_date
)
SELECT
customer_id,
product_name
FROM
cte
WHERE
purchase_pos = 1;
The result of the above query would contain two columns namely:
customer_id
: this is the unique identifier for each customer.product_name
: this is the name of the item purchased just before becoming a member.
The last item Customer A
and Customer B
purchased just before becoming members are ramen and sushi respectively.
Q8
What is the total items and amount spent for each member before they became a member?
I am asked to find the total items each member bought and the amount spent before becoming a member.
I would be utilizing a CTE to select the relevant columns from their specific table but these tables have to be joined first.
I would join the
menu
table and thesales
table using theproduct_id
column on both tables then I would perform a right join between the menu and members table using thecustomer_id
column including a condition thatjoin_date
on the members table is less than theorder_date
on the sales table.
~~ Using the right join is to exclude the record of sales from non-members and customers after they became members.
~~ Adding the condition "<sup>b.join_date < s.order_date</sup>
" when joining, is so the output only includes purchases from customers before they became members.Moving to the outer query, I would select the
customer_id
, the total items bought using theCOUNT()
function on theproduct_id
column, the total amount spent using theSUM()
function on theprice
column, from the CTE before grouping by the customer_id column.
WITH cte AS
(
SELECT
b.customer_id,
s.product_id,
m.price
FROM
menu m
LEFT JOIN
sales s
ON
m.product_id = s.product_id
RIGHT JOIN
members b
ON
b.customer_id = s.customer_id
AND
b.join_date < s.order_date
)
SELECT
customer_id,
COUNT(product_id) item_count,
SUM(price) total_spent
FROM
cte
GROUP BY
customer_id;
The result of the above query would contain three columns namely:
customer_id
: this is the unique identifier for each customer.item_count
: this is the number of items purchased by the member before membership.total_spent
: this is the total amount (in $) each member spent before becoming a member.
Customer A
has purchased 3 items and has spent a total of $36 while Customer B
has also purchased 3 items but has spent a total of $34.
Q9
If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
To solve this, I would:
Create a CTE to select the
product_id
column, then use a CASE WHEN statement to convert each dollar to points based on the given conditions, from the menu table.In the outer query, I would:
join the CTE to the sales table using the
product_id
column on both tables.select the
customer_id
column, then use theSUM()
function to get the total points.apply the
customer_id
column in the group by statement.
Following the steps stated above, the final query should look like the one below.
WITH points_cte AS
(
SELECT
product_id,
CASE
WHEN product_name = 'sushi' THEN 20 * price
ELSE 10 * price
END AS points
FROM
menu
)
SELECT
customer_id,
SUM(points) as total_points
FROM
sales s
LEFT JOIN
points_cte c
ON
s.product_id = c.product_id
GROUP BY
customer_id;
The result of the above query would contain three columns namely:
customer_id
: this is the unique identifier for each customer.total_points
: this is the total number of points each customer has.
Customer A
has a total of 860 points while Customer B
has the highest number of points with 940 points which shows that Customer B has probably purchased Sushi more than the remaining customers, then Customer C
has a total of 360 points.
Q10
In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
Similar to the last question, I am to assign points to customers based on specific conditions.
I would be utilizing a CTE to select the relevant columns which include
m_duration
column, gotten by using the DATEDIFF() function to find the difference between thejoin_date
and theorder_date
columns, from their specific table but these tables have to be joined first.I would join the
menu
table and thesales
table using theproduct_id
column on both tables then I would perform a right join between the menu and members table using thecustomer_id
column including a condition thatjoin_date
on the members table is less than theorder_date
on the sales table.
~~ Using the right join is to exclude the record of sales from non-members and customers before they became members.
~~ Adding the condition "<sup>b.join_date < s.order_date</sup>
" when joining, is so the output only includes purchases from customers after they became members.I would create another CTE to select the
customer_id
column, use a CASE WHEN statement to convert the price to points based on the given conditions, from the previous CTE then add a condition using the WHERE statement to only show orders in January.In the outer query, I would select the
customer_id
column and use the SUM() function to get the total points, from the second CTE before grouping by thecustomer_id
column.
WITH cte AS
(
SELECT
b.customer_id,
s.product_id,
order_date,
join_date,
DATEDIFF(DAY, join_date, order_date) AS m_duration,
price
FROM
menu m
LEFT JOIN
sales s
ON
m.product_id = s.product_id
RIGHT JOIN
members b
ON
b.customer_id = s.customer_id
AND
b.join_date < s.order_date
),
cte2 AS
(
SELECT
customer_id,
CASE
WHEN m_duration <= 7 THEN price * 20
ELSE price * 10
END AS points
FROM
cte
WHERE
DATEPART(MONTH, order_date) = 1
)
SELECT
customer_id,
SUM(points) as january_points
FROM
cte2
GROUP BY
customer_id
The result of the above query would contain three columns namely:
customer_id
: this is the unique identifier for each customer.january_points
: this is the total number of points each customer has at the end of January.
The result above shows that at the end of January, Customer A
had 720 points while Customer B
had a total of 440 points.