🍽 Case Study #1 - Danny's Diner

🍽 Case Study #1 - Danny's Diner


Introduction

Danny seriously loves Japanese food so at the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favorite foods: sushi, curry, and ramen.

Danny's Diner needs your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from its operation but has no idea how to use the data to help it run the business.


Problem Statement

Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they've spent, and which menu items are their favorite.

He plans on using these insights to understand his customers better and help him deliver a better and more personalized experience for his loyal customers.


Dataset Overview

Three different datasets have been provided for this case study:

  • sales

  • menu

  • members

Sales
The sales table captures all customer_id level purchases with a corresponding order_date and product_id information for when and what menu items were ordered.

Menu
The menu table associates the product_id to the actual product_name and price of each menu item.

Members
The members table captures the join_date when a customer_id joins the Danny's Diner loyalty program.


Case Study Questions


Q1

What is the total amount each customer spent at the restaurant?
The goal is to find the total amount each customer has spent at the restaurant. To get this, we have to find the products bought by the customer from the sales table and link each product to their price on the menu table by joining using the product_id on each table, then getting the total amount by summing the price of all the products bought.

SELECT
  customer_id,
  SUM(price) amount_spent
FROM
  sales s
LEFT JOIN
  menu m
ON
  s.product_id = m.product_id
GROUP BY
  customer_id;

The above query would return two columns, the customer_id and amount_spent. A preview of the result can be seen below

The result shows that Customer A has spent the most amount of money and Customer C has spent the least amount of money so far.


Q2

How many days has each customer visited the restaurant?
The given task is to find the number of days each customer has visited the restaurant but since there's no data that explicitly shows when a customer visits the restaurant, I did a distinct count of the order_date column in the sales table as each customer would have to visit the restaurant to place an order.

SELECT
  customer_id,
  COUNT(DISTINCT order_date) days_visited
FROM
  sales
GROUP BY
  customer_id;

The result of the above query would contain 2 columns:
customer_id which is the unique identifier for each customer
days_visited which is the number of days each customer has visited the restaurant

From the above result, it can be noted that Customer B has visited the restaurant the most while Customer C has visited the restaurant the least.
Although Customer B has visited the restaurant more times than Customer A, it is interesting to see that Customer A has still spent more money than Customer B.


Q3

What was the first item from the menu purchased by each customer?
The objective is to find the first item purchased by each customer from the menu. To do that, I would be breaking it down into steps.

  • The first step is to use a cte called "cte" to select all the relevant columns from their respective tables which in this case would be the menu and sales table.

  • The second step is to use another cte called "cte2" to select the customer_id and product_name then rank them using the rank() window function according to the order_date for each customer.

  • The final step is to select the customer_id and product_name from cte2 with the condition that the rank is 1 before grouping by the customer_id and product_name and ordering by customer_id. The final query should look like the one below.

WITH cte AS
(
  SELECT
    product_name,
    s.product_id,
    customer_id,
    order_date
  FROM
    sales s
  LEFT JOIN
    menu m
  ON
    s.product_id = m.product_id
),

cte2 AS
(
  SELECT
    customer_id,
    product_name,
    RANK() OVER(PARTITION BY customer_id ORDER BY order_date) AS rank_
  FROM 
    cte
)

SELECT
  customer_id,
  product_name
FROM 
  cte2
WHERE
  rank_ = 1
GROUP BY
  customer_id,
  product_name
ORDER BY
  customer_id;

Like the objective of this question, the above query returns the customer_id and the name of the first product purchased from the menu.

The above result shows that although other customers purchased a single product the first time, Customer A purchased two products the first time.


Q4

What is the most purchased item on the menu and how many times was it purchased by all customers?
To find the most purchased item by all customers and how many times it has been purchased, I would:

  • Select the product_name and count of product_id then rank the products using the dense_rank() window function and the count(product_id) as the condition before storing it in a CTE called items.

  • Select the product_name and times_bought column from the items CTE then filter the results with the rank_ column. The final query should look like the one below.

WITH items AS
(
  SELECT
    product_name,
    COUNT(s.product_id) times_bought,
    DENSE_RANK() OVER(ORDER BY count(s.product_id) DESC) AS rank_
  FROM
    sales s
  LEFT JOIN
    menu m 
  ON
    s.product_id = m.product_id
  GROUP BY
    product_name
)
SELECT
  product_name,
  concat(times_bought, ' times') AS times_bought
FROM
  items
WHERE
  rank_ =1;

This query returns two columns product_name and times_bought but just one row of results.

The result above shows that the product Ramen has the highest number of purchases with 8 purchases from all customers.


Q5

Which item was the most popular for each customer?
To get the most popular item for each customer, I would be using the most bought item by each customer as this is the best way to gauge what their most popular item is.

  • Firstly, I would be creating a CTE called item_sales to select the product for each customer, how many times each product has been bought by using the count function, then using the rank() window function to rank the customer purchases, from the sales and menu table.

  • After the CTE has been created, I would select the customer_id and product_name then filter the results using the rank_ column before ordering the results using the customer_id column.

WITH items_sales AS
(
  SELECT
    customer_id,
    product_name,
    COUNT(s.product_id) times_bought,
    RANK() OVER(PARTITION BY customer_id ORDER BY COUNT(s.product_id) DESC) rank_
  FROM
    sales s
  LEFT JOIN
    menu m 
  ON
    s.product_id = m.product_id
  GROUP BY
    customer_id,
    product_name
)

SELECT
  customer_id,
  product_name as popular_item
FROM
  items_sales
WHERE
  rank_ = 1
ORDER BY
  customer_id;

The result of the above query would contain two columns namely:

  • customer_id: this is the unique identifier for each customer.

  • popular_item: this is the name of the popular item.

The preview of the result above shows that Customer A and Customer C both enjoy ramen while Customer B enjoys all the items on the menu.


Q6

Which item was purchased first by the customer after they became a member?
The approach for this is pretty easy but might get complex as I would be joining three tables based on different conditions to get the desired output.

  • Join the menu table and the sales table on the product_id column on both tables.

  • Perform a right join between the menu and members table on the customer_id column including a condition that join_date on the members table is greater than or equal to the order_date on the sales table.
    ~~ Using the right join is to exclude the record of sales from non-members and members before they became members.
    ~~ Adding the condition "
    <sup>b.join_date &gt;= s.order_date</sup>" when joining, is so the output only includes purchases from customers after they became members.

  • Using the combination of ROW_NUMBER() and PARTITION BY window function to divide and order the purchases of the customers based on the number of days they've been members for "DATEDIFF(DAY, order_date, join_date)"

  • In the outer query, I would select the customer_id and product_id from the CTE and filter the results using the "purchase_no = 1" column to show the first purchases.

WITH cte AS
  (
    SELECT
      b.customer_id,
      m.product_name,
      ROW_NUMBER() OVER(PARTITION BY b.customer_id ORDER BY DATEDIFF(DAY, order_date, join_date)) AS purchase_no
    FROM
      menu m
    LEFT JOIN
      sales s
    ON
      m.product_id = s.product_id
    RIGHT JOIN
      members b
    ON
      b.customer_id = s.customer_id
    AND
      b.join_date >= s.order_date
  )

SELECT
  customer_id,
  product_name
FROM
  cte
WHERE
 purchase_no = 1;

The result of the above query would contain two columns namely:

  • customer_id: this is the unique identifier for each customer.

  • product_name: this is the name of the item first purchased after becoming a member.

The first item Customer A and Customer B purchased after becoming members are curry and sushi respectively.


Q7

Which item was purchased just before the customer became a member?
The objective of this question is very similar to the previous question, so I would be following the same approach.

  • I would create a CTE to select the relevant columns

  • I would join the menu table and the sales table using the product_id column on both tables.

  • I would perform a right join between the menu and members table using the customer_id column including a condition that join_date on the members table is less than the order_date on the sales table.
    ~~ Using the right join is to exclude the record of sales from non-members and customers after they became members.
    ~~ Adding the condition "
    <sup>b.join_date &lt; s.order_date</sup>" when joining, is so the output only includes purchases from customers before they became members.

  • Using the combination of ROW_NUMBER() and PARTITION BY window function to divide and order the purchases of the customers based on the number of days they've been members for "DATEDIFF(DAY, order_date, join_date) DESC"

  • In the outer query, I would select the customer_id and product_id from the CTE and filter the results using the "purchase_pos = 1" column to show the last purchase before membership.

WITH cte AS
  (
    SELECT
      b.customer_id,
      m.product_name,
      DATEDIFF(DAY, order_date, join_date) ad,
      ROW_NUMBER() OVER(PARTITION BY b.customer_id ORDER BY DATEDIFF(DAY, order_date, join_date) DESC) AS purchase_pos
    FROM
      menu m
    LEFT JOIN
      sales s
    ON
      m.product_id = s.product_id
    RIGHT JOIN
      members b
    ON
      b.customer_id = s.customer_id
    AND
      b.join_date < s.order_date
  )

SELECT
  customer_id,
  product_name
FROM
  cte
WHERE
  purchase_pos = 1;

The result of the above query would contain two columns namely:

  • customer_id: this is the unique identifier for each customer.

  • product_name: this is the name of the item purchased just before becoming a member.

The last item Customer A and Customer B purchased just before becoming members are ramen and sushi respectively.


Q8

What is the total items and amount spent for each member before they became a member?
I am asked to find the total items each member bought and the amount spent before becoming a member.

  • I would be utilizing a CTE to select the relevant columns from their specific table but these tables have to be joined first.

  • I would join the menu table and the sales table using the product_id column on both tables then I would perform a right join between the menu and members table using the customer_id column including a condition that join_date on the members table is less than the order_date on the sales table.
    ~~ Using the right join is to exclude the record of sales from non-members and customers after they became members.
    ~~ Adding the condition "
    <sup>b.join_date &lt; s.order_date</sup>" when joining, is so the output only includes purchases from customers before they became members.

  • Moving to the outer query, I would select the customer_id, the total items bought using the COUNT() function on the product_id column, the total amount spent using the SUM() function on the price column, from the CTE before grouping by the customer_id column.

WITH cte AS
(
  SELECT
    b.customer_id,
    s.product_id,
    m.price
  FROM
    menu m
  LEFT JOIN
    sales s
  ON
    m.product_id = s.product_id
  RIGHT JOIN
    members b
  ON
    b.customer_id = s.customer_id
  AND
    b.join_date < s.order_date
)

SELECT
  customer_id,
  COUNT(product_id) item_count,
  SUM(price) total_spent
FROM
  cte
GROUP BY
  customer_id;

The result of the above query would contain three columns namely:

  • customer_id: this is the unique identifier for each customer.

  • item_count: this is the number of items purchased by the member before membership.

  • total_spent: this is the total amount (in $) each member spent before becoming a member.

Customer A has purchased 3 items and has spent a total of $36 while Customer B has also purchased 3 items but has spent a total of $34.


Q9

If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
To solve this, I would:

  • Create a CTE to select the product_id column, then use a CASE WHEN statement to convert each dollar to points based on the given conditions, from the menu table.

  • In the outer query, I would:

    • join the CTE to the sales table using the product_id column on both tables.

    • select the customer_id column, then use the SUM() function to get the total points.

    • apply the customer_id column in the group by statement.

Following the steps stated above, the final query should look like the one below.

WITH points_cte AS
(
  SELECT
    product_id,
    CASE 
      WHEN product_name = 'sushi' THEN 20 * price
      ELSE 10 * price
    END AS points
  FROM
    menu 
)

SELECT
  customer_id,
  SUM(points) as total_points
FROM
  sales s
LEFT JOIN
  points_cte c
ON
  s.product_id = c.product_id
GROUP BY
  customer_id;

The result of the above query would contain three columns namely:

  • customer_id: this is the unique identifier for each customer.

  • total_points: this is the total number of points each customer has.

Customer A has a total of 860 points while Customer B has the highest number of points with 940 points which shows that Customer B has probably purchased Sushi more than the remaining customers, then Customer C has a total of 360 points.


Q10

In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
Similar to the last question, I am to assign points to customers based on specific conditions.

  • I would be utilizing a CTE to select the relevant columns which include m_duration column, gotten by using the DATEDIFF() function to find the difference between the join_date and the order_date columns, from their specific table but these tables have to be joined first.

  • I would join the menu table and the sales table using the product_id column on both tables then I would perform a right join between the menu and members table using the customer_id column including a condition that join_date on the members table is less than the order_date on the sales table.
    ~~ Using the right join is to exclude the record of sales from non-members and customers before they became members.
    ~~ Adding the condition "
    <sup>b.join_date &lt; s.order_date</sup>" when joining, is so the output only includes purchases from customers after they became members.

  • I would create another CTE to select the customer_id column, use a CASE WHEN statement to convert the price to points based on the given conditions, from the previous CTE then add a condition using the WHERE statement to only show orders in January.

  • In the outer query, I would select the customer_id column and use the SUM() function to get the total points, from the second CTE before grouping by the customer_id column.

WITH cte AS
(
  SELECT
    b.customer_id,
    s.product_id,
    order_date,
    join_date,
    DATEDIFF(DAY, join_date, order_date) AS m_duration,
    price
  FROM
    menu m
  LEFT JOIN
    sales s
  ON
    m.product_id = s.product_id
  RIGHT JOIN
    members b
  ON
    b.customer_id = s.customer_id
  AND
    b.join_date < s.order_date

),

cte2 AS 
(
  SELECT
    customer_id,
      CASE 
      WHEN m_duration <= 7 THEN price * 20
      ELSE price * 10
    END AS points 
  FROM 
    cte
  WHERE 
    DATEPART(MONTH, order_date) = 1
)

SELECT
  customer_id,
  SUM(points) as january_points
FROM
  cte2
GROUP BY
  customer_id

The result of the above query would contain three columns namely:

  • customer_id: this is the unique identifier for each customer.

  • january_points: this is the total number of points each customer has at the end of January.

The result above shows that at the end of January, Customer A had 720 points while Customer B had a total of 440 points.