Project

Sales Performance Analysis using SQL

A project to rank and categorize salespeople based on their sales performance. It utilises SQL's advanced features like Window Functions, Common Table Expressions and Aggregate Functions.

Empty image or helper icon

Sales Performance Analysis using SQL

Description

This project involves SQL to categorize salespeople into different ranks based on their performance. We start by extracting the total due which is total sales per salesperson and rank them using SQL window function. Then, the salespeople are categorized into 'Top 5' and 'The Rest' based on their rank. This data is then used to find out monthly sales for each category. The final output gives total monthly sales for each category (Top 5 and The Rest).

The original prompt:

Please explain this code in detail

WITH ranked_salespeople AS ( SELECT so.salespersonid, RANK() OVER (ORDER BY SUM(so.totaldue) DESC) AS rank, SUM(so.totaldue) AS total_sales FROM sales.salesorderheader so WHERE DATE_PART('year', so.orderdate) = 2013 GROUP BY so.salespersonid ), categorized AS ( SELECT rs.salespersonid, CASE WHEN rs.rank <= 5 THEN 'Top 5' ELSE 'The Rest' END AS category FROM ranked_salespeople rs ), monthly_sales AS ( SELECT c.category, DATE_TRUNC('month', so.orderdate) AS month, SUM(so.totaldue) AS total_sales FROM sales.salesorderheader so JOIN categorized c ON so.salespersonid = c.salespersonid WHERE DATE_PART('year', so.orderdate) = 2013 GROUP BY c.category, month ORDER BY month ) SELECT * FROM monthly_sales

Introduction

This write-up details a practical implementation of window functions, aggregate functions, and common table expressions (CTEs) in the SQL language. These powerful features of SQL will be used to build an application that ranks and categorizes salespeople based on their sales performance.

Window functions provide a way to apply functions to subsets of rows related to the current row - this is, a "window" of rows.

Aggregate functions compute a single output value given a set of input values. SQL includes many built-in aggregate functions, such as SUM, COUNT, AVG, MAX, and MIN.

CTEs or Common Table Expressions, offer the keys to enabling complex, multi-stage data processing tasks with SQL.

Database Setup

The first step is setting up our database. This will involve creating tables and inserting some mock data into these tables. For this example, we'll consider a Salesperson table and a Sales table.

CREATE TABLE Salesperson (
    id INT PRIMARY KEY,
    name TEXT NOT NULL
);

CREATE TABLE Sales (
    id INT PRIMARY KEY,
    salesperson_id INT,
    amount INT,
    sale_date DATE,
    FOREIGN KEY(salesperson_id) REFERENCES Salesperson(id)
);

The Salesperson table includes salesperson details like id and name, while the Sales table records every sale made, who made it, the amount of the sale, and the date it was made.

Next, insert some data:

INSERT INTO Salesperson (id, name) 
VALUES (1, 'John Doe'), (2, 'Jane Smith'), (3, 'Mary Johnson');

INSERT INTO Sales (id, salesperson_id, amount, sale_date) 
VALUES 
    (1, 1, 100, '2021-08-01'), (2, 1, 200, '2021-08-02'),
    (3, 2, 150, '2021-08-05'), (4, 2, 250, '2021-08-10'),
    (5, 3, 300, '2021-08-12'), (6, 3, 350, '2021-08-15');

Usage of Window Functions and Aggregates

To rank salespeople by their total sales:

SELECT
    S.name,
    SUM(amount) OVER (PARTITION BY S.id) AS total_sales,
    RANK() OVER (
        ORDER BY SUM(amount) OVER (PARTITION BY S.id) DESC
    ) as rank
FROM
    Salesperson AS S
JOIN
    Sales AS B
ON
    S.id = B.salesperson_id;

Here, the SUM(amount) OVER (PARTITION BY S.id) window function is computing the total sales for each salesperson. The RANK() window function is ranking these salespeople in order of their total sales.

Common Table Expressions

Technically, you can nest the above query to the desired depth, but after the first 2-3 layers, it becomes a bit unreadable. This is where CTEs come into play. They allow for creating temporary views on which we can build upon.

Here's how you do the same ranking task with CTE:

-- Define a CTE that calculates the total sales for each salesperson.
WITH total_sales AS (
    SELECT
        S.id,
        S.name,
        SUM(B.amount) as total_sales
    FROM
        Salesperson AS S
    JOIN
        Sales AS B
    ON
        S.id = B.salesperson_id
    GROUP BY S.id, S.name
)

-- Use the CTE to rank salespeople by their total sales.
SELECT
    name,
    total_sales,
    RANK () OVER (
        ORDER BY total_sales DESC
    ) as rank
FROM
    total_sales;

This SQL recipe can be used to rank and categorize salespeople based on their sales performance. The window function and aggregate function techniques here can be modified to fit a variety of data analysis requirements beyond just ranking sales performances.

Data Extraction and Manipulation with SQL

This section is dedicated to extracting and manipulating data by using SQL. We are going to work on a database containing sales data. The goal is to rank and categorize salespeople based on their sales performance.

Data Extraction

Assuming that we have a table named sales_data with columns salesperson_id, product_id, quantity, sales_date and total_price and another table named salesperson with columns salesperson_id, name and region.

First, let's retrieve all sales data for a particular salesperson:

SELECT * 
FROM sales_data 
WHERE salesperson_id = 1;

This will give us all the sales data for the salesperson with ID 1.

The total sales for each salesperson can be found using the aggregate function SUM:

SELECT salesperson_id, 
       SUM(total_price) AS total_sales 
FROM sales_data 
GROUP BY salesperson_id;

Data Manipulation

Common Table Expressions (CTE)

Now, we can use a Common Table Expression (CTE) to store and manipulate the above result.

WITH sales_totals AS (
    SELECT salesperson_id, 
           SUM(total_price) AS total_sales 
    FROM sales_data 
    GROUP BY salesperson_id
)
SELECT * FROM sales_totals;

In the query above, sales_totals is a CTE that stores the total sales for each salesperson. CTE can be referenced like a table in the FROM clause of a SQL query.

Joining the tables

Join the salesperson and sales_totals CTE to include salesperson information:

WITH sales_totals AS (
    SELECT salesperson_id, 
           SUM(total_price) AS total_sales 
    FROM sales_data 
    GROUP BY salesperson_id
)
SELECT s.salesperson_id, 
       s.name, 
       s.region, 
       t.total_sales 
FROM salesperson AS s 
JOIN sales_totals AS t 
ON s.salesperson_id = t.salesperson_id;

Now we have the salespersonā€™s information along with their total sales.

Ranking salespersons

Rank salesperson based on their total sales:

WITH sales_totals AS (
    SELECT salesperson_id, 
           SUM(total_price) AS total_sales 
    FROM sales_data 
    GROUP BY salesperson_id
), 
sales_info AS (
    SELECT s.salesperson_id, 
           s.name, 
           s.region, 
           t.total_sales 
    FROM salesperson AS s 
    JOIN sales_totals AS t 
    ON s.salesperson_id = t.salesperson_id
)
SELECT salesperson_id, 
       name, 
       region, 
       total_sales, 
       RANK() OVER (ORDER BY total_sales DESC) AS sales_rank 
FROM sales_info;

The RANK() function is used to assign a unique rank to each row, sorting them by total sales in descending order.

Summarizing, these queries allow you to extract the necessary information from your database and manipulate it to rank and categorize your salespeople based on their sales performance.

Practical Implementation of Common Table Expressions and Case Statements in SQL

Common Table Expressions (CTEs)

A Common Table Expression (CTE) is a temporary result set that can be referenced within another SELECT, INSERT, UPDATE, or DELETE statement. This allows you to divide complex queries into simple, non-repeated parts.

Here is an example implementation in the context of your project. Suppose you have a table sales with the following schema:

  • sale_id
  • salesperson_id
  • sale_amount

You want to get a report of each salesperson's total sales. You can do this using a CTE:

WITH total_sales AS (
  SELECT salesperson_id, SUM(sale_amount) as total_sales
  FROM sales
  GROUP BY salesperson_id
)

SELECT *
FROM total_sales;

In this code:

  1. We first define a CTE total_sales that groups the sales table by salesperson_id and calculates each salesperson's total sales.
  2. The resulting CTE is a table that contains every salesperson_id along with their corresponding total_sales.
  3. We then select all rows from the total_sales CTE.

Case Statements

Case statements are used to create different outputs (usually in the SELECT statement). It is SQL's way of handling if-then logic.

Assuming you have a table salespeople with the following schema:

  • salesperson_id
  • name

You want to categorize salespeople based on their total sales. You can use a case statement along with the previously defined CTE:

WITH total_sales AS (
  SELECT salesperson_id, SUM(sale_amount) as total_sales
  FROM sales
  GROUP BY salesperson_id
)

SELECT sp.salesperson_id, sp.name,
  CASE
    WHEN total_sales > 100000 THEN 'Top Performer'
    WHEN total_sales > 75000 THEN 'High Performer'
    WHEN total_sales > 50000 THEN 'Average Performer'
    ELSE 'Low Performer'
  END as category
FROM salespeople sp JOIN total_sales ts
ON sp.salesperson_id=ts.salesperson_id;

In this code:

  1. We first define the same CTE total_sales as explained above.
  2. We select salesperson_id and name from the salespeople table.
  3. We then define a new column category that categorizes salespeople based on their total sales.
    • If total sales are greater than 100,000, salesperson falls into 'Top Performer' category.
    • If total sales are between 75,001 and 100,000, salesperson falls into 'High Performer' category.
    • If total sales are between 50,001 and 75,000, salesperson falls into 'Average Performer' category.
    • If total sales are 50,000 or less, salesperson falls into 'Low Performer' category.
  4. We join the salespeople table with the total_sales CTE on salesperson_id to match each salesperson with their total sales amount.

This will give you a final table with each salesperson's ID, name, and their categorized performance.

Grouping Data and Deriving Insights with SQL

In this section, we will examine how you can group your data using SQL, allowing us to derive more insights like ranking and categorizing salespeople based on their performance.

=> GROUP BY Statement

The SQL GROUP BY statement enables us to group the data that share a property so that an aggregate function can be applied to each group. In our case, this will allow us to summarize and analyze the sales data based on individual salespeople.

Example:

SELECT SalesPersonID, COUNT(*) as NumberOfSales, SUM(Amount) as TotalSales
FROM Sales
GROUP BY SalesPersonID

This query would return a list of salespeople, alongside the total number of sales they each made, and the total amount for those sales.

=> HAVING Clause

The HAVING clause is used in combination with GROUP BY to filter groups based on a condition. It is similar to a WHERE clause, but operates on groups of rows rather than on individual rows.

Example:

SELECT SalesPersonID, COUNT(*) as NumberOfSales, SUM(Amount) as TotalSales
FROM Sales
GROUP BY SalesPersonID
HAVING COUNT(*) > 10 AND SUM(Amount) > 1000

This query would only list the salespeople who have made more than ten sales and those sales totalled more than 1000 in amount.

=> ORDER BY Clause

We can use ORDER BY to sort our results. This can be useful when trying to decide who the top performing salespeople are based on different metrics.

Example:

SELECT SalesPersonID, COUNT(*) as NumberOfSales, SUM(Amount) as TotalSales
FROM Sales
GROUP BY SalesPersonID
ORDER BY TotalSales DESC

This query lists salespeople by the total amount of sales they've made (from highest to lowest)

=> JOIN Statement

The JOIN statement lets us combine rows from two or more tables based on related columns between them, which allows us to pull in related information.

Example:

SELECT SP.Name, COUNT(*) as NumberOfSales, SUM(S.Amount) as TotalSales
FROM Sales S
JOIN SalesPeople SP on S.SalesPersonID = SP.SalesPersonID
GROUP BY SP.Name
ORDER BY TotalSales DESC

Here we join the Sales table with the SalesPeople table to display names alongside their sales numbers and total amounts.

By using these techniques, we can group sales data in various ways to derive insights about salespeople's performance.