Sales Performance Analysis using SQL
Description
This project involves SQL to categorize salespeople into different ranks based on their performance. We start by extracting the total due which is total sales per salesperson and rank them using SQL window function. Then, the salespeople are categorized into 'Top 5' and 'The Rest' based on their rank. This data is then used to find out monthly sales for each category. The final output gives total monthly sales for each category (Top 5 and The Rest).
The original prompt:
Please explain this code in detail
WITH ranked_salespeople AS ( SELECT so.salespersonid, RANK() OVER (ORDER BY SUM(so.totaldue) DESC) AS rank, SUM(so.totaldue) AS total_sales FROM sales.salesorderheader so WHERE DATE_PART('year', so.orderdate) = 2013 GROUP BY so.salespersonid ), categorized AS ( SELECT rs.salespersonid, CASE WHEN rs.rank <= 5 THEN 'Top 5' ELSE 'The Rest' END AS category FROM ranked_salespeople rs ), monthly_sales AS ( SELECT c.category, DATE_TRUNC('month', so.orderdate) AS month, SUM(so.totaldue) AS total_sales FROM sales.salesorderheader so JOIN categorized c ON so.salespersonid = c.salespersonid WHERE DATE_PART('year', so.orderdate) = 2013 GROUP BY c.category, month ORDER BY month ) SELECT * FROM monthly_sales
Introduction
This write-up details a practical implementation of window functions, aggregate functions, and common table expressions (CTEs) in the SQL language. These powerful features of SQL will be used to build an application that ranks and categorizes salespeople based on their sales performance.
Window functions provide a way to apply functions to subsets of rows related to the current row - this is, a "window" of rows.
Aggregate functions compute a single output value given a set of input values. SQL includes many built-in aggregate functions, such as SUM
, COUNT
, AVG
, MAX
, and MIN
.
CTEs or Common Table Expressions, offer the keys to enabling complex, multi-stage data processing tasks with SQL.
Database Setup
The first step is setting up our database. This will involve creating tables and inserting some mock data into these tables. For this example, we'll consider a Salesperson table and a Sales table.
CREATE TABLE Salesperson (
id INT PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE Sales (
id INT PRIMARY KEY,
salesperson_id INT,
amount INT,
sale_date DATE,
FOREIGN KEY(salesperson_id) REFERENCES Salesperson(id)
);
The Salesperson
table includes salesperson details like id and name, while the Sales
table records every sale made, who made it, the amount of the sale, and the date it was made.
Next, insert some data:
INSERT INTO Salesperson (id, name)
VALUES (1, 'John Doe'), (2, 'Jane Smith'), (3, 'Mary Johnson');
INSERT INTO Sales (id, salesperson_id, amount, sale_date)
VALUES
(1, 1, 100, '2021-08-01'), (2, 1, 200, '2021-08-02'),
(3, 2, 150, '2021-08-05'), (4, 2, 250, '2021-08-10'),
(5, 3, 300, '2021-08-12'), (6, 3, 350, '2021-08-15');
Usage of Window Functions and Aggregates
To rank salespeople by their total sales:
SELECT
S.name,
SUM(amount) OVER (PARTITION BY S.id) AS total_sales,
RANK() OVER (
ORDER BY SUM(amount) OVER (PARTITION BY S.id) DESC
) as rank
FROM
Salesperson AS S
JOIN
Sales AS B
ON
S.id = B.salesperson_id;
Here, the SUM(amount) OVER (PARTITION BY S.id)
window function is computing the total sales for each salesperson. The RANK()
window function is ranking these salespeople in order of their total sales.
Common Table Expressions
Technically, you can nest the above query to the desired depth, but after the first 2-3 layers, it becomes a bit unreadable. This is where CTEs come into play. They allow for creating temporary views on which we can build upon.
Here's how you do the same ranking task with CTE:
-- Define a CTE that calculates the total sales for each salesperson.
WITH total_sales AS (
SELECT
S.id,
S.name,
SUM(B.amount) as total_sales
FROM
Salesperson AS S
JOIN
Sales AS B
ON
S.id = B.salesperson_id
GROUP BY S.id, S.name
)
-- Use the CTE to rank salespeople by their total sales.
SELECT
name,
total_sales,
RANK () OVER (
ORDER BY total_sales DESC
) as rank
FROM
total_sales;
This SQL recipe can be used to rank and categorize salespeople based on their sales performance. The window function and aggregate function techniques here can be modified to fit a variety of data analysis requirements beyond just ranking sales performances.
Data Extraction and Manipulation with SQL
This section is dedicated to extracting and manipulating data by using SQL. We are going to work on a database containing sales data. The goal is to rank and categorize salespeople based on their sales performance.
Data Extraction
Assuming that we have a table named sales_data
with columns salesperson_id
, product_id
, quantity
, sales_date
and total_price
and another table named salesperson
with columns salesperson_id
, name
and region
.
First, let's retrieve all sales data for a particular salesperson:
SELECT *
FROM sales_data
WHERE salesperson_id = 1;
This will give us all the sales data for the salesperson with ID 1.
The total sales for each salesperson can be found using the aggregate function SUM:
SELECT salesperson_id,
SUM(total_price) AS total_sales
FROM sales_data
GROUP BY salesperson_id;
Data Manipulation
Common Table Expressions (CTE)
Now, we can use a Common Table Expression (CTE) to store and manipulate the above result.
WITH sales_totals AS (
SELECT salesperson_id,
SUM(total_price) AS total_sales
FROM sales_data
GROUP BY salesperson_id
)
SELECT * FROM sales_totals;
In the query above, sales_totals
is a CTE that stores the total sales for each salesperson. CTE can be referenced like a table in the FROM
clause of a SQL query.
Joining the tables
Join the salesperson
and sales_totals
CTE to include salesperson information:
WITH sales_totals AS (
SELECT salesperson_id,
SUM(total_price) AS total_sales
FROM sales_data
GROUP BY salesperson_id
)
SELECT s.salesperson_id,
s.name,
s.region,
t.total_sales
FROM salesperson AS s
JOIN sales_totals AS t
ON s.salesperson_id = t.salesperson_id;
Now we have the salesperson’s information along with their total sales.
Ranking salespersons
Rank salesperson based on their total sales:
WITH sales_totals AS (
SELECT salesperson_id,
SUM(total_price) AS total_sales
FROM sales_data
GROUP BY salesperson_id
),
sales_info AS (
SELECT s.salesperson_id,
s.name,
s.region,
t.total_sales
FROM salesperson AS s
JOIN sales_totals AS t
ON s.salesperson_id = t.salesperson_id
)
SELECT salesperson_id,
name,
region,
total_sales,
RANK() OVER (ORDER BY total_sales DESC) AS sales_rank
FROM sales_info;
The RANK()
function is used to assign a unique rank to each row, sorting them by total sales in descending order.
Summarizing, these queries allow you to extract the necessary information from your database and manipulate it to rank and categorize your salespeople based on their sales performance.
Practical Implementation of Common Table Expressions and Case Statements in SQL
Common Table Expressions (CTEs)
A Common Table Expression (CTE) is a temporary result set that can be referenced within another SELECT, INSERT, UPDATE, or DELETE statement. This allows you to divide complex queries into simple, non-repeated parts.
Here is an example implementation in the context of your project. Suppose you have a table sales
with the following schema:
- sale_id
- salesperson_id
- sale_amount
You want to get a report of each salesperson's total sales. You can do this using a CTE:
WITH total_sales AS (
SELECT salesperson_id, SUM(sale_amount) as total_sales
FROM sales
GROUP BY salesperson_id
)
SELECT *
FROM total_sales;
In this code:
- We first define a CTE
total_sales
that groups thesales
table bysalesperson_id
and calculates each salesperson's total sales. - The resulting CTE is a table that contains every
salesperson_id
along with their correspondingtotal_sales
. - We then select all rows from the
total_sales
CTE.
Case Statements
Case statements are used to create different outputs (usually in the SELECT
statement). It is SQL's way of handling if-then logic.
Assuming you have a table salespeople
with the following schema:
- salesperson_id
- name
You want to categorize salespeople based on their total sales. You can use a case statement along with the previously defined CTE:
WITH total_sales AS (
SELECT salesperson_id, SUM(sale_amount) as total_sales
FROM sales
GROUP BY salesperson_id
)
SELECT sp.salesperson_id, sp.name,
CASE
WHEN total_sales > 100000 THEN 'Top Performer'
WHEN total_sales > 75000 THEN 'High Performer'
WHEN total_sales > 50000 THEN 'Average Performer'
ELSE 'Low Performer'
END as category
FROM salespeople sp JOIN total_sales ts
ON sp.salesperson_id=ts.salesperson_id;
In this code:
- We first define the same CTE
total_sales
as explained above. - We select
salesperson_id
andname
from thesalespeople
table. - We then define a new column
category
that categorizes salespeople based on their total sales.- If total sales are greater than 100,000, salesperson falls into 'Top Performer' category.
- If total sales are between 75,001 and 100,000, salesperson falls into 'High Performer' category.
- If total sales are between 50,001 and 75,000, salesperson falls into 'Average Performer' category.
- If total sales are 50,000 or less, salesperson falls into 'Low Performer' category.
- We join the
salespeople
table with thetotal_sales
CTE onsalesperson_id
to match each salesperson with their total sales amount.
This will give you a final table with each salesperson's ID, name, and their categorized performance.
Grouping Data and Deriving Insights with SQL
In this section, we will examine how you can group your data using SQL, allowing us to derive more insights like ranking and categorizing salespeople based on their performance.
=> GROUP BY Statement
The SQL GROUP BY
statement enables us to group the data that share a property so that an aggregate function can be applied to each group. In our case, this will allow us to summarize and analyze the sales data based on individual salespeople.
Example:
SELECT SalesPersonID, COUNT(*) as NumberOfSales, SUM(Amount) as TotalSales
FROM Sales
GROUP BY SalesPersonID
This query would return a list of salespeople, alongside the total number of sales they each made, and the total amount for those sales.
=> HAVING Clause
The HAVING
clause is used in combination with GROUP BY
to filter groups based on a condition. It is similar to a WHERE clause, but operates on groups of rows rather than on individual rows.
Example:
SELECT SalesPersonID, COUNT(*) as NumberOfSales, SUM(Amount) as TotalSales
FROM Sales
GROUP BY SalesPersonID
HAVING COUNT(*) > 10 AND SUM(Amount) > 1000
This query would only list the salespeople who have made more than ten sales and those sales totalled more than 1000 in amount.
=> ORDER BY Clause
We can use ORDER BY
to sort our results. This can be useful when trying to decide who the top performing salespeople are based on different metrics.
Example:
SELECT SalesPersonID, COUNT(*) as NumberOfSales, SUM(Amount) as TotalSales
FROM Sales
GROUP BY SalesPersonID
ORDER BY TotalSales DESC
This query lists salespeople by the total amount of sales they've made (from highest to lowest)
=> JOIN Statement
The JOIN
statement lets us combine rows from two or more tables based on related columns between them, which allows us to pull in related information.
Example:
SELECT SP.Name, COUNT(*) as NumberOfSales, SUM(S.Amount) as TotalSales
FROM Sales S
JOIN SalesPeople SP on S.SalesPersonID = SP.SalesPersonID
GROUP BY SP.Name
ORDER BY TotalSales DESC
Here we join the Sales table with the SalesPeople table to display names alongside their sales numbers and total amounts.
By using these techniques, we can group sales data in various ways to derive insights about salespeople's performance.