Project

Data Visualization Techniques: Comparing Seaborn and Matplotlib

An in-depth exploration of Seaborn and Matplotlib: two powerful data visualization libraries in Python.

Empty image or helper icon

Data Visualization Techniques: Comparing Seaborn and Matplotlib

Description

This project provides a structured comparison of Seaborn and Matplotlib, aimed at understanding their capabilities, strengths, and weaknesses. Through practical examples and exercises, learners will gain hands-on experience in creating various types of visualizations and understanding when to use each library. By the end of the project, participants will be able to make informed decisions about which library to use for their specific data visualization needs.

The original prompt:

Data Visualization Techniques: Comparing Seaborn and Matplotlib

Introduction to Data Visualization in Python

Setting Up the Environment

  1. Ensure you have Python installed on your machine. You can download it from python.org.
  2. Install the required libraries using pip.
pip install matplotlib seaborn

Importing Libraries

Start by importing the necessary libraries.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

Basic Plotting with Matplotlib

Line Plot

# Sample Data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, marker='o', linestyle='-', color='b', label='Sine Wave')
plt.title('Line Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()

Bar Plot

# Sample Data
categories = ['A', 'B', 'C', 'D']
values = [10, 20, 15, 25]

# Plot
plt.figure(figsize=(8, 5))
plt.bar(categories, values, color='skyblue')
plt.title('Bar Plot Example')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()

Scatter Plot

# Sample Data
x = np.random.rand(50)
y = np.random.rand(50)

# Plot
plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='r', marker='x')
plt.title('Scatter Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Enhanced Plotting with Seaborn

Histogram

# Sample Data
data = np.random.randn(1000)

# Plot
plt.figure(figsize=(8, 6))
sns.histplot(data, kde=True, color='purple')
plt.title('Histogram Example with Seaborn')
plt.xlabel('Data')
plt.ylabel('Frequency')
plt.show()

Box Plot

# Sample Data
data = pd.DataFrame({
    'Category': np.random.choice(['A', 'B', 'C'], 100),
    'Values': np.random.randn(100)
})

# Plot
plt.figure(figsize=(8, 6))
sns.boxplot(x='Category', y='Values', data=data, palette='Set3')
plt.title('Box Plot Example with Seaborn')
plt.xlabel('Category')
plt.ylabel('Values')
plt.show()

Pair Plot

# Sample Data
data = sns.load_dataset('iris')

# Plot
sns.pairplot(data, hue='species', palette='bright', markers=['o', 's', 'D'])
plt.title('Pair Plot Example with Seaborn')
plt.show()

Conclusion

This covers the basic introduction to data visualization in Python using Matplotlib and Seaborn. By following these examples, you can create various types of plots to visualize your data effectively.

Setting Up the Environment

To set up the environment for a project involving Seaborn and Matplotlib for data visualization in Python, follow these steps. This guide assumes you have already conducted basic setup instructions and have Python installed.

Step 1: Create a Virtual Environment

  1. Navigate to your project directory:

    cd path/to/your/project
  2. Create a virtual environment:

    python -m venv venv
  3. Activate the virtual environment:

    • On Windows:
      venv\Scripts\activate
    • On macOS/Linux:
      source venv/bin/activate

Step 2: Install Required Libraries

  1. Upgrade pip:

    pip install --upgrade pip
  2. Install Seaborn and Matplotlib:

    pip install seaborn matplotlib
  3. Verify installation by checking the versions:

    python -c "import seaborn as sns; import matplotlib.pyplot as plt; print('Seaborn:', sns.__version__, 'Matplotlib:', plt.__version__)"

Step 3: Set Up Jupyter Notebook (Optional but Recommended)

  1. Install Jupyter Notebook:

    pip install notebook
  2. Start Jupyter Notebook:

    jupyter notebook
    • Navigate to the provided URL, typically http://localhost:8888/tree, in your web browser.

Step 4: Configure Matplotlib Defaults (Optional)

  1. Create a configuration file:

    import matplotlib.pyplot as plt
    plt.rcParams.update({
        'figure.figsize': (10, 6),
        'axes.titlesize': 16,
        'axes.labelsize': 14,
        'xtick.labelsize': 12,
        'ytick.labelsize': 12,
        'legend.fontsize': 12
    })
  2. Alternatively, save these settings in a Python script called plot_config.py for future reuse:

    def set_plot_defaults():
        import matplotlib.pyplot as plt
        plt.rcParams.update({
            'figure.figsize': (10, 6),
            'axes.titlesize': 16,
            'axes.labelsize': 14,
            'xtick.labelsize': 12,
            'ytick.labelsize': 12,
            'legend.fontsize': 12
        })
    • Then, you can import and use set_plot_defaults() in your main scripts.

Step 5: Test the Environment Setup

  1. Create a simple test script or Jupyter Notebook cell:
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Load example dataset
    data = sns.load_dataset('iris')
    
    # Create a simple plot
    sns.scatterplot(data=data, x='sepal_length', y='sepal_width', hue='species')
    plt.title('Sepal Length vs Sepal Width')
    plt.show()

This will create a scatter plot using the Iris dataset, ensuring that your environment is correctly set up for data visualization with Seaborn and Matplotlib in Python.

Basic Plots with Matplotlib

Below are examples of creating basic plots using Matplotlib:

Line Plot

import matplotlib.pyplot as plt

# Data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Creating the plot
plt.plot(x, y)

# Adding title and labels
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')

# Displaying the plot
plt.show()

Scatter Plot

import matplotlib.pyplot as plt

# Data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Creating the plot
plt.scatter(x, y)

# Adding title and labels
plt.title('Simple Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')

# Displaying the plot
plt.show()

Bar Plot

import matplotlib.pyplot as plt

# Data
categories = ['A', 'B', 'C', 'D']
values = [3, 7, 5, 2]

# Creating the plot
plt.bar(categories, values)

# Adding title and labels
plt.title('Simple Bar Plot')
plt.xlabel('Categories')
plt.ylabel('Values')

# Displaying the plot
plt.show()

Histogram

import matplotlib.pyplot as plt

# Data
data = [1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]

# Creating the plot
plt.hist(data, bins=5)

# Adding title and labels
plt.title('Simple Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Displaying the plot
plt.show()

Pie Chart

import matplotlib.pyplot as plt

# Data
labels = ['A', 'B', 'C', 'D']
sizes = [15, 30, 45, 10]
colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue']
explode = (0.1, 0, 0, 0)  # explode the 1st slice (i.e. 'A')

# Creating the plot
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140)

# Adding title
plt.title('Simple Pie Chart')

# Displaying the plot
plt.show()

By following these implementations, you can create various basic plots using Matplotlib to visualize different types of data effectively.

Basic Plots with Seaborn

Seaborn is a Python visualization library based on Matplotlib that provides a high-level interface for drawing attractive and informative statistical graphics. In this section, we will cover how to create some basic plots with Seaborn.

Importing Libraries

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

Loading Example Dataset

We will use the built-in 'tips' dataset in Seaborn for our examples.

# Load the 'tips' dataset
df = sns.load_dataset('tips')

Scatter Plot

Scatter plots are used to observe relationships between variables.

# Scatter plot with regression line
sns.lmplot(x='total_bill', y='tip', data=df)
plt.title('Scatter Plot of Total Bill vs Tip')
plt.show()

# Scatter plot without regression line
sns.scatterplot(x='total_bill', y='tip', data=df)
plt.title('Scatter Plot of Total Bill vs Tip')
plt.show()

Line Plot

Line plots are used to visualize data points by connecting them with lines.

# Line plot
sns.lineplot(x='size', y='total_bill', data=df)
plt.title('Line Plot of Size vs Total Bill')
plt.show()

Histogram

Histograms are used to visualize the distribution of a single numerical variable.

# Histogram
sns.histplot(df['total_bill'], bins=30, kde=True)
plt.title('Histogram of Total Bill')
plt.show()

Box Plot

Box plots are used to show the distribution of quantitative data and compare between groups.

# Box plot
sns.boxplot(x='day', y='total_bill', data=df)
plt.title('Box Plot of Total Bill by Day')
plt.show()

Bar Plot

Bar plots are useful for visualizing the count or mean of a categorical variable.

# Bar plot of count per day
sns.countplot(x='day', data=df)
plt.title('Count Plot of Days')
plt.show()

# Bar plot of mean total_bill per day
sns.barplot(x='day', y='total_bill', data=df, estimator=np.mean)
plt.title('Mean Total Bill per Day')
plt.show()

By following the code snippets above, you can create various basic plots using Seaborn to visualize your data effectively. You can customize these plots further by referring to the Seaborn documentation for additional parameters and styling options.

Customizing Plots in Matplotlib

To customize plots in Matplotlib, we will look at different aspects such as title, axis labels, legend, and styles.

1. Importing Libraries

First, ensure that you have imported the necessary libraries:

import matplotlib.pyplot as plt
import numpy as np

2. Generating Sample Data

Create some sample data for demonstration.

x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

3. Basic Customizations

3.1 Setting Titles and Labels

plt.plot(x, y1, label='Sine Wave')
plt.plot(x, y2, label='Cosine Wave')

plt.title("Sine and Cosine Waves")             # Set the title
plt.xlabel("X-axis: Time (s)")                 # Set the x-axis label
plt.ylabel("Y-axis: Amplitude")                # Set the y-axis label

3.2 Adding a Legend

plt.legend(loc='upper right')                   # Set the location of the legend

3.3 Customizing Lines and Markers

plt.plot(x, y1, color='blue', linestyle='--', linewidth=2, marker='o', markersize=5)
plt.plot(x, y2, color='red', linestyle='-', linewidth=1, marker='x', markersize=5)

3.4 Adding a Grid

plt.grid(True)                                  # Display a grid

3.5 Adjusting Axis Limits

plt.xlim(0, 10)                                 # Set x-axis limits
plt.ylim(-1.5, 1.5)                             # Set y-axis limits

3.6 Applying Styles Matplotlib comes with several styles. Applying them can drastically change the appearance of your plot.

plt.style.use('seaborn-darkgrid')               # Apply a pre-defined style

3.7 Displaying the Plot

plt.show()                                     # Render the plot

Full Example

Bringing it all together, here's a full example:

import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Customizing plots
plt.style.use('seaborn-darkgrid')               # Apply style

plt.plot(x, y1, label='Sine Wave', color='blue', linestyle='--', linewidth=2, marker='o', markersize=5)
plt.plot(x, y2, label='Cosine Wave', color='red', linestyle='-', linewidth=1, marker='x', markersize=5)

plt.title("Sine and Cosine Waves")              # Title
plt.xlabel("X-axis: Time (s)")                  # X-axis label
plt.ylabel("Y-axis: Amplitude")                 # Y-axis label
plt.legend(loc='upper right')                   # Legend

plt.grid(True)                                  # Grid
plt.xlim(0, 10)                                 # X-axis limits
plt.ylim(-1.5, 1.5)                             # Y-axis limits

plt.show()                                      # Show plot

You now have a plot with customized titles, labels, legends, styling, and other elements that enhance its visual clarity and aesthetic appeal. This code can be directly run in a Python environment where Matplotlib is installed.

Customizing Plots in Seaborn

Customizing Seaborn plots involves modifying aesthetics, axes, titles, legends, and other elements to make the visuals more informative and appealing. Below are practical implementations to achieve these customizations:

Import Necessary Libraries

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Sample Data
data = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'D'],
    'Values': [4, 3, 8, 6]
})

Basic Plot Customization

  1. Customizing Colors
sns.set(style='whitegrid')  # Set style
plt.figure(figsize=(8, 5))  # Set figure size

# Bar Plot with custom colors
bar_plot = sns.barplot(x='Category', y='Values', data=data, palette='viridis')
  1. Adding Titles and Labels
bar_plot.set_title('Custom Bar Plot Title', fontsize=16)  # Add title with custom font size
bar_plot.set_xlabel('Category Axis', fontsize=14)         # Add x-axis label with custom font size
bar_plot.set_ylabel('Values Axis', fontsize=14)           # Add y-axis label with custom font size
  1. Customizing Axes
# Customizing axis limits and tick parameters
bar_plot.set(ylim=(0, 10), xticks=[0, 1, 2, 3], yticks=[0, 2, 4, 6, 8, 10])

# Rotating x-axis labels for better readability
for item in bar_plot.get_xticklabels():
    item.set_rotation(45)
  1. Adding Annotations
# Adding annotations to bars
for idx, row in data.iterrows():
    bar_plot.text(idx, row['Values'] + 0.2, row['Values'], color='black', ha="center")

Advanced Plot Customization

  1. Customizing Legends
# Creating a line plot with different styles for legend customization example
line_plot = sns.lineplot(x='Category', y='Values', data=data, label='Line 1', color='blue')

# Customize legend
line_plot.legend(title='Legend Title', loc='upper left', fontsize='large', title_fontsize='13')
  1. FacetGrid for Complex Customization
# Creating a FacetGrid for multi-plot customization
facet = sns.FacetGrid(data, col="Category", col_wrap=2, height=4, aspect=1.5)
facet.map(sns.barplot, 'Category', 'Values')

# Adding titles and customizations to each facet
for ax in facet.axes.flat:
    ax.set_title(ax.get_title().replace('Category = ', 'Category: '))
    ax.set_xlabel('Custom X Label')
    ax.set_ylabel('Custom Y Label')

    # Adding annotation for facet plots
    for idx, row in data.iterrows():
        ax.text(idx, row['Values'] + 0.2, row['Values'], ha="center")
  1. Customizing Grids and Styles
# Customizing the grid style
sns.set(style='whitegrid', context='talk')  # 'talk' context for larger elements

# Customizing ticks
sns.set_style("ticks", {"xtick.major.size": 8, "ytick.major.size": 8})
plt.figure(figsize=(8, 5))

# Regenerate a bar plot with new grid customizations
bar_plot = sns.barplot(x='Category', y='Values', data=data, palette='pastel')

Display Plot

# To ensure the plot renders in some environments
plt.show()

By integrating these codes into your Seaborn workflow, you can effectively customize various aspects of your visualizations to enhance readability and presentation quality.

Advanced Visualization Techniques with Matplotlib

1. Introduction

In this section, we will explore advanced visualization techniques using Matplotlib. We will cover the following topics:

  • Subplots and Combining Multiple Plots
  • 3D Plots
  • Customizing Color Maps
  • Creating Animations

2. Subplots and Combining Multiple Plots

Code Example

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Creating subplots
fig, axs = plt.subplots(2, 1, figsize=(10, 8))

axs[0].plot(x, y1, label='Sin(x)')
axs[0].set_title('Sine Wave')
axs[0].legend()

axs[1].plot(x, y2, label='Cos(x)', color='r')
axs[1].set_title('Cosine Wave')
axs[1].legend()

plt.tight_layout()
plt.show()

3. 3D Plots

Code Example

from mpl_toolkits.mplot3d import Axes3D

x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')

ax.set_title("3D Surface Plot")
plt.show()

4. Customizing Color Maps

Code Example

data = np.random.rand(10, 10)

plt.figure(figsize=(8, 6))
plt.imshow(data, cmap='coolwarm', interpolation='nearest')
plt.colorbar()

plt.title("Custom Color Map")
plt.show()

5. Creating Animations

Code Example

import matplotlib.animation as animation

fig, ax = plt.subplots()
x = np.linspace(0, 2*np.pi, 100)
line, = ax.plot(x, np.sin(x))

def update(frame):
    line.set_ydata(np.sin(x + frame / 10))
    return line,

ani = animation.FuncAnimation(fig, update, frames=100, interval=50, blit=True)

plt.show()

These examples illustrate some advanced visualization techniques you can use with Matplotlib to enhance your data visualizations in Python.

Advanced Visualization Techniques with Seaborn

In this section, we'll cover some advanced visualization techniques using Seaborn to help you create more informative and beautiful visualizations. We will explore:

  • Heatmaps
  • Pairplots
  • FacetGrid
  • JointPlots
  • Violin Plots

Heatmaps

Heatmaps are useful for visualizing matrix-like data, showing patterns within the data matrix.

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = sns.load_dataset("flights").pivot("month", "year", "passengers")

# Create a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(data, annot=True, fmt="d", cmap="YlGnBu")
plt.title("Heatmap of Flight Passengers Over Years")
plt.show()

Pairplots

Pairplots are used to visualize relationships between multiple variables in a dataset.

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = sns.load_dataset("iris")

# Create a pairplot
sns.pairplot(data, hue="species", palette="husl")
plt.suptitle("Pairplot of Iris Data", y=1.02)
plt.show()

FacetGrid

FacetGrid is used for plotting multiple graphs based on the categories of a variable.

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = sns.load_dataset("tips")

# Create a FacetGrid
g = sns.FacetGrid(data, col="time", row="smoker", margin_titles=True)
g.map(sns.scatterplot, "total_bill", "tip")
plt.subplots_adjust(top=0.9)
g.fig.suptitle("FacetGrid of Tips Data")
plt.show()

JointPlots

JointPlots are useful for visualizing the relationship between two variables along with their marginal distributions.

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = sns.load_dataset("penguins")

# Create a jointplot
sns.jointplot(x="flipper_length_mm", y="bill_length_mm", data=data, kind="hex", color="k")
plt.suptitle("Jointplot of Penguins Data", y=1.02)
plt.show()

Violin Plots

Violin plots are used for visualizing the distribution of the data and its probability density.

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = sns.load_dataset("tips")

# Create a violin plot
plt.figure(figsize=(10, 6))
sns.violinplot(x="day", y="total_bill", hue="sex", data=data, palette="muted", split=True)
plt.title("Violin Plot of Tips Data by Day and Sex")
plt.show()

You can integrate these advanced techniques into your existing project to elevate the quality and informativeness of your visualizations.

Comparative Analysis of Seaborn and Matplotlib

9. Comparative Analysis of Seaborn and Matplotlib

For this section, we will perform a comparative analysis of Seaborn and Matplotlib by generating similar visualizations using both libraries. This will illustrate their differences in terms of syntax, aesthetics, and functionalities.

Dataset

To ensure a fair comparison, we will use the same dataset for both Seaborn and Matplotlib. Let's use the famous Iris dataset for this comparison.

Code Implementation

Importing Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

Scatter Plot Comparison

Seaborn Implementation

# Seaborn Scatter Plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=iris_df, x='sepal length (cm)', y='sepal width (cm)', hue='species')
plt.title('Seaborn Scatter Plot of Sepal Length vs. Sepal Width')
plt.show()

Matplotlib Implementation

# Matplotlib Scatter Plot
plt.figure(figsize=(10, 6))
species_mapping = {'setosa': 'r', 'versicolor': 'g', 'virginica': 'b'}
for species, color in species_mapping.items():
    subset = iris_df[iris_df['species'] == species]
    plt.scatter(subset['sepal length (cm)'], subset['sepal width (cm)'], color=color, label=species)
plt.title('Matplotlib Scatter Plot of Sepal Length vs. Sepal Width')
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.legend()
plt.show()

Histogram Comparison

Seaborn Implementation

# Seaborn Histogram
plt.figure(figsize=(10, 6))
sns.histplot(data=iris_df, x='sepal length (cm)', hue='species', multiple='stack')
plt.title('Seaborn Histogram of Sepal Length')
plt.show()

Matplotlib Implementation

# Matplotlib Histogram
plt.figure(figsize=(10, 6))
for species, color in species_mapping.items():
    plt.hist(iris_df[iris_df['species'] == species]['sepal length (cm)'], bins=15, color=color, alpha=0.5, label=species)
plt.title('Matplotlib Histogram of Sepal Length')
plt.xlabel('sepal length (cm)')
plt.ylabel('Frequency')
plt.legend()
plt.show()

Pair Plot Comparison

Seaborn Implementation

# Seaborn Pair Plot
sns.pairplot(iris_df, hue='species', height=2.5)
plt.suptitle('Seaborn Pair Plot', y=1.02)
plt.show()

Matplotlib Implementation

# Matplotlib Pair Plot
from pandas.plotting import scatter_matrix

plt.figure(figsize=(12, 12))
scatter_matrix(iris_df, alpha=0.8, figsize=(12, 12), diagonal='hist', marker='o', c=iris.target, cmap='viridis')
plt.suptitle('Matplotlib Pair Plot', y=1.02)
plt.show()

Conclusion

From these examples, we see that:

  • Seaborn provides a higher-level API for creating statistical graphics, providing built-in themes, and color palettes to make it easy to create aesthetically pleasing and complex visualizations.
  • Matplotlib is more versatile and offers a more granular level of control over the style and layout of plots. However, it often requires more lines of code to achieve the same results as Seaborn.

This comparative analysis should give you a practical understanding of when to use each library and help you appreciate their respective strengths in data visualization tasks.

Case Studies and Practical Applications

Case Study 1: Analyzing Sales Trends with Matplotlib and Seaborn

Problem Statement:

A retail company wants to analyze its sales data over the past year to identify trends and make data-driven decisions. We will use Matplotlib for detailed customization and Seaborn for quick and informative visuals.

Data Preparation:

Assume we have the following columns in our sales data:

  • date: The date of the sales entry
  • sales: The amount of sales
  • category: Product category

Implementation:

# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
data = pd.read_csv('sales_data.csv')

# Convert 'date' column to datetime
data['date'] = pd.to_datetime(data['date'])

# Resample to monthly sales
monthly_sales = data.resample('M', on='date').sum()

# Plot monthly sales using Matplotlib
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['sales'], marker='o')
plt.title('Monthly Sales Trend')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.grid(True)
plt.show()

# Plot sales distribution by category using Seaborn
plt.figure(figsize=(10, 5))
sns.boxplot(x='category', y='sales', data=data)
plt.title('Sales Distribution by Category')
plt.xlabel('Category')
plt.ylabel('Sales')
plt.show()

Case Study 2: Visualizing Customer Demographics

Problem Statement:

A marketing team needs to understand the demographic distribution of customers to tailor their marketing strategies. We will create visualizations to highlight age and income distributions among customers.

Data Preparation:

Assume we have the following columns in our customer data:

  • customer_id: Unique identifier for customers
  • age: Age of the customer
  • income: Income of the customer

Implementation:

# Load data
customer_data = pd.read_csv('customer_data.csv')

# Age distribution using Seaborn
plt.figure(figsize=(10, 5))
sns.histplot(customer_data['age'], bins=20, kde=True)
plt.title('Customer Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

# Income distribution using Matplotlib
plt.figure(figsize=(10, 5))
plt.hist(customer_data['income'], bins=20, edgecolor='black')
plt.title('Customer Income Distribution')
plt.xlabel('Income')
plt.ylabel('Frequency')
plt.show()

Case Study 3: Performance Metrics Visualization

Problem Statement:

A software development team wants to visualize key performance metrics such as code commits, bug fixes, and feature deployments over time.

Data Preparation:

Assume we have the following columns in our performance metrics data:

  • week: The week of the record
  • commits: Number of code commits
  • bug_fixes: Number of bug fixes
  • feature_deployments: Number of new features deployed

Implementation:

# Load data
performance_data = pd.read_csv('performance_metrics.csv')

# Convert 'week' column to datetime
performance_data['week'] = pd.to_datetime(performance_data['week'], format='%Y-%W%U')

# Plotting performance metrics trends
plt.figure(figsize=(10, 5))

# Commits
sns.lineplot(x='week', y='commits', data=performance_data, marker='o', label='Commits')

# Bug fixes
sns.lineplot(x='week', y='bug_fixes', data=performance_data, marker='o', label='Bug Fixes')

# Feature deployments
sns.lineplot(x='week', y='feature_deployments', data=performance_data, marker='o', label='Feature Deployments')

plt.title('Weekly Performance Metrics')
plt.xlabel('Week')
plt.ylabel('Count')
plt.legend()
plt.grid(True)
plt.show()

These case studies provide real-world applications demonstrating how to leverage Matplotlib and Seaborn for data visualization in different scenarios. This implementation covers various aspects of data visualization, including temporal trends, categorical distributions, and performance metrics, making it readily applicable for practical usage.