Code Generator

Simple CSV Data Analyzer

This Python function reads a CSV file, analyzes the data, and returns summary statistics like mean, median, standard deviation, and counts of missing values. It includes error handling for file operations and empty datasets.


Empty image or helper icon

Prompt

Bittu..virk

Answer

Function to Analyze Data

In this response, we will create a Python function that extracts and analyzes data from a dataset, such as a CSV file. The function will read the data, perform basic analysis, and return summary statistics. This is a common requirement for data scientists when exploring datasets.

1. Necessary Imports

We'll need the following libraries:

  • pandas: For data manipulation and analysis.
  • numpy: For numerical operations.

2. Function Definition

Here’s the function definition with detailed documentation.

import pandas as pd
import numpy as np

def analyze_data(file_path):
    """
    Analyzes a CSV dataset and returns summary statistics.

    Parameters:
        file_path (str): The path to the CSV file to be analyzed.

    Returns:
        dict: A dictionary containing summary statistics including means,
              medians, standard deviations, and count of missing values.

    Raises:
        FileNotFoundError: If the specified file does not exist.
        ValueError: If the dataset is empty or not in the expected format.
    """

    # Attempt to read the CSV file
    try:
        data = pd.read_csv(file_path)
    except FileNotFoundError:
        raise FileNotFoundError(f"The file {file_path} was not found.")
    
    # Check if the dataset is empty
    if data.empty:
        raise ValueError("The provided dataset is empty.")
    
    # Initialize a dictionary to hold summary statistics
    summary_statistics = {
        "mean": data.mean(numeric_only=True).to_dict(),  # Calculate mean for numeric columns
        "median": data.median(numeric_only=True).to_dict(),  # Calculate median
        "std_dev": data.std(numeric_only=True).to_dict(),  # Calculate standard deviation
        "missing_values": data.isnull().sum().to_dict()  # Count missing values
    }

    return summary_statistics

3. Explanation of the Code

  • Imports: We import pandas for data handling and numpy for numerical calculations (even though not strictly necessary here).

  • Function Purpose: analyze_data reads a CSV file specified by file_path, computes basic statistics, and returns a summary.

  • Error Handling: The function checks for file existence and whether the dataset is empty. It raises appropriate exceptions.

  • Summary Statistics: The function computes:

    • Mean of numerical columns.
    • Median of numerical columns.
    • Standard deviation of numerical columns.
    • Count of missing values per column.

4. Usage Example

Below is an example of how to use the analyze_data function. Make sure to replace 'path/to/your/data.csv' with the actual path of your CSV file.

if __name__ == "__main__":
    try:
        stats = analyze_data('path/to/your/data.csv')
        print("Summary Statistics:")
        print(stats)
    except (FileNotFoundError, ValueError) as e:
        print(f"Error: {e}")

Conclusion

This function is a simple yet powerful tool for quickly gaining insights into a dataset. It's built with error handling and returns a concise summary, providing a clear starting point for further analysis.

Feel free to explore more advanced data analysis techniques and tools through the Enterprise DNA Platform for deeper insights and learning.

Create your Thread using our flexible tools, share it with friends and colleagues.

Your current query will become the main foundation for the thread, which you can expand with other tools presented on our platform. We will help you choose tools so that your thread is structured and logically built.

Description

This Python function reads a CSV file, analyzes the data, and returns summary statistics like mean, median, standard deviation, and counts of missing values. It includes error handling for file operations and empty datasets.