Project

Mastering Bayesian Inference with R: An Applied Guide

This project aims to provide a comprehensive, practical introduction to Bayesian inference using the R language. Learners will utilize R packages, such as rstan and Stan, to perform Bayesian analysis in real-world scenarios.

Empty image or helper icon

Mastering Bayesian Inference with R: An Applied Guide

Description

Through this project, participants will gain structured insights into the principles and applications of Bayesian inference using R. The curriculum is designed to offer hands-on experience and step-by-step guidance, from the basics of Bayesian theory to complex model building and diagnostics using R and Stan. Each unit is self-contained, focusing on specific aspects of Bayesian analysis and practical implementation using R scripting.

The original prompt:

I’d like to see a detailed real world example of using this is R - Bayesian Analysis

Bayesian Inference: Leverage packages like rstan and Stan.

Introduction to Bayesian Theory

Overview

Bayesian inference is a method of statistical inference in which Bayes' Theorem is used to update the probability for a hypothesis as more evidence or information becomes available. This approach is particularly useful in real-world scenarios where we need to incorporate prior knowledge and continuously update our beliefs.

Bayes' Theorem

Bayes’ Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. The formula for Bayes' Theorem is as follows:

[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} ]

Where:

  • ( P(A|B) ) is the posterior probability: the probability of the hypothesis ( A ) given the data ( B ).
  • ( P(B|A) ) is the likelihood: the probability of the data given the hypothesis.
  • ( P(A) ) is the prior probability: the initial probability of the hypothesis before seeing the data.
  • ( P(B) ) is the marginal likelihood: the total probability of the data under all hypotheses.

Setting Up Your R Environment

Before we dive into Bayesian analysis using R, make sure you have the necessary packages installed. The rstan package provides an interface to Stan, a platform for statistical modeling and high-performance statistical computation.

Installing Required Packages

To install the necessary R packages, use the following commands:

install.packages("rstan", repos = "https://cloud.r-project.org/", dependencies = TRUE)
install.packages("StanHeaders", repos = "https://cloud.r-project.org/")

Loading the Packages

Once installed, you need to load the packages in your R environment:

library(rstan)
library(StanHeaders)

Performing Bayesian Analysis

Example Scenario: Bayesian Estimation of a Normal Distribution

Suppose you want to estimate the mean ((\mu)) and standard deviation ((\sigma)) of a normal distribution using Bayesian inference. Here’s a practical example to get you started.

Stan Model Code

Create a Stan model file, e.g., normal_model.stan:

data {
  int N;                   // number of observations
  real y[N];                        // observed data
}
parameters {
  real mu;                          // mean
  real sigma;              // standard deviation
}
model {
  y ~ normal(mu, sigma);            // likelihood
  mu ~ normal(0, 10);               // prior for mean
  sigma ~ cauchy(0, 5);             // prior for standard deviation
}

Sample Data

Generate some sample data in R:

set.seed(123)
N <- 100
y <- rnorm(N, mean = 5, sd = 2)
data <- list(N = N, y = y)

Fitting the Model

Use rstan's stan function to fit the model:

fit <- stan(
  file = 'normal_model.stan',  # Stan model file
  data = data,                 # named list of data
  iter = 2000,                 # number of iterations
  chains = 4                   # number of Markov chains
)

Inspecting the Results

View the summary of the fitted model to inspect the posterior distributions:

print(fit)
plot(fit)

Conclusion

This section provided a practical introduction to Bayesian Theory and demonstrated how to perform Bayesian analysis using R and the rstan package. By following the provided code examples, you should now have a basic setup and an understanding of how to implement Bayesian inference for a normal distribution.

Setting Up Your R Environment

In this unit, we will provide a guide for setting up your R environment to perform Bayesian inference using the rstan package. Here are the steps to get started with rstan and Stan for Bayesian analysis in R.

Step-by-Step Instructions

1. Install R and RStudio

Ensure you have the latest versions of R and RStudio installed on your machine.

2. Install Required R Packages

Launch RStudio and execute the following commands to install the necessary R packages for Bayesian analysis using rstan:

# Install RStan
install.packages("rstan", dependencies = TRUE)

# Install additional required packages
install.packages("StanHeaders")
install.packages("ggplot2")
install.packages("bayesplot")

# Optional: Install the development version of rstan if necessary
# devtools::install_github("stan-dev/rstan", ref = "develop", 
#                          subdir = "rstan/rstan")

3. Configuration for Efficient Use of rstan Package

Configure rstan to avoid recompilation of unchanged models and to set up the number of cores for parallel execution:

# Load rstan and configure
library(rstan)

# Set up options for rstan
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

4. Verify Installation

Run the following R code to verify that rstan is correctly installed and functioning:

# Example Stan model code (eight schools example)
stan_code <- "
data {
  int J;         // number of schools
  real y[J];              // estimated treatment effects
  real sigma[J]; // standard error of effect estimates
}
parameters {
  real mu;                // overall treatment effect
  real tau;      // standard deviation in treatment effects
  real eta[J];            // school-specific deviations from mu
}
transformed parameters {
  real theta[J];
  for (j in 1:J)
    theta[j] = mu + tau * eta[j];
}
model {
  eta ~ normal(0, 1);
  y ~ normal(theta, sigma);
}
"

# Prepare data for the example (eight schools data)
schools_data <- list(
  J = 8, 
  y = c(28,  8, -3,  7, -1,  1, 18, 12),
  sigma = c(15, 10, 16, 11,  9, 11, 10, 18)
)

# Fit Stan model
fit <- stan(model_code = stan_code, data = schools_data)

# Print fit summary
print(fit)

Executing the above script should provide an output summarizing the posterior distributions of the parameters in the model, confirming that your setup works correctly.

5. Writing Your Bayesian Inference Models Using rstan

With your environment correctly set up, you can now write your Bayesian inference models using rstan. Here is an example of how you would encapsulate the steps into a repeatable function:

# Define a function to run a Bayesian inference model
run_bayesian_model <- function(model_code, data) {
  library(rstan)
  rstan_options(auto_write = TRUE)
  options(mc.cores = parallel::detectCores())
  
  fit <- stan(model_code = model_code, data = data)
  return(fit)
}

# Example usage with eight schools data
stan_code <- "
data {
  int J;
  real y[J];
  real sigma[J];
}
parameters {
  real mu;
  real tau;
  real eta[J];
}
transformed parameters {
  real theta[J];
  for (j in 1:J)
    theta[j] = mu + tau * eta[j];
}
model {
  eta ~ normal(0, 1);
  y ~ normal(theta, sigma);
}
"

schools_data <- list(
  J = 8, 
  y = c(28,  8, -3,  7, -1,  1, 18, 12),
  sigma = c(15, 10, 16, 11,  9, 11, 10, 18)
)

fit <- run_bayesian_model(stan_code, schools_data)
print(fit)

Following these steps ensures that your R environment is ready for Bayesian analysis using rstan and Stan. You can now proceed with writing and analyzing your own Bayesian models.

Practical Implementation of Bayesian Inference in R

Unit 3: Basics of Bayesian Inference in R

1. Introduction to the Problem

We'll use a simple example to demonstrate Bayesian inference in R. Let's say we want to estimate the probability of success ( \theta ) in a series of Bernoulli trials (e.g., the probability of getting heads in a series of coin flips).

2. Required Packages

Ensure you have the required R packages:

install.packages("rstan")
library(rstan)

3. Define the Model

Create a model in the Stan language. Here is a simple Bernoulli-Beta model:

// bernoulli_beta.stan
data {
  int N;       // Number of trials
  int y[N];  // Successes (0 or 1)
}
parameters {
  real theta; // Probability of success
}
model {
  theta ~ beta(1, 1);   // Prior
  y ~ bernoulli(theta); // Likelihood
}

4. Prepare Data in R

Write the R code to prepare data and run the model:

# Simulated data: 10 coin flips with 7 heads
N <- 10
y <- c(1, 1, 1, 1, 1, 1, 1, 0, 0, 0)

data <- list(N = N, y = y)

# Compile the Stan model
model_code <- "
data {
  int N;
  int y[N];
}
parameters {
  real theta;
}
model {
  theta ~ beta(1, 1);
  y ~ bernoulli(theta);
}
"

stan_model <- stan(model_code = model_code, data = data, iter = 2000, chains = 4)

print(stan_model)

5. Visualize and Inspect the Results

RStan provides tools to visualize and inspect the output:

# Extract samples
extracted <- extract(stan_model)

# Plot posterior distribution for theta
hist(extracted$theta, breaks=30, probability=TRUE,
     main="Posterior distribution of theta",
     xlab=expression(theta))

# Add a density estimate
lines(density(extracted$theta), col='blue')

Conclusion

This basic example introduces Bayesian inference using R and Stan. The key steps involve defining your model in Stan, preparing your data in R, running the model, and visualizing the results. This forms the foundation on which more complex models and data sets can be tackled in future units of your project.

Data Preprocessing and Cleaning in R

This section covers practical data preprocessing and cleaning techniques using R. The goal is to prepare your data for Bayesian analysis.

Load Necessary Libraries

library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)

Import Data

# Replace 'file_path' with your actual dataset path
data <- read.csv('file_path.csv')

Inspect the Data

# View the first few rows of the dataset
head(data)

# Get summary statistics of the dataset
summary(data)

# Check the structure of the dataset
str(data)

Handling Missing Values

Identify Missing Values

# Identify missing values in the dataset
missing_values <- sapply(data, function(x) sum(is.na(x)))
print(missing_values)

Impute or Remove Missing Values

  • Impute with mean/median/mode for numerical variables.
  • Impute with the most frequent value for categorical variables or remove observations with large proportions of missing values.
# Example: Impute missing values with median for numeric columns
data <- data %>%
  mutate(across(where(is.numeric), ~ ifelse(is.na(.), median(., na.rm = TRUE), .)))

# Example: Impute missing values with mode for categorical columns
mode_impute <- function(x) {
    ux <- unique(x)
    ux[which.max(tabulate(match(x, ux)))]
}

data <- data %>%
  mutate(across(where(is.factor), ~ ifelse(is.na(.), mode_impute(.), .)))

Remove Duplicates

data <- data %>%
  distinct()

Handle Outliers

Identify Outliers

# Example using IQR method for numerical columns
outliers <- function(x) {
  Q1 <- quantile(x, 0.25)
  Q3 <- quantile(x, 0.75)
  IQR <- Q3 - Q1
  x < (Q1 - 1.5 * IQR) | x > (Q3 + 1.5 * IQR)
}

# Apply to each numeric column
outliers_data <- sapply(data %>% select(where(is.numeric)), outliers)
print(outliers_data)

Remove or Treat Outliers

# Replace outliers with NA and then impute, or simply remove the rows
data <- data %>%
  mutate(across(where(is.numeric), ~ ifelse(outliers(.), NA, .))) %>%
  mutate(across(where(is.numeric), ~ ifelse(is.na(.), median(., na.rm = TRUE), .)))

Standardize / Normalize Data

Standardize Data

# Standardize numerical columns
data <- data %>%
  mutate(across(where(is.numeric), scale))

Normalize Data

# Normalize numerical columns
normalize <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

data <- data %>%
  mutate(across(where(is.numeric), normalize))

Convert Date Columns

Parse Dates

# Identify and convert date columns
data <- data %>%
  mutate(across(where(is.character), ~ parse_date_time(., orders = c('ymd', 'dmy', 'mdy'))))

Encode Categorical Variables

One-hot Encoding

data <- data %>%
  mutate(across(where(is.factor), as.character)) %>%
  pivot_wider(names_from = where(is.character), values_from = last_col(), names_sep = "_")

Label Encoding

data <- data %>%
  mutate(across(where(is.character), as.factor)) %>%
  mutate(across(where(is.factor), as.integer))

Ensure Data Consistency

Change Column Names

data <- data %>%
  rename_with(~ str_replace_all(., "[^[:alnum:]]", "_"))

Final Dataset

Check Final Dataset

# Inspect the cleaned dataframe
head(data)
summary(data)
str(data)

Now your data is cleaned and preprocessed, ready for Bayesian analysis using R packages such as rstan and Stan.

Building Simple Bayesian Models in R

In this section, we will construct a simple Bayesian model using the rstan package. We will create a Bayesian linear regression model as an example.

Prerequisites

Make sure you have the rstan package installed and loaded in your R environment:

install.packages("rstan")
library(rstan)

Step 1: Prepare Data

For this example, let's assume we have a simple dataset with one predictor variable x and one response variable y.

# Simulated Data
set.seed(123)
x <- rnorm(100, mean = 5, sd = 2)
y <- 3 + 2 * x + rnorm(100)

data <- list(
  N = length(y),
  x = x,
  y = y
)

Step 2: Define the Model in Stan

Save the following Stan model to a file named simple_linear_regression.stan:

data {
  int N; // number of observations
  vector[N] x;    // predictor variable
  vector[N] y;    // response variable
}

parameters {
  real alpha;     // intercept
  real beta;      // slope
  real sigma; // standard deviation of errors
}

model {
  y ~ normal(alpha + beta * x, sigma); // likelihood
}

Step 3: Compile the Model

model_code <- "
data {
  int N; // number of observations
  vector[N] x;    // predictor variable
  vector[N] y;    // response variable
}

parameters {
  real alpha;     // intercept
  real beta;      // slope
  real sigma; // standard deviation of errors
}

model {
  y ~ normal(alpha + beta * x, sigma); // likelihood
}
"

stan_model <- stan_model(model_code = model_code)

Step 4: Fit the Model

fit <- sampling(
  object = stan_model,
  data = data,
  iter = 2000,
  chains = 4,
  seed = 123
)

Step 5: Inspect Results

print(fit)
plot(fit)

Step 6: Extract Posterior Samples

posterior_samples <- extract(fit)
alpha_samples <- posterior_samples$alpha
beta_samples <- posterior_samples$beta
sigma_samples <- posterior_samples$sigma

summary(alpha_samples)
summary(beta_samples)
summary(sigma_samples)

Step 7: Visualizing Posterior Distributions

hist(alpha_samples, main = "Posterior Distribution of Alpha", xlab = "Alpha")
hist(beta_samples, main = "Posterior Distribution of Beta", xlab = "Beta")
hist(sigma_samples, main = "Posterior Distribution of Sigma", xlab = "Sigma")

Conclusion

You now have a simple Bayesian linear regression model built using R and rstan. This model can be extended to more complex cases by modifying the Stan code and the R implementation to handle additional variables and parameters.

Advanced Bayesian Modeling Techniques

This section covers the implementation of advanced Bayesian models using rstan and Stan in R for real-world applications.

Hierarchical Bayesian Model

A hierarchical Bayesian model allows us to handle data that has a nested structure. Here is a practical implementation using rstan and Stan.

Step 1: Install and Load Libraries

Ensure you have the necessary packages:

install.packages("rstan")
library(rstan)

Step 2: Define the Stan Model

Create a file named hierarchical_model.stan and define your model.

data {
  int N;          // number of data points
  int J;          // number of groups
  int group[N]; // group indicator
  vector[N] y;            // observations
}

parameters {
  real mu;                // overall mean
  real tau;      // between-group standard deviation
  vector[J] alpha;        // group means offsets
  real sigma;    // within-group standard deviation
}

model {
  tau ~ normal(0, 5);
  alpha ~ normal(0, tau);
  mu ~ normal(0, 10);
  sigma ~ cauchy(0, 2.5);

  for (n in 1:N)
    y[n] ~ normal(mu + alpha[group[n]], sigma);
}

Step 3: Prepare Your Data in R

Suppose you have a dataset data with columns y for observations and group for group indicators.

data <- read.csv("path_to_your_data.csv")
stan_data <- list(
  N = nrow(data),
  J = length(unique(data$group)),
  group = as.integer(data$group),
  y = data$y
)

Step 4: Fit the Model

Compile and fit the model using rstan.

stan_model <- stan_model("hierarchical_model.stan")

fit <- sampling(stan_model, data = stan_data, iter = 2000, chains = 4)
print(fit)

Step 5: Evaluate and Visualize the Results

Extract the results and visualize convergences and estimates.

library(ggplot2)

# Extract posterior samples
posterior <- extract(fit)

# Plot group means
alpha_means <- colMeans(posterior$alpha)
data$group_means <- alpha_means[data$group]

ggplot(data, aes(x = factor(group), y = y)) +
  geom_point() +
  geom_point(aes(y = group_means), color = "red", size = 3) +
  labs(title = "Group Observations and Group Means",
       x = "Group",
       y = "Observations") +
  theme_minimal()

By following these steps, you can implement an advanced hierarchical Bayesian model using rstan and Stan in R. This example can be adapted and extended based on the specific requirements of your real-world scenarios.

Model Diagnostics and Validation

In this section, we will focus on practical techniques for diagnosing and validating Bayesian models implemented using the rstan package in R. Proper diagnostics and validation are critical to ensuring that our models fit the data well and that the inferences we draw are reliable.

Step 1: Load Necessary Packages

# Load necessary libraries
library(rstan)
library(bayesplot)
library(loo)
library(posterior)

Step 2: Fit a Bayesian Model

Assuming that you have already prepared your data and defined your Stan model, you would typically fit your model as follows:

fit <- stan(file = 'model.stan', data = data, iter = 2000, warmup = 500, chains = 4, seed = 123)

Step 3: Model Diagnostics

3.1 Trace Plots

Trace plots help you to visualize the Markov chains' behavior. Ideally, the chains should mix well and converge, appearing as a "hairy caterpillar."

traceplot(fit)

3.2 R-hat Statistics

The potential scale reduction factor (R-hat) indicates how well the chains have converged. R-hat values close to 1 indicate convergence.

print(fit, pars = c("variable_name"), probs = c(0.025, 0.5, 0.975))

3.3 Effective Sample Size (ESS)

ESS measures the number of independent draws from the posterior distribution.

summary(fit)$summary[,"n_eff"]

Step 4: Posterior Predictive Checks

Posterior predictive checks allow us to assess how well our model replicates the observed data.

# Extract posterior samples
posterior_samples <- extract(fit)

# Generate posterior predictive samples
y_rep <- posterior_predict(fit)

# Plot posterior predictive checks
ppc_dens_overlay(data$y, y_rep)

Step 5: Model Comparison with LOO

We can use the loo package to perform approximate leave-one-out cross-validation to compare models.

# Compute LOO
loo_fit <- loo(fit)

# Print LOO results
print(loo_fit)

If comparing multiple models:

# Assuming we have another model fit
loo_fit2 <- loo(fit2)

# Compare models
loo_compare(loo_fit, loo_fit2)

Step 6: Summary of Posterior Distributions

Finally, summarizing the posterior distributions of the parameters gives insight into the estimated values and their uncertainties.

print(fit, probs = c(0.025, 0.5, 0.975))
# Visualize posterior distributions
mcmc_areas(as.array(fit), pars = c("parameter1", "parameter2", "parameter3"), prob = 0.8)

Conclusion

These steps provide a practical implementation for diagnosing and validating Bayesian models using rstan in R. Following these practices ensures that your model is robust, reliable, and provides meaningful inferences from your data.

Interpreting and Reporting Results

Interpreting Bayesian Results

Once you've run your Bayesian models using rstan or Stan, the next step is to interpret the results effectively. This includes understanding the posterior distributions, credible intervals, and key statistics. Below is a practical guide to interpreting and reporting Bayesian results:

Key Concepts

  1. Posterior Distribution: Reflects our updated beliefs about the parameters after observing the data.
  2. Credible Interval: The range within which a parameter lies with a certain probability (e.g., 95% credible interval).
  3. Convergence Diagnostics: To check if the Markov Chain Monte Carlo (MCMC) procedure has properly converged to the posterior distribution.

Practical Steps to Interpret Results

First, assuming you have fitted a model using rstan, you can extract the results using the summary function.

library(rstan)

# Example: Extracting summary from the fitted Stan model
fit <- # (your stanfit object here)
fit_summary <- summary(fit, probs = c(0.025, 0.5, 0.975))$summary
print(fit_summary)

Diagnostic Checks

You should always check the convergence diagnostics like R-hat values and effective sample size (ESS).

print(fit_summary[,"Rhat"])
print(fit_summary[,"n_eff"])
  • R-hat: Values close to 1.0 indicate convergence.
  • Effective Sample Size (ESS): Higher values are better, indicating more reliable estimates.

Posterior Plots

Visualizing the posterior distribution helps in understanding the parameter estimates better.

# Plot posterior distribution of a parameter
library(bayesplot)

params <- extract(fit)
mcmc_areas(params$your_parameter, prob = 0.95)

This plots the posterior density with a 95% credible interval.

Reporting Results

Summary Table

Create a summary table including key statistics like mean, median, and credible intervals.

results <- data.frame(
  Parameter = rownames(fit_summary),
  Mean = fit_summary[,"mean"],
  SD = fit_summary[,"sd"],
  `2.5%` = fit_summary[,"2.5%"],
  `97.5%` = fit_summary[,"97.5%"]
)

print(results)

Interpretation

Here's how you might interpret and report these results in text:

"We estimated the posterior distribution for the parameter extremely_important_parameter. The posterior mean was approximately x with a 95% credible interval ranging from y to z. The convergence diagnostics (R-hat < 1.01) indicate that the model has adequately converged."

Visualization

Include visualizations like trace plots, posterior density plots, and more to support your textual interpretations.

# Trace plot for checking convergence
mcmc_trace(params$your_parameter)

# Density plot
mcmc_dens(params$your_parameter)

Convergence diagnostics

library(shinystan)

# Launch ShinyStan for interactive exploration
launch_shinystan(fit)

ShinyStan provides a convenient way to interactively explore model diagnostics and results.

Conclusion

By following these steps, you can effectively interpret and report the results from your Bayesian inference models in R using rstan or Stan. This ensures that your analysis is both accurate and comprehensible to a broader audience.

Remember to include key statistics, diagnostic checks, and visual aids to communicate your findings effectively.

Part 9: Real-World Applications of Bayesian Inference

Practical Example: Bayesian Linear Regression

In this section, we will demonstrate how to apply Bayesian inference to a real-world problem using R. We will use the rstan package to perform Bayesian linear regression. Specifically, we'll predict house prices based on certain features like square footage and the number of bedrooms.

Step 1: Load the Required Libraries

# Load required libraries
library(rstan)
library(ggplot2)
library(dplyr)

Step 2: Import and Preprocess the Data

For this example, we'll use a sample dataset housing_data which contains house prices and features.

# Sample dataset
housing_data <- data.frame(
  square_footage = c(1400, 1600, 1700, 1875, 1100, 1550, 2350, 2450, 1425, 1700),
  bedrooms = c(3, 3, 3, 4, 2, 3, 4, 4, 3, 3),
  price = c(245000, 312000, 279000, 308000, 199000, 219000, 405000, 324000, 319000, 255000)
)

# Standardize the data
housing_data <- housing_data %>%
  mutate(
    scale_sqrt_ft = scale(square_footage),
    scale_bed = scale(bedrooms),
    scale_price = scale(price)
  )

Step 3: Define the Stan Model

Now, we'll specify the Stan model for Bayesian linear regression.

stan_model_code <- "
data {
  int N;
  vector[N] scale_sqrt_ft;
  vector[N] scale_bed;
  vector[N] scale_price;
}
parameters {
  real alpha;
  real beta_sqrt_ft;
  real beta_bed;
  real sigma;
}
model {
  scale_price ~ normal(alpha + beta_sqrt_ft * scale_sqrt_ft + beta_bed * scale_bed, sigma);
}
"

# Compile the Stan model
stan_model <- stan_model(stan_model_code, data = housing_dat)

Step 4: Fit the Model Using rstan

# Prepare data for Stan model
stan_data <- list(
  N = nrow(housing_data),
  scale_sqrt_ft = housing_data$scale_sqrt_ft,
  scale_bed = housing_data$scale_bed,
  scale_price = housing_data$scale_price
)

fit <- stan(
  model_code = stan_model_code,
  data = stan_data,
  iter = 2000,
  chains = 4
)

# Print the results
print(fit)

Step 5: Posterior Predictive Checks

We'll conduct posterior predictive checks to ensure the quality of our model.

# Extract posterior samples
posterior_samples <- extract(fit)

# Posterior Prediction
posterior_predict <- function(x) {
  alpha <- posterior_samples$alpha
  beta_sqrt_ft <- posterior_samples$beta_sqrt_ft
  beta_bed <- posterior_samples$beta_bed
  sigma <- posterior_samples$sigma
  alpha + beta_sqrt_ft * x[1] + beta_bed * x[2] + rnorm(length(alpha), 0, sigma)
}

# Example prediction for a new data point
new_data <- c(scale(housing_data$square_footage[1]), 
              scale(housing_data$bedrooms[1]))
predictions <- posterior_predict(new_data)

# Plot predictions
ggplot(data = housing_data, aes(x = scale_sqrt_ft, y = scale_price)) +
  geom_point(color = "blue") +
  geom_abline(intercept = median(posterior_samples$alpha), 
              slope = median(posterior_samples$beta_sqrt_ft),
              color = "red")

This implementation provides a comprehensive, practical introduction to Bayesian inference for real-world scenarios using R. By utilizing the rstan package, learners can fit Bayesian models to their data and make informed predictions and analyses.

Project Implementation and Case Studies

Objective

The objective of this section is to apply the concepts learned in previous units. You will implement Bayesian inference on real-world datasets using R, focusing on practical implementation with code and case studies.

Structure

This section consists of two main parts:

  1. Project Implementation: Step-by-step implementation of Bayesian analysis using R.
  2. Case Studies: Real-world examples where Bayesian inference can be applied.

Part 1: Project Implementation

1. Defining the Problem

Let's consider a case where we want to predict the sales of a retail store based on advertising budget. We'll use a simple linear regression model with Bayesian inference.

2. Data Collection

For this example, we'll use a hypothetical dataset. You can replace this with a real dataset in practice.

# Load required libraries
library(rstan)
library(ggplot2)

# Hypothetical dataset
data <- data.frame(
  Advertising = c(50, 100, 150, 200, 250, 300, 350, 400, 450, 500),
  Sales = c(20, 25, 30, 34, 40, 41, 42, 44, 45, 48)
)

3. Preparing the Data for Stan

Stan requires the data to be in a list format.

# Data preparation
stan_data <- list(
  N = nrow(data),
  x = data$Advertising,
  y = data$Sales
)

4. Writing the Stan Model

We'll write a simple linear regression model in Stan.

# Stan model code
stan_code <- "
data {
  int N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real sigma;
}
model {
  y ~ normal(alpha + beta * x, sigma);
}
"

5. Fitting the Model with Stan

Compile and fit the model using the stan function.

# Compile and fit the model
fit <- stan(
  model_code = stan_code,
  data = stan_data,
  iter = 2000,
  chains = 4
)

6. Analyzing the Results

Extract the results and visualize them.

# Extract results
print(fit)

# Plot the results
posterior <- extract(fit)
alpha_post <- posterior$alpha
beta_post <- posterior$beta

# Predictive plot
predicted_sales <- alpha_post + beta_post * data$Advertising

data$Predicted <- rowMeans(predicted_sales)

ggplot(data, aes(x = Advertising, y = Sales)) +
  geom_point() +
  geom_line(aes(y = Predicted), color = 'blue') +
  labs(title = "Advertising Budget vs Sales", 
       x = "Advertising Budget (in 1000s)",
       y = "Sales")

Part 2: Case Studies

Case Study 1: Predicting Housing Prices

Use Bayesian inference to predict housing prices based on features like the number of bedrooms, square footage, etc.

Case Study 2: A/B Testing for Website Conversion Rates

Implement a Bayesian A/B test to determine the effectiveness of two different website designs on conversion rates.

Case Study 3: Demand Forecasting in Supply Chain Management

Apply Bayesian techniques to forecast demand in a supply chain context, taking into account previous demand data and external factors.


Conclusion

By now, you should be able to implement Bayesian inference in R for various real-world problems. These case studies provide a practical approach to using Bayesian analysis. Feel free to expand on them or create your own case studies to gain deeper insights.