Project

Deep Dive in Geospatial Analysis: A Project for Data Analysts

An in-depth, systematic learning project designed to master various techniques of geospatial analysis for data analysts.

Empty image or helper icon

Deep Dive in Geospatial Analysis: A Project for Data Analysts

Description

The project delves into a wide range of techniques and variations used in geospatial analysis. The aim is to provide a comprehensive understanding of geospatial data manipulation, spatial analysis, mapping and visualization, model development, and problem-solving. Through a series of 12 independent curriculum units, learners will gain the ability to conduct sophisticated spatial analysis, enriching their data analyst skills.

Introduction to Geospatial Analysis

This implementation is focused on the first part of your project, which is to get acquainted with Geospatial Analysis. We would be introducing geospatial data handling using Python. In particular, we will leverage geopandas to read and manipulate geospatial data, and matplotlib for visualization.

We assume that Python3 is already installed.

Setup environment and libraries

Before we start our analysis, we need to set up the environment:

We are going to use Jupyter notebook for this analysis.

To install Jupyter notebook:

pip install jupyter

You can run the notebook by typing:

jupyter notebook

Next step is to install the required Python Libraries. We will be using geopandas, matplotlib and shapely.

These can be installed using pip:

pip install geopandas matplotlib shapely

If you encounter any error while installing geopandas, consider installing gdal library before geopandas as it's one of its dependencies:

pip install gdal
pip install geopandas matplotlib shapely

Fetch and load the data

First, import the libraries and load your data file. In this example, we will use a shapefile that contains polygons for different regions.

import geopandas as gpd
import matplotlib.pyplot as plt

# Load the data
gdf = gpd.read_file('regions.shp')

Visualize the data

Next, we will visualize the data:

gdf.plot()
plt.show()

Basic exploration

You can explore your geographical data frame. For example, you can list first few rows with:

gdf.head()

or get shape information:

gdf.shape

These are some basic steps to start with Geospatial Analysis in Python. To perform more complex operations like calculating areas, distances or intersections, you will need to understand the geometry in your data. One of the most common libraries working with geometries is Shapely.

With the provided implementation, you are now setup to delve deeper into geospatial analysis.

References

  1. Geopandas documentation: https://geopandas.org/
  2. Matplotlib documentation: https://matplotlib.org/
  3. Shapely documentation: https://shapely.readthedocs.io/en/stable/

Spatial Data Manipulation Techniques

In part 1, we learned the basics of geospatial analysis. Now let's dive into the topic of spatial data manipulation. We'll use Python to manipulate geographical and spatial data. The main libraries we'll use are: GeoPandas, Shapely and Pyproj.

Before we start, please make sure to have your environment ready for running GeoPandas and related libraries.

Importing Libraries

As a first step, let's import the necessary libraries:

import geopandas as gpd
from shapely.geometry import Point, LineString, Polygon
import pyproj

Loading Spatial Data

Next, we load our spatial data (stored in a GeoJson file) using GeoPandas, which will be stored in a GeoDataFrame:

gdf = gpd.read_file('path_to_your_geojson_file.geojson')

You can inspect the first few rows of your GeodataFrame using gdf.head().

Manipulation Techniques

1. Projection

To project our data to another CRS (Coordinate Reference System), we can use the to_crs method. For example, to project to the World Mercator (EPSG 3395), you can do:

gdf_projected = gdf.to_crs(epsg=3395)

2. Geometric Transformations

With Shapely, we can perform various geometric transformations:

2.1 Translation

To translate (move) a Point, we don't need a special function, we can simply add or subtract from its coordinates. Let's move a point 10 units to the right and 5 units up:

p = Point(0, 0)
p_translated = Point(p.x + 10, p.y + 5)

For GeoDataFrame, we can use the translate method:

gdf_translated = gdf.translate(xoff=10.0, yoff=5.0)

2.2 Rotation

To rotate a shape around a point, we can use the rotate method in Shapely. For example, to rotate a point around the origin (0,0) by 45 degrees:

p = Point(1, 0)
p_rotated = shapely.affinity.rotate(p, 45)

For GeoDataFrame, we can use the rotate method:

gdf_rotated = gdf.rotate(45, origin='center') 

2.3 Scaling

To scale a shape, we can use the scale method in Shapely. To scale a point:

p = Point(1, 1)
p_scaled = shapely.affinity.scale(p, xfact=2.0, yfact=2.0, origin=(0,0))

For GeoDataFrame:

gdf_scaled = gdf.scale(xfact=2.0, yfact=2.0, origin='center') 

3. Geometric Operations

GeoPandas provides several methods to perform geometric operations on geometries, such as union, intersection, difference etc.

3.1 Union

Let's create two circles and perform a Union operation:

c1 = Point(0,0).buffer(10)
c2 = Point(10,0).buffer(10)
union = c1.union(c2)

3.2 Intersection

We can find the intersection of two geometries as follows:

intersect = c1.intersection(c2)

3.3 Difference

The difference operation can be performed as follows:

diff = c1.difference(c2)

We can use gpd.overlay(gdf1, gdf2, how='union') for union, gpd.overlay(gdf1, gdf2, how='intersection') for intersection and gpd.overlay(gdf1, gdf2, how='difference') for difference operations for GeoDataFrames.

This is a brief overview of Spatial Data Manipulation Techniques and demonstrates how we can perform manipulations on the geometry of spatial data. Please replace the 'path_to_your_geojson_file.geojson' with your actual file path and adjust parameters to suit your data and requirements. Try these on your data and you should get desired results.

Task: Mapping & Visualization in Geospatial Analysis

Our task is to visualize geographical data for better understanding and insights. Here, I'll demonstrate how you can plot geographical data in Python using Geopandas and Matplotlib libraries.

1. Import Necessary Libraries

import geopandas as gpd
import matplotlib.pyplot as plt

2. Load the GeoDataFrame

Assuming you have '.shp' file (let's assume name is 'world.shp') of a world map.

world = gpd.read_file('world.shp')

3. Plot the GeoDataFrame

world.plot()
plt.show()

This will plot a simple plain world map.

4. Customize the GeoDataFrame Plot

Plotting with feature highlights: coloring the map based on a particular feature(column). Let's assume the 'POPULATION' column is available, each country can be colored based on the population.

world.plot(column='POPULATION', cmap='coolwarm')
plt.show()

You can explore different color maps provided by Matplotlib using this link

5. Overlaying Features on Maps

Assuming you have points of interest that you'd like to overlay on your map, the points also should be in GeoDataFrame format.

# Overlay POI over the world map.
fig, ax = plt.subplots()
world.plot(ax=ax, color='white', edgecolor='black')
# Assuming pois is your points GeoDataFrame
pois.plot(ax=ax, markersize=20, color='red', marker='o')
plt.show()

6. Interactive Visualisation using Folium

Folium is a powerful Python library that helps you create several types of Leaflet maps.

import folium

# INITIALIZE FOLIUM MAP WITH START LOCATION
map = folium.Map(location=[40.693943, -73.985880])

# MARKER 
folium.Marker(
    location=[40.693943, -73.985880], 
    popup='NYC', 
    icon=folium.Icon(icon='cloud')).add_to(map)

# CIRCLE 
folium.Circle(
    radius=100,
    location=[40.6924, -73.9866], 
    popup='NYC', 
    color='crimson', 
    fill=False).add_to(map)

map

This will generate an interactive map centered around the co-ordinates provided showing a marker and a circle.

In case you want to plot heat maps or choropleth maps you could use the folium's built-in heat map and choropleth utilities. Please, make sure that your data format suits the folium's data format demands.

This is a basic interpretation, you can play around with parameters like color, markersize, etc. based on your requirements.

Implementation of Statistics and Machine Learning in Geospatial Analysis

After having an understanding of Geospatial Analysis, Spatial Data Manipulation Techniques, and Mapping & Visualization in Geospatial Analysis, we move forward to apply the principles of statistics and machine learning in geospatial analysis. Here, we will focus on Python libraries like Geopandas, Scikit-learn, and Statsmodels.

In this implementation, we'll use a simple case study, predicting house prices based on locational and non-locational features. This is a common real-world scenario where geospatial data is critical.

Step 1: Load and Preprocess Data

The hypothesis of this case study is, The price of houses might depend on location as well as features like floor space or number of rooms.

We'll start by loading the modules, the data, and preparing the data for analyzing. Here we will be using 'houses.csv', which contains locational data (latitude, longitude) and non-locational data (rooms, area).

import geopandas as gpd
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor

# Load the data
data = pd.read_csv('houses.csv')

# Drop houses with missing values
data = data.dropna()

# Turn data into a GeoDataFrame
gdf = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data.Longitude, data.Latitude))

# Remove outliers
Q1 = gdf.quantile(0.25)
Q3 = gdf.quantile(0.75)
IQR = Q3 - Q1

gdf = gdf[~((gdf < (Q1 - 1.5 * IQR)) |(gdf > (Q3 + 1.5 * IQR))).any(axis=1)]

The data is now loaded as a GeoDataFrame, and outliers have been removed using the interquartile range.

Step 2: Data Exploration

Here we can employ basic statistical methods to gain insights into our data. For instance, we will check correlation between variables.

# Get correlations
gdf.corr()

Step 3: Machine Learning

Once we have cleaned and explored our data, the next stage is to build a predictive model.

Training and Testing datasets

First, we need to split our data into training and testing datasets:

# Split the data into train and test datasets
X_train, X_test, Y_train, Y_test = train_test_split(gdf[['Longitude', 'Latitude', 'rooms', 'area']], \
                                                    gdf['price'], test_size=0.2, random_state=42)

Linear Regression Model

We start by fitting a simple linear regression model to our data. In Python, this can be done using the LinearRegression object from Scikit-learn.

# Create a linear regression model
model_LR = LinearRegression()

# Fit the model to the train data
model_LR.fit(X_train, Y_train)

# Predict prices for the test data
Y_pred = model_LR.predict(X_test)

# Calculate the mean squared error of the predictions
mse = mean_squared_error(Y_test, Y_pred)

Random Forest Regression Model

A more complex model we can use is the Random Forest.

# Create a Random Forest regression model
model_RF = RandomForestRegressor(random_state=42)

# Fit the model to the train data
model_RF.fit(X_train, Y_train)

# Predict prices for the test data
Y_pred_RF = model_RF.predict(X_test)

# Calculate the mean squared error of the predictions
mse_RF = mean_squared_error(Y_test, Y_pred_RF)

Conclusion:

In this implementation, we have performed Geospatial Analysis using Statistics and Machine Learning. We used a dataset containing house prices along with locational and non-locational features, cleaned and preprocessed it, performed basic statistical analyses, and finally, built and evaluated Linear Regression and Random Forest models to predict house prices. The concepts shown here are fundamental to Geospatial Analysis involving Statistical and Machine Learning techniques.

Note: The given code serves as a fundamental guideline. The models might need better preprocessing, feature engineering, model tuning etc. for better prediction in actual scenarios. Moreover, the performance & success of the ML models like the ones used in this example (Linear Regression, Random Forest) greatly depend on the nature, quality and specifics of the data used.

Analysis of Geospatial Raster Data

Geospatial Raster data is a type of data that represents a matrix of cells (or pixels) organized into rows and columns (or a grid) each with a specific value.

In this practical implementation, we will load a raster data file, perform an exploratory analysis, extract values from the raster data for locations of interest, perform calculations such as computing the min, max and mean value of the raster, and finally visualize the raster data.

1. Loading the Raster Data

We will use the raster package in R to load the raster data.

# Load the required library
library(raster)

# Load a raster data file
r <- raster("path-to-your-raster-file")

2. Exploratory Analysis

Before performing any analysis, it's always beneficial to explore the data first.

# Get basic information about the raster
print(r)

# Check if the raster data has missing values
is.na(r)

3. Extracting Values from the Raster

Extract method can be used to extract values from the raster.

# Extract values from raster for locations
locs <- cbind(lon=-79.417, lat=43.7)  # Coordinates of Toronto (example)
vals <- extract(r, locs)

4. Calculations

Calculations can be performed on the raster data directly.

# Calculate minimum, maximum, and mean of the raster values
min_val <- cellStats(r, stat = "min")
max_val <- cellStats(r, stat = "max")
mean_val <- cellStats(r, stat = "mean")

5. Visualizing Raster Data

Visualizing the raster data can be done with the spplot function.

# Visualize the raster data
spplot(r, main="Raster Plot")

This ends the analysis of geospatial raster data. By using this small R program, you should be able to load, explore, perform calculations and visualize your own geospatial raster data.

Remember to replace "path-to-your-raster-file" with the path to the raster file you want to analyze and replace the coordinates in locs with the longitude and latitude of the location of interest.

Analysis of Geospatial Vector Data using Python

In this section, we will perform an analysis on Geospatial Vector Data with Python. Geospatial Vector Data consists of geometric objects such as points, lines, and polygons. And they can represent a wide variety of things in the real world like location of cities, roads, or areas of states etc.

We'll be using Python libraries which includes pandas for data manipulation, geopandas for handling geospatial data, shapely for geometric operations, and matplotlib for plotting.

Reading the Geospatial data

Our first step is to import the necessary libraries and read our GeoJSON file using geopandas.

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt

gdf = gpd.read_file("geo_data.json")

Filtering Data Based on Geospatial Characteristics

Assume we want to filter out certain areas, let's suppose we only want to work with areas that have an AREA value greater than 0.5.

gdf_filtered = gdf[gdf['AREA'] > 0.5]

Calculating Area and Length

We can calculate the area and length (perimeter) of the polygons as follows:

gdf['Area'] = gdf.geometry.area
gdf['Length'] = gdf.geometry.length

Geospatial Join

We can use geospatial join to find all points that lie within a polygon. It is a common GIS operation to use the spatial relationship between layers.

Assuming we have another GeoDataFrame of points, we can carry out a spatial join operation to find out all points that lie within a polygon.

# Create a GeoDataFrame of points
points_dict = {"Longitude": [30, 50, 70, 90], "Latitude": [15, 45, 30, 10]}
df_points = pd.DataFrame(points_dict)
df_points['Coordinates']  = list(zip(df_points.Longitude, df_points.Latitude))
df_points['Coordinates'] = df_points['Coordinates'].apply(Point)
gdf_points = gpd.GeoDataFrame(df_points, geometry='Coordinates')

# Perform spatial join
joined = gpd.sjoin(gdf_points, gdf, op="within")

Merging Geospatial and Non-Geospatial Data

If there is a separate DataFrame that has same identifiers as in GeoDataFrame, we can merge the two into a single GeoDataFrame.

# Assuming the other DataFrame is 'data'
merged = gdf.merge(data, on='id', how='left')

Plotting

To represent the data graphically, the GeoDataFrame can be plotted using matplotlib.

fig, ax = plt.subplots(1,1)
gdf.plot(column='Population', scheme='quantiles', legend=True, ax=ax)
ax.set_axis_off()
plt.show()

This is a very basic level of geospatial data analysis with geospatial vector data. The data processing part depends mainly on the properties and complexity of the data. The Python geospatial analysis libraries provide a wide range of functionalities to handle more advanced use cases.

Geospatial Model Development Techniques

This section will cover the practical implementation of geospatial model development techniques, specifically focusing on the use of geostatistical models and spatial regression. We will use Python's libraries: GeoPandas, rasterio, scikit-learn, and PySAL for our implementation.

Import Libraries and Load Data

import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import rasterio
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import r2_score
import pysal


# load your spatial data (substitute your_file_location by the appropriate path)
gdf = gpd.read_file("your_file_location")

Geostatistics

Geostatistics uses statistical models that account for the spatial correlation. The common geostatistical methods include variogram modeling & kriging.

Variogram Modeling

Variogram function measures the spatial dependence of a data-set. We will use PySAL library for this purpose.

import pysal.lib
from pysal.model import spreg
from pysal.explore import esda

# Calculate a measure and create lagged measure
gdf['measure_lag'] = pysal.lib.weights.lag_spatial(pysal.lib.weights.Queen.from_dataframe(gdf), gdf['measure'])

# Calculate local Moran's I
gdf['moran'] = esda.Moran_Local(gdf['measure'], pysal.lib.weights.Queen.from_dataframe(gdf)).Is

# Plot both the measures
fig, ax = plt.subplots(1, 1)
gdf.plot(column='measure', ax=ax, legend=True)
plt.show()

fig, ax = plt.subplots(1, 1)
gdf.plot(column='moran', ax=ax, legend=True)
plt.show()

Spatial Regression

Spatial regression models extend traditional regression models by accounting for the spatial correlation. PySAL library allows us to run spatial autoregressive models.

y = gdf['dep_var'].values
x = gdf['indep_var'].values
x = np.reshape(x, (-1, 1)) # Reshape if your data is not in the desired shape

# Split the data into training/testing sets
x_train = x[:-20]
x_test = x[-20:]

# Split the targets into training/testing sets
y_train = y[:-20]
y_test = y[-20:]

# Create spatial weight matrix
w = pysal.lib.weights.Queen.from_dataframe(gdf)
w.transform = 'r'

# Create spatial lag of y
ylag = pysal.lib.weights.lag_spatial(w, y_train)

# Create Spatial Lag model
m1 = pysal.model.spreg.OLS(ylag,x_train, w=w, spat_diag=True, moran=True)
m1.summary

The above code illustrates the simple ordinary least squares model. Depending on the nature of your data and the problem at hand, there are various other regression models available in PySAL, like GM_Lag, GM_Error etc.

This practical implementation on geospatial model development should serve as a starting point to build upon more complex geospatial analysis.

Spatial Correlation and Regression Practical Implementation

For this case, we'll use Python with libraries: Pandas for data processing, Geopandas for spatial data analysis, and PySAL for spatial statistics.

Let's assume you have a Geopandas data frame named 'gdf' with a column 'variable1' for which you want to analyze spatial correlation and regression.

1. Check for Spatial Autocorrelation - Moran's I

Moran's I is a measure of global spatial autocorrelation that compares a feature's value with the values of its neighbors.

import pysal as ps

# Create a queen's case spatial weights
w = ps.lib.weights.Queen.from_dataframe(gdf)

# Row-standardize the weights
w.transform = 'r'

# Execute Moran's I
mi = ps.explore.esda.Moran(gdf['variable1'], w)

# Print Moran's I results
print('Moran\'s I: ', mi.I)
print('Expected I under random conditions: ', mi.EI)
print('p-value: ', mi.p_sim)

2. Local Indicators of Spatial Autocorrelation (LISA) - Hot Spots, Cold Spots, and Spatial Outliers

LISA provides indications of local spatial autocorrelation. A hotspot is a location with a high value and is surrounded by other locations with high values.

# Execute Moran's I at multiple scales
local_mi = ps.explore.esda.Moran_Local(gdf['variable1'], w)

# Add LISA cluster map
gdf['local_mi'] = local_mi.Is
gdf['local_mi_p_sim'] = local_mi.p_sim
gdf['local_mi_q'] = local_mi.q

Plot the data for visualization.

import matplotlib.pyplot as plt

# Plot local Moran's I result
fig, ax = plt.subplots(figsize=(10,10))
gdf.plot(ax=ax, column='local_mi', scheme='Quantiles', legend=True, linewidth=0.1);

3. Spatial Regression

Here we will perform a spatial lag model, which is a spatial regression that includes a new variable, calculated as the average value of the dependent variable in the neighboring geometries.

First, standardize your variables.

gdf['variable1_std'] = (gdf['variable1'] - gdf['variable1'].mean()) / gdf['variable1'].std()

Define dependent and independent variables.

y = gdf['variable1_std']
X = gdf[['ind_var1', 'ind_var2']]

Execute the spatial lag model.

# Define spatial lag model
m1 = ps.model.spreg.OLS(y.values, X.values, w=w, spat_diag=True, moran=True, name_y='variable1', name_x=['ind_var1', 'ind_var2'])

# Print results
print(m1.summary)

The above steps constitute a basic workflow for a Spatial Correlation and Regression analysis. Remember to interpret the results carefully, consider including more variables or using different weighting schemes if necessary. The implementation could vary based on the characteristics of your data and requirements of your analysis.

Practical Implementation of Remote Sensing and Image Analysis

In this implementation, we are going to work with a machine learning model specifically designed for remote sensing techniques and image analysis: The Convolutional Neural Network (CNN).

You can use any programming language of your preference to implement the CNN. However, this guide will use Python because of its simplicity and its compatibility with libraries like TensorFlow and Keras, which are helpful in image analysis and machine learning tasks.

For this example, we will use the SatImg dataset. It is a satellite image containing mixed land-use land cover. Each pixel in the image has been classified into one of nine land use land cover classes. But, you can also use your own data if you've got.

Now, as we are set, let's get started.

Import necessary libraries

Import the necessary packages and modules for the task.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import Adam

Load and Preprocess the Dataset

Load your dataset, in our case SatImg, pre-process and split it into training and testing data sets.

# Load Dataset
df = pd.read_csv('./SatImg.csv')

# Define predictors and target
X = df.drop('Target', axis = 1)
y = df['Target']

# Normalize (Preprocessing)
sc = StandardScaler()
X = sc.fit_transform(X)

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

# Data Reshaping
X_train = X_train.reshape(-1, 28,28,1)
X_test = X_test.reshape(-1, 28,28,1)

# Convert target to categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Building CNN Model

Now, to perform image analysis, we're building a Convolutional Neural Network (CNN) and compiling the model using Adam optimizer.

# Initialize model
model = Sequential()

# Add Convolution layer
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))

# Add MaxPooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))

# Add another Convolution layer
model.add(Conv2D(64, (3, 3), activation='relu'))

# Add MaxPooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))

# Flattening
model.add(Flatten())

# Full connection
model.add(Dense(128, activation='relu'))

# Add Dropout to prevent overfitting
model.add(Dropout(0.5))

# Output layer
model.add(Dense(10, activation='softmax'))

# Compile model
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Model training
model.fit(X_train, y_train, epochs=5, batch_size=128)

Next, we are testing our model on the test dataset to see how well it can predict the results.

# Model evaluation
score = model.evaluate(X_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

And done. This model can be used to analyse the input satellite imagery data. The CNN model uses the pixel data to classify the land use and land cover classes.

Hopefully, this practical implementation will give you a better understanding of how to use Machine Learning models to analyse satellite images and perform geospatial analysis. It's also important to mention that there are many other Machine Learning models that can be used for the same task, and the selection of the model would depend on the specific requirements of the project.

Geospatial Network Analysis

GeoSpatial Network Analysis involves assessing the spatial organization of networks. The use of network theory and its techniques provide a quantitative means to understand and emulate spatially embedded networks.

We will use Python and some of its libraries for this implementation. The libraries are NetworkX, OSMnx, and Geopandas. NetworkX is for creating, manipulating, and studying the structure, dynamics, and functions of complex networks. OSMnx is built on top of NetworkX, Matplotlib, and Geopandas for downloading spatial geometries and constructing, projecting, visualizing, and analyzing complex street networks from OpenStreetMap. Geopandas is for handling geometric operations.

Please, make sure you have these libraries installed in your working environment.

Importing Libraries

This is a hands-on implementation. So, it's important to import the necessary libraries first.

import networkx as nx
import osmnx as ox 
import geopandas as gpd
import matplotlib.pyplot as plt

Getting the Spatial Network Data from Open Street Map (OSM)

Here, we'd retrieve the transportation network (e.g., roads, streets) of a place(area). For this implementation, we're considering the streets within Barcelona, Spain, as an example.

# Specify the name that is used to seach for the data
place_name = "Barcelona, Spain"

# Fetch OSM street network from the location
graph = ox.graph_from_place(place_name)

Visualizing the Road Network Graph

You'd have got a network graph object from the streets of Barcelona, Spain. Visualize it using the plot_graph() function.

# Plot the streets
fig, ax = ox.plot_graph(graph)

You'll get a plot representing the road network of Barcelona, Spain.

Converting the Network Graph into Geodataframes

The road network graph needs to be converted into nodes and edges (Geopandas GeoDataFrames). Nodes are the junctions and edges are the roads connecting the junctions.

# Convert the graph into GeoDataFrames 
nodes, edges = ox.graph_to_gdfs(graph)

You'll get two geodataframes, nodes and edges which can be observed like normal pandas dataframes.

print(edges.head())
print(nodes.head())

Calculating Network Efficiency

Many spatial measurements are involved in the analysis of Geospatial Networks, one of which is measuring the network's efficiency. Efficiency involves determining how quickly you can navigate through the network. It is typically measured by taking the inverse of the sum of the shortest path distances.

# Calculate network efficiency
efficiency = nx.global_efficiency(graph)

print("The calculated value of the network efficiency: ", efficiency)

This gives you the efficiency of moving in this network.

There are many more metrics to try out included in NetworkX's algorithms documentation.

Creating a Shortest Path Route

Shortest path analysis is a common function of network analysis. Here, we are considering the shortest path between two random nodes.

# Choose the source and target nodes randomly.
source = list(graph)[0]
target = list(graph)[-1]

# Calculate the shortest path
shortest_path = nx.shortest_path(G=graph, source=source, target=target)

# Show the shortest path
fig, ax = ox.plot_graph_route(graph, shortest_path)

This will show you the shortest route in the network graph.

In conclusion, Geospatial Network Analysis is a broad topic, this is a simple implementation to give a taste of what can be achieved when working with Geospatial Network data. There are many more functions to explore, each serving different purpose as per the requirements of the analysis.

Spatial Data Infrastructure & Management

In this implementation, we will implement a spatial data infrastructure using PostGIS, which is an open-source software that allows GIS (Geographic Information System) objects to be stored in an SQL database, and GeoServer, a Java-based software server that allows users to view and edit geospatial data. Moreover, for the management of geospatial metadata, we will use GeoNetwork which is a catalog application to manage spatially referenced resources.

Section 1: Setting up a PostGIS database

PostGIS extends PostgreSQL with geometric types and functions, turning it into a spatial database management system.

To interact with PostgreSQL and PostGIS we'll use psycopg2 and SQLAlchemy.

Assuming a PostgreSQL service is already running, let's create a database and enable the Postgis extension on it. Let's also create a table and fill it with some spatial data:

import psycopg2
from psycopg2.sql import Identifier
from shapely.geometry import Point
from geoalchemy2 import Geometry, WKTElement

# Connect to your postgres DB
conn = psycopg2.connect("dbname=test user=postgres password=secret")

# Open a cursor to perform database operations
cur = conn.cursor()

# create a database
cur.execute("CREATE DATABASE geodb")

# connect to newly created database
conn = psycopg2.connect("dbname=geodb user=postgres password=secret")
cur = conn.cursor()

# enable PostGIS extension
cur.execute("CREATE EXTENSION postgis")

# create a table
cur.execute("CREATE TABLE test (id SERIAL PRIMARY KEY, geom GEOMETRY(Point, 4326))")

# create some data
points = [Point(0.0, 0.0), Point(1.0, 1.0), Point(-1.0, -1.0)]

# insert data into the table
for point in points:
    cur.execute("INSERT INTO test (geom) VALUES (ST_GeomFromText(%s, 4326))", (WKTElement(point.wkt, srid=4326),))

# commit changes
conn.commit()

# close connections
cur.close()
conn.close()

Section 2: Setting up GeoServer

Once we've stored our spatial data in PostGIS, we can serve it via WMS, WFS, or WCS requests with GeoServer.

First, let's create a workspace, add our PostGIS store to it, and publish a layer:

import requests
import json

# it's assumed GeoServer is running on localhost:8080
geoserver_url = "http://localhost:8080/geoserver/rest"

headers = {
    'Content-type': 'application/json',
}

# create a workspace
workspace_data = {
    "workspace": {
        "name": "geoworkspace"
    }
}
requests.post(f'{geoserver_url}/workspaces', headers=headers, data=json.dumps(workspace_data), auth=('admin', 'geoserver'))

# add our PostGIS store
store_data = {
    "dataStore": {
        "name": "geostore",
        "connectionParameters": {
            "entry": [
                {"@key":"dbtype", "$":"postgis"},
                {"@key":"host", "$":"localhost"},
                {"@key":"port", "$":"5432"},
                {"@key":"database", "$":"geodb"},
                {"@key":"user", "$":"postgres"},
                {"@key":"passwd", "$":"secret"},
                {"@key":"schema", "$":"public"}
            ]
        }
    }
}
requests.post(f'{geoserver_url}/workspaces/geoworkspace/datastores', headers=headers, data=json.dumps(store_data), auth=('admin', 'geoserver'))

# publish a layer
layer_data = {
    "featureType": {
        "name":"test",
    }
}
requests.post(f'{geoserver_url}/workspaces/geoworkspace/datastores/geodb/featuretypes', headers=headers, data=json.dumps(layer_data), auth=('admin', 'geoserver'))

Now, our spatial data are ready to be served.

Section 3: Setting up GeoNetwork for Geospatial Metadata Management

Assuming that we have previous knowledge or implementation on how to set up and run GeoNetwork.

  • Add a metadata entry:
import requests
import xml.etree.ElementTree as ET

geonetwork_url = "http://localhost:8080/geonetwork/srv/eng"

headers = {'Content-type': 'application/xml'}

# prepare xml data
data = ET.Element("request")
ET.SubElement(data, "name").text = "test"
ET.SubElement(data, "uuid").text = "uuid-test"
ET.SubElement(data, "schema").text = "iso19139"
xml_data = ET.tostring(data)

# add metadata
r = requests.post(f"{geonetwork_url}/metadata.insert", headers=headers, data=xml_data)
  • Get a metadata entry:
params = {"uuid": "uuid-test"}

r = requests.get(f"{geonetwork_url}/metadata.get", params=params)

metadata_xml = r.text  # this contains metadata in XML format

This implementation lets you have a foundational Spatial Data Infrastructure in place. You have a PostgreSQL + PostGIS setup to store and retrieve spatial data, a GeoServer setup to serve the stored spatial data and a GeoNetwork setup to manage the metadata of the spatial data.

Problem-solving with Geospatial Analysis

In this section, we will cover how to create a geospatial analysis solution for a specific problem. Considering your previous knowledge from the previous units, we are ready to conduct a practical implementation. Let's imagine that we have data related to property prices and the location given in latitude and longitude. Our task is to identify if there is a spatial clustering of high priced properties in any specific regions.

You should have input data in two formats:

  1. Property price dataset having columns: prop_id (unique property id), price (property's price) and corresponding geographical coordinates longitude and latitude.

  2. Shapefile which are used to define the boundaries of the city.

IMPORTANT: This practical implementation is done using Python and will need libraries such as geopandas, pyproj, matplotlib, seaborn, and sklearn.

Data Loading

import geopandas as gpd
import pandas as pd
import numpy as np
from sklearn.cluster import DBSCAN
from pyproj import CRS
import matplotlib.pyplot as plt
import seaborn as sns

# Load property price data
prop_data = pd.read_csv('property_prices.csv')
geoprops = gpd.GeoDataFrame(prop_data, geometry=gpd.points_from_xy(prop_data.longitude, prop_data.latitude))

# Load city shapefile
city = gpd.read_file('city.shp')

# Ensure that both datasets have the same Coordinate Reference System (CRS)
geoprops.crs = CRS('EPSG:4326')
city = city.to_crs(CRS('EPSG:4326'))

Data Preparation

We will be creating a new column that separates high priced properties from the rest. Let's consider properties which price is in the top quarter of all property prices to be high priced.

# Create a new binary column that identifies high priced properties
high_price_threshold = prop_data['price'].quantile(0.75)
geoprops['high_price'] = np.where(geoprops['price']>=high_price_threshold, 1, 0)

Geospatial Clustering

We will then apply DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering technique to find clusters of high-priced properties.

# Prepare the data to feed into the DBSCAN model
coords = geoprops.loc[geoprops['high_price']==1, 'geometry'].apply(lambda x: np.array(x)).to_list()

# Initialize and fit the DBSCAN model
db = DBSCAN(eps=0.01, min_samples=10).fit(coords)

# Create a new column in geoprops that specifies which cluster each property belongs to
geoprops.loc[geoprops['high_price']==1, 'cluster'] = db.labels_

Visualize the Clusters

Finally, let's visualize the clusters to understand the spatial patterns of high-priced properties.

fig, ax = plt.subplots(figsize=(10,10))
base = city.plot(ax=ax, color='lightgrey')
geoprops.plot(ax=base, column='cluster', legend=True, markersize=10)
plt.show()

In the final map, different clusters of high-priced properties are shown in different colors.

Please note that this analysis was conducted assuming a hypothetical scenario and the use of certain threshold and parameters. You would have to adjust these according to your data and desired outcomes.

This was an example of how you can combine the techniques you've learned in the different units to solve real-life problem using geospatial analysis.