Who

We are a team of data scientists tackling the issue of predicting roads and backgrounds from satellite imagery.

What

We observe that roads and terrain are everchanging, and understand that some places may find it difficult to update their maps frequently.

Why

We believe that no matter where one is the world, they deserve to have access to readable and accurate maps, and seek an efficient solution.

Background

Satellite imagery is ...

Satellite imagery has become inextricably intertwined with our lives, especially in first-world countries as we leverage it for many key functions in life such as GPS navigation.

But what about countries that are less technologically advanced and require extra assistance in mapping out their roads?

Problem Statement

Road segmentation is ...

Segmentation is where we separate and cluster elements of importance from an image. Another name for this is pixel-level classification - where we assign labels to all pixels present in the image. In our case, we will only be needing to identify proper roads from mud roads, flat expanses, and the like, then drawing out the roads.

In many developing countries, roads are not easily accessible or recognizable. Maps are also hard to find and this limitation can affect response activities during natural disasters.

We built a road segmentation model that will help assist in predicting roads from satellite imagery. The intent is for non-profits and rescue teams to use this model to identify roads and provide rescue teams with access to data so they can reach populations in need.

Data

High Quality Satellite Images and Road Maps

We utilized the DeepGlobe Road Extraction dataset, which was part of a DeepGlobe Challenge held in 2018. This dataset consists of a total of 6226 satellite imagery files in RGB, with dimensions of 1024 x 1024 pixels. The images were collected from DigitalGlobe's satellite in areas of Thailand, Indonesia, and India at a pixel resolution of 50cm.

Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D., & Raskar, R. (2018). DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images. ArXiv. https://doi.org/10.48550/ARXIV.1805.06561

1

TRAINING IMAGES AND MASKS

1

VALIDATION IMAGES

1

TEST IMAGES

1

PIXEL RESOLUTION

OUR APPROACH

Convolutional Neural Networks

Convolutional neural networks(CNNs) are perfectly suited for visual imagery analysis. A benefit of these is that they require less preprocessing before training, and thus demonstrates an example of unsupervised feature learning.

Exploratory Data Analysis

To validate our model, we split the training images and masks by a 80/20 ratio, as the validation and test images provided by the dataset did not have ground truth masks.

Data Hosting

Our code is hosted on Github.io, with work done on the AWS cloud service, each working through SSH on parts of the model.

Compute Power

We leveraged AWS' EC2 instances to power and train our models. One of our initial runs nearly took 10 hours! Then we switched instances and cut the duration down to 5 hours for 25 epochs.

OUR MODEL(S)

UNet, DeepLabV3+, PAN

We initially started out with three models to test, and found that Unet yielded the best results with greater efficiency.

UNet

Initially created as a bio-medical image model, Unet uses an encoder path that captures the context of images, then its decoder learns the discriminative features.

This was the final model we utilized after observing each model's performance.

Best performing model in terms of loss, accuracy, and epoch duration

DeepLabV3+

Part of an open source suite from Google, DeepLabV3+ uses atrous convolution to upsample the output of the last convolution layer and computes pixel-wise loss, helping with dense prediction. This enhances the segmentation model by improving its transformation modeling capacity, and enables larger output feature maps.

Second best performing model in terms of loss

Third best performing model in terms of epoch duration

PAN

Built upon a pretrained ResNet-101 model, Pyramid Attention Network(PAN) has a feature pyramid attention (FPA) module, which essentially combines multiple convolutional layers into one, incorporating larger and smaller features into the same output. Global Attention Upsample(GAUs) allow the model to focus on specific features by ignoring (or gating-out) irrelevant information.

Third best performing model in terms of loss

Second best performing model in terms of epoch duration

1st model output from Unet, DeepLabV3+, and PAN

Outputs

Predicted Roads: Urban

Example outputs from all three models, ranging in accuracy on different densities of roads in a more dense, urban landscape.

Image: Unet, DeepLabV3+, and PAN outputs from top to bottom

2nd model output from Unet, DeepLabV3+, and PAN

Outputs

Predicted Roads: Rural

Example output from all three models, ranging in accuracy on different densities of roads in a rural landscape.

Image: Unet, DeepLabV3+, and PAN outputs from top to bottom

OUR BEST MODEL

UNet

Our best model was a Unet model run across 25 epochs with a 100 batch size and 0.003 learning rate. There is a large improvement over the initial model as seen below.

Screenshot of Unet model comparison

Best Model Outputs

Predicted Roads

Example outputs from our best model

Image: 3 rows of Unet batch 100 model outputs

Below are our losses and IoU(Intersection of Union, used as a substitute for accuracy due to unbalanced classes of road and background if we used pixel-wise accuracy), as well as a confusion matrix.

Screenshot of Unet model loss