Data Science

Machine learning & data science for beginners and experts alike.
SydneyF
Alteryx Alumni (Retired)

In the world of spatial analysis, there are two major varieties of data: vector and raster. The divide between these two data types and the people that use them has raged on for decades.

 

“Raster is faster!” One side would yell.

 

“Vector is corrector!” the other would retort.

 

This verbal jousting escalated to fist fights in the streets, elaborate traps staged in musty academic buildings, and eventually the cold-war-style standoff we have today.

 

At least I'm pretty sure that’s what I remember from my intro to GIS class.

 

Don’t worry, unless you’re an old-school geographer, you don’t need to choose. Although it is true that vector and raster data are both ways of representing spatial data, and most data can be represented as either vector or raster, the data types are very different, and each one can shine brighter than the other for different use cases and data sets.

 

Vector Data

 

Vector data represents the world with points, lines, and polygons. Vector data is stored as a list of coordinates that define vertices (points), and a set of rules that determine if and how the vertices are joined into lines or polygons. Vector data is most useful to represent spatial phenomena that has discrete boundaries, like county borders or streets.

 

Vector.png

 

Raster Data

 

Raster data represents the world as a continuous surface divided into a regular grid of cells (pixels), where each cell contains a value corresponding to the measured value for the area the cell represents. The spatial resolution of raster data is determined by the size of the cells it is comprised of (e.g., one cell in a raster map can represent a 10x10m area on the surface of the Earth). Raster data can be continuous (e.g., elevation or rainfall) or discrete (e.g., land use or vegetation type).

 

raster.png

 

There are many different pros and cons associated with each data type and selecting the appropriate data type for your use case or the phenomena you are trying to represent will depend on which best suits your needs.

 

Vector data tends to be more aesthetically pleasing, and as a result, most maps are created with vector data. Another major advantage is that vector data is not dependent on grid size. Vector data tends to be easier to register, re-project, and scale, which can make it more straightforward to use vector data from different data sources. Vector data can also allow for network analysis, where raster does not. 

 

One of the most significant disadvantages of vector data is that it does not effectively represent continuous data

 

Raster data, on the other hand, can be used to represent continuous or discrete data. Raster data is computationally less expensive than vector to render. Raster data also tends to lend itself to mathematical modeling and quantitative analysis due to its matrix-like format. An example of this is map algebra.

 

What is map algebra you say?

 

Let’s imagine we have a series of rasters covering the state of Colorado, and we are interested in identifying the best place to build a new resort.

We might have criteria such as being above 6,000 feet (because we like mountains), having a Southern exposure (so the snow melts off quickly), and being less than a 5% slope average, so that it’s not too tricky to build on, etc. We could transform our raster data so that each criterion is represented as a raster of 1’s and 0’s (1 = has feature, 0 = does not), and then add our raster layers together to find the cell that has the greatest number of requested features.

 

Least Cost Path Analysis from Map Analysis by Joseph K. BerryLeast Cost Path Analysis from Map Analysis by Joseph K. Berry

One of the major cons of raster data is that the cell size of a raster dictates spatial resolution/grain. It is also difficult to represent linear features or features with discrete boundaries with raster data, particularly depending on cell size. Because raster data has a value for every cell in the area it represents, it can require more storage space than vector data. Raster maps are also not always the most attractive (somewhat depending on cell size) so cartographers tend to stick to vector data.  

 

Raster Data in Alteryx 

 

Currently, Alteryx only natively supports vector data (for help getting started with vector data in Alteryx, please see this great course from the Alteryx Academy). However, it is possible to incorporate analysis with raster data in Alteryx using the code-friendly (R and Python) tools. 

 

Let’s work through a simple example using landcover data for the island of Maui and an R tool. In this exercise, I want to determine the best locations to find Nene on the Island of Maui using raster data.

 

If you aren’t familiar with them, Nene (also known as Hawaiian Goose) are a majestic and endangered species of geese endemic to the Hawaiian Islands. Although they used to have a greater range, today they are only found on Hawaii, Kauai, and Maui. Nene tend to nest on the slopes of volcanoes, beneath shrubs and surrounded by barren lava.

 

nene.png

 

To aid in our Nene hunt, I have three datasets I would like to use. I have a dataset that shows landcover for the island of Maui, a digital elevation model (DEM) for the island, and a major land resource areas shapefile which I converted to a raster using the rasterize() function.

 

As a first step, I load the raster, rgdal, and tiff packages into my R tool in Alteryx. These packages do not come installed with the Alteryx predictive tools, so before conducting this analysis I had to install them.

 

library(raster)
library(rgdal)
library(tiff)

 

Instead of bringing the raster data into Alteryx, where it is not supported, I am going to load my raster data directly into the R tool with the raster() and stack() functions, which can be used to both create and load raster data. The raster() brings in a single layer raster, and the stack() function brings in a multi-layered raster. 

 

Based on the ecological information I have and my data I am going to use the raster reclassify() function to engineer features that account for the following criteria. The reclassify function takes a raster data set and a matrix as inputs. The matrix provides a range of values to find in the raster, and the target value you would like them to be converted to. For each of my reclassified raster layers, a value of 1 indicates a Nene preferred feature, and 0 indicates less suitable habitat conditions.

 

1. Nene’s live anywhere from Sea level to 2400 m. 

 

To account for this condition, we can reclassify the DEM data so that all cells greater than 2400 m are equal to 0, and all cells less than this elevation are equal to 1.

 

To use the reclassify function, we need to first create a vector with the from, to, and becomes values, and then convert the vector to a matrix.

 

# elevation less than 2400 m
m <- c(0, 2400, 1, 2400, Inf, 0) # create vector with conversion ranges
remat <- matrix(m, ncol = 3, byrow = TRUE) # convert vector to three column matrix
elevation.rc <- reclassify(HI.Terrain[[1]], remat) # apply reclasification to raster

 

The first two lines of code results in a matrix that looks like this:

 

matrix1.png

 

Which can be applied with the reclassify function to the DEM raster:

 

 

ElevationMaui.png

 

 

 

To create this new raster layer that flags cells at Nene-approved elevations:

 

 

ElevationMask.png

 

 

2. Nene’s prefer herbaceous or shrubby environments (they will also live in lava fields, sand dunes, and golf courses). 

 

For this criteria, we can reclassify the landcover raster data so that only herbaceous or shrubby environments are classified as 1, and all other landcover types are set equal to 0. 

 

In this raster layer, categories of land cover are identified by integer values. To see what each integer value corresponds to, we consult the metadata. The metadata indicates that the values We are interested in are 8 - Grassland/Herbaceous,12- Scrub/Shrub, and 20 - Bare Land. 

 

For discrete data values we can create a two-column matrix for reclassification; value and becomes (e.g., 2 becomes 0, 8 becomes 1).

 

# grasslands, coastal dunes, lava planes, golf courses 8, 12, 20
m <- c(1, 0, 2, 0, 5, 0, 6, 0, 7, 0, 8, 1, 9, 0, 10, 0, 11, 0, 12, 1, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 
19, 0, 20, 1, 21, 0, 22, 0) remat <- matrix(m, ncol = 2, byrow = TRUE) # convert vector to two column matrix landcover.rc <- reclassify(HI.landcover, remat) # reclassify

 

This code converts our land cover raster:

 

landcover.png

 

To a filter for suitable environments for Nene nests:

 

 

nenelandcover.png 

 

3. Nene’s like to nest on volcanic slopes.

 

For the slope factor, we are going to select for a slope range between 2 and 20 percent (this was a little arbitrary). 

 

# select slopes between 2 and 20 percent
m <- c(0, 2, 0, 2, 20, 1, 20, 100, 0)
remat <- matrix(m, ncol = 3, byrow = TRUE)
slope.rc <- reclassify(HI.Terrain[[2]], remat)

slopemask.png

   

4. Nene's prefer to nest in areas surrounded by barren lava.

 

For this last factor, we are going to use our major land resource area data to select for exposed lava, setting all other categories to 0.

 

The categories for this dataset are listed in this table:

 

descriptions.png

 

Given the descriptions, we want to select for Lava Flows and Rock Outcrops, or category 5.

 

m <- c(1, 0, 2, 0, 3, 0, 4, 0, 5, 1, 6, 0, 7, 0, 8, 0, 9, 0, 10, 0)
remat <- matrix(m, ncol = 2, byrow = TRUE)
mjr.land.rc <- reclassify(HI.mjr.land, remat)

 

 lavaflows.png 

 

 

Now that our data is converted to a useful format for our question, we can perform some map algebra to combine it together and determine the most suitable locations to find Nene on Maui!

 

There are a lot of ways to execute map algebra in R, but the most intuitive it just to use a simple addition operator for each of the layers.

 

# find the best pixels to find nene
nene <- elevation.rc + landcover.rc + slope.rc + mjr.land.rc

 

We can create a raster plot of our final layer and put it out as an image.

 

#create graph output
AlteryxGraph(1)
#plot nene layer
plot(nene)
#make plot window invisible
invisible(dev.off())

alteryximage.png

And we can extract the centroids of the cells with the top and export those coordinates back out to Alteryx! For this use case, I aggregated the raster prior to finding the centroids so we didn't end up with an overwhelming amount of points.

 

#aggregate output raster
nene_agg <- aggregate(nene, fact=100, fun=modal)
#convert raster to points, selecting for points where all four criteria are met
p <- rasterToPoints(nene_agg, fun=function(x){x==4.0})
#write to alteryx as a dataframe
write.Alteryx(as.data.frame(p), 2)

 

alteryxOutput.png

 

Based on the map from the Cornell Lab of Ornithology, it looks like the points we found are promising!

 

cornellmap.jpg

 

Although this example is simple it demonstrates one way you can apply raster data to your spatial analysis. Another option might be to load point data into the R tool and extract corresponding values (like elevation or aspect) for those points in space from a raster. You could even leverage raster layers to generate a geospatial predictive model.

 

Truth be told, as a (wannabe) old-school geographer, raster data is first in my heart. If you aren’t familiar with it or haven’t thought about how you can apply it to your spatial analysis, I hope you are now! For more help getting started working with raster data in R, check out the NEON data skills spatial tutorials, Applied Spatial Analysis with R by Bivand et al., and the Introduction to visualizing spatial data in R tutorial from Lovelace et al.

Sydney Firmin

A geographer by training and a data geek at heart, Sydney joined the Alteryx team as a Customer Support Engineer in 2017. She strongly believes that data and knowledge are most valuable when they can be clearly communicated and understood. She currently manages a team of data scientists that bring new innovations to the Alteryx Platform.

A geographer by training and a data geek at heart, Sydney joined the Alteryx team as a Customer Support Engineer in 2017. She strongly believes that data and knowledge are most valuable when they can be clearly communicated and understood. She currently manages a team of data scientists that bring new innovations to the Alteryx Platform.

Comments