Data Science

Machine learning & data science for beginners and experts alike.
SydneyF
Alteryx Alumni (Retired)

When there are missing values in a typical data set, you have a few options on how to handle them. You can create a new category for the missing values, you can remove the observations with missing values, or you can interpolate values for the missing observations.

 

But what about spatial data? What if you have a spatial data set for a continuous feature (e.g., annual rainfall), but that data set doesn't include a value for a point that you need. 

 

whatabouthere.png 

This is a very common scenario for spatial data, particularly for environmental phenomena that are captured with sensors (e.g., precipitation, temperature, elevation, mineral concentration samples, etc.). Setting up sampling sites can be expensive, and it is often unreasonable to cover every square inch of a study site with sensors.

 

Scenarios like this are where spatial interpolation comes in handy. Spatial interpolation is the process of using points with known data values to estimate values at other, unknown points. Spatial interpolation is only relevant when there can be a meaningful value at every possible point in your study area (i.e., average rainfall is perfect for interpolation, the location of volcanoes is not). Most spatial interpolation methods take point data and create a continuous raster surface of values.

 

interpolatedsurface.png

 

Spatial interpolation works because of Tobler’s first law of Geography, which states: “everything is related to everything else, but near things are more related than distant things.”

 

This might seem obvious, but in the world of Geography and GIS, it’s a big deal. Tobler’s first law allows us to make assumptions about how things are related in space and create meaningful spatial analysis. Without this law, there would be no spatial patterns to study, making Geography a pointless pursuit.

 

Tobler’s first law of Geography also implies the existence of spatial autocorrelation, which is a fundamental concept in the fields of GIS and spatial statistics. Autocorrelation (of any type) violates standard statistical techniques that assume independence among observations. However, the lack of independence between spatial points can be leveraged to perform a wide variety of spatial analysis.

 

Many different methods for spatial interpolation have been developed. The best method for your use case will depend on your data and application. Popular methods for spatial interpolation include inverse distance weighting (IDW), nearest neighbor, natural neighbortriangulated irregular network (TIN), splinekriging, and many others. In this post, we will review IDW; a deterministic interpolation method. As a side note, nearest neighbor, natural neighbor, and TIN are referenced in the blog post Voronoi (Thiessen) Polygons and Delaunay Triangles in Alteryx.

 

Inverse Distance Weighting (IDW)

 

IDW is one of the most straightforward methods for spatial interpolation. It is a deterministic (meaning no randomness is incorporated into estimates) method, based on the assumption that the value of an unsampled point can be estimated as the weighted average of values of points close to the unknown point. Weights are inversely proportional as a function of distance (i.e., further away points have a lower influence on the estimated value).

 

To improve processing time, it is not uncommon to limit the number of points that have an influence on an unknown point calculation with either a search radius or a numeric cutoff (known as a variable search radius, where only the x closest points are considered).

 

IDWSearchNeighborhood.png

 

Another specification for IDW is the power, which determines the distance decay function used to estimate the weights for points that are averaged to estimate the unknown value. Higher power values emphasize the influence of the points nearest to the unknown point, resulting in a more detailed and less smooth interpolated surface. A smaller power value gives more influence to distant points, and results in a more averaged and smoothed interpolated surface.

 

Source: http://planet.botany.uwc.ac.za/nisl/GIS/spatial/chap_1_32.htmSource: http://planet.botany.uwc.ac.za/nisl/GIS/spatial/chap_1_32.htm

  

Things to keep in mind about IDW are that it will not estimate points outside of sample range, that it will not reproduce the local shape suggested by data values and create local extrema at the measured data points. IDW is an exact interpolation method, meaning that it will create values exactly equal to the observed values at all measured locations, which can result in jagged contour line or bull's eye surfaces. IDW treats all points that fall within the search radius the same way.

  

IDW is best for point data that is relatively equally distributed throughout the study area, and dense. IDW assumes a constant (monotonic) trend related to distance and will not account for trends that occur within the data.

 

Spatial Interpolation in Alteryx

 

If you'd like to start dabbling with spatial interpolation yourself, feel free to use the IDW tool that @DrDan and I worked on together as an Alteryx Innovation Days project. To use the IDW tool, you need to provide a series of points with values for the phenomena you would like to interpolate. You can also provide a mask shapefile to filter your interpolated values to (e.g., a state boundary or the boundaries of the study area). Currently, the IDW tool produces two outputs: an image of the interpolated raster surface, and a series of points with the estimated values for the centroids of each of the raster cells from the interpolated surface.  

 

 2018-12-17_8-51-42.png

 

 

Output 1: a plot of the interpolated surfaceOutput 1: a plot of the interpolated surface

 

 

Output 2: the centroids and interpolated values of the raster gridOutput 2: the centroids and interpolated values of the raster grid

 

This tool is available for download here

 

Additional Resources

 

If you would like to continue to learn about spatial interpolation, here is a collection of resources to start you on our journey!

 

Sydney Firmin

A geographer by training and a data geek at heart, Sydney joined the Alteryx team as a Customer Support Engineer in 2017. She strongly believes that data and knowledge are most valuable when they can be clearly communicated and understood. She currently manages a team of data scientists that bring new innovations to the Alteryx Platform.

A geographer by training and a data geek at heart, Sydney joined the Alteryx team as a Customer Support Engineer in 2017. She strongly believes that data and knowledge are most valuable when they can be clearly communicated and understood. She currently manages a team of data scientists that bring new innovations to the Alteryx Platform.