Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
Samantha_Jayne
Alteryx
Alteryx

Alteryx + Snowflake series (5).png

 

Introduction

 

The previous two blogs in this series have been about loading data into snowflake and bulk updating data in Snowflake using a MERGE statement.

 

My fellow Sales Engineers would say, so what? Why did we load spatial data into Snowflake?

 

Well we have loaded spatial data into Snowflake, to show the art of the possible but also to set you up, my dear reader, to try out spatial In-DB queries for yourself.

 

image002.gif

Source: imgur

 

This blog follows the previous one on how to ulitise spatial data in Snowflake.

 

We will use the In-DB Tools, so we can really harness the power of your databases and extract the detail we need from the start of your workflows.

 

For example, for all stores within a particular county, we can calculate how many stores are within an area and find your nearest stores. These are all great examples of when spatial querying is used within an organisation. Let’s do this!

 

image003.gif

Source: gfycat

 

How many stores are within each county?

 

Resource: https://docs.snowflake.com/en/sql-reference/functions/st_within.html

 

Let’s walk this through… we could bring in two datasets and do a spatial match tool and get the answer we want but remember spatial data is large and power/memory intensive.

 

My alternative option here is to run a WITHIN query on the database itself using the In-DB tools. It would be much faster and lighter for your production database to do this rather than you try and bring down the dataset each time, especially if you were then to filter on the specific county you wanted rather than the whole country.

 

Here is a simple WITHIN query workflow for the whole country to start with.

 

image004.png

INDB Within.PNG

 

Example Spatial SQL Within Clause:

 

Select s.ID, s.G STORE_LOCATION, c.ID county_id, c.G COUNTY_BOUNDARY 
from "Database"."Schema"."STORES" s, "Database"."Schema"."COUNTY" c
where ST_WITHIN(s."G", c."G") = 1;

 

Using SQL, we are selecting the fields from the tables s (stores) and c (county) we would like. Then within the WHERE clause, we use the function ST_Within(point, polygon)=1.

 

Basically, we ask Snowflake to return only those which are within a county boundary. If you had points outside of any boundaries, they would not be returned.

 

image006.png

Results: all my stores with their corresponding county boundary 😊

 

Map of point in polygon indb results.png

 

How many stores are within the area of Southampton?

 

Resource: https://docs.snowflake.com/en/sql-reference/functions/st_intersects.html

 

Another great example of Spatial querying is the use of a bounding box. For example, you may be looking at data within a specific location on a map, between four corners of a screen. 😉 And you want to see all the data that falls within that area.

 

This is when you would utilise a bounding box

 

There are many ways within Alteryx to get the coordinates of a bounding box—I can share this another time.

 

For now, let’s take Southampton:

 

image007.png

I would like to see all the stores that fall within this box/area that I am interested in.

 

Within Snowflake, this would be a ST_Intersects query, i.e., show me where on this plane (geometric) points intersect with it. Yes, we are treating this as 2D, not 3D.

 

Select * from "Database"."Schema"."STORES" s
Where ST_Intersects(s."G",
to_geography('POLYGON((-1.95682845068099 51.383915269452,-0.7293442833561 51.383915269452,-0.7293442833561 50.7061091658136,-1.95682845068099 50.7061091658136,-1.95682845068099 51.383915269452))'));

 

When you pop this code within your InDB tool, you get the results almost instantly.

 

image008.png

 

You receive 750 results within 3.7 seconds using AMP.

 

To do this traditionally in Alteryx, you would bring in the dataset and do a spatial match which means loading the entire dataset, doing the match within Alteryx, and then outputting the results. Here, you start with the results right at the start of your workflow to begin doing the deeper analysis. 😊

 

The time saving here on 17K points is around 5.9-3.7=2.2 seconds with AMP.

 

More than one table? No problem…

 

(Select * from "Database"."Schema"."STORES" s
Where ST_Intersects(s."G",
to_geography('POLYGON((-1.95682845068099 51.383915269452,-0.7293442833561 51.383915269452,-0.7293442833561 50.7061091658136,-1.95682845068099 50.7061091658136,-1.95682845068099 51.383915269452))')))
UNION
(Select * from "Database"."Schema"."COUNTY" c
Where ST_Intersects(c."G",
to_geography('POLYGON((-1.95682845068099 51.383915269452,-0.7293442833561 51.383915269452,-0.7293442833561 50.7061091658136,-1.95682845068099 50.7061091658136,-1.95682845068099 51.383915269452))')));

 

Wrap each one in brackets and stack them with a UNION clause. Keep the results simple here, i.e., only return details from each table where the columns align: name, description, spatial object, etc.

 

What about distance? Can you tell me my nearest stores?

 

Can Snowflake do that too? Yes, it can.

 

Resource: https://docs.snowflake.com/en/sql-reference/functions/st_distance.html

 

Quick use case, please tell me the top 5 stores closest to this point: -1.343086 51.045012

GeoJSON format: {"type":"Point", "coordinates":[-1.34308636701855,51.0450122176328]}

 

image009.png

 

INDB Distance.PNG

 

  

Here’s the spatial SQL to find the distance between the origin and the destinations:

 

Select
s."ID",
'{"type":"Point", "coordinates":[-1.34308636701855,51.0450122176328]}' as origin,
s."G" as destination,
ST_DISTANCE(to_geography('POINT(-1.34308636701855 51.0450122176328)'),s."G")/1000 as kilometers,
ST_DISTANCE(to_geography('POINT(-1.34308636701855 51.0450122176328)'),s."G")/1609 as miles
from "Database"."Schema"."STORES" s
order by
ST_DISTANCE(to_geography('POINT(-1.34308636701855 51.0450122176328)'),s."G")/1000;

 

image011.png

 

Want the top 5? No problem – just add a limit clause at the end of your SQL statement, in this case LIMIT 5;.

 

Select
s."ID",
'{"type":"Point", "coordinates":[-1.34308636701855,51.0450122176328]}' as origin,
s."G" as destination,
ST_DISTANCE(to_geography('POINT(-1.34308636701855 51.0450122176328)'),s."G")/1000 as kilometers,
ST_DISTANCE(to_geography('POINT(-1.34308636701855 51.0450122176328)'),s."G")/1609 as miles
from "Database"."Schema"."STORES" s
order by
ST_DISTANCE(to_geography('POINT(-1.34308636701855 51.0450122176328)'),s."G")/1000
LIMIT 5;

 

It took 6.2 seconds to calculate the distance between the location provided and all the stores within the stores table (~17K) but what is great is that we have all origins and destinations alongside their distances in both kilometers and miles. 

 

That’s a wrap!

 

We hope you have enjoyed this mini-series using Alteryx, Snowflake & Spatial Methods.

  • We have learned how to bulk update in Snowflake using Merge functions and the bulk update within Alteryx.
  • We also have ultised the merge function to push up spatial data into Snowflake – vital for our customers to be able to do this at speed.
  • And finally, we have learned how to apply Snowflake spatial functions to that data using In-DB tools within Alteryx.

Did you enjoy this series? Did it help you within your organisation? Please let us know your use cases and how you applied these techniques.

 

WANT MORE: We would love to know what you would like to learn more of next, please let us know in the comments.

 

Samantha Clifton
CGeog | ACE Emeritus | Sales Engineering

Samantha is well known for sharing her knowledge and evangelising Alteryx with new users both in her role as a Sales Engineer and at INSPIRE. Her passion lies with making every day business chaos, simple and automated giving people hours back in their day jobs and personal lives. With over 15 years experience in data, she enjoys solving puzzles, spatial analytics, writing blogs and helping others develop themselves into gurus like herself.

Samantha is well known for sharing her knowledge and evangelising Alteryx with new users both in her role as a Sales Engineer and at INSPIRE. Her passion lies with making every day business chaos, simple and automated giving people hours back in their day jobs and personal lives. With over 15 years experience in data, she enjoys solving puzzles, spatial analytics, writing blogs and helping others develop themselves into gurus like herself.