Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
Samantha_Jayne
Alteryx
Alteryx

Samantha_Jayne_0-1665008505790.png

 

If you have caught my previous blog post on bulk uploading data into Snowflake with speed, then you know I left you a little promise to share with you how to bulk upload spatial data to Snowflake. Using primary keys, you can also use this process to bulk update too! Happy days.

 

Before we jump into how to do this, we need to understand a couple of things about Snowflake and Spatial data.

 

Firstly, the good news: Snowflake can handle spatial geography. However, it must be presented in a particular way. Their documentation is exceptionally helpful around this. Here is a snippet, link below if needed.

 

Samantha_Jayne_1-1665008505804.png

Source: Snowflake Geography Documentation

 

The two most common ways would be WKT (Well Known Text) or GeoJSON; all the others are variations of the same.

 

I have created a macro for WKT that I will share on Gallery (Sam, you just keep on giving – you are welcome!) However, Alteryx is a wonderful tool because it handles spatial data in the form of GeoJSON within the tool itself. All you need to do is change the data type from SpatialObj to VW_String within the select tool, and you have exactly what you need—no conversions needed. Fantastic and simple! Believe it or not, easily forgettable, so bookmark this page now! 😀😀

 

Now, all we need is a dataset. I have again used Geolytix Retail Points to support me on my quest. We are simply going to:

  1. Input the data
  2. Create Points
  3. In the Select tool:
    1. Rename Centroid to g
    2. Change its data type to VW_String (reducing the string length to something reasonable – Snowflake doesn’t like MAX length strings)
    3. For this example, I have turned all other fields off, but you can leave them on. It just means adding more fields in your SQL code. (I’ll show you where to do this!)
  4. Create an ID with a Record ID tool, then add a Browse tool

 

Writing Points.PNG

 

 

The data should look like this:

 

Samantha_Jayne_3-1665008505822.png

 

Now we need to focus on the Output tool and push the data into Snowflake. Whereas my previous blog was all about Post SQL statements within the Output tool.

 

In order to achieve uploading spatial data into Snowflake, we utilise both Pre and Post SQL statements.

 

Spatial Write Points Output.PNG

 

First Stop PreSQL

 

Pre SQL Point Write.PNG

 

drop table "Database"."Schema"."STORES";
create table "Database"."Schema"."STORES" (id integer, g geography);

 

As you can see, the PreSQL statements are very simple. In fact, the first time you run the workflow, you only need the create table statement.

 

Let’s see if you are reading or just copying code :D. If you run it with the drop table line before the table exists, you will get an error.

 

Here you are really creating the table using Snowflake syntax to ensure that the table includes a geography column. You would have to include all the columns you want to write to the database alongside their correct data types.

 

Second Stop PostSQL

 

Point Write Post SQL.PNG

 

MERGE INTO "Database"."Schema"."STORES" t1
using "Database"."Schema"."STORES_Temp" t2
on t1."ID" = t2."id"

WHEN MATCHED then UPDATE SET
t1."G" = t2."g"

WHEN NOT MATCHED THEN
INSERT (t1."ID", t1."G")
VALUES (t2."id", t2."g");

DROP TABLE "Database"."Schema"."STORES_Temp";

 

Now we use Snowflake Merge statements to push the data from the TEMP table to the PRODUCTION table. This means the GeoJSON is copied from a string field into a geography field which is why it needs to be a Snowflake agreed format. 😉

 

Then we drop the TEMP table to keep the database clean.

 

Now we have Spatial data in Snowflake!

 

Source: GIPHY

 

Let’s do this again but now push some Polygon data into Snowflake.

 

Write Polygon.PNG

 

Similar process here as above:

  1. Load Spatial data
  2. Change Data Type of SpatialObj to VW_String and the max size to a reasonably large one
  3. Add a Record ID tool
  4. Output tool
    1. PreSQL – Remember it’s database.schema.table (and its your database name and schema)

 

drop table "Database"."Schema"."COUNTY";
create table "Database"."Schema"."COUNTY" (id integer, g geography);

 

2. PostSQL

 

MERGE INTO "Database"."Schema"."COUNTY" t1
using "Database"."Schema"."COUNTY_Temp" t2
on t1."ID" = t2."id"

WHEN MATCHED then UPDATE SET
t1."G" = t2."g"

WHEN NOT MATCHED THEN
INSERT (t1."ID", t1."G")
VALUES (t2."id", t2."g");

DROP TABLE "Database"."Schema"."COUNTY_Temp";

 

Spatial Write Polygon Output.PNG

 

Congratulations you now have a polygon and point dataset in Snowflake!!

 

Tune in same time next week to learn how to do cool spatial SQL in Snowflake using Alteryx InDB tools!

 

Samantha Clifton
CGeog | ACE Emeritus | Sales Engineering

Samantha is well known for sharing her knowledge and evangelising Alteryx with new users both in her role as a Sales Engineer and at INSPIRE. Her passion lies with making every day business chaos, simple and automated giving people hours back in their day jobs and personal lives. With over 15 years experience in data, she enjoys solving puzzles, spatial analytics, writing blogs and helping others develop themselves into gurus like herself.

Samantha is well known for sharing her knowledge and evangelising Alteryx with new users both in her role as a Sales Engineer and at INSPIRE. Her passion lies with making every day business chaos, simple and automated giving people hours back in their day jobs and personal lives. With over 15 years experience in data, she enjoys solving puzzles, spatial analytics, writing blogs and helping others develop themselves into gurus like herself.

Comments
blyons
11 - Bolide

Thank you for this! The method I was using was to have 2 fields in the table, one varchar for geojson, the other geography. I bulk loaded to the geojson field, leaving the geography field null, then ran an update using TO_GEOGRAPHY to populate the geography field. I like that this method doesn't leave that artifact behind.

Samantha_Jayne
Alteryx
Alteryx

So pleased this has helped @blyons !! Would love to know more about your spatial data usage. please reach out to me on samantha.hughes@alteryx.com so we can talk more post INSPIRE, or better yet if you are there, seek me out. 

blyons
11 - Bolide

I have a variant of this process that I think is a slight improvement.

 

In the Pre-SQL, instead of dropping and creating a regular table, I create a temporary table.

e.g. create temporary table "Database"."Schema"."STORES_temp"...

 

(see https://docs.snowflake.com/en/user-guide/tables-temp-transient.html for more information)

 

The MERGE statement in the Post-SQL then works the same way you show above.

 

Advantages:

  • I no longer need to delete the table when I'm done, because regular temporary tables only last for the duration of the session (though explicitly deleting them may be advisable in some scenarios).
  • Temporary tables eliminate any possibility of conflict with other users or applications using the same table.
  • Per Snowflake documentation, "Note that creating a temporary table does not require the CREATE TABLE privilege on the schema in which the object is created." This is most helpful when publishing to the Gallery because the service account we use to connect to Snowflake has limited priviledges.

I hope that is useful.

Samantha_Jayne
Alteryx
Alteryx

Perfect @blyons this crossed my mind when developing, thanks for reminding me of it! Perfect explanation too.

derreck0218
6 - Meteoroid

Thanks @Samantha_Jayne  for all your help. 

 

This posting was timely and most helpful to serve as a model to bulk load spatial data into Snowflake.  This is quite useful to be able to load tables quickly and repeat the process.  This post saved me untold hours to increase the speed and efficiency as well as easily transfer the knowledge to my colleagues. It has allowed me to skip loading the data into SQL first as a blob straight into the data warehouse.  I look forward to macro development of this process.  

 

I did have one issue that was unexpected with one of my loads in the PostSQL in which is said the "GEOGRPHY too large".  Now, I am curious about that.  Again I appreciate you for getting us up to speed with this solution. 

Samantha_Jayne
Alteryx
Alteryx

Ah @derreck0218 large geography is a new one. You may need to simplify or remove duplicate nodes on that one. Now we are getting technical. Just to rule out anything else, make sure the string is set to Snowflakes largest and not Alteryx largest string size, 16777216 - from memory ;)

Also try removing that record and seeing if everything else goes in - that way you can isolate the one it is. Maybe worth mentioning to the supplier of the data too, so it can be sorted at source and you wont need to do this every time.

blyons
11 - Bolide

@derreck0218 , I also encountered problems with large geographies in Snowflake. I did a bunch of research and testing and opened tickets with Snowflake. See my post on 8/25/2022 in this thread for some details, limits, and workarounds. The fact that you are getting a "large geography" message is different than what I experienced, which tells me that Snowflake may have put in something to detect the issue I was having, rather than just ignoring it and leaving null objects as it did before. Not sure about that.

vizwhiz
5 - Atom

Several issues when attempting to follow this process when loading the Census Tract shapefile from the US Census Bureau (https://hazards.fema.gov/nri/data-resources#shpDownload) (I only attempted to load one statefips (11 District of Columbia) - :


PostSQL did not work for DC, returning an error.  Using the same sql within Snowflake also resulted in an error message: Geography validation failed: Loop 0: Edge (-77.089012,38.903912) -- (-77.088527,38.903713) crosses edge (-77.088552,38.903721) -- (-77.089002,38.903908)

 

StateFIPs for Hawii (15), did not return an error, but even though the resulting field seems to be a valid geography datatype, Tableau version 2023.1.2 picks that field up as a text field, not as a spatial object.  Using ST_DIMENSION funtion for the Hawaii data returns that all the data is of type '2', or Polygon (per this article re: Snowflake and Tableau -- https://medium.com/starschema-blog/tableau-map-layers-snowflake-geospatial-support-b3b0c1ec0847)

So, the two questions remain for me:
1. How to overcome the US Census Bureau's data that returns edge validation errors
2. Even if the data does reside in Snowflake, what's up with Tableau not recognizing the spatial geography datatype