Hi Alteryx Community!
I have a potential project that will involve spatially matching billions of points to hundreds of millions of polygons on a daily basis. Essentially, the data will be coming in faster than one or even ten Designer instances can process it.
For anyone that's dealt with spatial operations of this magnitude, is it something that can be handled with Alteryx with clever parallelization or macros or extremely powerful/fast systems? Or will I likely need to invest in specialized cloud compute resources?
Thanks!
Solved! Go to Solution.
Hi @Ben_Edelman
Been there. For this scale at this pace, I recommend Google BigQuery. You can spatial match millions of points to millions of polygons in a matter of minutes. Getting data in/out of BigQuery/Google Storage is a little cumbersome. There are some tools on the Gallery for limited applications, but otherwise most of it can be automated/controlled via Alteryx and command line. Alteryx has some support of Google BigQuery, but I also recommend checking out the Simba ODBC driver for use in other Alteryx tools.
https://help.alteryx.com/current/designer/google-bigquery-0
https://cloud.google.com/bigquery/docs/reference/odbc-jdbc-drivers
Can you expound further @CharlieS
The attached workflow below had been running for days and I need to run Cities on queue
Appreciate your response
Looking at that image, I have a couple suggestions for this size of spatial match.
- Save off the universe input as a .yxdb file (which by default will have a spatial index) and use that file as the universe input inside the spatial match configuration.
- Also, before you save that file off, don't Summarize/combine the spatial objects, keep them as individual objects. It will be easier to match the smaller pieces instead of one combined object (even though the overall file size is larger).
Try that out and let us know how it goes.
Amazing, it works well, thanks a lot @CharlieS