This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Even though the Spatial Match tool is extremely fast and efficient, there are additional ways its speed can be further improved. This article provides suggestions to increase performance.
As defined in the Help documentation, the Spatial Match tool establishes the spatial relationship (contains, intersects, touches, etc.) between two sets of spatial objects. At least one input stream should include the Polygon type spatial objects. The other set will contain any of the other types of spatial objects, such as points or lines. But wait - which set of objects shall be used for the Universe (U) and which set for the Target (T)?
Under the hood the Spatial Match tool will put the U input into a temporary YXDB file with a spatial index. This is a highly efficient data format for spatial data. Thus, instead of indexing the geometric features of the object (first image), the objects' bounding boxes are indexed (second and third image).
This effectively means that when calculating a spatial match, only a few spatial objects inside the relevant boundary box must be considered for the spatial calculation. Next, every object from the T input is spatially matched with the relevant objects of the U input.
In one line: The Spatial Match tool can ignore most Universe records that do not match the Target record. Using this fact to your advantage can greatly speed up your workflow.
Deselect spatial objects not needed
As with many other tools, the Spatial Match tool has a built-in Select tool enabling one to deselect columns that are not needed. While discarding unnecessary columns comes in handy to make data sets more readable, it can be a real performance improvement. Therefore, unnecessary spatial objects should be removed from the workflow. Unnecessary data consumes memory and takes away otherwise available resources.
In the below example, toggling back the spatial object will increase the tool output from 7 kB to 757 kB.
Spatial tool output with unnecessary spatial object
Spatial tool output without unnecessary spatial object
Consider Using the Dynamic Input Tool
In certain circumstances, using the Dynamic Input tool is quicker to perform a Spatial Match than using the native Spatial Match tool. Note: This can be only used for the spatial relationship 'Universe contains Target'.
To perform a spatial match using the Dynamic Input tool, select the spatial data file, then choose the second option: 'Modify SQL Query'. Select the latitude and longitude fields for the Universe object, and the spatial object field for the Target object. This SQL filter will only let through data that fall within the bounding rectangle of the polygon.
Harness the Power of Calgary Data and YXDB
The YXDB (.yxdb) and Calgary DB (.cydb) data formats use spatial indexing. As explained above, this can give the workflow a major efficiency boost. Therefore, when possible it is strongly advised to import data from the above two DB types.
The second advantage is that they both enable you to leverage the spatial index. As defined in the Help documentation, for Calgary use the spatial Calgary Join tool. If specifying a Calgary file, be aware that the Calgary spatial index uses 5 decimal places of precision for compression and speed. The yxdb spatial index uses 6 decimal places. This adds an additional round-off error of up to a maximum of 1.8 feet to Calgary indexes. In other words, it is possible that a point can be 1.8 feet inside of a polygon and yet still be found as "outside."
In summary, using the YXDB and Calgary DB data formats has the advantage of the highly efficient spatial indexing.
Use Integrated Tool Input in Spatial Tools
For larger data sets, the option to Use Records from File or Database can be used for added speed. This also uses the spatial index and has the advantage that the entire dataset will not have to be read into memory for the workflow to start, as I/O is usually the biggest performance bottleneck for Alteryx.