Engine Works

ned_blog · ‎02-14-2011

We got an issue report on our internal support email about the non-overlapping drive time (DT) macro being slow. In this case it was running 29,000 5 minute DTs against a nationwide retail network. It took 18 hours, which certainly does seem like a long time. Rob & I both reasoned, well yeah, of course it's slow because the quantity of spatial processing is so much larger than a regular DT polygon. We kind of decided it might take as much as 10X longer than running a regular DT.

We were in for a surprise when Rob ran the same points for a normal DT and it completed in 5 minutes (the speed of that is in large part credited to Rob's efforts in multi-threading it). Our off the cuff estimate of 10X slower for the non-overlapping DT says it shouldn't have taken more than 1 hour.

Not knowing where to start optimizing the macro, I started by taking a sampling of records (the 1^st 300) and setting up the macro so I could run it directly. After running I turned on the connection progress and just started to poke around. The problem appeared very quickly. While most of the connections showed about 2.1MB of data, there were a few that showed more than 300GB! What was worse was that it was easy to see by raising or lowering the # of input records that the problem was exponential.

What was going on? The algorithm for the non-overlapping DT involves finding the intersecting areas of the DT polygons, splitting them into grids and assigning each grid to the nearest (by drive time) location. The problem is that we created a single spatial object for the entire layers intersection. Because of the large input data set, this spatial object ended up have 17,000 regions and 1.9M points! Clipping all the DT polygons against this beast of a region was a huge issue.

The simple solution was to split the intersection polygon into its separate regions and then use a Spatial Match tool to find the bits that intersected any given DT polygon. This probably should have involved the PolySplit tool, but there was a limitation that made that unusable in this case (more on that in another blog.)

The long and short of it is that this particular data set went from 18 hours to 12 minutes. Clearly the algorithm matters most here.? The newest version of this macro will be part of the Alteryx 6.1 release, which will be out soon...

Engine Works

Algorithm Matters