Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Allocate Append and Non Overlap Drivetime tool - Excessive counts (Sum)

RickPack
8 - Asteroid

The workflow I attached to this message shows the sum of Massachusetts residents in the Non Overlap Drivetime tool's trade areas (western MA) from 100 addresses (sample data provided with Alteryx) 68% of the number of total residents in Massachusetts despite not including any of the Boston metropolitan area (eastern MA) that also composes about 68% of total MA residents. Is there some kind of deduplication I need to do? 

 

The attached workflow took 8 minutes to complete on my machine. It uses data that I believe came with Altyerx and was found at C:\Program Files\Alteryx\Samples\data\SampleData\AddressData.yxdb. I believe addressing this question requires the Location Insights or Business Insights data package. 

 

In short:

  1. Geocode a 50-record sample of Massachusetts Alteryx sample data (AddressData.yxdb) with Street Geocoder 
  2. Use the Non Overlap Drivetime tool to calculate the trade areas for the spaces within 30 minute drive time
  3. Spatial Match so the trade areas are restricted to the state of Massachusetts 
  4. Sum the count of the MA population in those trade areas (should I be using Spatial Combine because the dataset has one row per Address and residents can still be counted multiple times despite the Non Overlap Drivetime usage?)
  5. Compare that count to the total MA population from Allocate Input.
 

RickPack_0-1591979242613.png

 

RickPack_4-1591979355877.png

RickPack_2-1591979451385.pngRickPack_3-1591979464400.png

 

 

RickPack_4-1591979523480.png

 

Thank you to JD Love-Epp of Aimpoint Digital for your valuable help in getting to this point.

 

4 REPLIES 4
CharlieS
17 - Castor
17 - Castor

Hi @RickPack 

 

Did you mean to attached the workflow? I have access to the data installs and would be interested in testing. 

 

Here's a few thoughts before looking at the module:

- You can test to see if spatial overlap is your issue by taking your "non-overlapping drive time polygons" and combining them into a single object using a Summarize tool (Spatial>Combine). Use an Allocate Append on that summary area and compare that population to the sum of population counts from the individual polygons.

- You cite an estimate of the "Boston metropolitan area (eastern MA)". It's important to note that the Boston-Cambridge-Newton, MA-NH CMA is not limited to the state of Massachusetts (see below).

     - The current year population estimates of these areas, according to the Experian US 2019 Q3 data are: State of Massachusetts [6,879,306], Boston-Cambridge-Newton CMA [4,847,653]. the intersecting area between the state and CMA has a population of [4,406,207]. I realize that 91% of the CMA's population estimate is within the state, but it's still important to note. 

- While they should ideally be similar, there are always going to be some differences in population estimates from different sources including methodology and vintage of source/estimation. (your data vs "according to Google")

 

20200612-MassPop.PNG

RickPack
8 - Asteroid

Thank you @CharlieS. I will attach the missing workflow momentarily. I am going to try your Summarise Tool -> Spatial Combine compared to "non-overlapping drive time polygons" population count comparisons shortly.  

CharlieS
17 - Castor
17 - Castor

Thanks for sharing the workflow. Now that I have seen the points, I can tell you that your issue is definitely related to overlapping spatial objects from the Non-Overlapping Drivetime tool. 

 

"but why, Charlie? 'non-overlapping' is in the name!" The answer to this lies in the "Grid Size" settings on the tool configuration. The numeric up/down and the miles/kilometers dropdown control the grid size that is used as the basis for distributing the overlapping area. In this scenario, the points entered are very close (some are even the same points duplicated). If the tool cannot distinguish the trade area of one from another by at least the grid size entered, then it will not be able to split the polygons. 

 

- After the Street Geocoder, add a Unique tool set to the SpatialObj field so only unique centroids are sent downstream.

- Reduce the grid size settings to smaller settings (0.1 Km is the smallest) for increase accuracy (yay!) and increased runtime (/sad).

 

Here's what happens when you reduce the grid size. The two examples use the same points and drive time (10 minutes), but one uses 5 mile grids and the other uses 0.1 Km grids.You can see that once the grid settings are small enough, each point gets its own unique area.

20200612-NODT_5MiExample.PNG20200612-NODT_01KmExample.PNG

 

Make sure your grid size settings are small enough for the proximity of your centroids, or ever smaller for the best accuracy. If the smallest setting available from the tool is not small enough for your use case, the Non-Overlapping Drivetime tool is a macro you can right-click > open to edit further (be sure you don't save over the original if you do make changes).

RickPack
8 - Asteroid

Thanks, Charlie! 

I experimented with adjusting the grid size parameter of the Non Overlap Drivetime tool to see how the adjustments impacted the population counts. As you offered, the grid size made a massive impact on the estimate when Summarize -> [Spatial -> Combine] was not used. The impact when SpatialCombine was used was smaller and interestingly (but not of sufficient concern for me to research further) not consistent:

 

The total Massachusetts population is 6,547,629. 

0.1 Mile Grid (run time 12 minutes)

RickPack_0-1592061834725.png

1 Mile Grid (run time 7 minutes)

RickPack_4-1592061938878.png

5 Mile Grid run time (6 minutes)

RickPack_3-1592061919269.png

 

My conclusion: Given the proximity of these points, like you mentioned, and if one is aggregating data across all locations, I see support for generally using the Non Overlap Drivetime tool with a 5 mile grid. However, if one needs data specific to those within a drivetime from each location separately and a lack of overlap is key, using a 0.1 mile grid appears wise. Based on what I have seen with other data, this may mean running a workflow that takes 8 - 10 hours.

I wonder though: what is an example use case for needing to avoid overlap of, say, counted members of the population within 15 minutes of multiple locations, and needing a count per location separately? I can only imagine use cases where overlap would be fine: if X people are near store 1 and the count of Y people near store 2 includes some members of X, that seems fine for practical applications.

 

RickPack_0-1592062198338.png

 

 

 

RickPack_1-1592061874218.png

 

Labels