Location Data Knowledge Base

mbarone · ‎03-31-2022

If you’re familiar with the process to get zip points to show up in Alteryx, then no doubt you know the frustrations of clicking through multiple Allocate registration windows and navigating to multiple deeply embedded data VGF files. It’s painful, and you might ask why there isn’t a simple option to register these point geographies with the regular install. Well, turns out there actually is a decent reason that has to do with how data is allocated to point geometries versus polygon geometries. I’m not a spatial expert, and don’t pretend to be, so I’ll leave it at that. But I do know that when we look at the data allocated to these points; we need to take it with the proverbial grain of salt.

Think of it this way, a zip point is often a main post office with hundreds of PO boxes:

If a retail trade business has their legal zip code as that zip point location (i.e., a po box in that building), and their actual physical building 10 miles away, where do they get counted as a retail trade business? If looking within a mile of that post office (the building with all the po boxes) for retail trade businesses, you will get them since they are retail trade, and their zip is that building, and that building is certainly within a mile of itself. But in reality, they are located 9 miles away.

Still, some of us do use the zip with points data. You never know; perhaps for a particular predictive model the fact that a business uses a po box (their zip would be a zip-point) could come out to be a strong predictor of some behavior. So, there are a number of us that want those point geographies to be made available in the Allocate tools.

Back to the original intent of this post – is an easier way than all that clicking and navigating to get those zip-points registered and therefore to show up in the Allocate tools? Yes, there is. But it involves modifying INI files immediately after installing the US Data Bundle, but before registering the data (from a network location) on your various machines.

Before I go over how to do this, it must be noted that this is not an Alteryx supported method. However, this method starts out by making a backup copy of the INI files that we’ll be modifying, so it’d be very simple to revert back to the old files and begrudgingly go through the tried and true click-happy process (again, linked above in the first paragraph).

Okay, here we go - you just installed the data package. That means there are now 3 INI files and 9 VGF files that were installed that need to be considered:

1 INI file and 3 VGF files for the “Full US” dataset
1 INI file and 3 VGF files for the plain old “US” dataset
1 INI file and 3 VGF files for the “Census” dataset

The 3 VGF files for each dataset bulleted above breaks down as follows:
- 1 VGF file for points
- 1 VGF file for points by county
- 1 VGF file for points by DMA

Let’s look “inside” one of those INI files, namely the one for the “Census” dataset. You can scroll down to the VGF portion of the INI file (open with your favorite text/code editor; here, I use Notepad ++):

The INI file automatically comes with lines of code referencing the 17 VGF files listed above. We need to add 3 more lines of code to this section (so VGF18, VGF19, and VGF20) for the 3 additional zips-with-point VGF files (points, points by county, and points by DMA). Note that the INI file is located in the folder you specified during the data install process.

And here is where those VGF files are:

The 3 additional lines of code we’d therefore add to the INI file are:

VGF18 = Data\ALTERYX_US_2010B_ZIPSwithPOINTS_Q4_2021_21A.vgf

VGF19 = Data\ ALTERYX_US_2010B_ZIPswPointsbyCnty_Q4_2021_21A.vgf

VGF20 = Data\ ALTERYX_US_2010B_ZIPswPointsbyDMA_Q4_2021_21A.vgf

The modified INI file would then look like this:

Then we just save the now-modified INI file, and when you register the datasets through the normal process on your various machines, those zip with points files will get registered and therefore show up in the Allocate tools:

Now we just need to repeat the process for the “Full US” dataset and the plain old “US” dataset.

But enough manual work. Let’s transition over to Designer and create a workflow and macro to do this all for us with a simple click of the “Run” button.

The first step is to locate the new INI files. As stated previously, this will be the location you specified during the data install process. For me, that’s \\[server name]\w$\Program Files (x86)\Alteryx\DataProducts. There’ll be lots of INI files there, and I’m only interested in the Allocate ones mentioned earlier, so I’ll be sure to only look for those:

Note that there are actually two INI files that begin with “ALTERYX_US_Experian_”: the one we are interested in, and the one for Simmons data which we are not interested in, so we make sure that one won’t be included.

Now we have the list of INI files we want to modify:

Next, we need to get a list of the VGF files for which we’ll be adding lines of code for (in the INI files above). Recall that each INI file will have 3 lines of code entered (one line of code for each of 3 VGF files). So, we’ll set up a separate stream in the workflow to get a list of those zip with point VGF files:

Now we ALMOST have the list of VGF files we want to add lines of code for (in the INI files):

If you archive and store previous vintages, you’ll see all VGF files over time (refer to DateCreated_VGF field). We need to pull only the most recent ones. So, we’ll simply take the max date for each file, and join the result with the whole file on max date to get only the VGF files that were just installed:

And that will give us just what we need:

Each highlighted block above represents the 3 VGF files that the corresponding INI file will need to be modified with.

At this point we have the list of INI files to modify:

And the list of VGF files to modify those INI files with:

Now that we have the list of INI file we need to modify, and the lines of code we need to modify them with (i.e., the names of the VGF files), we need to “link” or join them together (think yellow-to-yellow above, then green-to-green, and orange-to-orange). To do this, we’ll need to create a common key to join them on. I’ll call it “ini_vgfFile_Joiner”. And I’ll give it values of “FULL_US”, “US”, or “CENSUS”, based upon the file names and/or file paths listed above.

For the INI files, the joiner formula would look like this:

And for the VGF files, the joiner formula would look like this:

Now we’re ready to join the two data streams:

And the resulting inner join looks like this:

What I want to do next is create a macro to process this information. Since I want to make it as dynamic as possible, I’ll create a batch macro to read in the above information and process it one INI file and the corresponding 3 VGF files at a time. But to do this, I need to get each INI file into just one row. If I group by INI file, and concatenate the VGF file names using a pipe (“ | “) delimiter, this becomes a simple one-tool task:

⬇️

We now have one record per INI file with the associated VGF file names to feed into our batch macro:

For the batch macro questions, you simply give it the INI path and the VGF file name. Let’s look then at what the macro will do for each “INI file / VGF file names” pair:

First, we start with a Text Input tool so we have a starting point for the macro to feed each INI file / VGF file names pair. The Text Input tool is hardcoded with the FULL US INI file and it’s associate VGF file names:

And then the control parameters will update these fields with each incoming INI file / VGF file names pair:

Once we have the INI file full path, and the VGF file names, first thing to do is read in the INI file with a Dynamic Input tool, and make a backup of it before we modify it. The set up of the Dynamic Input tool is pretty standard for reading in a flat file (which is what an INI file is). Note the highlighted configuration fields (again, pretty standard for flat files):

And here’s what the results from the Dynamic Input Tool look like (I’ve scrolled down to the VGF section):

You can see how I brought in the full file name in the right column. This is for easy outputting to the backup file, as well as overwriting the INI file itself.

So first we’ll make the backup. I simply take the file as-is, add a “.bak” to the file name, and output it. I use a Block Until Done Tool so further processing is delayed until all records pass through the “1” anchor (and since these are flat files with only about 100 records or so, they will output almost instantly so there is no fear that downstream processing will complete before the backup file is output):

Now that the backup is created, we can start to process the modifications. The first thing I want to do is add record IDs to the data so we can keep the order of the lines of code, with the exception of inserting our additional 3 lines for the VGF file name references right where they belong (after the other VGF file name references). I also want to get rid of the file name at this point (I’ll bring it back later) so I have just the records, as they are in the INI file:

The remainder of this data stream (the one off the select tool) will focus on counting up the number of VGF file names in the INI file already, and then adding 3 additional ones. So here, we’ll dynamically figure out that there are 17 VGF file names so far, and will add 3 more (so 18, 19, 20).

The way I’ll do this is to first isolate just the number of the VGF file names:

I can now take the max VGF_Nbr, and add the next 3 numbers as fields by simply adding 1, 2, and 3 to the max VGF_Nbr. And then I’ll do the same for RecordID, but instead of adding 1, 2, and 3; I’ll add 0.1, 0.2, and 0.3 in order to preserve the incoming order:

So the result will look like this:

And then we can flip this data using a Transpose Tool, and do that for both the record IDs and the VGF numbers. We can then join them by position:

And now we’ll use a formula to create the start of the text that is needed for the lines of code:

Simultaneously, we’ll flip the delimited VGF file names from the Text Input tool (that’s updated each batch via the control parameter) back to vertical in preparation to join it (by position) to the above:

We now can join these two by position:

A quick formula and deselection of the intermediate fields will give us exactly the 3 additional lines of codes we need:

Now we just union back these new records with the original records and sort on RecordID to get:

And here’s the modified INI file that can be re-saved to its original location/name, for use when registering the data on your various machines.

Only thing left to do is dynamically bring in the original location/name so it can be saved:

And then we can connect that Appends Field data stream to an Output tool and configure it dynamically as follows:

And that’s it. Let’s take a look at it from a summary view.

We have our workflow:

And our macro:

As long as the two directory tools point to your install location, you can run the module and the 3 INI files will be updated, after making a backup of each:

You can now register your data package/files on your various machines just like normal, and the zips with points geographies will appear in the Allocate tools.

Attached is a YXZP file with both the workflow and the macro. You’ll just have to configure the Directory tools in the workflow, and the Text Input tool and Dynamic Input tool in the macro to all point to your installation directory.

Location Data Knowledge Base

An Easier Way to Get Point Zip Codes to Show Up in Alteryx (Allocate Geographies)