Morning All,
I've recently had the need to geocode IP addresses - not a thing Alteryx has a built in provision for. Having looked at the community for ideas I've seen that it's a problem that has come up a few times over the years with not much in the way of solutions.
I thought it might be useful to share my experiences.
There are a few approaches I took before landing on my ultimate solution to the problem which I'll quickly outline -
Methods I tried -
Ultimately I decided it was about narrowing down the intital group you match to.
My background is in Location Planning/Analysis so I'm well familiar with the spatial tools.
It occured to me that the spatial match tool is incredibly efficient at both loading large sets of data and performing match operations, with this in mind I decided to convert the IP ranges in the IP database to co-ordinates and create a point spatial object to represent them (I did this by making them fractions of the maximum possible IP and then multiplying them by 90, IP range start becomes the X co-ord, IP range end becomes the Y co-ord). I then embed this in the spatial match tool.
Then if you were to convert the IP you are interested in to co-ordinates as well (using the converted IP as both X and Y) you will get another spatial point. The next part of the process was to draw a small circle around the IP of interest and spatial match to full database. (the size of the circle determines the number of possible matches you will get), you can then just filter this small subset of possibles to get your answer.
Using this method I was able to bring the run time down to 1 minute per 10,000 geocodes.
If anyone is interested in this I'm happy to provide more detail.
Regards,
Ben
@Ben_H This is really fantastic! Great work! We need to get this into a knowledge article or blog post!
@Ben_H ,
I'm interested in finding out more (maybe not as excited as @joshuaburkhow ), but wanted to know if you would try using my spatial match macro in place of the spatial match tool?
AMP may eventually eliminate the performance benefits of my tool, but I'd be interested in hearing how much the macro improves your throughput.
https://gallery.alteryx.com/#!app/CReW-Spatial-Match/5c335c5f0462d70ba8a5b2cb
Otherwise, I could ask for your testing workflow (Exported to include both data and process) and I'll publish here the results.
Cheers,
Mark
Hi @MarqueeCrew
I had a go at implementing your Macro but it couldn't quite figure out how to add it in without completely reworking mine.
The size of the IP databases means that it doesn't really lend itself to zipping up and sharing but I'll break down my process a bit for more for you below. Hopefully that helps a little.
Top level - as you might expect you just run a set of IP addresses into a macro - in this instance I ran 10,000.
Batch Data - I split the list of IP addresses into batches - this was to save memory originally but I'm not sure if this isn't just making it take longer at this point. I would expect batching to be more beneficial for larger sets.
Geocode -
I then run actual matching part – each batch comes through and the IP addresses are converted to a number then co-ordinates.
A centroid is generated and a radius. (this is arbitrary – it’s too big at the moment I think)
The radius is then matched to the universe (all spatial objects are dumped at this point).
The spatial match output contains what I'm looking for - I filter down to matching record (i.e. where input IP is >= IP range from and <= IP range to) and then output.
I then add on the full set of data from the IP database – this again I think could be better – the embedded version contains as little information as possible to make that part quicker but I might just be slowing the whole process down here anyway.
Regards,
Ben