Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Geocoding IP Addresses

Ben_H
11 - Bolide

Morning All,

 

I've recently had the need to geocode IP addresses - not a thing Alteryx has a built in provision for. Having looked at the community for ideas I've seen that it's a problem that has come up a few times over the years with not much in the way of solutions.

I thought it might be useful to share my experiences.

 

There are a few approaches I took before landing on my ultimate solution to the problem which I'll quickly outline -

 

  • API based - Feeding a list to a API based system
    • This ultimately wouldn't work for me as I have a large number of IP addresses to geocode and no budget to pay for a service.
    • Free plans are typically limited to 10,000 queries per month.

 

  • Using a free database to build my own - There are a few free (but less accurate) IP databases provided by the same companies that provide the API services. I ended up using db-ip (https://db-ip.com/db/). The problem with this method is how to efficiently match an IP address to the data.
    • The data is across 3.2 million records, each representing a range of IP addresses. The total number of IP addresses covered being in the billions.
    • The question became how to join this data given that Alteryx has no in built function to join if a value is >= x and <= y.

Methods I tried -

  • Brute force 1 - You could just append the IP to the full list and then identify which range it falls in with a filter. Works, but very inefficient and very slow.
  • Brute force 2 - Generate a massive database of all the unique IPs from the ranges and then join to that. Again super inefficient and felt unneccesary.
  • Advanced Join Macro - Download the advanced join macro from the gallery - this does allow you to join a value that falls between ranges but after trying it, it was still extremely slow.

 

Ultimately I decided it was about narrowing down the intital group you match to.

 

My background is in Location Planning/Analysis so I'm well familiar with the spatial tools.

 

It occured to me that the spatial match tool is incredibly efficient at both loading large sets of data and performing match operations, with this in mind I decided to convert the IP ranges in the IP database to co-ordinates and create a point spatial object to represent them (I did this by making them fractions of the maximum possible IP and then multiplying them by 90, IP range start becomes the X co-ord, IP range end becomes the Y co-ord). I then embed this in the spatial match tool.

 

Then if you were to convert the IP you are interested in to co-ordinates as well (using the converted IP as both X and Y) you will get another spatial point. The next part of the process was to draw a small circle around the IP of interest and spatial match to full database. (the size of the circle determines the number of possible matches you will get), you can then just filter this small subset of possibles to get your answer.

 

Using this method I was able to bring the run time down to 1 minute per 10,000 geocodes.

 

If anyone is interested in this I'm happy to provide more detail.

 

Regards,

 

Ben

 

3 REPLIES 3
joshuaburkhow
ACE Emeritus
ACE Emeritus

@Ben_H This is really fantastic! Great work! We need to get this into a knowledge article or blog post! 

 

 

Joshua Burkhow - Alteryx Ace | Global Alteryx Architect @PwC | Blogger @ AlterTricks
MarqueeCrew
20 - Arcturus
20 - Arcturus

@Ben_H ,

 

I'm interested in finding out more (maybe not as excited as @joshuaburkhow ), but wanted to know if you would try using my spatial match macro in place of the spatial match tool?

 

AMP may eventually eliminate the performance benefits of my tool, but I'd be interested in hearing how much the macro improves your throughput.

 

https://gallery.alteryx.com/#!app/CReW-Spatial-Match/5c335c5f0462d70ba8a5b2cb 

 

Otherwise, I could ask for your testing workflow (Exported to include both data and process) and I'll publish here the results.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Ben_H
11 - Bolide

Hi @MarqueeCrew 

 

I had a go at implementing your Macro but it couldn't quite figure out how to add it in without completely reworking mine.

 

The size of the IP databases means that it doesn't really lend itself to zipping up and sharing but I'll break down my process a bit for more for you below. Hopefully that helps a little.

 

Top level - as you might expect you just run a set of IP addresses into a macro - in this instance I ran 10,000.

Ben_H_1-1607354315875.pngBen_H_2-1607354315876.png

 

Ben_H_3-1607354315877.png

 

 

 

Batch Data - I split the list of IP addresses into batches - this was to save memory originally but I'm not sure if this isn't just making it take longer at this point. I would expect batching to be more beneficial for larger sets.

Ben_H_4-1607354315878.png

 

 

Geocode -

 

I then run actual matching part – each batch comes through and the IP addresses are converted to a number then co-ordinates.

A centroid is generated and a radius. (this is arbitrary – it’s too big at the moment I think)

The radius is then matched to the universe (all spatial objects are dumped at this point).

 

The spatial match output contains what I'm looking for - I filter down to matching record (i.e. where input IP is >= IP range from and <= IP range to) and then output.

 

I then add on the full set of data from the IP database – this again I think could be better – the embedded version contains as little information as possible to make that part quicker but I might just be slowing the whole process down here anyway.

 

Ben_H_5-1607354315882.png

Regards,

 

Ben

 

 

Labels