Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
ned_blog
8 - Asteroid

1st - a big thank you to Steve Pasetti of Accudata for bringing the MaxMind GeoLite City Database to my attention. He was asking me about how it might be loaded into Calgary. The biggest problem with it is that it gives a range of IP addresses pointing to a single location. It might refer to 207.12.246.129-207.12.246.254 as Boulder CO. The problem with that is Calgary has no way to do a lookup for a # inside that range. Given 207.12.246.158, Calgary has no easy way to look it up. Steve had some ideas that would have worked pretty well, but they required some changes to code. I figured that I could figure out a way to do it without having to change anything.

 

The 1st thing I did was load the city file into a Calgary. That's only about 250K records, so that is a quick load. The next step was the big one. I know that Calgary indexes can handle a LOT of data. I also know that they compress some types of data really well. I used a generate rows tool to generate every unique IP address and then load it into an advanced index. It ended up being 2.9 Billion records! The good news though is that Calgary indexes compress sequential #'s really well. The index ended up taking about 2 hours to build and coming in at about 100MB. The compression ratio for that index came out to about 300 to 1! Not too bad.

 

The second problem was much simpler. It is that the IP address are stored not as a string, but as a simple number. It is easy to write a formula to convert them and then I built a quick macro to accomplish the IP Geocoding with little pain for the user. It then becomes a simple task to geocode IP address to the city level and show them on a map.

 

MaxMind is kind enough to distribute this database under a GPL license so I can provide a direct Calgary download of this data. The database is current as of August 2009. For more recent or premium versions, see http://www.maxmind.com/.

 

So for an example - here is a map of the last 30 days of traffic to http://www.freedemographics.com:

 

 

Required acknowledgment: "This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com/.

Comments