Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

IP validation from a controlled file

calloni
9 - Comet

Hello

 

I have the following challenging scenario. In one datasource I have 18M rows, each row two columns: IP and Date; On the second datasource I have a controlled list of IPs that contain IP Range and Bot. This file has 1857 IP variations, some of the IPs are complete IPs (XXX.XXX.XXX.XXX) and others are IP ranges (XXX.XXX or XXX.XXX.XXX)

 

This is an example of what I need to achieve:

 

Datasource 1

IPDate
216.239.41.962018-01-01 00:00:00
64.233.173.1972018-01-01 00:00:00
64.68.90.802018-01-01 00:00:00
209.85.238.112018-01-01 00:00:00

 

Datasource 2

IP rangeBot
216.239Google
64.68.90.80Bing
64.233.173Yahoo
209.85Other

 

Final Table

IPDateBot
216.239.41.962018-01-01 00:00:00Google
64.233.173.1972018-01-01 00:00:00Yahoo
64.68.90.802018-01-01 00:00:00Bing
209.85.238.112018-01-01 00:00:00Other

 

At this moment I am running a workflow that contains a gigantic IF CONTAINS formula with all 1857 possibilities and of course that's super slow.

 

I wonder if there is a more intelligent way of doing this?

 

Thanks in advance

Rodrigo

5 REPLIES 5
BrandonB
Alteryx
Alteryx

One possibility would be to turn the ranges into lists using a generate rows tool. Your controlled list of IPs would then serve as a lookup table which you would join to the first dataset on IP address. 

calloni
9 - Comet

 Thanks Brandon

 

I've added a Generate Rows tool but I am not sure how to complete the IP ranges (2 and 3 parts) to transform then into IP with 4 parts.

 

Any examples I could follow?
Rodrigo

jrgo
14 - Magnetar

@calloni 

 

Would a Find Replace tool not work? I have it configured to find from the beginning of the string and appending the "Bot" field from DS2 when a match is found. Output matched your final table example, but maybe there's something i missed that wouldn't make this option feasible.

image.png

danilang
19 - Altair
19 - Altair

Hi @calloni 

 

Here's another technique you could try

 

w.png

Split the IP and Bot IPs into subnets IP1-IP4.  Use filters and corresponding joins to match.  For BOT ranges that only have the first 2 subnets, only join on those 2, etc.  Add a formula to indicate the number of subnets matched after each join, union everything and use the sample tool to pull out the record for each IP that matches the largest number of subnets.  This last bit handles the case where you have subcategories of BOTs i.e 

216.239Google 
216.239.1Google  Mail
216.239.2Google Calendar

Note: I just made these up to demonstrate the concept.  I have no idea if Google has sub categories of bots.

 

The end result looks like this.  Remove any columns you don't need 

 

r.png

Dan

  

BrandonB
Alteryx
Alteryx
Rodrigo, you would probably need to use filters to break out the different lengths and then use generate rows accordingly.

However, based on the other answers people have provided it looks like the find and replace at the beginning of each string may actually do exactly what you need with minimal configuration needed!
Labels
Top Solution Authors