Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Proximity Searching on a String

theinsideguy
7 - Meteor

Hello,

 

Does Alteryx have a  non-regex way to find if a string "contains" one word NEAR another? For example:

 

Find: "Crazy" within 5 words of "Programmer"

 

1. "The crazy person was a programmer" would return true

2. "The crazy person hoped that one day he could be a programmer" would return false 

5 REPLIES 5
Qiu
20 - Arcturus
20 - Arcturus

@theinsideguy 
I can only think of a Non-RegEx way. 😁

0203-theinsideguy.PNG

binuacs
20 - Arcturus

@theinsideguy another method using generate row tool

binuacs_0-1643882800727.png

 

theinsideguy
7 - Meteor

Great solution! One of the issues I'm having is that I have around a million records that are many pages of text each. Placing each of these words into a row would end up being billions of rows. Processing would probably take days! Is this the only way? Or is there a faster way that anyone knows of?

 

Thanks!

trevorwightman
8 - Asteroid

Here is my solution and I think it should running 1M records should be no problem at all. Let me know what you think!

 

First, you want to find where the word "Crazy" is. Then you want to take the length of this text, minus the length of that same text after removing spaces. This will tell you how many spaces are before the word "Crazy".

 

Next, you will want to do the same thing as the above, but for the word "Programmer". This will tell you how many spaces are before the work "Programmer".

 

Now that you know how many spaces are before each word you can take the absolute value of the difference and add 1 which will give you the number of words between "Crazy" and "Programmer".

 

From here all you need to do is a simple conditional statement to determine if the number of words is >=5 or not.

trevorwightman_0-1643952648314.png

 

EDIT: You can actually make this a bit more buttelproof by first stripping out any punctuation (I did common ones like period and comma). Then search for " crazy " and " programmer " with spaces surrounding the words so you don't inadvertantly pick up other words that contain these strings. But this may be a bit excessive since a quick google search shows that there are many of these extra words :).

trevorwightman_0-1643984442069.png

 

EDIT 2 (and final edit): I actually like this way a bit more. Instead of counting spaces, just count words in each substring and then find the difference.

trevorwightman_1-1643985151084.png

 

Qiu
20 - Arcturus
20 - Arcturus

@theinsideguy 
You have a point.
I am doing something like this.
I will cut the head and tail before and after ”crazy” and "programmer".

then get the word of remaining, then minus -1 to get the distance.
In this way, the rows remains same. but of course, needs some adjustment for your real data.

0204-theinsideguy-A.PNG0204-theinsideguy-B.PNG

Labels