Alteryx Designer Desktop Discussions

theinsideguy · ‎02-02-2022

Hello,

Does Alteryx have a non-regex way to find if a string "contains" one word NEAR another? For example:

Find: "Crazy" within 5 words of "Programmer"

1. "The crazy person was a programmer" would return true

2. "The crazy person hoped that one day he could be a programmer" would return false

Qiu · ‎02-02-2022

@theinsideguy
I can only think of a Non-RegEx way. 😁

binuacs · ‎02-03-2022

@theinsideguy another method using generate row tool

theinsideguy · ‎02-03-2022

Great solution! One of the issues I'm having is that I have around a million records that are many pages of text each. Placing each of these words into a row would end up being billions of rows. Processing would probably take days! Is this the only way? Or is there a faster way that anyone knows of?

Thanks!

trevorwightman · ‎02-03-2022

Here is my solution and I think it should running 1M records should be no problem at all. Let me know what you think!

First, you want to find where the word "Crazy" is. Then you want to take the length of this text, minus the length of that same text after removing spaces. This will tell you how many spaces are before the word "Crazy".

Next, you will want to do the same thing as the above, but for the word "Programmer". This will tell you how many spaces are before the work "Programmer".

Now that you know how many spaces are before each word you can take the absolute value of the difference and add 1 which will give you the number of words between "Crazy" and "Programmer".

From here all you need to do is a simple conditional statement to determine if the number of words is >=5 or not.

EDIT: You can actually make this a bit more buttelproof by first stripping out any punctuation (I did common ones like period and comma). Then search for " crazy " and " programmer " with spaces surrounding the words so you don't inadvertantly pick up other words that contain these strings. But this may be a bit excessive since a quick google search shows that there are many of these extra words :).

EDIT 2 (and final edit): I actually like this way a bit more. Instead of counting spaces, just count words in each substring and then find the difference.

Qiu · ‎02-03-2022

@theinsideguy
You have a point.
I am doing something like this.
I will cut the head and tail before and after ”crazy” and "programmer".

then get the word of remaining, then minus -1 to get the distance.
In this way, the rows remains same. but of course, needs some adjustment for your real data.

Alteryx Designer Desktop Discussions

Proximity Searching on a String