Alteryx Designer Desktop Discussions

Aldric · ‎03-19-2021

hello,

I'm struggling to create a table with matching from a list of string with wildcards.

I tried to follow the suggestion on this post using the Find and Replace tool, but it works only removing the wildcard * and only when the wildcard is not in the middle of my string to search. So I get only partially results.

I explain what I've been trying to achieve:

1. I have a pattern list with a related classification field in one table similar to this:

pattern	classification
activate access	class A
event log servicestopped	class B
app1stop	class C

2. I have a long list of descriptions from another table:

description

the user has activate access to xyz

the event log service was stopped

app1 has suddenly stopped

the event log service was restarted

..

Searching the pattern string from the first table into each description field, I have to obtain the classifications that match that pattern (for some I might not have the pattern which is fine). So resulting table should be something like this:

description	pattern	classification
User1 need to activate access to xyz	activate access	class A
the event log service was stopped	event log servicestopped	class B
app1 has stopped	app1stop	class C
the event log service was restarted	n/a	n/a
…
..

As said, with the find and replace tool (with removing the initial and last *) I match some pattern (like "activate access"), but where the * is in the middle I couldn't find a way to get the match that I need.

Any suggestion is appreciated!

thanks in advance!

AngelosPachis · ‎03-19-2021

Hi @Aldric ,

Instead of using the Find & Replace tool, maybe you can use Regex as an alternative. In the attached workflow, I have used an append fields tool to create all combinations of descriptions/patterns. Then, with a formula tool, I have replaced all asterisks to ".*?" which in Regex can be translated as everything (so something exists there, but it's not greedy, so it may also not exist).

The you can check if there is a match in the description to the pattern you have defined, with Regex Match.

Hope that helps, let me know if that worked for you

Cheers,

Angelos

Tyro_abc · ‎03-19-2021

Hi

Please try this formula with Regular expression

if REGEX_Match([Description],'.*activate access.*' )
then 'Class A'

elseif  REGEX_Match([Description],'.*event log service.*stopped' )

then 'Class B'

elseif  REGEX_Match([Description],'.*app1.*stop.*' )

then 'Class C'

else 'n/a' endif

Aldric · ‎03-19-2021

thanks AngelosPachis!

your idea seems working with the subset of dummy data, but using the full set of patterns and descriptions something gets lost.

I have ~300 patterns and +10K list of descriptions, which some of them can also be the same, but having different ID - which is unique, in my sample data I didn't report all the columns that I have for the descriptions, it is something like:

description	ID	team	date	...
the user has activate access to xyz	INC123	group1	..	..
the event log service was stopped	INC456	group2	..	..
app1 has suddenly stopped	INC789	group3	...	..
the event log service was restarted	INC012	group1	..	..
app1 has suddenly stopped	INC345	group3
..

I guess that point where I get confused is when I have the same description, but I I think we are on the right way. Let me try to play a bit more.

Aldric · ‎03-19-2021

Thanks Tyro_abc,

this work with a small list of patterns (like 3 as in your example), but I have more than 300 patterns that can potentially change in the patterns table, I'd need something that is not using the string hardcoded in the regex.

AngelosPachis · ‎03-19-2021

Ah I see @Aldric , probably something gets lost in the summarize tool.

Maybe try grouping on the new fields (ID, team) as well? There is always the solution of using a filter tool to only keep those that the pattern matches and process those that you can't find a match separately.

You can achieve that by using a filter tool after the formula tool where the regex_match formula is found

Tyro_abc · ‎03-19-2021

Hi Aldric

Try this , I dynamically generated expression for Regex. In the Sort tool, we have to use some text so that 'n/a' comes after any 'Class".

Let me know if you need any explanation.

Kept some extra field for easier understanding.

Regards

Arundhuti

Aldric · ‎03-20-2021

@AngelosPachis

I added the filtered the -1 match after the formula with the regex and added the grouping by ID (after the grouping by description).

Result is much better, but not perfect.

I tried with a subset of 200 descriptions and all the patterns table (313).

I ended up with 175 records, which classification is correct.

After the filter I have 207 records, all matching -1. I think I have different descriptions that matches more patterns (not a big issue, the first match is good enough).

I get the 175 records after grouping. What I’ve lost is the descriptions that don’t match

So I added a union tool with 200 descriptions not matching + the 175 matching (unique, not all 207)

Then keep grouping by ID and description as before, I get the right results: 200 records with 175 match (all ok) and 25 n/a

I already tried with the full list of descriptions (+10K), but despite creating 3M of rows in the first step, the workflow produces the expected result. Now I have to improve the pattern list, but that’s another issue.

Thanks a lot for your hints! I learned a lot this time

@Tyro_abc

I tried the same with 200 descriptions and the whole set of patterns.

I ended up with 145 records in total, which include 23 not matching, but are wrong for some reasons.

Thanks anyway!

Tyro_abc · ‎03-20-2021

can you share some example where it is not matching? Wondering if we have case sensitivity problem?

Aldric · ‎03-21-2021

@Tyro_abc

Hi Arundhuti,

I tried again, the matching step looks ok. I checked in detail. The issue is not there.

After the regex match I get 209 record matching (they looks ok and there few more than the initial number of records, because some pattern are matching twice).

The sorting step is also ok, but then I get 123 records matching (correctly) and 22 not matching. These ones are correctly not matching as well.

Problem is I get ultimately only 145 records, 55 are missing.

I checked the output at each step. Until the sort step I still have 209 matching (and thousands of not matching). With the sample step by “first” description I get only 123 matching.

I guess I should add a union like I did with the other workflow to have all the proper results.

Alteryx Designer Desktop Discussions

Matching pattern list with wildcard to another table

Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: Single App that filters user selections dynami...

Re: Sorting Sheet Names