Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Matching pattern list with wildcard to another table

Aldric
6 - Meteoroid

hello, 

I'm struggling to create a table with matching from a list of string with wildcards.

I tried to follow the suggestion on this post using the Find and Replace tool, but it works only removing the wildcard * and only when the wildcard is not in the middle of my string to search. So I get only partially results.

I explain what I've been trying to achieve:

 

1. I have a pattern list with a related classification field in one table similar to this:

patternclassification
*activate access*class A
*event log service*stoppedclass B
app1*stop*class C

 

2. I have a long list of descriptions from another table:

description
the user has activate access to xyz
the event log service was stopped
app1 has suddenly stopped
the event log service was restarted
..
..

 

Searching the pattern string from the first table into each description field, I have to obtain the classifications that match that pattern (for some I might not have the pattern which is fine). So resulting table should be something like this:

 

descriptionpatternclassification
User1 need to activate access to xyz*activate access*class A
the event log service was stopped*event log service*stoppedclass B
app1 has stoppedapp1*stop*class C
the event log service was restartedn/an/a
  
..  

 

As said, with the find and replace tool (with removing the initial and last *) I match some pattern (like "activate access"), but where the * is in the middle I couldn't find a way to get the match that I need.

 

Any suggestion is appreciated!

thanks in advance!

 

11 REPLIES 11
AngelosPachis
16 - Nebula

Hi @Aldric ,

 

Instead of using the Find & Replace tool, maybe you can use Regex as an alternative. In the attached workflow, I have used an append fields tool to create all combinations of descriptions/patterns. Then, with a formula tool, I have replaced all asterisks to ".*?" which in Regex can be translated as everything (so something exists there, but it's not greedy, so it may also not exist).

 

The you can check if there is a match in the description to the pattern you have defined, with Regex Match.

 

AngelosPachis_0-1616184625004.png

 

Hope that helps, let me know if that worked for you 

 

Cheers,

 

Angelos

Tyro_abc
11 - Bolide

Hi

 

Please try this formula with Regular expression

 

Tyro_abc_0-1616184929227.png

 

if REGEX_Match([Description],'.*activate access.*' )
then 'Class A'

elseif  REGEX_Match([Description],'.*event log service.*stopped' )

then 'Class B'

elseif  REGEX_Match([Description],'.*app1.*stop.*' )

then 'Class C'

else 'n/a' endif
Aldric
6 - Meteoroid

thanks AngelosPachis!

 

your idea seems working with the subset of dummy data, but using the full set of patterns and descriptions something gets lost.

I have ~300 patterns and +10K list of descriptions, which some of them can also be the same, but having different ID - which is unique, in my sample data I didn't report all the columns that I have for the descriptions, it is something like:

 

descriptionIDteamdate...
the user has activate access to xyzINC123group1....
the event log service was stoppedINC456group2....
app1 has suddenly stoppedINC789group3.....
the event log service was restartedINC012group1....
app1 has suddenly stoppedINC345group3  
..    

 

I guess that point where I get confused  is when I have the same description, but I I think we are on the right way. Let me try to play a bit more.

 

Aldric
6 - Meteoroid

Thanks Tyro_abc,

this work with a small list of patterns (like 3 as in your example), but I have more than 300 patterns that can potentially change in the patterns table, I'd need something that is not using the string hardcoded in the regex.

AngelosPachis
16 - Nebula

Ah I see @Aldric , probably something gets lost in the summarize tool.

 

Maybe try grouping on the new fields (ID, team) as well? There is always the solution of using a filter tool to only keep those that the pattern matches and process those that you can't find a match separately.

 

You can achieve that by using a filter tool after the formula tool where the regex_match formula is found

Tyro_abc
11 - Bolide

Hi Aldric

 

Try this , I dynamically generated expression for Regex. In the Sort tool, we have to use some text so that 'n/a' comes after any 'Class". 

Let me know if you need any explanation.

 

Tyro_abc_0-1616190287748.png

 

Kept some extra field for easier understanding.

 

Regards

Arundhuti

Aldric
6 - Meteoroid

@AngelosPachis

I added the filtered the -1 match after the formula with the regex and added the grouping by ID (after the grouping by description).

Result is much better, but not perfect.

I tried with a subset of 200 descriptions and all the patterns table (313).

I ended up with 175 records, which classification is correct.

After the filter I have 207 records, all matching -1. I think I have different descriptions that matches more patterns (not a big issue, the first match is good enough).

I get the 175 records after grouping. What I’ve lost is the descriptions that don’t match

 

So I added a union tool with 200 descriptions not matching + the 175 matching (unique, not all 207)

Then keep grouping by ID and description as before, I get the right results: 200 records with 175 match (all ok) and 25 n/a

 

I already tried with the full list of descriptions (+10K), but despite creating 3M of rows in the first step, the workflow produces the expected result. Now I have to improve the pattern list, but that’s another issue.

 

Thanks a lot for your hints! I learned a lot this time

 

I tried the same with 200 descriptions and the whole set of patterns.

I ended up with 145 records in total, which include 23 not matching, but are wrong for some reasons.

Thanks anyway!

Tyro_abc
11 - Bolide

can you share some example where it is not matching? Wondering if we have case sensitivity problem?

 

 

Aldric
6 - Meteoroid

@Tyro_abc

 

Hi Arundhuti,

 

I tried again, the matching step looks ok. I checked in detail. The issue is not there.

 

After the regex match I get 209 record matching (they looks ok and there few more than the initial number of records, because some pattern are matching twice).

The sorting step is also ok, but then I get 123 records matching (correctly) and 22 not matching. These ones are correctly not matching as well.

Problem is I get ultimately only 145 records, 55 are missing.

I checked the output at each step. Until the sort step I still have 209 matching (and thousands of not matching). With the sample step by “first” description I get only 123 matching.

I guess I should add a union like I did with the other workflow to have all the proper results.

Labels
Top Solution Authors