hello,
I'm struggling to create a table with matching from a list of string with wildcards.
I tried to follow the suggestion on this post using the Find and Replace tool, but it works only removing the wildcard * and only when the wildcard is not in the middle of my string to search. So I get only partially results.
I explain what I've been trying to achieve:
1. I have a pattern list with a related classification field in one table similar to this:
pattern | classification |
*activate access* | class A |
*event log service*stopped | class B |
app1*stop* | class C |
2. I have a long list of descriptions from another table:
description |
the user has activate access to xyz |
the event log service was stopped |
app1 has suddenly stopped |
the event log service was restarted |
.. |
.. |
Searching the pattern string from the first table into each description field, I have to obtain the classifications that match that pattern (for some I might not have the pattern which is fine). So resulting table should be something like this:
description | pattern | classification |
User1 need to activate access to xyz | *activate access* | class A |
the event log service was stopped | *event log service*stopped | class B |
app1 has stopped | app1*stop* | class C |
the event log service was restarted | n/a | n/a |
… | ||
.. |
As said, with the find and replace tool (with removing the initial and last *) I match some pattern (like "activate access"), but where the * is in the middle I couldn't find a way to get the match that I need.
Any suggestion is appreciated!
thanks in advance!
Solved! Go to Solution.
Hi @Aldric ,
Instead of using the Find & Replace tool, maybe you can use Regex as an alternative. In the attached workflow, I have used an append fields tool to create all combinations of descriptions/patterns. Then, with a formula tool, I have replaced all asterisks to ".*?" which in Regex can be translated as everything (so something exists there, but it's not greedy, so it may also not exist).
The you can check if there is a match in the description to the pattern you have defined, with Regex Match.
Hope that helps, let me know if that worked for you
Cheers,
Angelos
Hi
Please try this formula with Regular expression
if REGEX_Match([Description],'.*activate access.*' )
then 'Class A'
elseif REGEX_Match([Description],'.*event log service.*stopped' )
then 'Class B'
elseif REGEX_Match([Description],'.*app1.*stop.*' )
then 'Class C'
else 'n/a' endif
thanks AngelosPachis!
your idea seems working with the subset of dummy data, but using the full set of patterns and descriptions something gets lost.
I have ~300 patterns and +10K list of descriptions, which some of them can also be the same, but having different ID - which is unique, in my sample data I didn't report all the columns that I have for the descriptions, it is something like:
description | ID | team | date | ... |
the user has activate access to xyz | INC123 | group1 | .. | .. |
the event log service was stopped | INC456 | group2 | .. | .. |
app1 has suddenly stopped | INC789 | group3 | ... | .. |
the event log service was restarted | INC012 | group1 | .. | .. |
app1 has suddenly stopped | INC345 | group3 | ||
.. |
I guess that point where I get confused is when I have the same description, but I I think we are on the right way. Let me try to play a bit more.
Thanks Tyro_abc,
this work with a small list of patterns (like 3 as in your example), but I have more than 300 patterns that can potentially change in the patterns table, I'd need something that is not using the string hardcoded in the regex.
Ah I see @Aldric , probably something gets lost in the summarize tool.
Maybe try grouping on the new fields (ID, team) as well? There is always the solution of using a filter tool to only keep those that the pattern matches and process those that you can't find a match separately.
You can achieve that by using a filter tool after the formula tool where the regex_match formula is found
I added the filtered the -1 match after the formula with the regex and added the grouping by ID (after the grouping by description).
Result is much better, but not perfect.
I tried with a subset of 200 descriptions and all the patterns table (313).
I ended up with 175 records, which classification is correct.
After the filter I have 207 records, all matching -1. I think I have different descriptions that matches more patterns (not a big issue, the first match is good enough).
I get the 175 records after grouping. What I’ve lost is the descriptions that don’t match
So I added a union tool with 200 descriptions not matching + the 175 matching (unique, not all 207)
Then keep grouping by ID and description as before, I get the right results: 200 records with 175 match (all ok) and 25 n/a
I already tried with the full list of descriptions (+10K), but despite creating 3M of rows in the first step, the workflow produces the expected result. Now I have to improve the pattern list, but that’s another issue.
Thanks a lot for your hints! I learned a lot this time
I tried the same with 200 descriptions and the whole set of patterns.
I ended up with 145 records in total, which include 23 not matching, but are wrong for some reasons.
Thanks anyway!
can you share some example where it is not matching? Wondering if we have case sensitivity problem?
Hi Arundhuti,
I tried again, the matching step looks ok. I checked in detail. The issue is not there.
After the regex match I get 209 record matching (they looks ok and there few more than the initial number of records, because some pattern are matching twice).
The sorting step is also ok, but then I get 123 records matching (correctly) and 22 not matching. These ones are correctly not matching as well.
Problem is I get ultimately only 145 records, 55 are missing.
I checked the output at each step. Until the sort step I still have 209 matching (and thousands of not matching). With the sample step by “first” description I get only 123 matching.
I guess I should add a union like I did with the other workflow to have all the proper results.