String/Name match formula/algorithm | Finding substring in a string
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi all,
I'm trying to build a string/name match formula/algorithm that gives me a match between strings such as these:
(Example for reference)
String 1 : A | MndALv5 | D | EFJFJ
String 2: MndALv5C
MndALv5 is a subset of MndALv5C, therefore it should give me a match. But it doesn't with this formula that I tried:
IF contains([String1],[String2]) OR contains([String2],[String1])
THEN "Match"
ELSE "No Match"
ENDIF
NOTE1: Fuzzy match is not an option because the population of strings is too big to consider making adjustments to fuzzy logic
NOTE2: Doing split to rows by "|" and then checking although is possible but some strings have 50-100 pipes per user and there are 1000 of users in the file, so this approach compromises on the speed of execution.
Looking forward to your response(s).
Thanks.
- Labels:
- Expression
- Fuzzy Match
- Parse
- Regex
- Text Mining
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Miglani ,
are you trying to find matches for "MndALv5" in a column of data with various strings?
If so, this would work.
(where the data is in field "String")
--
IF contains([String],"MndALv5")
THEN "Match"
ELSE "No Match"
ENDIF
--
An example is attached for clarity.
Regards,
Aidan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks for the reply.
It's not exactly substring because there is an extra "C" to it which is where the challenge stems from. Otherwise, it would have been straight forward.
But yes, I am searching the string is a column of strings where the column has values separated by pipes.
So unfortunately, this doesn't quite work because the Alteryx "Contains" only search from start and not from the middle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Enter the additional "C" in the input(second argument in contains) and remove it from the strings in the input column.
It will show as "No Match" when it should show as "Match".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I just took that as an example (for reference).
My actual data has like 20,000+ more records.
This is not really what I'm looking for.
I'll try to explain it again:
I have 2 columns with a lot of values.
Column1 Column2
MndALv5C A | MndALv5 | B | C
No role General|CCP Agent|CCP Agent|General|Report Access
I want to match values of column 1 & column 2.
For example: Here in the first row, the result should show "match" and in second row it should show "no match".
Notice the extra C, which is why contains doesn't work. I already put the contains formula with my question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
can column 1 record have any number of preceeding or following characters like
Column1 Column2
MndALv5C A | MndALv5 | B | C
ABMndALv5C A | MndALv5 | B | C
MndALv5DEF A | MndALv5 | B | C
abMndALv5CDEFGC A | MndALv5 | B | C
and can column 2 have the same ?
Column1 Column2
MndALv5C A | MndALv5de | B | C
ABMndALv5C A | BMndALv | B | C
MndALv5DEF A | MndALv5hbhj | B | C
abMndALv5CDEFGC A | abMndAL | B | C
if this is the case I would try to get the minimum number of characters required to make a successful "match" and go from there or "fuzzy match" but you've said this wont work?
You might be left with your mentioned option to parse the data on "|" and match that way, even if it may take longer, it will be more accurate
------------------------
edit -
Have a look at this post from which may help with the fuzzy match option.
I did a quick test of Ben's workflow and I think you could use it as a starting point?
see attached.
