Hi,
My scenario is in its basic terms:
if there are 4 rows as:
Jasdev
Jasdev Singh
Jasdev Singh Sachdeva
Jasdev Singh Sachdevaaaa
It should change all 4 of them to Jasdev Singh Sachdevaaaa. Even for the scenario
Logic I am trying to use for them is, if row 2 contains row 1 (complete string, if it is Jasdev R, it shouldnt work) then replace row 1 with row 2. The Expression in multi row formula I have used is created a new column checking and below:
IF CONTAINS([Row+1:Name],[Name])
THEN [Row+1:Name]
ELSE [Name]
ENDIF
Kindly provide some suggestions.
Solved! Go to Solution.
Hi @Jasdev you could try and approach this using the fuzzy match tool in purge mode
This is how I set up the fuzzy matching for the dummy dataset. You will probably need to tweak this to get it closer to what you need. The preprocess step of dealing with addresses I think will be useful for you
There's more information on fuzzy matching here: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-Fuzzy-Match/ta-p/45485
This is how the workflow looks:
Hi @OllieClarke
It kinda worked but can you please explain me what is happening? Please see the screenshot once. It brings the data into 2 different columns, i can just combine it right?
Hi @Jasdev I've attached an annotated version of the workflow below (and screenshot too). This is slightly different as it first creates a unique list of addresses, and purges that for performance reasons. I'm also assuming your screenshot is from just after the union tool. Your output looks a little different to mine, which I think comes from me having a different field name. So I've changed my field name to mirror yours. It's also worth checking the Select tool after the Join tool is renaming right_street 1 to Group and dropping the original Group
There's more information on the fuzzy match tool here too: https://help.alteryx.com/current/FuzzyMatch.htm and I would recommend looking at the example workflow that Alteryx provides (click on the tool in the toolbar and then click on 'open example')
So, the fuzzy match tool in purge mode allows you to group similar strings together. You can define what you deem to be similar in the match style configuration panel. In my case, I told Alteryx to pre-process the strings to remove normal address stuff, and then use the Levenshtein Distance of the strings to work out how similar they are. I then use the match threshold to say that I want strings which are >=75% similar according to the levenshtein distance to count as matches, and to ignore anything below that threshold.
The make group tool then takes the output of the fuzzy matching tool, and groups the record ids. (i.e. records 4,6,7 are all matches according to our fuzzy match tool, so the make group tool assigns them all the group '4').
We then join this group onto the unique list of addresses, and then join the group's address onto each record.
I then clean this slightly, by replacing the recordID in Group with the content of Right_street 1, and then drop that column (actually done by dropping group and renaming right_street 1 to group
However, we need to union the L output of the join tool onto this data so that we don't lose any addresses which weren't assigned a group. so we now have this:
At this point, we can replace the Null groups with field_1, as they can be thought of as in a group by themselves.
Finally, we can join the groups back onto the original data, so we don't lose any duplicates:
I hope that makes sense and helps
Thank you so much sir, it gives me alot better output so far. In your screenshots as well as my data, i observed this difference, do you know why isnt this handled?
1 CNN CENTER | 1 CNN CENTER |
1 CNN CENTER DOCK 4 | 1 CNN CENTER DOCK 4 |
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
5 |