Hi Team,
Hope your well.
I am hoping you can help me or point me in the right direction.
I would like to perform a fuzzy match / match that (compares two data sets against each other), but instead of running it across the whole data set, I would like the fuzzy match / match to only run for child records for a given parent
Hopefully the below explains my use case
Input 1: Master
Contains following 2 columns
id | address |
1111 | 2 high street, new york, united states |
2222 | 101 central avenue, new york, united states |
Input 2 : Data
Contains the following three columns;
Example data set ;
id | parentid | address |
1111 | 1 high street, new york, united states | |
1111a | 1111 | 2 high street, new york, united states |
1111b | 1111 | 3 high street, new york, united states |
1111c | 1111 | 4 high street, new york, united states |
2222 | 100 central avenue, new york, united states | |
2222a | 2222 | 101 central avenue, new york, united states |
2222b | 2222 | 102 central avenue, new york, united states |
2222c | 2222 | 103 central avenue, new york, united states |
Scenario : 1111
Scenario : 2222
What i would like to happen is that we are not comparing the address of "master" id 1111 against child records or 2222a,2222b,2222c, or "master" id 2222 against child records or 1111a,1111b,1111c
Solved! Go to Solution.
@ArtApa Thank for sharing your solution.
The first file yxwz doesn’t work on my version of Alteryx. Currently using version 19.3.
Checking the second workflow although I am understanding the flow , however its not giving me the output i am expecting. ( more than likely due to my description)
For example input "master"
id | address |
1111 | 2 high street, new york, united states |
Checking against the the "Data" input we can see that there are 3 children for this particular parent
id |
1111a |
1111b |
1111c |
Whilst comparing the addresses of the "master" vs the "child" input for the parent "1111" we can see that there is one child "1111a" where the addresses matches 100%. So it should return that as an outcome
id | address | child id | child address | match % |
1111 | 2 high street, new york, united states | 1111a | 2 high street, new york, united states | 100% |
Lets say in the "master" input i have two addresses for parent 1111
id | address |
1111 | 2 high street, new york, united states |
1111 | 95 highway avenue, Boston, united states |
Whilst comparing the addresses of the "master" vs the "child" input for the parent "1111" we can see that there is one child "1111a" where the addresses matches 100%. However the Boston address is not found in the list of children addresses
id |
1111a |
1111b |
1111c |
the expected outcome should be something like this..
id | address | id | child address | match % |
1111 | 2 high street, new york, united states | 1111a | 2 high street, new york, united states | 100% |
1111 | 95 highway avenue, Boston, united states |
hope that makes more sense? more than happy to connect if you have further questions
Hi @Masond3 - In your new scenario you don't need any Fuzzy logic. You'd use a Join to match 100% and a Union to bring the unmatched record back.
I guess it will be easier to offer you a solution, if you'd offer a data sample that is representative of the entire population of your data. We'd also need a sample of the desired output.
I have attached the data set i am currently working with. On the data tab this is all child information. all i would like to do, is see if the address on the "Master" tab is in the "Child" Tab for a given Id
My thinking is, if i can solve for this use case then it should hopefully meet all the uses cases. Why, as the below use cases has the following 3 scenarios ( and this most you can have )
Input : Master
ID | Address | City | State/Province | Zip/Postal Code | Country |
18003036 | 39 Jeays Street | Bowen Hills | Queensland | 4006 | Australia |
18003036 | 431-439 King William Street | Adelaide | South Australia | 5000 | Australia |
18003036 | 367 Collins Street | Melbourne | Victoria | 3000 | Australia |
18003036 | Level 1 58 Burwood Road | Hawthorn | Victoria | 3122 | Australia |
Input : Data
ID | Parent ID | Address (Line 1, Line 2, Line 3) | City | State/Province | Zip/Postal Code | Country |
001j000000eKjoPAAS | 18003036 | Level 1 431- 439 King William Street | Adelaide | South Australia | 5000 | Australia |
001j000000eKjoQAAS | 18003036 | Suite 11 39 Jeays Street | Bowen Hills | Queensland | 4006 | Australia |
001j000000cVKUrAAO | 18003036 | 367 Collins Street | Melbourne | Victoria | 3000 | Australia |
001j000000cVKT8AAO | 18003036 | Level 1 480 St Kilda Road | Melbourne | Victoria | 3004 | Australia |
001j000000eKh0KAAS | 18003036 | Unit 3 Level 2 10 Moorabool Street | Geelong | Victoria | 3040 | Australia |
Expected outcome
ID | Address | City | State/Province | Zip/Postal Code | Country | child id | child address | % | Reason |
18003036 | 39 Jeays Street | Bowen Hills | Queensland | 4006 | Australia | 001j000000eKjoQAAS | Suite 11 39 Jeays Street | Fuzzy Match | |
18003036 | 431-439 King William Street | Adelaide | South Australia | 5000 | Australia | 001j000000eKjoPAAS | Level 1 431- 439 King William Street | Fuzzy Match | |
18003036 | 367 Collins Street | Melbourne | Victoria | 3000 | Australia | 001j000000cVKUrAAO | Level 31 367 Collins St | Exact Match | |
18003036 | Level 1 58 Burwood Road | Hawthorn | Victoria | 3122 | Australia | No Address matches |
If I think about it. What I'm looking to do is fuzzy match two sources , however I want the matching to only be executed within groups.
For example Alteryx to fuzzy match within Group a, then move on to B, then onto group c, d, etc
Hi @Masond3 - The dataset helped. I designed something that I hope will meet your requirements:
Please note that your child data has 6 groups of similar addresses:
001j000000cVKPrAAO Level 22, ANZ Tower, 242 Pitt Street
001j000000heDOcAAM Level 23, 242 Pitt Street
001j000000eKjhiAAC 105 Mawson Drive
001j000000eKgcKAAS 105 Mawson Drive
001j000000eKjZSAA0 Level 4, 1 York Street
001j000000eKhEMAA0 Level 4, 1 York Street
001j000000cVKUlAAO Level 21, 25 Bligh Street
001j000000eKhGwAAK Level 21 25 Bligh Street
001j000000eKjaGAAS One International Towers, 100 Barangaroo Avenue
001j000000eKju3AAC One International Towers, 100 Barangaroo Avenue
001j000000eKiKZAA0 Suite 13, Edgecliff Mews, 201 New South Head Road
001f100001ZX5ESAA1 Suite 13,201 New South Head Rd
That is why the output has more records than input. You can use a Sample tool to pick just one or do something else, if you want to avoid this.
Hi @Masond3 - Can you please let me know if this worked for you? If so, please mark it as a solution.