Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Data Validation within a Data Set

Justin_SVU
7 - Meteor

I am hoping someone can point me in the right direction as I am spinning my wheels a bit on this one.

 

Scenario:

 

I have a snapshot of our employee population which is about 25,000 employees. In this spreadsheet there are about 50 columns 5 of which identify where in the organization you fall (Company, Region, Function, etc...). These are manually populated columns that are 95% accurate. You can determine the correct sorts by looking at someone's location, department, direct mgr, and 1st indirect mgr. I want Alteryx to look at this and be able to point out the 5% who don't look like the rest and give us it's best guess as to the proper value. This will allow us to improve our reporting and data integrity in our HRIS system.

 

 

Any help or guidance in the right direction would be appreciated. Currently, I have tried the predictive tools, but I am unsure if I am just using them wrong or that is the wrong route. I have watched videos on the predictive tools, but when I try with this spreadsheet it gets hung up around 20% and I'm not sure if that is user error or the large amount of data.

 

Thank you!

 

3 REPLIES 3
Claje
14 - Magnetar

If I'm understanding this use case correctly, you're really looking for cases that match a pattern really well, and cases that don't match a pattern, so you can identify what to do with them.

At a very high level, I would recommend taking a look at the Fuzzy Match tool.  This tool can take some getting used to, but it can do things like see how close two strings are to one another, and provide confidence thresholds alongside this.  This will probably accomplish what you are looking for.

Justin_SVU
7 - Meteor

Thank you for the reply. I think you are pretty close with your understanding. At a basic level if the leader is John and department is HR then the region should always be Corporate and function HR and you can see this in the data where 95% of people match this, but then 5% have an invalid region or function and we want that 5% highlighted somehow.

 

I will take a look at the Fuzzy Match tool. Thank you!

Justin_SVU
7 - Meteor

As you said it took me a bit to figure out the fuzzy match, but I do believe this got us to where we needed to be. Thank you!

Labels