I'm trying to put together an Alteryx workflow which I can use on a regular basis which can serve as a standard for the QA of data contained in Excel files rather than carrying out the QA in Excel before transferring over to Alteryx.
Any help with this will be appreciated.
Thanks.
Olu
Solved! Go to Solution.
Hi @Ojay
There a lot of tools under the "Data Investigation" tab in Alteryx that can do QA of your data.
Maybe if you provided us an example of how you're used to do it in Excel, we could suggest some tips to do it in Alteryx.
Cheers,
Thanks for your reply.
In Excel, I would go through every column individually to check for the following:
Kind regards
These are some example on how to do QA of your datasets:
- With RegEX tool and RegEX function in formula Tool - you can identify patterns of date format and correct them.
If you don't know what RegEX is, I recommend you to study more about it. Here are some topics in the community and a website to help you:
https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-RegEx/ta-p/37689
https://community.alteryx.com/t5/Alteryx-Knowledge-Base/RegEx-Examples-12-Handy-Use-Cases/ta-p/40680
- Basic Data Profile Tool hands you a lot of useful information from all of your fields
Including Leading and Trailing whitespaces, longest length, number of nulls etc.
Here is a useful topic on this tool
https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-Basic-Data-Profile/ta-p/28610
- Field Info tool brings MetaData info to you. Field types, field sizes, names. Here is a topic about it.
https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-Field-Info/ta-p/60723
- Frequency Tool is able to identify the frequency of each value on each field. Perfect to find duplicates
- Field Summary is similar to Basic Data Profile. It focus on the fields you select.
These are some tools to do Data QA. Browse Tool also has the ability to do most of what these tools do, and you can use it to do ad-hoc analysis on the quality of your data.
I'm appending the package with the dataset analyzed and all the tools commented.
I hope I was able to clear your mind on this topic and that this can serve to boost your interest in Alteryx.
Cheers,
@Thableaus
Truly appreciated.
Will look into the links you sent as well as trying out the workflow.
Best regards
Hi @Thableaus
I'm actually trying it now.
How do I resolve instances of dates not matching the required format?
Your help is truly appreciated.
Regards
check the RegEX tool (green one in my workflow).
It's useful to check formats, patterns, etc. That's the way you can identify if a string date is in an adequate format.
Cheers,
@Thableaus: I just wanted you to help me clarify what this expression (i.e. date format) the reg-ex tool is checking in this instance?
Yes the RegEx tool is checking if the date is in the right format.
It demands a bit of understanding of how RegEX works, but working with dates is pretty intuitive and easy.
Cheers,