Hi, playing with the trial version of Alteryx Designer and threw some particularly dirty, but actual, data at it. My first challenge is that some of the user-entered dates have typos. I've split the date into 3 columns so that I can clean up the year then re-glue it back together again. For example I have the following years:
3019
3017
...
2107
2106
2071
2027
Most of these are pretty clearly typos especially since these dates are historical (test scores). So my question - do I really have to write a fairly convoluted IF to clean these? Is there a CASE/SWITCH statement I can use to at least group them? Is there a better way?
Thanks.
Solved! Go to Solution.
Using a filter tool you can say [date] <= datetimenow() so all of the historical records will come out of the True and future records will come out of the false. Typos can be tricky because there need to be ways that you can establish what was actually intended. For instance, if the first digit of the year starts with a 3, you may be able to infer that they always meant to put a 2. Similarly, if the number is 2071 they may have actually meant 2017. Where it gets tricky is if someone typed 2018 and actually meant 2017, it is impossible to know that was the case.
So maybe as a start, you can split to the historical and future streams using a filter tool, then use a regex tool to split the years into three new columns using something like (\d{1})(\d{1})(\d{2}). Then you can use a formula tool to set the 3 in the first field to a 2 using a formula tool and then figure out rules for how you want to deal with other potential typos.
@YeahMan Is only the first digit that's a type (3 -> 2) or sometimes all the 4 digits? How do you know which digits are wrong?
You can use the formula tool or the regex tool to parse out the digits into 4 fields then add logic to fix the YYYY.
Do you have a before > after for the dataset you shared?
I ended up using a variation of your solution with a RegEx_Replace function. Thank you for your help.
User | Count |
---|---|
17 | |
16 | |
14 | |
6 | |
5 |