Hello,
I have an excel worksheet with 6 columns of data - A to F
Now I need to filter data/highlight cells which contains any character other than a predefined set.
Current set: "A-Z" "a-z" "0-9" " " "/" "*" "(" ")" "&" "." "_", "-"
Anything other than the above in any cell of column A to F must be either highlighted, or those rows filtered out for DQ issues (Highlight if possible but filtering is alright as well)
Example:
231_ABCD&BH (This is right and not be highlighted or flagged)
213_HR. abc (This is right too)
980_*24/01 HSU-h ( This is right as well as all characters are part of that set)
23401_ABC, DEF (Wrong, highlight or flag as comma is not part of set)
675_*23/02 S@D (Wrong again as @ is not part of set)
Any character outside of that set will make that cell wrong and must be highlighted or flagged in columns A to F.
Hi @ShantanuDagar,
Whilst this is likely not an optimal solution, this works as intended.
1. The top text input is the strings desired for analysis.
2. A text input is added each for numeric characters (0-9), alpha characters (A-Z), and allowed symbols (including space, ' ').
3. Some data preparation takes place.
4. RegEx tool is used to split the strings to individual lines.
5. 3 successive joins take place to eliminate the 'allowed' characters.
6. The 'Left' output of the third join contains strings with any characters that were not found in the allowed character lists.
7. Summarize groups the remaining/invalid strings and shows the invalid characters.
8. Final join rejoins this to the original strings.
Again, this may not be optimal, but it does work.
The use of text input tools allows the lists of allowed characters to be customized more easily than having a complex Formula tool, so I opted to use this method instead.
Let me know if you have any questions.
EDIT: The way that you then go about flagging these cells is up to you. Happy to provide insights if you can provide a dataset to work from, but if not, it shouldn't be too tricky since the above will show the exact value that is erroneous.
Cheers,
Peter