Good day,
I am having trouble removing some bad data from my input file before writing the data to the database (Microsoft Sql Server).
The bad data has a lots of different values but one example for a field shows:
æ±äº¬éƒ½è±Šå³¶åŒºå—å¤ |
I am unsure how to remove these special characters or characters of some sort from my dataset.
I have tried importing the csv with different Code Pages and tried removing the values for the various fields using the following formula tools:
Formula tool 1: REPLACE(REGEX_REPLACE([LAST_NAME],"(.*)(?:[^x20-x7E])(.*)","$1"+"$2")," "," ")
Formula tool 2: Regex_Replace([LAST_NAME],"[[:unicode:]]",'FAIL') = [LAST_NAME]
I have also tried flagging the records by creating a new field:
IF REGEX_CountMatches([FIRST_NAME],"[^'a-zA-Z\s\d-]+") > 0
then 'INVALID' else 'Fine'
endif
Any help would be much appreciated.
Cheers,
Clayton.