I am attempting to run a fuzzy match on some customer data and reduce the list to as few records as possible, as an employee will have to review the data and make decisions. I am encountering an issue where there are line breaks/carriage returns in my file, which is causing a number of duplicate records that have a match score of 0 because it's treating it like a separate unique value. It's not entirely apparent when looking at the data to know which records are duplicated because of this.
I am looking for a way to remove all line breaks/carriage returns from a singular column within my dataset. I have attempted to use the data cleansing tool to remove these line breaks, but it is not working. I have also tried using the text to columns and trim features in excel, but those did not work either. Are there any other suggestions on how I could do this within Alteryx?
Thanks!
I would recommend regex. You can match all 'non printable' characters out of the standard ascii range with REGEX_Replace([Field1], '[^ -~]+', '') in a formula tool. If you want new line specifically it will be REGEX_Replace([Field1], '\n', '')
Not a problem. If you would like me to put the full workflow in here let me know.
Hey @mariaabbott, you can also just use a standard Replace() function here. For the target, literally just hit enter (a line break) inside your quotes:
Replace([DepotName], '
', '')
Thank you both for the suggestions! I tried both of them on the sample data and it worked perfectly, but when I applied it to my actual data set it didn't remove the line breaks. I'll play around with it more and hopefully be able to determine some additional questions.
#
You mentioned trying the Data Cleanse tool. Did you try the checkbox highlighted in yellow?