Dear community
I am standing in front of a problem which I don't seem to able to fix with the existing tips I found in the forum.
I still feel like this should be a common problem though which should be fixable without editing all the data manually. Maybe someone here is able to help me.
My problem is being caused by extra quotes in text fields (or semicolons). These make Alteryx unable to determine the end of a field correctly. As result, the data is moving to other columns in some places.
I hope the following example is understandable.
My input in csv format looks like this:
"Product";"Number";"Description";"Amount";"Unit";"Date" |
"Product";"1234";"NA";"250";"PCS";"20181212" |
"Product";"1234";"Screen 24"";"250";"PCS";"20181212" |
"Product";"1234";"Screen 28"";"250";"PCS";"20181212" |
"Product";"1234";"NA";"250";"PCS";"20181212" |
"Product";"1234";"NA";"250";"PCS";"20181212" |
"Product";"1234";"Book "Alteryx for dummies"";"250";"PCS";"20181212" |
"Product";"1234";"Game "Tetris" .";"250";"PCS";"20181212" |
"Product";"1234";"NA";"250";"PCS";"20181212" |
When being imported with delimiter ";" the result looks like this:
Row | Product | Number | Description | Amount | Unit | Date |
1 | Product | 1234 | NA | 250 | PCS | 20181212 |
2 | Product | 1234 | Screen 24;"250 | PCS | 20181213 | [NULL] |
3 | Product | 1234 | Screen 28;"250 | PCS | 20181213 | [NULL] |
4 | Product | 1234 | NA | 250 | PCS | 20181212 |
5 | Product | 1234 | NA | 250 | PCS | 20181212 |
6 | Product | 1234 | Book "Alteryx for dummies";"250 | PCS | 20181213 | [NULL] |
7 | Product | 1234 | Game "Tetris" . | 250 | PCS | 20181218 |
8 | Product | 1234 | NA | 250 | PCS | 20181212 |
As you can see in rows 2, 3 and 6 the data doesn't end up in the correct column.
For row 7 Alteryx still gives a warning but the data is in the correct column.
I have tried some workarounds I found here but sadly didn't get the desired result.
Reading in the data as one column with delimiter \0 and then replacing two double quotes ("") with one double quote doesn't work as the data may have trailing whitespaces separating the quotes. If I remove the whitespaces before, blank fields will be represented by two double quotes aswell and also be replaced with a single double quote.
I have also tried the different options in the "Ignore Delimiters in" setting. If I choose "None" I will get the error "too many fields in row XXX" as there are also text fields that contain a semicolon in the data.
I am kind of lost at moment and thankful for any advice.
Thank you!
Solved! Go to Solution.
Hi @artzee1556
You can use some REGEX.
Try REGEX_Replace([Field], "\s{2,}", "|")
This will replace sequences of 2 whitespaces or more for a single pipe.
Cheers,
Try the workflow I posted on 03-21-2019 with delimiter \t