I have a pipe delimited file, which I am bringing in using the input tool and reading it in as a delimited text file.
I need to find all fields with extra white spaces, but when I use REGEX_CountMatches, Nothing is returned. It appears that the white space is cleaned before I even do anything with the file. I am essentially running a data quality check on a file that a partner of ours will be sending us, so I need to be able to find all of the white spaces in the file, so they can correct this before sending it through an automated process.
A | B | C |
1 | 2 | 3 |
I need a way to return the spaces after 1 and the spaces before two, but alteryx is cleaning this up! How do I get Alteryx to keep the spaces??
Solved! Go to Solution.
Hi @MamaGidge
I'm not sure if this is standard behaviour or only happens with pipe delimiters, but you can get around it this way
Change the delimiter in your Input to \0(no delimiter).
Then split the field on the pipe and use a dynamic rename to get the field names
Edit: I checked with other delimiters and this behaviour occurs in all cases. According to the main wiki article on CSV, removing the leading and trailing spaces makes this implementation non-compliant with RFC-4180. It may be something you want to bring up as an issue.
Dan
Hi @MamaGidge
Interesting point.
A workaround would be using no delimiters (actually, \0 delimiter) and Text to Columns right after, but this would take extra work, since you would have to use a fixed number of columns or to do some tranposing if you split to rows.
Cheers,
Hey @MamaGidge
Try this:
Import the file with no delimiters and then use text to columns to seperate out the data
Hi @MamaGidge
If you read the file in without a delimiter (use \0 instead of |) and then use a Text to Column tool to parse the file using the pipe, both the leading and the trailing white space will be persisted.
Hope that helps!
Michal
Thanks! This did work. I had to make one change. Since the actual file I am working has 87 columns, I had to change the field length in the input file. Without changing the field length, I was losing more than half of my header fields.