Hello!
I have a text file that I am having trouble parsing out a file using the RegEx tool. I am able to parse out date and reference using the following regex but having trouble with Ful name and remaining columns.
(\d{2}/\d{2}/\d{4})\s(\<\w+\>)\s+
Sample text file attached.
Solved! Go to Solution.
Also having trouble with the data shifting if the code column is blank.
@CsmithData31 I won't lie, that was horrible. The data was tricky. Hopefully this works for you, I have no desire to do it again 🤣:
i think a better idea it to talk to someone about their data quality...
Another way of parsing the data by column width, not by separators.
When we parse a data with fixed width on Excel, we specify the length of each field before importing the data.
In parallel to this, I propose to add a line to pass the length information to Alteryx as below:
Then input this data to Alteryx:
Using the "+" sign on the 2nd line, we can split the data rows into fields.
I think there are pros and cons in this solution.
Pro: You don't need to "count" the column position to edit a long RegEx formula.
Con: You need to edit the input data beforehand. So this may not work in some use cases.
I hope you like it.
Hi, @CsmithData31
There is a specific way by regex method for you: (and it be easy, if you can give more sample data, maybe it can just for you want.)
1- Select : change you field data size to adjust more string.
2- Formula: format your string by regex expression again.
3- Text To Columns: Split string by specific delimiter.
4- Dynamic Select: remove the input field. (no need)
5- Data Cleansing: remove null columns.
6- Dynamic Rename: Rename by first row.
REGEX_Replace(REGEX_Replace(REGEX_Replace(REGEX_Replace([Txt], '(\/[[:alpha:]]+)(\s+?)(?=[A-Z]+)', '$1 '),'^[\d\/]+','$& '),'(COLUMN)(\s+?)(\d+)','$1$3 '),'\s{2,20}','|')
******
If it can help you , please mark it as a solution and give a like for more share.
@Yoshiro_Fujimori
Nice idea with the column width.
By the way:
I think one of the names crept into the next column!
@flying008 same issue with yours:
I wonder if that happens because of the column width idea? Interesting.
Unless CABABAGE is actually a code and I made a wrong assumption 😂.
Thanks!
Re your comment:
> I think one of the names crept into the next column!
Well, just looking at the column width, it seems to be working as intended...
In this particular case, we cannot use "\s+" as a separator, as "Full name" contains a space in it.
If we separate "Full name" to "First name" and "Last name", it might work. (assuming nobody has a "Middle name" 😀)
Input
Output
User | Count |
---|---|
19 | |
15 | |
13 | |
9 | |
8 |