Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx Tool

CsmithData31
7 - Meteor

Hello! 

 

I have a text file that I am having trouble parsing out a file using the RegEx tool. I am able to parse out date and reference using the following regex but having trouble with Ful name and remaining columns.

(\d{2}/\d{2}/\d{4})\s(\<\w+\>)\s+

 

Sample text file attached. 

 

CsmithData31_0-1680205944372.png

 

13 REPLIES 13
CsmithData31
7 - Meteor

Also having trouble with the data shifting if the code column is blank.

BS_THE_ANALYST
14 - Magnetar

@CsmithData31 I won't lie, that was horrible. The data was tricky. Hopefully this works for you, I have no desire to do it again 🤣:

BS_THE_ANALYST_0-1680208907612.png

 

 

Matthew
11 - Bolide

i think a better idea it to talk to someone about their data quality...

BS_THE_ANALYST
14 - Magnetar

@Matthew lmao. I was saying this as I was doing the RegEx. Bad data can be fine if it's consistently bad, but there's no consistency amongst this bad data 😂

 

Yoshiro_Fujimori
15 - Aurora

@CsmithData31 ,

Another way of parsing the data by column width, not by separators.

 

When we parse a data with fixed width on Excel, we specify the length of each field before importing the data.

In parallel to this, I propose to add a line to pass the length information to Alteryx as below:

Yoshiro_Fujimori_0-1680219196485.png

 

Then input this data to Alteryx:

Yoshiro_Fujimori_1-1680219254254.png

Using the "+" sign on the 2nd line, we can split the data rows into fields.

Yoshiro_Fujimori_0-1680220623139.png

 

 

I think there are pros and cons in this solution.

Pro: You don't need to "count" the column position to edit a long RegEx formula.

Con: You need to edit the input data beforehand. So this may not work in some use cases.

I hope you like it.

flying008
14 - Magnetar

Hi, @CsmithData31 

 

There is a specific way by regex method for you:  (and it be easy, if you can give more sample data, maybe it can just for you want.)

 

录制_2023_03_31_10_45_28_485.gif

 

1- Select : change you field data size to adjust more string.
2- Formula: format your string by regex expression again.
3- Text To Columns: Split string by specific delimiter.
4- Dynamic Select: remove the input field. (no need)
5- Data Cleansing: remove null columns.
6- Dynamic Rename: Rename by first row.

Spoiler
REGEX_Replace(REGEX_Replace(REGEX_Replace(REGEX_Replace([Txt], '(\/[[:alpha:]]+)(\s+?)(?=[A-Z]+)', '$1 '),'^[\d\/]+','$&  '),'(COLUMN)(\s+?)(\d+)','$1$3  '),'\s{2,20}','|')

******

If it can help you , please mark it as a solution and give a like for more share.

 

BS_THE_ANALYST
14 - Magnetar

@Yoshiro_Fujimori 
Nice idea with the column width.

By the way:

BS_THE_ANALYST_0-1680250079056.png


I think one of the names crept into the next column! 

 

BS_THE_ANALYST
14 - Magnetar

@flying008 same issue with yours:

BS_THE_ANALYST_1-1680250214069.png


I wonder if that happens because of the column width idea? Interesting.

Unless CABABAGE is actually a code and I made a wrong assumption 😂.

 

Yoshiro_Fujimori
15 - Aurora

@BS_THE_ANALYST ,

Thanks!

 

Re your comment:

> I think one of the names crept into the next column! 

 

Well, just looking at the column width, it seems to be working as intended...

 

In this particular case, we cannot use "\s+" as a separator, as "Full name" contains a space in it.

If we separate "Full name" to "First name" and "Last name", it might work. (assuming nobody has a "Middle name" 😀)

 

Input

Yoshiro_Fujimori_2-1680250921836.png

Output

Yoshiro_Fujimori_3-1680250960993.png

 

 

 

Labels