Important Community update: The process for changing your account details was updated on June 25th. Learn how this impacts your Community experience and the actions we suggest you take to secure your account here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting a column of data from a text file using RegEx (regular expressions)

45179902
8 - Asteroid

I'm trying to extract some data from a .txt file. The data looks messy with stacked headings:

 

Form 1

First Name       Last Name         Age

Occupation      Organisation      Residency

------------------------------------------------------------

Jonna               Karly                 53

Instructor          NYU                  Los Angeles

Hsiao-Ran        Liu                     22

Student             NYU                  Kaohsiung

 

Form 2

First Name       Last Name         Age

Occupation      Organisation      Residency

------------------------------------------------------------

Jonna               Karly                 53

Instructor          NYU                  Los Angeles

Hsiao-Ran        Liu                     22

Student             NYU                  Kaohsiung

...

(Disclaimer: All data presented are fictional, no personal data is leaked.)

 

As can be seen, the headings are stacked on top of each other so each record has two rows of data. Note that the dashed line under the headings is also included in the text file. Also there're multiple forms sharing the same layout stacked on top of each other in the text file as shown here.

 

For the Jonna Karly record, First Name matches Jonna, Last Name matches Karly, Age matches 53, Occupation matches instructor, Organisation matches NYU, and Residency matches Los Angeles and so on.

 

Suppose that I'm trying to extract all the ages in the Age column (53, 22, etc.) for each form, how do I achieve this? I know regular expression can help, but I have absolutely no idea how to deal with this type of structure.

 

FYI, I've tried using delimiters when importing the text file into Alteryx Designer, but for whatever reason, the error "too many values in record #" continues to pop up, so I got to use no delimiter, and now everything's squeezed in one column. I believe RegEx is the only solution.

 

4 REPLIES 4
ArtApa
Alteryx
Alteryx

Hi @45179902 - Yes, you need to read it with no delimiters and then parse it. However, the structure of your file does matter a lot. Without it every solution will be just a guess. Can you please provide a sample file? It may have just a couple of rows and you can mask the values. However, the number of symbols in each row and symbols between columns should remain unchanged. 

45179902
8 - Asteroid

@ArtApa Thank you for the response. Can you help parse out the data in the given example? If so, maybe I can try find out a way to modify it and apply to my files.

 

How do I provide a sample file? How exactly can I mask the data? The files are full of customer information, and I'm in the banking sector, and we have zero tolerance for leaking internal data and files.

ArtApa
Alteryx
Alteryx

Hi @45179902 - Here is an example:

ArtApa_0-1641796062332.png

 

atcodedog05
22 - Nova
22 - Nova

Hi @45179902 

 

Here is how you can do it. I am use none(\0) as delimiter

Workflow:

atcodedog05_0-1641797172261.png

 

Hope this helps : )

Labels