Alteryx Designer Desktop Discussions

45179902 · ‎01-09-2022

I'm trying to extract some data from a .txt file. The data looks messy with stacked headings:

Form 1

First Name Last Name Age

Occupation Organisation Residency

------------------------------------------------------------

Jonna Karly 53

Instructor NYU Los Angeles

Hsiao-Ran Liu 22

Student NYU Kaohsiung

Form 2

First Name Last Name Age

Occupation Organisation Residency

------------------------------------------------------------

Jonna Karly 53

Instructor NYU Los Angeles

Hsiao-Ran Liu 22

Student NYU Kaohsiung

...

(Disclaimer: All data presented are fictional, no personal data is leaked.)

As can be seen, the headings are stacked on top of each other so each record has two rows of data. Note that the dashed line under the headings is also included in the text file. Also there're multiple forms sharing the same layout stacked on top of each other in the text file as shown here.

For the Jonna Karly record, First Name matches Jonna, Last Name matches Karly, Age matches 53, Occupation matches instructor, Organisation matches NYU, and Residency matches Los Angeles and so on.

Suppose that I'm trying to extract all the ages in the Age column (53, 22, etc.) for each form, how do I achieve this? I know regular expression can help, but I have absolutely no idea how to deal with this type of structure.

FYI, I've tried using delimiters when importing the text file into Alteryx Designer, but for whatever reason, the error "too many values in record #" continues to pop up, so I got to use no delimiter, and now everything's squeezed in one column. I believe RegEx is the only solution.

ArtApa · ‎01-09-2022

Hi @45179902 - Yes, you need to read it with no delimiters and then parse it. However, the structure of your file does matter a lot. Without it every solution will be just a guess. Can you please provide a sample file? It may have just a couple of rows and you can mask the values. However, the number of symbols in each row and symbols between columns should remain unchanged.

45179902 · ‎01-09-2022

@ArtApa Thank you for the response. Can you help parse out the data in the given example? If so, maybe I can try find out a way to modify it and apply to my files.

How do I provide a sample file? How exactly can I mask the data? The files are full of customer information, and I'm in the banking sector, and we have zero tolerance for leaking internal data and files.

ArtApa · ‎01-09-2022

Hi @45179902 - Here is an example:

atcodedog05 · ‎01-09-2022

Hi @45179902

Here is how you can do it. I am use none(\0) as delimiter

Workflow:

Hope this helps : )

Alteryx Designer Desktop Discussions

Extracting a column of data from a text file using RegEx (regular expressions)

extracting seelcted text from a column

Help in Regex Expression

regex expression to extract URL's

Extracting Text from Data Column Using Regex

SAP Purchasing - Extract Contract Data (Table Extr...

No data in configuration pane of tool immediately ...

Re: How to separate IDs from the text

Re: Time delay for Email Tool

Re: How to dynamically name files without creating...

Re: Iterative canvas execution with different file...