Calling all Alteryx customers: Refer your colleague to try Alteryx and receive a $50 gift card. Sign up now!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Parsing a Mainframe Report File

Jerid_Ahrens
5 - Atom

Hello, I'm new to Alteryx and was hoping someone could give me some pointers on what tools would be best for me to use to parse out files like the one shown below.  I need to capture the Store, Suite, Invoice Number, Order Date, Invoice Date, and Facility from the header and then add it to the detail lines.

 

I started using RegEx to capture the detail lines but I'm not sure how to collect the header fields and add to each detail record. 

 

Is this possible in Alteryx?

 

Desired Result

Jerid_Ahrens_3-1637608341073.png

 

 

 

Original Report File 

Jerid_Ahrens_1-1637607630378.png

 

 

3 REPLIES 3
Jerid_Ahrens
5 - Atom

I figured out how to fill in the columns using the Multi-Row Formula tool.

 

I guess at this point I'm just looking for validation that RegEx and the Multi-Row Formula tools along with a Filter and or Select tools are my best option to parse files like this.

CharltonFranklin
8 - Asteroid

Hi Jerid, you probably no longer need this solution, but I imagine that others looking this up will need a solution to this kind of problem. I've worked for banks and many core data providers for banks still provide a lot of data in this format. I've solved for this challenge with a few reports already. It's not simple, but it's not expert level either. I think you're heading down the right path with the Multi-Row formula tool. 

Here's an example of what one of my workflows looks like. I later converted it to an analytic app so that people can convert files on their own. Also, I didn't screenshot the whole workflow, because there's other things I do with the data after I've reformatted it into column and row format.

 
 

Screenshot 2025-02-13 171623.png

 

First you want to write a formula to identify different row types: Report Header, Column Headers, data rows. This is the primary step and can take some time figuring out the unique logic in your report. Once you know how to identify the headers, filter them out. In my screenshot above, I wrote the logic right into the filter tool and it looks like this:

 

FILTER TOOL

 

NOT(
Contains([Field_1], '***') 
or Contains([Field_1], 'BRANCH  ACCOUNT NUMBER') 
or Contains([Field_1], 'ACH NAME') 
or Contains([Field_1], '309         ')
or Contains([Field_1], 'DEBITS                     ')
or Contains([Field_1], 'CREDITS                     ')
or IsEmpty([Field_1])
)

 


While you may need data from the header rows, that's fine, you can extract what you need from the "F" anchor and append it back to your data later. For example, the report date is likely in one of those header rows, but you'll want to add it to every row later.

 

Once I have the data rows isolated, I now have the issue that one record is on multiple rows. In order to resolve that, I write a formula to identify which rows are the first record and which ones are not. In my case, the first row always starts with a number in a specific position.

 

FORMULA TOOL (Added Column "Filter")

 

ToNumber(Trim(Substring([Field_1],11, 12)))

 

 

Then I use the multi-row formula tool to combine all the data into one row. Keep in mind your logic is likely to differ on these formulas.

 

MULTI-ROW FORMULA TOOL 

 

if [Filter]=0 then '' 
elseif [Row+1:Filter]=0 then [Field_1]+[Row+1:Field_1] 
else [Field_1]
endif

 

 

To wrap up organizing my data, I filter out the blank rows or rows I don't want.

 

[Filter] != 0

 

 

Now I'm left with one column with all my data and I need to parse it. You can take two approaches. Use the formula tool to define each column and clean it up, or use the RegEx Parse tool. Here are examples:

FORMULA TOOL

Screenshot 2025-02-13 174853.png

 

REGEX PARSE

parse.jpg

The full formula is

 

(.{26})(.{27})(.{19})(.{17})(.{43})(.{8})(.{24})(.{9})(.{9})(.{36})(.{29})(.{8})(.{7})

 

Where each part in () represents a column and the number represents the column length.

 

After this you can take a similar approach for the headers and append them. Let me know if you get stuck, but hopefully this sets you in the right direction.

OTrieger
13 - Pulsar

@Jerid_Ahrens 
What type of file is it, xml, csv, xlsx, json, etc?

 

Maybe the solution is much simpler and not so complex as different file type has some settings in the Data Input Tool that might give you almost an immediate result without so much data transformation.

Labels
Top Solution Authors