Alteryx Designer Desktop Discussions

Jerid_Ahrens · ‎11-22-2021

Hello, I'm new to Alteryx and was hoping someone could give me some pointers on what tools would be best for me to use to parse out files like the one shown below. I need to capture the Store, Suite, Invoice Number, Order Date, Invoice Date, and Facility from the header and then add it to the detail lines.

I started using RegEx to capture the detail lines but I'm not sure how to collect the header fields and add to each detail record.

Is this possible in Alteryx?

Desired Result

Original Report File

Jerid_Ahrens · ‎11-22-2021

I figured out how to fill in the columns using the Multi-Row Formula tool.

I guess at this point I'm just looking for validation that RegEx and the Multi-Row Formula tools along with a Filter and or Select tools are my best option to parse files like this.

CharltonFranklin · ‎02-13-2025

Hi Jerid, you probably no longer need this solution, but I imagine that others looking this up will need a solution to this kind of problem. I've worked for banks and many core data providers for banks still provide a lot of data in this format. I've solved for this challenge with a few reports already. It's not simple, but it's not expert level either. I think you're heading down the right path with the Multi-Row formula tool.

Here's an example of what one of my workflows looks like. I later converted it to an analytic app so that people can convert files on their own. Also, I didn't screenshot the whole workflow, because there's other things I do with the data after I've reformatted it into column and row format.

Screenshot 2025-02-13 171623.png

First you want to write a formula to identify different row types: Report Header, Column Headers, data rows. This is the primary step and can take some time figuring out the unique logic in your report. Once you know how to identify the headers, filter them out. In my screenshot above, I wrote the logic right into the filter tool and it looks like this:

FILTER TOOL

NOT(
Contains([Field_1], '***') 
or Contains([Field_1], 'BRANCH  ACCOUNT NUMBER') 
or Contains([Field_1], 'ACH NAME') 
or Contains([Field_1], '309         ')
or Contains([Field_1], 'DEBITS                     ')
or Contains([Field_1], 'CREDITS                     ')
or IsEmpty([Field_1])
)

While you may need data from the header rows, that's fine, you can extract what you need from the "F" anchor and append it back to your data later. For example, the report date is likely in one of those header rows, but you'll want to add it to every row later.

Once I have the data rows isolated, I now have the issue that one record is on multiple rows. In order to resolve that, I write a formula to identify which rows are the first record and which ones are not. In my case, the first row always starts with a number in a specific position.

FORMULA TOOL (Added Column "Filter")

ToNumber(Trim(Substring([Field_1],11, 12)))

Then I use the multi-row formula tool to combine all the data into one row. Keep in mind your logic is likely to differ on these formulas.

MULTI-ROW FORMULA TOOL

if [Filter]=0 then '' 
elseif [Row+1:Filter]=0 then [Field_1]+[Row+1:Field_1] 
else [Field_1]
endif

To wrap up organizing my data, I filter out the blank rows or rows I don't want.

[Filter] != 0

Now I'm left with one column with all my data and I need to parse it. You can take two approaches. Use the formula tool to define each column and clean it up, or use the RegEx Parse tool. Here are examples:

FORMULA TOOL

Screenshot 2025-02-13 174853.png

REGEX PARSE

The full formula is

(.{26})(.{27})(.{19})(.{17})(.{43})(.{8})(.{24})(.{9})(.{9})(.{36})(.{29})(.{8})(.{7})

Where each part in () represents a column and the number represents the column length.

After this you can take a similar approach for the headers and append them. Let me know if you get stuck, but hopefully this sets you in the right direction.

OTrieger · ‎02-13-2025

@Jerid_Ahrens
What type of file is it, xml, csv, xlsx, json, etc?

Maybe the solution is much simpler and not so complex as different file type has some settings in the Data Input Tool that might give you almost an immediate result without so much data transformation.

Alteryx Designer Desktop Discussions

Parsing a Mainframe Report File

Re: Need to check if we can activate container bas...

[SHARING] How to: Dynamic Formula Example with new...

Re: Regex Line Extraction

Re: Rolling 12 months dates

Alteryx 2024.2 Upgrade Issue – Formula Tool Config...