I'm trying to parse the data from text files. The sample data on the text file is as below,
REPT NO.XXXXX PAGE 1
BANK NO. XXX MAINTENANCE REJECT REPORT
DATE 02/14/23
Y (IF CHANGED)
************************************************************************************************************************************
Account 001-0002-003-0004-0005
NAME 1 : Test,1
NAME 2 :
Y ADDRESS 1 : Address for Test 1 ADDRESS REJECTED
ADDRESS 2 :
Y CITY-STATE : Unknown, OH ADDRESS REJECTED
Y ZIP : 43230 ADDRESS REJECTED
HOME PHONE : 000-008-7666
OTHER PHONE : 000-000-0000
BUSINESS PHONE: 000-000-0000 EXT.
************************************************************************************************************************************
Account 121-0000-006-5785-6560
NAME 1 : Test,2
NAME 2 :
Y ADDRESS 1 : Address for Test 2 ADDRESS REJECTED
ADDRESS 2 :
Y CITY-STATE : Test, PUERTO RICO ADDRESS REJECTED
Y ZIP : 00729 ADDRESS REJECTED
HOME PHONE : 000-000-0000
OTHER PHONE : 000-000-0000
BUSINESS PHONE: 000-000-0000 EXT.
BUSINESS NAME :
Data for multiple accounts is split by using asterisk(******) in the line
The requirement is to extract the account number. If line item starts with Y then extract the details on it. For e:g Y ADDRESS 1 : Address for Test 1 ADDRESS REJECTED then ADDRESS 1 = Address for Test 1 ADDRESS REJECTED
Solved! Go to Solution.
Hi @Idyllic_Data_Geek ,
As you have not provided an example of what you require I have assumed you want an account per row with the values in columns.
This is achieved by a combination of regex to parse out the required information, and the Crosstab tool to pivot the data into an appropriate format.
I have done this in the following workflow:
The first step is to strip out unwanted empty rows and the rows at the top of the report. I then used regex to parse out the Account Code and a multi-Row formula to carry the Account down to all appropriate rows. This will be the grouping column.
I then used Regex to parse the names and the data of each row, and finally cross-tabbed into the following format:
I would strongly recommend looking at the following resources to help you with these tools as it will be greatly beneficial in the future:
Regex:
https://help.alteryx.com/20223/designer/regex-tool
https://www.phdata.io/blog/alteryx-regular-expressions/
Crosstab:
https://help.alteryx.com/20223/designer/cross-tab-tool
I have attached the workflow.
I hope this helps,
M.
@idels I'm going to assume a few things:
- You want a list of account with at least one change.
- You the list of fields and accounts with a change
- The format of the txt file is set.
I took the approach of assigning each row of data with the account that it is associated with using multi-row formula, then determine if it's needs base on your requirement of a leading "Y " denotes a change was made.
This approach relies heavily on your text file format remaining consistent.
I hope this helps.
Thanks for taking the time out to help me. It is working great so far. The Column Name in your output that starts with Y obviously has change that we want to track. Now in the data that I gave you it was for Address 1. In other scenarios it can be for Address 2, phone # , Cell # etc. One thing that I want to capture here is that in the output where ever column name is starting with 'Y_' then trim 'Y_' dynamically. How can I accomplish that?
I've updated the workflow to strip out the leading Y and space:
I recommend having a look at Regex as this is what you need for parsing data strings like this. If you apply any changes to the RegexOut1 and RegexOut1 columns before the Crosstab tool you can basically strip out anything you want.
You can find out more about regex at the following pages:
https://help.alteryx.com/20223/designer/regex-tool
https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-RegEx/ta-p/37689
I hope this helps,
M.