I have this text file which have more than 7,000 rows. How can I parse this to match the below desire output?
Solved! Go to Solution.
RegEx will get you there: there appears to be sufficient structure in the Text Input that a proper Regular Expression in the RegEx Tool will be able to Parse the data.
Expression: (\d+)(.*?)\s{2,}(.*?)\s{2,}(\w{2})(.*?)\s{2,}(\w)\s(\w)(.*$)
One of the key items is splitting sections on the \s{2,}, which looks for multiple connected spaces.
Hope this helps and Happy Solving!
CoG, thank you your help! The RegEx worked for the most part but there was roughly 900 rows where it was [Null]. At first I thought maybe the data had special characters, so I did a data cleansing to remove punctuation and leading/trailing whitespace. That didn't help. Not sure why the RegEx parse didn't work on all.
Not sure how to handle the first two cases, since the Short Code and Full Name are not separated by multiple spaces.
The last example ends in YSY20181214, but all other examples involve a space. I'm glad the expression worked for most of the values, but it seems like there are some inconsistencies (from what I had initially observed). Without well-defined rules, RegEx does not work. I clearly do not know all of the rules, so if you have more information about what the structure of the string is, that will be critical in refining the RegEx.
@dorchin @CoG for these case, mostly i will goto open in notepad first. to check the monospace view (where each letter is same size, where ease to see the pattern).
if it structure like this. mean we can direct split by num of char.
(this why i raise idea to have a monospace in the result window.)
so here the sample workflow to split by char, and use macro that i built before.
You may also use .flat file to read a data file which contain fixed-length fields.
help.alteryx.com : Flat (ascii) Files
You can define the metadata as described in the above help page with the extension as ".flat".
For your sample data, it would look like below;
<flatfile version="1">
<file
path="sample2025.txt"
eoltype="crlf"
trimWhiteSpace="t"
allowShortLines="t"
/>
<fields>
<field name="ABA" type="String" length="9" />
<field name="Bank Short Name" type="String" length="18" />
<field name="Bank Full Name" type="String" length="36" />
<field name="State" type="String" length="2" />
<field name="Town" type="String" length="25" />
<field name="Random1" type="String" length="2" />
<field name="Random2" type="String" length="1" />
<field name="Date" type="String" length="8" />
</fields>
</flatfile>
Then you can read the .flat file with Input Data Tool.
You can download and unzip the attached zip file and see how it works.
If you save the data file and the .flat file in separate folders, you need to set the path for the text data according to the actual location of the data file.
If you want to change the data type of any field, you can change "type" in the .flat file accordingly.
I hope this helps.
Check this one out, it is independent of length, so in future if your items length changes it will still work as expected.
What I am doing here is identifying converting all two consecutive spaces with "|".
Then I generate a list of possible combinations of "|" using generate rows.
Then I replace all the combinations of "|" with single "|".
Added a recordID, now I can split the whole data into individual rows.
Once done, I use Tile to group on recordID and assign a value to each unique record.
Cross tab the data group by recordID and header as tile sequence and values as Field_1.
Clean it, sort it and you have your data ready for you to work with.
Your workflow worked perfectly. Silly question 1: where can I find the "Char Columnizer" tool? I tried typing in the search bar and it didn't come up. Silly question 2: how do you know how many characters to split into column (9,18,36 etc). Thanks again!!
It is Macro, where is built by me. not a standard tool. It should be save along workflow.
To use it, you need to:
1. move the macro to same folder of your workflow.
2. right click -> insert -> macro... to browse the macro
3. then can move as normal tool.
To share the workflow that use macro, you need to export it and make sure the macro is included:
To identify the number of char for each column, either count manually or using notepad. it show the # of character in bottom left.
User | Count |
---|---|
107 | |
85 | |
76 | |
54 | |
40 |