Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Delimited Text File Normalization

mustufa2019
8 - Asteroid

I am trying to normalize a text file into a cleaner excel output file. There are headers and multiple rows here and there I do not need. I am trying to get all the users under the columns fields "ID", "Initials", and "Name". Please see screenshot below:

 

I usually do clean up in excel but wanted to create a workflow to make this a quicker process. 

 

Text File Question Alteryx.PNG

 

This is the output I am looking for 

 

ID    Initial   Name  

AU   AUI     Choi

7 REPLIES 7
TonyA
Alteryx Alumni (Retired)

There are  different techniques that can be used depending on the data.

 

The general steps to follow are:

1. Open the file as a txt file. This will open a "Resolve File Type" window.

2. Select Read as Delimited file and select OK.

3. If the data is in columns separated by spaces, you can define the columns in the next window by clicking on the character positions at the top of the window. Don't worry about unneeded line of characters before or after the data, just split the file based on the column boundaries of the data you care about. If the data is in columns separated by tabs, then you don't need to worry about column boundaries at this point. Take a look at this link for more info on how to split raw text data into columns: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Text-txt-file-Non-Delimited-Flat-File-...

4. Click OK and the you'll get a preview of the data.

5. Use a Sample tool to skip the rows you don't care about. After this is done, you should have the column headers as the top row of data.

6. If you had split the columns in Step 3, then skip this step. If you hadn't split the columns, and the column data are separated by tabs, then set a row ID, use a Text to Columns tool to split the data to rows, then use a crosstab to put the data into separate columns.

7. You should now have the data in columns with the column names in the first row. Use the Dynamic Rename Tool to promote the first row to be the column headers.

8. At this point you can use Select, Cleansing, and/or Autofield to clean up the columns, select the ones you want, and set the data types.

 

mustufa2019
8 - Asteroid

This is the below setting i put while importing the file and the second screenshot shows the output. Not sure if i am doing this right. 

 

mustufa2019_0-1572160010808.png

 

mustufa2019_2-1572160143617.png

 

mustufa2019_1-1572160094688.png

 

When i select none as my delimiter, i get the below report output

 

mustufa2019_0-1572160010808.png

mustufa2019_3-1572160240173.png

RolandSchubert
16 - Nebula
16 - Nebula

Hi @mustufa2019 ,

 

your file looks like a report file create by a program running on a mainframe computer. I suppose, it is a fixed width text file, so you should select "Read in as a fixed width text file".

 

27-10-_2019_13-30-06.png

 

The next step will prompt you to (visually) define fields:

 

27-10-_2019_13-30-21.png

 

Of course you will have to remove the report header rows (e.g. using Sample tool).

 

Hope this is helpful.

 

Best regards

 

Roland

danilang
19 - Altair
19 - Altair

Hi @mustufa2019 

 

@TonyA has a good solution for working on text files based on column position. Another technique you can use is to import the entire file with no delimiters and use a technique similar to this to parse out the data

 

w.png

 

Start with your input tool set to import the file with no delimiters (\0) and data in the 1st row

 

i.png

 

Then use a Multirow tool to find the rows that contain the information.  In this case it's all the rows from the ID header row to the next null row.  Use a filter to remove all the other rows, add a record ID and use use this split the "ID INITIALS NAME PRINTER" header row from the other data rows.  The data you're interested in is everything from the start of the line to character position just before the word "PRINTER", so find the PrinterPosition in the Header row and append that to data rows.  Trim the data rows so they contain everything up to but not including this position.  Then use  a Regex tool in Parse mode to split this on space(s) to give you

 

r.png

mustufa2019
8 - Asteroid

@TonyA 

 

Thank you so much! I actually went back and tried your approach and it worked! 

mustufa2019
8 - Asteroid

@danilang 

 

Thank you so much for the help! I followed @TonyA solution and it worked!

Abhishek_0331
5 - Atom

Hi Roland,

 

This was really helpful, I would like to know if the same formatting can be done for a CSV or a XLS file coming from a mainframe computer. If Yes, could you please suggest a method ?

 

Best 

Abhishek 

Labels