Delimited Text File Normalization
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am trying to normalize a text file into a cleaner excel output file. There are headers and multiple rows here and there I do not need. I am trying to get all the users under the columns fields "ID", "Initials", and "Name". Please see screenshot below:
I usually do clean up in excel but wanted to create a workflow to make this a quicker process.
This is the output I am looking for
ID Initial Name
AU AUI Choi
Solved! Go to Solution.
- Labels:
- Input
- Preparation
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
There are different techniques that can be used depending on the data.
The general steps to follow are:
1. Open the file as a txt file. This will open a "Resolve File Type" window.
2. Select Read as Delimited file and select OK.
3. If the data is in columns separated by spaces, you can define the columns in the next window by clicking on the character positions at the top of the window. Don't worry about unneeded line of characters before or after the data, just split the file based on the column boundaries of the data you care about. If the data is in columns separated by tabs, then you don't need to worry about column boundaries at this point. Take a look at this link for more info on how to split raw text data into columns: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Text-txt-file-Non-Delimited-Flat-File-...
4. Click OK and the you'll get a preview of the data.
5. Use a Sample tool to skip the rows you don't care about. After this is done, you should have the column headers as the top row of data.
6. If you had split the columns in Step 3, then skip this step. If you hadn't split the columns, and the column data are separated by tabs, then set a row ID, use a Text to Columns tool to split the data to rows, then use a crosstab to put the data into separate columns.
7. You should now have the data in columns with the column names in the first row. Use the Dynamic Rename Tool to promote the first row to be the column headers.
8. At this point you can use Select, Cleansing, and/or Autofield to clean up the columns, select the ones you want, and set the data types.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This is the below setting i put while importing the file and the second screenshot shows the output. Not sure if i am doing this right.
When i select none as my delimiter, i get the below report output
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mustufa2019 ,
your file looks like a report file create by a program running on a mainframe computer. I suppose, it is a fixed width text file, so you should select "Read in as a fixed width text file".
The next step will prompt you to (visually) define fields:
Of course you will have to remove the report header rows (e.g. using Sample tool).
Hope this is helpful.
Best regards
Roland
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mustufa2019
@TonyA has a good solution for working on text files based on column position. Another technique you can use is to import the entire file with no delimiters and use a technique similar to this to parse out the data
Start with your input tool set to import the file with no delimiters (\0) and data in the 1st row
Then use a Multirow tool to find the rows that contain the information. In this case it's all the rows from the ID header row to the next null row. Use a filter to remove all the other rows, add a record ID and use use this split the "ID INITIALS NAME PRINTER" header row from the other data rows. The data you're interested in is everything from the start of the line to character position just before the word "PRINTER", so find the PrinterPosition in the Header row and append that to data rows. Trim the data rows so they contain everything up to but not including this position. Then use a Regex tool in Parse mode to split this on space(s) to give you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Roland,
This was really helpful, I would like to know if the same formatting can be done for a CSV or a XLS file coming from a mainframe computer. If Yes, could you please suggest a method ?
Best
Abhishek
