Fixing issues in file import with Flat File
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
For data processing I import data using a Fixed width flat file which when the data in the input file (usually in .txt format) is correct will import it as shown below:
However let's say the input file has some issue (happens sometimes when a vendor sends it with wrong formatting) and there is an additional tab or space before the first character (or subsequent characters) on any row then the input gets skewed for that row alone as shown below (notice the first row and first column only has 5 numbers. First row and 2nd column has 5 numbers instead of 4 and everything is shifted by one character to the right)
If this was one file, I can manually fix it. But I would like to see if there is some way I can import the data from the text file and then make Alteryx fix these issues so that the output is per the 1st image shown above.
Below is the (correct) format in which data usually occurs in the text file.
And below is when I shifted just the 1st row by 1 space to the right. How can I make Alteryx import the data regardless and fix it in the format (as shown in the very first image)?
Each column has a fixed width and the spacing is also important. So I cannot just strip all spacing and then use substring extract (tried it and it didnt work because it was pulling the number as "520101" whereas the correct format is "5201 " (2 spaces after 5201)).
Thanks
- Labels:
- Datasets
- Parse
- Preparation
- Transformation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
There may be a more elegant way to do this, but you could try:
- Reading it in as a non-delimited text file so everything is in one field
- Trim leading/trailing spaces
- Use a regex tool to parse out the fields based on the fixed width definitions (see below screenshot)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Luke.. Thank you! Is there a way I can make Regex ignore the space before the 012200 (I am assuming what if the vendor had introduced an extra space in the middle rather than at the beginning) as shown below (first row). Using regex is there a way I can make it ignore this space before 0122
Without ignoring that extra space before 0122, my output becomes as shown below:
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@dpakapd please provide some sample data and width of each fields
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @dpakapd I would just use a formula tool right after the input that is TRIM([Field_1]). That will get rid of the leading space (if there is one)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@dpakapd missed your point about the extra space in the middle of the fields. That is trickier but should be able to get parsed out if you give some sample data (not screenshots).
Personally, if one of my vendors had agreed to a file format I would push back and get them to correct it :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@binuacs : Pl see attached a sample data. The width of the fields is as shown in the screenshot below from the flat file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Luke.. I have attached some sample data..
As of now we are pushing back on vendors to fix the file format. But it becomes a repetitive exercise with multiple vendors each month. That's why I want to try and see if there is someway I can take in the data they give and spit it out in the correct format.
Below is the screenshot from the flat file (not able to upload that).
